Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append instead of overwrite #19

Closed
rachelvuu opened this issue Jul 8, 2019 · 5 comments
Closed

Append instead of overwrite #19

rachelvuu opened this issue Jul 8, 2019 · 5 comments

Comments

@rachelvuu
Copy link

Is there any way to append to the hyper extract rather than overwrite the file?

@bwiley1
Copy link
Owner

bwiley1 commented Jul 10, 2019

Hi Rachel,

Thanks for reaching out! Last time I checked on this, I had some trouble trying to manipulate the .hyper or .tde files as it looked like writing functions in the tableausdk package were encrypted. Originally I had wanted to push out another version where you could convert a .tde or .hyper to a pandas dataframe, or otherwise manipulate the data between sources, but I had some trouble trying to do this. I agree though, it would be a cool functionality to add - I'll try to do more research on the issue. Thanks!

Best,
Ben

@ghost
Copy link

ghost commented Jul 26, 2019

Hey @bwiley1 ,

I think the issue is around lines 139-154 in pandleau.py. This should be able to be abstracted and instead of creating a table definition from scratch, check to see if 'Extract' already exists, and if so, just set table_def to the definition that already exists.

At least in the 'old' SDK, tableauSDKSample.py has an example of this -- the procedure createOrOpenExtract() checks first if the table exists, otherwise it creates it. Then, the procedure populateExtract() gets the table schema using table.getTableDefinition()

However, I don't know how nicely this will play with the "add index" function of pandleau. TDEs (and I'm assuming hypers as well) aren't really meant to be read by anything but Tableau, and the SDK doesn't have any public reading functions that I'm aware of.

I should have some time when I get home to clone and play around and see if it's something that can be adjusted. I am making an assumption that these functions exist in both SDK and SDK2, but I guess I'll find out!

@bwiley1
Copy link
Owner

bwiley1 commented Jul 27, 2019

That's very true... I think using createOrOpenExtract would also solve writing multiple tables to a single extract (another issue on this list). That would be cool if you figure it out! Let me know if there's anything I can help out with!

@ghost
Copy link

ghost commented Jul 30, 2019

@bwiley1 Check it out here: https://github.com/harrison-h/pandleau/tree/load-existing-table

Was pretty straightforward. I've only tested it on the legacy SDK (as it's what I have for my use case) but it works exactly as intended. The use case I have is that I'm transforming very large datasets in a way to feed them into Tableau, so I end up having to pass it along to the extract in chunks.

Additionally, I think you're right about it allowing you to write multiple tables! As long as that argument is passed, it should work just fine. Also not tested though.

As full disclosure, I'm not in CS or anything, so feel free to point out anything in my code that could be better or improved upon. If it looks all good to you as well I can open a pull request.

@bwiley1
Copy link
Owner

bwiley1 commented Jul 30, 2019

I think this looks fine! If you want to open a pull request I'll approve it, thanks!

@bwiley1 bwiley1 closed this as completed Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants