-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAS7BDAT backend #41
SAS7BDAT backend #41
Conversation
@convert.register(list, SAS7BDAT, cost=8.0) | ||
def sas_to_list(s, dshape=None, **kwargs): | ||
s.skip_header = True | ||
return list(s.readlines()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need this one. I would just depend on the Iterator
conversion. The conversion network will handle other transformations from Iterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
This looks pretty slick. Into has gotten smarter in recent months, so I've marked a couple of functions that probably no longer need to exist. Otherwise things look nice. Regarding test data. Is there any way to use |
We'll also need to import it as is done here https://github.com/ContinuumIO/into/blob/master/into/__init__.py#L17-L20 and add sas7bdat to .travis.yml |
from ..resource import resource | ||
|
||
SAS_type_map = {'number': 'float64', | ||
'string': 'string'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to extend this?
Do you know how we would create an sas file from any sort of object (e.g. a Pandas DataFrame?) |
No, but @jaredhobbs might know how to create such a SAS file. |
It would save us from having to hold on to binary data in the repository. It would also be nice to craft particular datasets to test certain things. |
Unfortunately, the |
Good to know. Thanks for the work that's already there and for chiming in. |
This work revealed the following issue (and subsequent fix): https://bitbucket.org/jaredhobbs/sas7bdat/pull-request/5/unicode-fix-for-python-3/diff this affects the DataFrame test in this PR. |
Thanks for the fix. I merged the change and released v2.0.3 |
same behavior in Python 2.7 and 3.x now. Work arounds for Python 2.6. |
Awesome. Sorry to leave this lingering. My bandwidth has been severely limited the last couple of days. Should definitely have some time on Monday to finish this up. |
@mrocklin only question I had was on the way I worked around Python 2.6. Happy to get any other feedback here too. |
|
||
@convert.register(Iterator, SAS7BDAT, cost=1.0) | ||
def sas_to_iterator(s, **kwargs): | ||
s.skip_header = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always want this? Is there some reason why this isn't default in the sas7bdat
library?
I also removed the use of setting Last time around @jaredhobbs was pretty speedy about merging and updating. Here's hoping for a repeat performance :) @talumbau I'll push my changes to fork and issue another PR. |
Looks like @jaredhobbs has merged and updated PyPI. I'm good to merge with the changes in my PR to @talumbau . Are you ok with those changes? If so I'll go ahead and merge both into master. |
few changes to sas7bdat backend
I agree with the changes and thanks for the 2.6 fixes. Also thanks @jaredhobbs for the quick merge! |
This is in. Thanks all. |
Following this conversation, it appears that sas7bdat now supports Python 3. This leverages the latest package, which can read compressed and uncompressed data. Some notes:
Since sas7bdat doesn’t write files, I’m not sure how to do the test data, other than put it in the repo, which seems odd
I don’t have a sense about cost weights. The conversions are unlikely to be a “middle step” though so it probably doesn’t matter.
This can get better if I get access to data that is more than either ‘string’ or ‘numeric’ types
Renaming
sql_csv.py:excute_copy
->execute_copy
causes an error, so I changed it to a (hopefully) meaningful name, just not the same name as the decorator