Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAS7BDAT backend #41

Closed
wants to merge 1 commit into from
Closed

SAS7BDAT backend #41

wants to merge 1 commit into from

Conversation

talumbau
Copy link
Contributor

Following this conversation, it appears that sas7bdat now supports Python 3. This leverages the latest package, which can read compressed and uncompressed data. Some notes:

Since sas7bdat doesn’t write files, I’m not sure how to do the test data, other than put it in the repo, which seems odd

I don’t have a sense about cost weights. The conversions are unlikely to be a “middle step” though so it probably doesn’t matter.

This can get better if I get access to data that is more than either ‘string’ or ‘numeric’ types

Renaming sql_csv.py:excute_copy -> execute_copy causes an error, so I changed it to a (hopefully) meaningful name, just not the same name as the decorator

@convert.register(list, SAS7BDAT, cost=8.0)
def sas_to_list(s, dshape=None, **kwargs):
s.skip_header = True
return list(s.readlines())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this one. I would just depend on the Iterator conversion. The conversion network will handle other transformations from Iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@mrocklin
Copy link
Member

This looks pretty slick.

Into has gotten smarter in recent months, so I've marked a couple of functions that probably no longer need to exist. Otherwise things look nice.

Regarding test data. Is there any way to use sas7bdat to create SAS files that we could use for testing?

@mrocklin
Copy link
Member

We'll also need to import it as is done here https://github.com/ContinuumIO/into/blob/master/into/__init__.py#L17-L20

and add sas7bdat to .travis.yml

from ..resource import resource

SAS_type_map = {'number': 'float64',
'string': 'string'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to extend this?

@mrocklin
Copy link
Member

Do you know how we would create an sas file from any sort of object (e.g. a Pandas DataFrame?)

@talumbau
Copy link
Contributor Author

No, but @jaredhobbs might know how to create such a SAS file.

@mrocklin
Copy link
Member

It would save us from having to hold on to binary data in the repository. It would also be nice to craft particular datasets to test certain things.

@jaredhobbs
Copy link

Unfortunately, the sas7bdat library is read-only. The format is closed and the only way we're able to read the data is through reverse engineering the format. While this seems to work ok for reading, there are still a lot of holes in the format that are not fully understood. Unless sas opens the format or someone can provide the missing information on the spec, there won't be any write support.

@mrocklin
Copy link
Member

Good to know. Thanks for the work that's already there and for chiming in.

@talumbau
Copy link
Contributor Author

This work revealed the following issue (and subsequent fix):

https://bitbucket.org/jaredhobbs/sas7bdat/pull-request/5/unicode-fix-for-python-3/diff

this affects the DataFrame test in this PR.

@jaredhobbs
Copy link

Thanks for the fix. I merged the change and released v2.0.3

@talumbau
Copy link
Contributor Author

same behavior in Python 2.7 and 3.x now. Work arounds for Python 2.6.

@mrocklin
Copy link
Member

Awesome. Sorry to leave this lingering. My bandwidth has been severely limited the last couple of days. Should definitely have some time on Monday to finish this up.

@talumbau
Copy link
Contributor Author

@mrocklin only question I had was on the way I worked around Python 2.6. Happy to get any other feedback here too.


@convert.register(Iterator, SAS7BDAT, cost=1.0)
def sas_to_iterator(s, **kwargs):
s.skip_header = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want this? Is there some reason why this isn't default in the sas7bdat library?

@mrocklin
Copy link
Member

I also removed the use of setting skip_header. This seemed a bit invasive. This did require a few changes upstream though. Work here https://bitbucket.org/jaredhobbs/sas7bdat/pull-request/6/python-26-compatibility-and-iterator-fix/diff

Last time around @jaredhobbs was pretty speedy about merging and updating. Here's hoping for a repeat performance :)

@talumbau I'll push my changes to fork and issue another PR.

@mrocklin
Copy link
Member

Looks like @jaredhobbs has merged and updated PyPI. I'm good to merge with the changes in my PR to @talumbau . Are you ok with those changes? If so I'll go ahead and merge both into master.

few changes to sas7bdat backend
@talumbau
Copy link
Contributor Author

I agree with the changes and thanks for the 2.6 fixes. Also thanks @jaredhobbs for the quick merge!

@mrocklin
Copy link
Member

This is in. Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants