SAS7BDAT backend #41

talumbau · 2015-01-20T21:27:11Z

Following this conversation, it appears that sas7bdat now supports Python 3. This leverages the latest package, which can read compressed and uncompressed data. Some notes:

Since sas7bdat doesn’t write files, I’m not sure how to do the test data, other than put it in the repo, which seems odd

I don’t have a sense about cost weights. The conversions are unlikely to be a “middle step” though so it probably doesn’t matter.

This can get better if I get access to data that is more than either ‘string’ or ‘numeric’ types

Renaming sql_csv.py:excute_copy -> execute_copy causes an error, so I changed it to a (hopefully) meaningful name, just not the same name as the decorator

mrocklin · 2015-01-20T21:35:25Z

into/backends/sas.py

+@convert.register(list, SAS7BDAT, cost=8.0)
+def sas_to_list(s, dshape=None, **kwargs):
+    s.skip_header = True
+    return list(s.readlines())


Don't need this one. I would just depend on the Iterator conversion. The conversion network will handle other transformations from Iterator.

mrocklin · 2015-01-20T21:36:49Z

This looks pretty slick.

Into has gotten smarter in recent months, so I've marked a couple of functions that probably no longer need to exist. Otherwise things look nice.

Regarding test data. Is there any way to use sas7bdat to create SAS files that we could use for testing?

mrocklin · 2015-01-20T21:49:12Z

We'll also need to import it as is done here https://github.com/ContinuumIO/into/blob/master/into/__init__.py#L17-L20

and add sas7bdat to .travis.yml

mrocklin · 2015-01-20T22:08:33Z

into/backends/sas.py

+from ..resource import resource
+
+SAS_type_map = {'number': 'float64',
+                'string': 'string'}


Do we need to extend this?

mrocklin · 2015-01-20T22:25:40Z

Do you know how we would create an sas file from any sort of object (e.g. a Pandas DataFrame?)

talumbau · 2015-01-20T22:46:50Z

No, but @jaredhobbs might know how to create such a SAS file.

mrocklin · 2015-01-20T22:48:39Z

It would save us from having to hold on to binary data in the repository. It would also be nice to craft particular datasets to test certain things.

jaredhobbs · 2015-01-20T22:55:49Z

Unfortunately, the sas7bdat library is read-only. The format is closed and the only way we're able to read the data is through reverse engineering the format. While this seems to work ok for reading, there are still a lot of holes in the format that are not fully understood. Unless sas opens the format or someone can provide the missing information on the spec, there won't be any write support.

mrocklin · 2015-01-20T22:58:49Z

Good to know. Thanks for the work that's already there and for chiming in.

talumbau · 2015-01-21T19:03:09Z

This work revealed the following issue (and subsequent fix):

https://bitbucket.org/jaredhobbs/sas7bdat/pull-request/5/unicode-fix-for-python-3/diff

this affects the DataFrame test in this PR.

jaredhobbs · 2015-01-21T19:15:27Z

Thanks for the fix. I merged the change and released v2.0.3

talumbau · 2015-01-23T08:06:43Z

same behavior in Python 2.7 and 3.x now. Work arounds for Python 2.6.

mrocklin · 2015-01-23T16:25:18Z

Awesome. Sorry to leave this lingering. My bandwidth has been severely limited the last couple of days. Should definitely have some time on Monday to finish this up.

talumbau · 2015-01-27T16:47:05Z

@mrocklin only question I had was on the way I worked around Python 2.6. Happy to get any other feedback here too.

mrocklin · 2015-01-27T17:20:37Z

into/backends/sas.py

+
+@convert.register(Iterator, SAS7BDAT, cost=1.0)
+def sas_to_iterator(s, **kwargs):
+    s.skip_header = True


Do we always want this? Is there some reason why this isn't default in the sas7bdat library?

mrocklin · 2015-01-27T18:09:54Z

I also removed the use of setting skip_header. This seemed a bit invasive. This did require a few changes upstream though. Work here https://bitbucket.org/jaredhobbs/sas7bdat/pull-request/6/python-26-compatibility-and-iterator-fix/diff

Last time around @jaredhobbs was pretty speedy about merging and updating. Here's hoping for a repeat performance :)

@talumbau I'll push my changes to fork and issue another PR.

mrocklin · 2015-01-27T18:55:05Z

Looks like @jaredhobbs has merged and updated PyPI. I'm good to merge with the changes in my PR to @talumbau . Are you ok with those changes? If so I'll go ahead and merge both into master.

few changes to sas7bdat backend

talumbau · 2015-01-27T19:22:32Z

I agree with the changes and thanks for the 2.6 fixes. Also thanks @jaredhobbs for the quick merge!

mrocklin · 2015-01-27T19:47:55Z

This is in. Thanks all.

mrocklin reviewed Jan 20, 2015
View reviewed changes

mrocklin reviewed Jan 27, 2015
View reviewed changes

Merge pull request #2 from mrocklin/sas7bdat2

2f01d84

few changes to sas7bdat backend

mrocklin closed this Jan 27, 2015

mrocklin mentioned this pull request Feb 5, 2015

sas conversion #30

Closed

mrocklin mentioned this pull request Mar 11, 2015

sas sas7bdat, stata .dta formats <-> HDF5 #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAS7BDAT backend #41

SAS7BDAT backend #41

talumbau commented Jan 20, 2015

mrocklin Jan 20, 2015

talumbau Jan 20, 2015

mrocklin commented Jan 20, 2015

mrocklin commented Jan 20, 2015

mrocklin Jan 20, 2015

mrocklin commented Jan 20, 2015

talumbau commented Jan 20, 2015

mrocklin commented Jan 20, 2015

jaredhobbs commented Jan 20, 2015

mrocklin commented Jan 20, 2015

talumbau commented Jan 21, 2015

jaredhobbs commented Jan 21, 2015

talumbau commented Jan 23, 2015

mrocklin commented Jan 23, 2015

talumbau commented Jan 27, 2015

mrocklin Jan 27, 2015

mrocklin commented Jan 27, 2015

mrocklin commented Jan 27, 2015

talumbau commented Jan 27, 2015

mrocklin commented Jan 27, 2015

SAS7BDAT backend #41

SAS7BDAT backend #41

Conversation

talumbau commented Jan 20, 2015

mrocklin Jan 20, 2015

Choose a reason for hiding this comment

talumbau Jan 20, 2015

Choose a reason for hiding this comment

mrocklin commented Jan 20, 2015

mrocklin commented Jan 20, 2015

mrocklin Jan 20, 2015

Choose a reason for hiding this comment

mrocklin commented Jan 20, 2015

talumbau commented Jan 20, 2015

mrocklin commented Jan 20, 2015

jaredhobbs commented Jan 20, 2015

mrocklin commented Jan 20, 2015

talumbau commented Jan 21, 2015

jaredhobbs commented Jan 21, 2015

talumbau commented Jan 23, 2015

mrocklin commented Jan 23, 2015

talumbau commented Jan 27, 2015

mrocklin Jan 27, 2015

Choose a reason for hiding this comment

mrocklin commented Jan 27, 2015

mrocklin commented Jan 27, 2015

talumbau commented Jan 27, 2015

mrocklin commented Jan 27, 2015