-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new encoding parameter to ascii.read #5448
Conversation
@saimn - thanks! I think that you might be able to do this with a much smaller footprint and more in the idiom of
You can also do a global grep for |
For testing, in this case you definitely want to test the values as well as the column header names. So I would suggest removing the standardized tests you added and putting in a new test function that tests the column names and values and dtypes. To be honest I don't entirely understand the expected output for this in Python 2.7 and even why it is not failing tests now (I have an idea but didn't dig through the code to check). However, if I read the latin1 test table in Py2 and try to print it then this raises an exception. I'm not sure if this is a separate issue that can / should be fixed, but in any case we need to make sure Py2 users don't get a partially working situation. (Either fail up front or fully work). |
@taldcroft - Good points, I forgot to test on Python 2 but actually it fails on Travis (https://travis-ci.org/astropy/astropy/builds/172562778). The green status on the PR is because I pushed the CHANGES entry later with [ci skip] ... I will take a look ! |
One possibility is not allowing |
fcf08e8
to
7cffbd2
Compare
I pushed some commits to give an update, even though I didn't make progress the last 2 weeks. The current status is:
|
@saimn - sorry for the slow response. This looks much better now and is definitely close. I'm good with your approach in the second bullet regarding limiting the functionality to slow readers and Py3. I'm assuming that the existing tests you wrote are failing and that is why you [skip ci]'d the last commit here. So just check for the presence of an |
astropy/wcs/wcs.py
Outdated
@@ -2642,7 +2642,7 @@ def footprint_to_file(self, filename=None, color='green', width=2): | |||
color : str, optional | |||
Color to use when plotting the line. | |||
|
|||
width : int, optional | |||
width : int, optionalastropy/io/ascii/tests/test_read.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental edit.
astropy/io/ascii/setup_package.py
Outdated
@@ -72,6 +72,8 @@ def get_package_data(): | |||
't/simple3.txt', | |||
't/simple4.txt', | |||
't/simple5.txt', | |||
't/simple_latin1.txt', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like your current test strategy is to generate the appropriate file on the fly, so these are not needed.
astropy/io/ascii/tests/test_read.py
Outdated
print('\n\n******** SKIPPING %s' % testfile['name']) | ||
continue | ||
|
||
tmpfile = str(tmpdir.join(os.path.basename(testfile['name']))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here giving guidance on the overall plan of generating a new table with an encoded column where possible.
astropy/io/ascii/tests/test_read.py
Outdated
|
||
format = formats.get(testfile['opts'].get('Reader')) | ||
|
||
with open(tmpfile, mode='w', encoding='latin1') as fout: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does utf-8
need to be tested as well?
astropy/io/ascii/tests/test_read.py
Outdated
name = u'à' if not six.PY2 else 'alpha' | ||
col = Column(name=name, data=[six.u(x) for x in table.columns[0]]) | ||
table.add_column(col, 0) | ||
table[0][0] = u"àéö" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later on this table value should be tested. After this block you could define a variable table00 = table[0][0]
and then test.
@taldcroft - no problem for the slow pace, I'm not better ;-). Thanks for your comments, the last commit on tests is really messy and unfinished, sorry about that. I will try to make progress on this soon. |
Looks like we're going to have to push this to 2.0 . (will need to move the changelog entry to that section, too). |
@saimn - what's your thought on this? Defer to next release or try to push? |
@taldcroft - I will check if I can get something, at least for Python 3 and the slow reader. |
Currently passed to get_readable_fileobj only when guess is True
@taldcroft - I pushed a new simplified version, a said above it works for slow readers and Python 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great! In fact so nice that I also request putting some mention or example in the Getting Started docs. There is a Note at the end of the Reading section that would be a perfect place to mention the option of specifying the encoding
parameter.
astropy/io/ascii/tests/test_read.py
Outdated
'--- --- -----', | ||
' 1 2 héllo'] | ||
|
||
table = ascii.read(testfile, format=fmt, fast_reader=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a loop here for guess in (True, False):
to explicitly include the no-guessing case.
CHANGES.rst
Outdated
@@ -30,6 +30,9 @@ New Features | |||
|
|||
- ``astropy.io.ascii`` | |||
|
|||
- Allow to specify encoding in ``ascii.read``, only for Python 3 and with the | |||
slow readers. [#5448] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say "pure-Python readers" instead of "slow readers". 😄
@taldcroft - I addressed your comments, and the builds passed. |
Allow to specify encoding when using
ascii.read(...)
. See #3826 for the motivations.I added test files only for the simple reader, not sure if I should do the same for other readers ?
cc @taldcroft @pllim