delimiter in CSV writer should support unicode #1949

taldcroft · 2014-01-07T19:43:37Z

I think the following should probably be made to work:

In [6]: t.write('test.csv', format='ascii', delimiter=u',')
ERROR: TypeError: "delimiter" must be an 1-character string [astropy.io.ascii.core]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-1f963bd3edbd> in <module>()
----> 1 t.write('test.csv', format='ascii', delimiter=u',')

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/table/table.pyc in write(self, *args, **kwargs)
   1732         passed through to the underlying data reader (e.g. `~astropy.io.ascii.ui.write`).
   1733         """
-> 1734         io_registry.write(self, *args, **kwargs)
   1735 
   1736     def copy(self, copy_data=True):

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/io/registry.pyc in write(data, *args, **kwargs)
    356 
    357     writer = get_writer(format, data.__class__)
--> 358     writer(data, *args, **kwargs)
    359 
    360 

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/io/ascii/connect.pyc in write_asciitable(table, filename, **kwargs)
     23 def write_asciitable(table, filename, **kwargs):
     24     from .ui import write
---> 25     return write(table, filename, **kwargs)
     26 
     27 io_registry.register_writer('ascii', Table, write_asciitable)

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/io/ascii/ui.pyc in write(table, output, format, Writer, **kwargs)
    293     Writer = _get_format_class(format, Writer, 'Writer')
    294     writer = get_writer(Writer=Writer, **kwargs)
--> 295     lines = writer.write(table)
    296 
    297     # Write the lines to output

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/io/ascii/core.pyc in write(self, table)
    914         # Write header and data to lines list
    915         lines = []
--> 916         self.header.write(lines)
    917         self.data.write(lines)
    918 

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/io/ascii/core.pyc in write(self, lines)
    414                                        itertools.cycle(self.write_spacer_lines)):
    415                 lines.append(spacer_line)
--> 416             lines.append(self.splitter.join([x.name for x in self.cols]))
    417 
    418     @property

/Users/tom/Library/Python/2.7/lib/python/site-packages/astropy-0.4.dev6923-py2.7-macosx-10.8-x86_64.egg/astropy/io/ascii/core.pyc in join(self, vals)
    294                                          quotechar=self.quotechar,
    295                                          quoting=self.quoting,
--> 296                                          lineterminator='',
    297                                          )
    298         self.csv_writer_out.seek(0)

TypeError: "delimiter" must be an 1-character string

I found this in a script that was importig unicode_literals from __future__

cc @taldcroft

taldcroft · 2014-01-07T19:56:15Z

@astrofrog - code attached.

I've taken the kludgey step of copying two key test files and putting a future unicode_literals import at the top. As discussed previously with @mdboom for astropy.table, there isn't any totally obvious way to do this cleanly, through perhaps there is some clever solution possible. This isn't a future-proof solution, but it does demonstrate that io.ascii should now work with unicode literals in place.

The main issue was related csv which is explicitly not unicode-aware, according to the docs. I also found a mistake in the way an input was tested for str-ness.

mdboom · 2014-01-15T16:22:27Z

This looks good to me. Note that we aren't really "supporting unicode" here -- this will still fail with non-ascii characters -- but that's a pretty corner case -- hopefully no one is using unicode delimiters. Ideally, we would encode the delimiter etc. in the same encoding as whatever lines is in, but I don't think we (can) know what that is, so this is probably good enough.

Needs a CHANGES entry, but other than that I think this is good to merge. Let's just leave the duplicated test_*_unicode files in there for now, and I will take them out again as part of #1962.

Allow passing unicode delimiters in io.ascii read and write

embray · 2014-02-12T18:07:54Z

I merged this into the v0.3.x branch, and several of these tests are failing there. Not sure why yet but am investigating--any ideas?

taldcroft · 2014-02-12T18:27:45Z

No obvious ideas. Is there a link to the test log we can look at?

…ifferences likely due to slight implementation differences between the two versions. Now all tests are passing except some of those introduced by #1949. Still trying to understand the root of that problem.

embray · 2014-02-12T21:22:49Z

I just fixed a few other small issues in the release branch, and narrowed it down to the tests from this PR that aren't working. Travis should have a build log in a bit.

embray · 2014-02-12T21:53:27Z

Here we go: https://travis-ci.org/astropy/astropy/jobs/18745796

The doctest failures are unrelated.

embray · 2014-02-14T22:14:34Z

Nevermind, I understand the issue now: The original version of this PR, as you wrote, added test_read_unicode.py and test_write_unicode.py as copies of the non-unicode_literals versions of those test modules (with a few other slight tweaks). Then the PR was backported those new test modules were added directly to the v0.3.x branch, even though they tested for features/fixes that are unique to the master branch. I redid the merge of this PR by using the versions of test_read.py and test_write.py from the v0.3.x branch and applying the appropriate changes in new versions of test_read_unicode.py and test_write_unicode.py.

Allow passing unicode delimiters in io.ascii read and write

…ifferences likely due to slight implementation differences between the two versions. Now all tests are passing except some of those introduced by #1949. Still trying to understand the root of that problem.

taldcroft added 3 commits January 7, 2014 14:39

Add unicode_literal versions of test_read/write.py

7269feb

Change test type() == str to isinstance(basestring)

8aa27f7

Update io.ascii csv interface to coerce args to str() type

2fad99c

taldcroft mentioned this pull request Jan 7, 2014

Refactor io.ascii to use six #1950

Closed

astrofrog mentioned this pull request Jan 11, 2014

Add an option to run all tests with the 'unicode_literals' option #1956

Closed

3 tasks

taldcroft mentioned this pull request Jan 14, 2014

Test with and without unicode literals by parsing with ast #1962

Merged

2 tasks

Update CHANGES.rst

de08b2f

taldcroft added a commit that referenced this pull request Jan 15, 2014

Merge pull request #1949 from taldcroft/ascii/uni-lit

4210065

Allow passing unicode delimiters in io.ascii read and write

taldcroft merged commit 4210065 into astropy:master Jan 15, 2014

taldcroft deleted the ascii/uni-lit branch January 15, 2014 16:41

taldcroft added a commit that referenced this pull request Feb 12, 2014

Merge pull request #1949 from taldcroft/ascii/uni-lit

f55352c

Allow passing unicode delimiters in io.ascii read and write

taldcroft added a commit that referenced this pull request Feb 12, 2014

Merge pull request #1949 from taldcroft/ascii/uni-lit

f03467a

Allow passing unicode delimiters in io.ascii read and write

taldcroft added a commit that referenced this pull request Feb 14, 2014

Merge pull request #1949 from taldcroft/ascii/uni-lit

5b086fe

Allow passing unicode delimiters in io.ascii read and write

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delimiter in CSV writer should support unicode #1949

delimiter in CSV writer should support unicode #1949

taldcroft commented Jan 7, 2014

taldcroft commented Jan 7, 2014

mdboom commented Jan 15, 2014

embray commented Feb 12, 2014

taldcroft commented Feb 12, 2014

embray commented Feb 12, 2014

embray commented Feb 12, 2014

embray commented Feb 14, 2014

delimiter in CSV writer should support unicode #1949

delimiter in CSV writer should support unicode #1949

Conversation

taldcroft commented Jan 7, 2014

taldcroft commented Jan 7, 2014

mdboom commented Jan 15, 2014

embray commented Feb 12, 2014

taldcroft commented Feb 12, 2014

embray commented Feb 12, 2014

embray commented Feb 12, 2014

embray commented Feb 14, 2014