Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing na_values and quoting doesn't work. #602

Open
Matttman opened this issue Jan 11, 2018 · 5 comments
Open

Passing na_values and quoting doesn't work. #602

Matttman opened this issue Jan 11, 2018 · 5 comments

Comments

@Matttman
Copy link

I am trying to read from a tab delimited file and setting na_values and quoting aren't working. I used the same options with pandas and it creates the dataframe correctly.

t = odo(file, "postgresql://(removed)",
        encoding='utf-8',
        delimiter='\t',
        na_values="\\N",
        quoting=3,
        dshape=ds)

I was getting one warning when I running it, I fixed that as shown in #546 to make sure it wan't related and no change.

/usr/local/lib/python3.6/dist-packages/odo/backends/pandas.py:94: FutureWarning: pandas.tslib is deprecated and will be removed in a future version.
You can access NaTType as type(pandas.NaT)
  @convert.register((pd.Timestamp, pd.Timedelta), (pd.tslib.NaTType, type(None)))

If i run as is I get an error because of quotes in my data messing up the columns. I noticed in the error message some of the options specified don't seem to be set in parameters, quoting doesn't show up at all and na_values shows up but isn't set to what I specified.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
psycopg2.DataError: invalid input syntax for type boolean: "1996"
CONTEXT:  COPY titlebasics, line 679626, column isAdult: "1996"


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./makedb.py", line 42, in <module>
    main()
  File "./makedb.py", line 36, in main
    convert_csv()
  File "./makedb.py", line 30, in convert_csv
    dshape=ds)
  File "/usr/local/lib/python3.6/dist-packages/odo/odo.py", line 91, in odo
    return into(target, source, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/into.py", line 43, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/into.py", line 149, in into_string_string
    return into(a, resource(b, **kwargs), **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/into.py", line 43, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/into.py", line 143, in into_string
    return into(a, b, dshape=dshape, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/into.py", line 43, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/into.py", line 131, in into_object
    return append(target, source, dshape=dshape, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo/backends/sql_csv.py", line 233, in append_csv_to_sql_table
    conn.execute(stmt)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.DataError) invalid input syntax for type boolean: "1996"
CONTEXT:  COPY titlebasics, line 679626, column isAdult: "1996"
 [SQL: '\n            COPY titlebasics FROM %(path)s (\n                FORMAT CSV,\n                DELIMITER %(delimiter)s,\n                NULL %(na_value)s,\n                QUOTE %(quotechar)s,\n                ESCAPE %(escapechar)s,\n                HEADER %(header)s,\n                ENCODING %(encoding)s\n            )\n            '] [parameters: {'path': (removed), 'delimiter': '\t', 'na_value': '', 'quotechar': '"', 'escapechar': '\\', 'header': True, 'encoding': 'utf-8'}] (Background on this error at: http://sqlalche.me/e/9h9h)

I can sidestep that by setting quotechar='|', which doesn't exist in my data and stops the quotes from breaking it but obviously I would rather use quoting=QUOTE_NONE so that it doesn't end up breaking if i end up with a |. But na_values still doesn't work.

@llllllllll
Copy link
Member

That doesn't see like the same error. It looks live you have a numeric value for a boolean field:

CONTEXT:  COPY titlebasics, line 679626, column isAdult: "1996"

Also, what version of odo is this?

@Matttman
Copy link
Author

Using 0.5.0
It's getting the numeric value because of quotes in that row that cause a tab to get missed , if I run the same command with panads it ignores the quotes. I just included that error because I saw that the last line showed that not all of the values seem to be getting passed.
[SQL: '\n COPY titlebasics FROM %(path)s (\n FORMAT CSV,\n DELIMITER %(delimiter)s,\n NULL %(na_value)s,\n QUOTE %(quotechar)s,\n ESCAPE %(escapechar)s,\n HEADER %(header)s,\n ENCODING %(encoding)s\n )\n '] [parameters: {'path': (removed), 'delimiter': '\t', 'na_value': '', 'quotechar': '"', 'escapechar': '\\', 'header': True, 'encoding': 'utf-8'}] (Background on this error at: http://sqlalche.me/e/9h9h)
I see the 'delimiter': '\t' that I set is there but 'na_value': '' isn't set to my option, and I don't see quoting at all.

@llllllllll
Copy link
Member

Can you try this on master, I remember dealing with something like this in the past. There hasn't been an odo release in a comically long time so there are a lot of unreleased bug fixes.

@Matttman
Copy link
Author

Tried on master and ran into a different error. Seems to be this #585 I didn't see anyone with a fix for it.

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "./makedb.py", line 43, in <module>
    main()
  File "./makedb.py", line 37, in main
    convert_csv()
  File "./makedb.py", line 31, in convert_csv
    dshape=ds)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/odo.py", line 91, in odo
    return into(target, source, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/into.py", line 43, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/into.py", line 149, in into_string_string
    return into(a, resource(b, **kwargs), **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/into.py", line 43, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/into.py", line 143, in into_string
    return into(a, b, dshape=dshape, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/into.py", line 43, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/into.py", line 131, in into_object
    return append(target, source, dshape=dshape, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/backends/sql.py", line 583, in append_anything_to_sql_Table
    return append(t, convert(Iterator, o, **kwargs), **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/core.py", line 83, in __call__
    return _transform(self.graph, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/core.py", line 106, in _transform
    x = f(x, excluded_edges=excluded_edges, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/odo-0.5.1+60.ge62822a-py3.6.egg/odo/backends/csv.py", line 371, in CSV_to_chunks_of_dataframes
    data = [first] + rest
TypeError: can only concatenate list (not "TextFileReader") to list

@Matttman
Copy link
Author

Realized this won't work with Version 0.5.0 because it looks like it only uses psycopg2's copy_from to convert the csv to sql and never actually makes a dataframe. It does accept a null_value so I was able to set that instead of pandas null_values, but no quoting option.
It should work on master but somethings broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants