New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] CSV reader: Ability to not infer column types. #22232
Comments
Antoine Pitrou / @pitrou: We could add more inference options, though, for example to select the datatypes for which inference is enabled. |
Antoine Pitrou / @pitrou: |
Wes McKinney / @wesm: |
Neal Richardson / @nealrichardson: |
Antoine Pitrou / @pitrou: |
Neal Richardson / @nealrichardson: If
|
Antoine Pitrou / @pitrou: |
Wes McKinney / @wesm: |
I'm trying to read CSV as is. All columns as strings. I don't know the schema of these CSVs and they will vary as they are provided by user.
Right now i'm using pandas.read_csv(dtype=str) which works great, but since final destination of these CSVs are parquet files it seems like much more efficient to use pyarrow.csv.read_csv in future, as soon as this becomes available :)
I tried things like
pyarrow.csv.read_csv(convert_types=ConvertOptions(columns_types=defaultdict(lambda: 'string')))
but it doesn't work.Maybe I just didnt' find something that already exists? :)
Environment: Ubuntu Xenial
Reporter: Bogdan Klichuk
Note: This issue was originally created as ARROW-5811. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: