Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet export IR #183

Merged
merged 3 commits into from
Jun 1, 2022
Merged

Parquet export IR #183

merged 3 commits into from
Jun 1, 2022

Conversation

max-hoffman
Copy link
Collaborator

Exporting data through a CSV intermediary is subject to loss
of specificity and type info. This is particularly noticable
for read_pandas, where the resulting dataframe has every column
of type object and NULLs are indistinguishable from zero values.

I used a small hack to export data from Dolt into a DataFrame using parquet
instead of CSV. This requires the pyarrow dependency.

I left TODOs for improvements on the Dolt side that would make
this code cleaner and Dolt issues for the associated features.

There is one bug with NULL datetime values that I added a Dolt issue
for.

Exporting data through a CSV intermediary is subject to loss
of specificity and type info. This is particularly noticable
for read_pandas, where the resulting dataframe has every column
of type `object` and NULLs are indistinguishable from zero values.

I used a small hack to export data from Dolt into a DataFrame using parquet
instead of CSV. This requires the pyarrow dependency.

I left TODOs for improvements on the Dolt side that would make
this code cleaner and Dolt issues for the associated features.

There is one bug with NULL datetime values that I added a Dolt issue
for.
@max-hoffman
Copy link
Collaborator Author

re: #179

@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2022

Codecov Report

Merging #183 (83a5f9e) into main (f3c83cc) will increase coverage by 1.10%.
The diff coverage is 95.65%.

@@            Coverage Diff             @@
##             main     #183      +/-   ##
==========================================
+ Coverage   42.88%   43.98%   +1.10%     
==========================================
  Files          23       23              
  Lines         977      998      +21     
==========================================
+ Hits          419      439      +20     
- Misses        558      559       +1     
Impacted Files Coverage Δ
doltpy/cli/read.py 97.05% <95.65%> (-2.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3c83cc...83a5f9e. Read the comment docs.

@max-hoffman max-hoffman merged commit 0043e7e into main Jun 1, 2022
@max-hoffman max-hoffman deleted the max/export-pq-ir branch June 1, 2022 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants