Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs to show parquet files need to be serialised with version 1.0 to be visible in the Azure ML studio #104875

Open
adriantorrie opened this issue Feb 7, 2023 · 3 comments

Comments

@adriantorrie
Copy link

[Enter feedback here]
When saving parquet files as mltables, with one of the following ways, the file data is not viewable

# Assume we are going to save a pandas dataframe once split into train/test sets
 
# 1 doesn't work
df.to_parquet(Path(output_dir, "data.parquet"))

# nor 2
df.to_parquet(Path(output_dir, "data.parquet"), engine='pyarrow')

# nor 3
pyarrow_table = pa.Table.from_pandas(df)
pq.write_table(pyarrow_table, Path(output_dir, "data.parquet"))

# and finally save the MLTable file
table = mltable.from_parquet_files(paths=[{"pattern": "./*parquet"}])
table.save(Path(output_dir), overwrite=True)

Error:

image

However, if I explicitly use the Parquet version:

# Note the `version` flag within the `pq.write_table` function
pyarrow_table = pa.Table.from_pandas(df)
pq.write_table(pyarrow_table, Path(output_dir, "data.parquet"), version="1.0")

# and finally save the MLTable file
table = mltable.from_parquet_files(paths=[{"pattern": "./*parquet"}])
table.save(Path(output_dir), overwrite=True)

We can view the data:

image


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

@ManoharLakkoju-MSFT
Copy link
Contributor

@adriantorrie
Thanks for your feedback! We will investigate and update as appropriate.

@RamanathanChinnappan-MSFT
Copy link
Contributor

@adriantorrie

I've delegated this to @samuel100, a content author, to review and share their valuable insights.

@adriantorrie
Copy link
Author

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants