-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError: Not yet implemented: Unsupported encoding. #442
Comments
Hi @csy08 ! Thanks for reporting it. Currently CUR writes your report using the “V2” Parquet Schema. And Apache Arrow (Wrangler's dependency to handle low-level parquet functionalities), doesn’t currently support the encodings used by this schema. So, unfortunately, Wrangler/Apache Arrow will not be able to read columns encoded with My recommendation is to use CSV/JSON, or just skip not supported columns if you can. Reference: https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-strings-delta_byte_array--7 P.S. Let's block this issue until we have a CUR or Apache Arrow update. |
@igorborgest Thank you for coming back to me. Understood and shall try to avoid parquet files fo now. |
HI @igorborgest, |
This will be naturally supported after CUR or Apache Arrow overcame this incompatibility. |
One way to work around this is to use Athena CTAS statements instead to manipulate the data. As a bonus, this offloads the CPU and memory intensive operations to Athena. Here is an example that uses CTAS to write CUR joined with AWS Account metadata: https://github.com/aws-samples/glue-enrich-cost-and-usage. |
When trying to read in an AWS Cost and Usage Report (CUR) in Parquet format, I get an encoding error on a particular file when I request the column
identity_line_item_id
(it just seems to be this column that has the issue).The column
identity_line_item_id
should just be alphanumeric so I dont understand why it's causing an issue. Examples:Code
I can use the following code and export to CSV or JSON without a problem
The text was updated successfully, but these errors were encountered: