You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, I try to export numeric data plus some metadata in Python into to a parquet file and read it in R. However, the metadata seems to be a dict in Python but a string in R. I would have expected a list (which is roughly a dict in Python). Am I missing something? Here is the code to demonstrate the issue:
import sys import numpy as np import pyarrow as pa import pyarrow.parquet as pq print(sys.version) print(pa.__version__) x = np.random.randint(0, 10, (10, 3)) arrays = [pa.array(x[:, i]) for i in range(x.shape[1])] table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'], metadata=\{'foo': '42'}) pq.write_table(table, 'array.parquet', compression='snappy') table = pq.read_table('array.parquet') metadata = table.schema.metadata print(metadata) print(type(metadata))
Currently, I try to export numeric data plus some metadata in Python into to a parquet file and read it in R. However, the metadata seems to be a dict in Python but a string in R. I would have expected a list (which is roughly a dict in Python). Am I missing something? Here is the code to demonstrate the issue:
import sys
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
print(sys.version)
print(pa.__version__)
x = np.random.randint(0, 10, (10, 3))
arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]
table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],
metadata=\{'foo': '42'})
pq.write_table(table, 'array.parquet', compression='snappy')
table = pq.read_table('array.parquet')
metadata = table.schema.metadata
print(metadata)
print(type(metadata))
And in R:
library(arrow)
print(R.version)
print(packageVersion("arrow"))
table <- read_parquet("array.parquet", as_data_frame = FALSE)
metadata <- table$schema$metadata
print(metadata)
print(is(metadata))
print(metadata["foo"])``
Output Python:
3.6.8 (default, Aug 7 2019, 17:28:10)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
0.13.0
OrderedDict([(b'foo', b'42')])
<class 'collections.OrderedDict'>
Output R:
[1] ‘0.17.0’
[1] " -- metadata -- foo: 42"
[1] "character" "vector" "data.frameRowLabels"
[4] "SuperClassMethod"
[1] NA
Reporter: René Rex
Assignee: Neal Richardson / @nealrichardson
PRs and other links:
Note: This issue was originally created as ARROW-8703. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: