You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import cellxgene_census
from scipy.sparse import csr_matrix, coo_matrix
import tiledbsoma
import pyarrow as pa
census = cellxgene_census.open_soma(census_version="stable")
exp = census["census_data"]["homo_sapiens"]
obs = exp.obs
obs_df = obs.read().concat().to_pandas()
obs_df_shuffled = obs_df.sample(frac=1, random_state=1).reset_index(drop=True)
import pandas as pd
obs_df_shuffled["soma_joinid"] = pd.Series(range(len(obs_df_shuffled)))
idx = obs_df_shuffled.copy()["soma_joinid"]
idx.to_pickle('index.pkl')
with tiledbsoma.DataFrame.create("./obs", schema=obs.schema) as df:
data = pa.Table.from_pandas(
obs_df_shuffled
)
df.write(data)
Output
df.write(data)
File "/home/ssm-user/venv/lib/python3.10/site-packages/tiledbsoma/_dataframe.py", line 408, in write
col = values.column(name).combine_chunks()
File "pyarrow/table.pxi", line 746, in pyarrow.lib.ChunkedArray.combine_chunks
File "pyarrow/array.pxi", line 3775, in pyarrow.lib.concat_arrays
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
Analysis
This looks like an Arrow bug in its combine_chunks but perhaps there is something we can do to work around it ...
Reported by @ebezzi (see also #2120)
Repro script
Output
Analysis
This looks like an Arrow bug in its
combine_chunks
but perhaps there is something we can do to work around it ...cc @johnkerl @nguyenv for visibility.
The text was updated successfully, but these errors were encountered: