New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of from_numpy
Support for mode={ingest,schema_only,append}
#1185
Addition of from_numpy
Support for mode={ingest,schema_only,append}
#1185
Conversation
This pull request has been linked to Shortcut Story #17765: from_numpy support for mode={ingest,schema_only,append}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to use this; sorry, cannot accept or reject the PR :(
Using a local checkout of https://github.com/single-cell-data/TileDB-SingleCell/tree/main/apis/python and trying to re-upload:
$ git diff
diff --git a/apis/python/src/tiledbsc/uns_array.py b/apis/python/src/tiledbsc/uns_array.py
index 838f071..4fde242 100644
--- a/apis/python/src/tiledbsc/uns_array.py
+++ b/apis/python/src/tiledbsc/uns_array.py
@@ -93,13 +93,18 @@ class UnsArray(TileDBArray):
# Note arr.astype('str') does not lead to a successfuly tiledb.from_numpy.
arr = np.array(arr, dtype="O")
+ mode = "ingest"
+ if self.exists():
+ mode = "append"
+ logger.info(f"{self._indent}Re-using existing array {self.uri}")
+
# overwrite = False
# if self.exists:
# overwrite = True
# logger.info(f"{self._indent}Re-using existing array {self.uri}")
# tiledb.from_numpy(uri=self.uri, array=arr, ctx=self._ctx, overwrite=overwrite)
# TODO: find the right syntax for update-in-place (tiledb.from_pandas uses `mode`)
- tiledb.from_numpy(uri=self.uri, array=arr, ctx=self._ctx)
+ tiledb.from_numpy(uri=self.uri, array=arr, mode=mode, ctx=self._ctx)
logger.info(
util.format_elapsed(
$ ingestor anndata/pbmc3k_processed.h5ad s3://tiledb-johnkerl/scratch/try003
START SOMA.from_h5ad anndata/pbmc3k_processed.h5ad -> s3://tiledb-johnkerl/scratch/try003
START READING anndata/pbmc3k_processed.h5ad
FINISH READING anndata/pbmc3k_processed.h5ad TIME 0.076 seconds
START DECATEGORICALIZING
FINISH DECATEGORICALIZING TIME 0.005 seconds
START WRITING s3://tiledb-johnkerl/scratch/try003
START WRITING s3://tiledb-johnkerl/scratch/try003/uns
START WRITING s3://tiledb-johnkerl/scratch/try003/uns/draw_graph
START WRITING s3://tiledb-johnkerl/scratch/try003/uns/draw_graph/params
START WRITING FROM NUMPY.NDARRAY s3://tiledb-johnkerl/scratch/try003/uns/draw_graph/params/random_state
Traceback (most recent call last):
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/tools/ingestor", line 254, in <module>
main()
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/tools/ingestor", line 167, in main
ingest_one(
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/tools/ingestor", line 235, in ingest_one
tiledbsc.io.from_h5ad(soma, input_path)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/io.py", line 17, in from_h5ad
_from_h5ad_common(soma, input_path, from_anndata)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/io.py", line 46, in _from_h5ad_common
handler_func(soma, anndata)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/io.py", line 112, in from_anndata
soma.uns.from_anndata_uns(anndata.uns)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/uns_group.py", line 182, in from_anndata_uns
subgroup.from_anndata_uns(value)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/uns_group.py", line 182, in from_anndata_uns
subgroup.from_anndata_uns(value)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/uns_group.py", line 211, in from_anndata_uns
elif array._maybe_from_numpyable_object(value):
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/uns_array.py", line 65, in _maybe_from_numpyable_object
self.from_numpy_ndarray(arr)
File "/Users/johnkerl/git/single-cell-data/TileDB-SingleCell/apis/python/src/tiledbsc/uns_array.py", line 102, in from_numpy_ndarray
tiledb.from_numpy(uri=self.uri, array=arr, mode='append', ctx=self._ctx)
File "/Users/johnkerl/git/TileDB-Inc/TileDB-Py/tiledb/highlevel.py", line 98, in from_numpy
return tiledb.DenseArray.from_numpy(uri, array, ctx=_get_ctx(ctx, config), **kwargs)
File "tiledb/libtiledb.pyx", line 4302, in tiledb.libtiledb.DenseArrayImpl.from_numpy
File "tiledb/libtiledb.pyx", line 4307, in tiledb.libtiledb.DenseArrayImpl.from_numpy
File "tiledb/libtiledb.pyx", line 4816, in tiledb.libtiledb.DenseArrayImpl.write_direct
File "tiledb/libtiledb.pyx", line 574, in tiledb.libtiledb._raise_ctx_err
File "tiledb/libtiledb.pyx", line 559, in tiledb.libtiledb._raise_tiledb_error
tiledb.cc.TileDBError: [TileDB::Subarray] Error: Cannot add range to dimension '__dim_0'; Range [1, 1] is out of domain bounds [0, 0]
Here's what I do with tiledb.from_pandas
:
https://github.com/single-cell-data/TileDB-SingleCell/blob/main/apis/python/src/tiledbsc/annotation_dataframe.py#L226-L229
What I want, and what I thought this PR was (may I misunderstand this PR) is that I would be able to do with tiledb.from_numpy
what I'm already successfully doing with tiledb.from_pandas
.
c71d222
to
c978bfa
Compare
Based on our discussion earlier, I have added a By default,
|
@nguyenv sorry for the delay in replying -- this works beautifully! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although this would probably be a good time to pull as much of the schema_like_numpy code as possible out of cython for maintainability (we'll have to do it eventually either way).
c978bfa
to
f98fd54
Compare
🎉 |
No description provided.