Support pyarrow output for GenomicsDB queries #51

nalinigans · 2024-06-20T23:47:41Z

No description provided.

test/test_genomicsdb.py

.github/scripts/install_prereqs.sh

.github/workflows/basic.yml

nalinigans · 2024-06-21T00:06:04Z

src/genomicsdb.pyx

+          self._genomicsdb.query_variant_calls(processor, array_name, scan_range)
+
+      if non_blocking:
+        query_thread = threading.Thread(target=query_calls)


Will be changing this to using the thread pool from asyncio

oliviawebber · 2024-06-21T23:34:11Z

requirements.txt

@@ -28,3 +28,4 @@
 numpy>=1.24.0
 pandas>=2.1.0
 protobuf>=4.21.1
+pyarrow


Could a version specifier be added to this dependency?

@oliviawebber , any suggestions for pyarrow versions? I don't want to introduce incompatibilities with other packages.

If there's no known minimum, my usual rule of thumb is to pick a version from like ~1 year ago which would give 14.0.0. The one library I know we use that might eventually pull in pyarrow looks like it requires >= 7.0.0.

Sorta related: I ran into this issue https://stackoverflow.com/questions/78634235/numpy-dtype-size-changed-may-indicate-binary-incompatibility-expected-96-from recently. Based on that (and maybe this migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html) do we also need to pin to numpy<2.0.0?

Yes, I am using numpy C interface that may need porting to 2. So will have

numpy>=1.25,<2.0.0 pyarrow>=14.0.0

src/genomicsdb.pyx

oliviawebber · 2024-06-21T23:42:08Z

src/genomicsdb.pyx

+          schema_obj = _ArrowSchemaWrapper._import_from_c_capsule(schema_capsule)
+          schema = pa.schema(schema_obj.children_schema)
+          yield schema.serialize().to_pybytes()
+        if arrow_array and arrow_schema:


The and arrow_schema part of this check seems redundant given the arrow_schema == NULL check just above, or is there some truthiness conversion?

mlathara

Some minor questions

src/genomicsdb.pyx

mlathara · 2024-06-24T19:56:47Z

src/genomicsdb.pyx

+          break
+
+      if non_blocking:
+        query_thread.join()


Do we need to clean up the array and schema here? Or does the client do that?

The client does the cleanup here as the operations are zero copy. See

GenomicsDB-Python/src/utils.pxi

Lines 102 to 103 in 4e37170

cdef object pycapsule_get_arrow_schema(void *schema):

return PyCapsule_New(<ArrowSchema*>schema, "arrow_schema", &pycapsule_delete_arrow_schema);

and

GenomicsDB-Python/src/utils.pxi

Lines 105 to 106 in 4e37170

cdef object pycapsule_get_arrow_array(void *array):

return PyCapsule_New(<ArrowArray*>array, "arrow_array", &pycapsule_delete_arrow_array);

where we register the callbacks for cleanup for schema and array respectively.

…csDB-Python#51, address review comments and some cleanup

…lls() (#338) * Support arrow output using nanoarrow in respnse to query_variant_calls() * Introduce semaphore for separate array_schema() see GenomicsDB/GenomicsDB-Python#51, address review comments and some cleanup * Test coverage for ArrowVariantCallProcessor::allocate_schema

mlathara

lgtm

Nalini Ganapati added 2 commits June 20, 2024 16:07

Support pyarrow output for GenomicsDB queries

4752272

Modify message while installing devtoolset-11 gcc for cibuildwheel

fb35067

nalinigans commented Jun 20, 2024

View reviewed changes

test/test_genomicsdb.py Outdated Show resolved Hide resolved

nalinigans commented Jun 20, 2024

View reviewed changes

.github/scripts/install_prereqs.sh Outdated Show resolved Hide resolved

nalinigans commented Jun 20, 2024

View reviewed changes

.github/workflows/basic.yml Outdated Show resolved Hide resolved

nalinigans requested review from mlathara and oliviawebber June 21, 2024 00:04

nalinigans commented Jun 21, 2024

View reviewed changes

Add arrow tests to test_genomicsdb_demo for benchmarking

58781a0

oliviawebber reviewed Jun 21, 2024

View reviewed changes

mlathara reviewed Jun 24, 2024

View reviewed changes

nalinigans pushed a commit to GenomicsDB/GenomicsDB that referenced this pull request Jun 24, 2024

Introduce semaphore for separate array_schema() see GenomicsDB/Genomi…

0425e2a

…csDB-Python#51, address review comments and some cleanup

Nalini Ganapati added 2 commits June 24, 2024 17:10

Address review comments

6626ac4

Rename non_blocking to batching mode

e382ddc

Move back to using develop for builds

4e37170

nalinigans requested review from oliviawebber and mlathara June 25, 2024 15:55

mlathara approved these changes Jun 25, 2024

View reviewed changes

nalinigans merged commit e082f3c into develop Jun 25, 2024

nalinigans deleted the ng_arrow branch June 25, 2024 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support pyarrow output for GenomicsDB queries #51

Support pyarrow output for GenomicsDB queries #51

Uh oh!

nalinigans commented Jun 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nalinigans Jun 21, 2024 •

edited

Loading

Uh oh!

oliviawebber Jun 21, 2024

Uh oh!

nalinigans Jun 24, 2024

Uh oh!

oliviawebber Jun 24, 2024

Uh oh!

mlathara Jun 24, 2024

Uh oh!

nalinigans Jun 24, 2024

Uh oh!

Uh oh!

oliviawebber Jun 21, 2024

Uh oh!

nalinigans Jun 22, 2024

Uh oh!

mlathara left a comment

Uh oh!

Uh oh!

mlathara Jun 24, 2024

Uh oh!

nalinigans Jun 25, 2024 •

edited

Loading

Uh oh!

mlathara left a comment

Uh oh!

Uh oh!

	cdef object pycapsule_get_arrow_schema(void *schema):
	return PyCapsule_New(<ArrowSchema*>schema, "arrow_schema", &pycapsule_delete_arrow_schema);

	cdef object pycapsule_get_arrow_array(void *array):
	return PyCapsule_New(<ArrowArray*>array, "arrow_array", &pycapsule_delete_arrow_array);

Support pyarrow output for GenomicsDB queries #51

Support pyarrow output for GenomicsDB queries #51

Uh oh!

Conversation

nalinigans commented Jun 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nalinigans Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlathara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nalinigans Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlathara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nalinigans Jun 21, 2024 •

edited

Loading

nalinigans Jun 25, 2024 •

edited

Loading