Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Let Gpu arrow python runners support writing one batch one time for the single threaded model. #9844

Closed
firestarman opened this issue Nov 23, 2023 · 0 comments · Fixed by #9833
Assignees
Labels
feature request New feature or request

Comments

@firestarman
Copy link
Collaborator

This is a follow-up for #9833.

That PR only changed the basic GpuArrowPythonRunner to support the single threaded model. Other GPU python runners (e.g. GpuCogroupArrowPythonRunner) still need to be updated.

@firestarman firestarman added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 23, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 28, 2023
firestarman added a commit that referenced this issue Nov 30, 2023
fix #9493
fix #9844

The python runner uses two separate threads to write and read data with Python processes, 
however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread.
Now the first reading is always ahead of the first writing. But the original BatchQueue will wait
on the first reading until the first writing is done. Then it will wait forever.

Change made:

- Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. 
   This can eliminate the order requirement of reading and writing.
- Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number
   peek on demand for the reading.
- Apply this new BatchQueue to relevant plans.
- Update the Python runners to support writing one batch one time for the singled-threaded model.
- Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3,
   and add a test (test_window_aggregate_udf_on_cpu) for it.
- Other small refactors
---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
razajafri pushed a commit to razajafri/spark-rapids that referenced this issue Jan 25, 2024
fix NVIDIA#9493
fix NVIDIA#9844

The python runner uses two separate threads to write and read data with Python processes, 
however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread.
Now the first reading is always ahead of the first writing. But the original BatchQueue will wait
on the first reading until the first writing is done. Then it will wait forever.

Change made:

- Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. 
   This can eliminate the order requirement of reading and writing.
- Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number
   peek on demand for the reading.
- Apply this new BatchQueue to relevant plans.
- Update the Python runners to support writing one batch one time for the singled-threaded model.
- Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3,
   and add a test (test_window_aggregate_udf_on_cpu) for it.
- Other small refactors
---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
razajafri added a commit that referenced this issue Jan 26, 2024
* Download Maven from apache.org archives (#10225)

Fixes #10224 

Replace broken install using apt by downloading Maven from apache.org.

Signed-off-by: Gera Shegalov <gera@apache.org>

* Fix a hang for Pandas UDFs on DB 13.3[databricks] (#9833)

fix #9493
fix #9844

The python runner uses two separate threads to write and read data with Python processes, 
however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread.
Now the first reading is always ahead of the first writing. But the original BatchQueue will wait
on the first reading until the first writing is done. Then it will wait forever.

Change made:

- Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. 
   This can eliminate the order requirement of reading and writing.
- Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number
   peek on demand for the reading.
- Apply this new BatchQueue to relevant plans.
- Update the Python runners to support writing one batch one time for the singled-threaded model.
- Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3,
   and add a test (test_window_aggregate_udf_on_cpu) for it.
- Other small refactors
---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Fix a potential data corruption for Pandas UDF (#9942)

This PR moves the BatchQueue into the DataProducer to share the same lock as the output iterator
returned by asIterator,  and make the batch movement from the input iterator to the batch queue be
an atomic operation to eliminate the race when appending the batches to the queue.

* Do some refactor for the Python UDF code to try to reduce duplicate code. (#9902)

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Fixed 330db Shims to Adopt the PythonRunner Changes [databricks] (#10232)

This PR removes the old 330db shims in favor of the new Shims, similar to the one in 341db. 

**Tests:**
Ran udf_test.py on Databricks 11.3 and they all passed. 

fixes #10228 

---------

Signed-off-by: raza jafri <rjafri@nvidia.com>

---------

Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: raza jafri <rjafri@nvidia.com>
Co-authored-by: Gera Shegalov <gera@apache.org>
Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>
@sameerz sameerz changed the title [FEA]Let Gpu arrow python runners support writing one batch one time for the single threaded model. [FEA] Let Gpu arrow python runners support writing one batch one time for the single threaded model. Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants