Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds plugin, example and tests for Vaex (#457) #708

Merged
merged 4 commits into from
Feb 24, 2024

Conversation

tyapochkin
Copy link
Contributor

@tyapochkin tyapochkin commented Feb 21, 2024

This PR adds a plugin for Vaex and closes #457

Changes

  • hamilton/plugins/vaex_extensions.py: registration of vaex types
  • hamilton/plugins/h_vaex.py: code of VaexDataFrameResult
  • plugin_tests/h_vaex/test_h_vaex.py: unit tests
  • plugin_tests/h_vaex/resources/functions.py: functions for unit tests
  • examples/vaex/notebook.ipynb: example with vaex as a jupyter notebook
  • examples/vaex/my_script.py: example with vaex as a script
  • examples/vaex/my_functions.py: functions for example with vaex
  • examples/vaex/requirements.txt: requirements for example with vaex
  • .circleci/config.yml: added new task for vaex
  • .ci/setup.sh: added new task for vaex
  • .ci/test.sh: added new task for vaex
  • setup.py: added new installation option sf-hamilton[vaex]

How I tested this

I added and ran a new CircleCI job with unit tests for VaexDataFrameResult.

Notes

Vaex builds its wheels so long and depends on so many additional packages, that I decided to take it out in the separate installation option sf-hamilton[vaex] and separate CircleCI task vaex-py39. Right now, it doesn't support Python 3.11 and 3.12, so I created only one task for just one Python version 3.9.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

import sys

from hamilton import base, driver
from hamilton.plugins import h_vaex, vaex_extensions # noqa F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good just please add vaex to the plugins list in function_modifiers/base.py -- that'll save people an import :)

I'll then run this locally this week, and then we can plan to release this next week. 🚀

@zilto
Copy link
Collaborator

zilto commented Feb 22, 2024

Great job on the PR! Excited to try it!

@skrawcz @elijahbenizzy Should we move tests to tests/plugins/test_vaex_extensions or tests/plugins/test_h_vaex instead of plugin_tests/h_vaex/?

We have a growing list of test dependencies for plugins / extensions and we might make that more modular in CI tasks?

@skrawcz
Copy link
Collaborator

skrawcz commented Feb 23, 2024

Great job on the PR! Excited to try it!

@skrawcz @elijahbenizzy Should we move tests to tests/plugins/test_vaex_extensions or tests/plugins/test_h_vaex instead of plugin_tests/h_vaex/?

We have a growing list of test dependencies for plugins / extensions and we might make that more modular in CI tasks?

Sorry @tyapochkin @zilto I confused myself with what I was talking about earlier.

I think this test setup is fine, and mirrors what we have for spark, etc.

Basically, if it's special then plugins_tests/ is where to put it. Otherwise tests/plugins is fine.

Adding to the registry so we try to load the extension if vaex is in the environment.
Then fixing a few minor typos otherwise to get things to work.
@tyapochkin
Copy link
Contributor Author

tyapochkin commented Feb 23, 2024

Oh good point. @tyapochkin any reason for your current test structure? It doesn't mirror what we currently do. And @zilto you have a point about scaling this.

Yeah, my logic is the following:

  • I created distinct installation option sf_hamilton[vaex] and distinct task for circleci because vaex pulls a lot of additional requirements.
  • So I can't add vaex tests to the default folder tests just because default circleci job runs pytest tests. So if I add vaex tests there, the default circleci job will also run these vaex tests, and the vaex tests will fail just because vaex is not installed
  • Additionally, we can't add vaex to standard requirements-test.txt because it still doesn't support python 3.11 and 3.12.

So for me, separating tests for vaex in distinct folder outside tests still looks like better solution

@skrawcz
Copy link
Collaborator

skrawcz commented Feb 23, 2024

Oh good point. @tyapochkin any reason for your current test structure? It doesn't mirror what we currently do. And @zilto you have a point about scaling this.

Yeah, my logic is the following:

  • I created distinct installation option sf_hamilton[vaex] and distinct task for circleci because vaex pulls a lot of additional requirements.
  • So I can't add vaex tests to the default folder tests just because default circleci job runs pytest tests. So if I add vaex tests there, the default circleci job will also run these vaex tests, and the vaex tests will fail just because vaex is not installed
  • Additionally, we can't add vaex to standard requirements.txt because it still doesn't support python 3.11 and 3.12.

So for me, separating tests for vaex in distinct folder outside tests still looks like better solution

yep yep, I think what you have is fine.

@skrawcz skrawcz merged commit e2ea1d8 into DAGWorks-Inc:main Feb 24, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[good first issue - beginner] Add Hamilton example using Vaex
3 participants