Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples: Example with pandas for split apply combine #753

Merged
merged 16 commits into from
Mar 10, 2024

Conversation

nhuray
Copy link
Contributor

@nhuray nhuray commented Mar 9, 2024

Hi everyone,

This is my first Pull Request to contribute to the project 🎉

As a beginner I had issue reading the documentation about how to implement a simple split / apply / combine on a pandas DataFrame.

So after discussing it on Slack with @skrawcz and @elijahbenizzy, I decided to investigate how things works creating a simple code example.

Changes

The changes are located in examples/pandas/split-apply-combine (can be renamed).

This is the files I added:

  • README.md: to explain the example
  • my_functions.py: module when all the transformation functions have been implemented
  • my_wrapper.py: module when I just define a class with static methods wrapping the Hamilton driver
  • my_script.py: module defining the input for the example and run the wrapper to execute the dataflow

Screenshots

Here is the screenshots of the console when we run the example:

Screenshot 2024-03-08 at 10 28 57 PM

And this is the DAG generated:

my_full_dag

How I tested this

I just run it locally in Debug to ensure the output expected is rendered and the DAG generated.

Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example. Looks good. Mostly stylistic feedback and questions :)

Oh, and can you add the image too, that way it can also be displayed in the README please :)

examples/pandas/split-apply-combine/README.md Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/README.md Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/my_functions.py Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/my_functions.py Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/my_functions.py Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/my_functions.py Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/my_functions.py Outdated Show resolved Hide resolved
examples/pandas/split-apply-combine/my_functions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I got these pandas warnings:

hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Rate"] = output["Tax Rate"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:69: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Credit"] = output["Tax Credit"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Rate"] = output["Tax Rate"]

I sugget doing pd.concat([df, output], axis=1) instead

nhuray and others added 3 commits March 9, 2024 05:58
Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>
Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>
@nhuray
Copy link
Contributor Author

nhuray commented Mar 9, 2024

Otherwise I got these pandas warnings:

hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Rate"] = output["Tax Rate"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:69: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Credit"] = output["Tax Credit"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Rate"] = output["Tax Rate"]

I sugget doing pd.concat([df, output], axis=1) instead

I also got them ! It was my plan to fix it today !

@nhuray
Copy link
Contributor Author

nhuray commented Mar 9, 2024

Otherwise I got these pandas warnings:

hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Rate"] = output["Tax Rate"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:69: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Credit"] = output["Tax Credit"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Tax Rate"] = output["Tax Rate"]

I sugget doing pd.concat([df, output], axis=1) instead

I also got them ! It was my plan to fix it today !

Fixed !

@nhuray nhuray requested a review from skrawcz March 9, 2024 12:47
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful. I think this looks good to me. CC @elijahbenizzy ?

We'll squash merge this one so will take the PR description largely.

Otherwise not sure what's up with circleci. It should have run -- only thing I'd need to check is the pre-commit check.

examples/pandas/split-apply-combine/my_full_dag Outdated Show resolved Hide resolved
@skrawcz
Copy link
Collaborator

skrawcz commented Mar 10, 2024

@nhuray okay pre-commit failed locally for me:

$ pre-commit run --all-files
black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted examples/pandas/split-apply-combine/my_wrapper.py
reformatted examples/pandas/split-apply-combine/my_functions.py

All done! ✨ 🍰 ✨
2 files reformatted, 519 files left unchanged.

trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing examples/pandas/split-apply-combine/README.md

fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing examples/pandas/split-apply-combine/README.md

fix requirements.txt.....................................................Passed
check python ast.........................................................Passed
isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing /hamilton/examples/pandas/split-apply-combine/my_functions.py
Fixing /hamilton/examples/pandas/split-apply-combine/my_script.py
Fixing /hamilton/examples/pandas/split-apply-combine/my_wrapper.py

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

examples/pandas/split-apply-combine/my_functions.py:8:1: F401 'hamilton.function_modifiers.value' imported but unused

So if you run that locally it should fix/flag things to fix.

Thanks!

@nhuray
Copy link
Contributor Author

nhuray commented Mar 10, 2024

@skrawcz I removed the my_full_dag file, add a local .gitignore and fix the lint issues !

I think the PR is ready to be merged now ! Thanks for the review and suggestions.

@skrawcz skrawcz merged commit e93b829 into DAGWorks-Inc:main Mar 10, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants