Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding official version for Fraud detection notebook #321

Conversation

sudarshan-SpringML
Copy link
Contributor

@sudarshan-SpringML sudarshan-SpringML commented Feb 23, 2022

This PR contains a notebook which shows you how to build, deploy, and analyze predictions from a simple random forest model using tools like scikit-learn, Vertex AI, and the What-IF Tool (WIT) on a synthetic fraud transaction dataset to solve a financial fraud detection problem.

  • Use the notebook template as a starting point.
  • Follow the style and grammar rules outlined in the above notebook template.
  • Verify the notebook runs successfully in Colab since the automated tests cannot guarantee this even when it passes.
  • Passes all the required automated checks. You can locally test for formatting and linting with these instructions.
  • You have consulted with a tech writer to see if tech writer review is necessary. If so, the notebook has been reviewed by a tech writer, and they have approved it.
  • This notebook has been added to the CODEOWNERS file under # Official Notebooks section, pointing to the author or the author's team.
  • The Jupyter notebook cleans up any artifacts it has created (datasets, ML models, endpoints, etc) so as not to eat up unnecessary resources.

@sudarshan-SpringML sudarshan-SpringML requested a review from a team as a code owner February 23, 2022 12:13
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@kweinmeister
Copy link
Contributor

/gcbrun

@andrewferlitsch
Copy link
Contributor

Build error:
ModuleNotFoundError Traceback (most recent call last)
Step #3: Input In [17], in
Step #3: 1 import warnings
Step #3: ----> 3 import joblib
Step #3: 4 import matplotlib.pyplot as plt
Step #3: 5 import numpy as np
Step #3:
Step #3: ModuleNotFoundError: No module named 'joblib'

@ivanmkc
Copy link
Contributor

ivanmkc commented Feb 25, 2022

/gcbrun

@ivanmkc
Copy link
Contributor

ivanmkc commented Feb 25, 2022

tldr: pip install joblib. You don't need an IS_TESTING check as this is a required dependency. Don't worry about what Workbench or Colab have preinstalled, just install any dependencies that our base image doesn't provide. That is the source-of-truth.

@sudarshan-SpringML sudarshan-SpringML changed the title Fraud detection notebook Adding official version for Fraud detection notebook Mar 1, 2022
@sudarshan-SpringML
Copy link
Contributor Author

notebook ready for review

@kweinmeister
Copy link
Contributor

/gcbrun

@@ -9,6 +9,57 @@
"# Build a fraud detection model on Vertex AI"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #8.        serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest",

Can we pull this out as a constant variable? Also, let's use the latest version 1.0 listed here: https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers


Reply via ReviewNB

@kweinmeister
Copy link
Contributor

@sudarshan-SpringML added one comment here: https://app.reviewnb.com/GoogleCloudPlatform/vertex-ai-samples/pull/321/

Other than that, looks good!

@kweinmeister
Copy link
Contributor

/gcbrun

@sudarshan-SpringML
Copy link
Contributor Author

sudarshan-SpringML commented Mar 11, 2022 via email

@sudarshan-SpringML
Copy link
Contributor Author

@sudarshan-SpringML the kernel died again. Can you please check the specific cell below? Note that cell 66 is the one that prints out the dimensions. Two cells below, the issue occurs.

INFO:papermill:Executing Cell 66--------------------------------------
INFO:papermill:(4453834, 11) (1908786, 11)

(4453834, 11) (1908786, 11)
INFO:papermill:Ending Cell 66-----------------------------------------
INFO:papermill:Executing Cell 67--------------------------------------
INFO:papermill:Ending Cell 67-----------------------------------------
INFO:papermill:Executing Cell 68--------------------------------------
ERROR:papermill:Kernel died while waiting for execute reply.
INFO:papermill:Ending Cell 68-----------------------------------------

@kweinmeister I think this cell is failing for me-> "forest = RandomForestClassifier()
forest.fit(X_train, y_train)". For me this cell takes around 10-15 mins.

@sudarshan-SpringML
Copy link
Contributor Author

added debug statements to find the cell where kernel dies

@andrewferlitsch
Copy link
Contributor

/gcbrun

@kweinmeister
Copy link
Contributor

/gcbrun

@andrewferlitsch
Copy link
Contributor

Weird error. See if I can force a retest.

Traceback (most recent call last):
Step #3: File "/workspace/.cloud-build/execute_notebook_cli.py", line 36, in
Step #3: execute_notebook_helper.execute_notebook(
Step #3: File "/workspace/.cloud-build/execute_notebook_helper.py", line 91, in execute_notebook
Step #3: raise execution_exception
Step #3: File "/workspace/.cloud-build/execute_notebook_helper.py", line 56, in execute_notebook
Step #3: pm.execute_notebook(
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 107, in execute_notebook
Step #3: nb = papermill_engines.execute_notebook_with_engine(
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine
Step #3: return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/engines.py", line 359, in execute_notebook
Step #3: cls.execute_managed_notebook(nb_man, kernel_name, log_output=log_output, **kwargs)
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/engines.py", line 418, in execute_managed_notebook
Step #3: return PapermillNotebookClient(nb_man, **final_kwargs).execute()
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/clientwrap.py", line 45, in execute
Step #3: self.papermill_execute_cells()
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/clientwrap.py", line 72, in papermill_execute_cells
Step #3: self.execute_cell(cell, index)
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/nbclient/util.py", line 84, in wrapped
Step #3: return just_run(coro(*args, **kwargs))
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/nbclient/util.py", line 62, in just_run
Step #3: return loop.run_until_complete(coro)
Step #3: File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
Step #3: return future.result()
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/nbclient/client.py", line 953, in async_execute_cell
Step #3: raise DeadKernelError("Kernel died")
Step #3: nbclient.exceptions.DeadKernelError: Kernel died

@andrewferlitsch
Copy link
Contributor

/gcbrun

@andrewferlitsch
Copy link
Contributor

Step #3: Traceback (most recent call last):
Step #3: File "/workspace/.cloud-build/execute_notebook_cli.py", line 36, in
Step #3: execute_notebook_helper.execute_notebook(
Step #3: File "/workspace/.cloud-build/execute_notebook_helper.py", line 91, in execute_notebook
Step #3: raise execution_exception
Step #3: File "/workspace/.cloud-build/execute_notebook_helper.py", line 56, in execute_notebook
Step #3: pm.execute_notebook(
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/execute.py", line 107, in execute_notebook
Step #3: nb = papermill_engines.execute_notebook_with_engine(
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine
Step #3: return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/engines.py", line 359, in execute_notebook
Step #3: cls.execute_managed_notebook(nb_man, kernel_name, log_output=log_output, **kwargs)
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/engines.py", line 418, in execute_managed_notebook
Step #3: return PapermillNotebookClient(nb_man, **final_kwargs).execute()
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/clientwrap.py", line 45, in execute
Step #3: self.papermill_execute_cells()
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/papermill/clientwrap.py", line 72, in papermill_execute_cells
Step #3: self.execute_cell(cell, index)
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/nbclient/util.py", line 84, in wrapped
Step #3: return just_run(coro(*args, **kwargs))
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/nbclient/util.py", line 62, in just_run
Step #3: return loop.run_until_complete(coro)
Step #3: File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
Step #3: return future.result()
Step #3: File "/builder/home/.local/lib/python3.9/site-packages/nbclient/client.py", line 953, in async_execute_cell
Step #3: raise DeadKernelError("Kernel died")
Step #3: nbclient.exceptions.DeadKernelError: Kernel died
Finished Step #3

@andrewferlitsch
Copy link
Contributor

@kweinmeister Has anyone debugged the reason for kernel died?

@kweinmeister
Copy link
Contributor

/gcbrun

@andrewferlitsch
Copy link
Contributor

kernel died again

@ivanmkc
Copy link
Contributor

ivanmkc commented Apr 19, 2022

/gcbrun

@ivanmkc
Copy link
Contributor

ivanmkc commented Apr 19, 2022

Seems like it's dying on forest.fit(X_train, y_train).

Try initializing RandomForestClassifier with verbose=1 to get more info.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

@sudarshan-SpringML
Copy link
Contributor Author

Seems like it's dying on forest.fit(X_train, y_train).

Try initializing RandomForestClassifier with verbose=1 to get more info.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Done

@andrewferlitsch
Copy link
Contributor

@sudarshan-SpringML Please run this notebook manually in Workbench AI and see if the kernel dies or not.

@sudarshan-SpringML
Copy link
Contributor Author

sudarshan-SpringML commented Apr 21, 2022 via email

@andrewferlitsch
Copy link
Contributor

per comment, notebook runs a-ok in Workbench, so will manually merge.

@andrewferlitsch andrewferlitsch merged commit 831c515 into GoogleCloudPlatform:main Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants