Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model_monitoring notebook is flaky #31

Closed
ivanmkc opened this issue Aug 19, 2021 · 4 comments
Closed

model_monitoring notebook is flaky #31

ivanmkc opened this issue Aug 19, 2021 · 4 comments
Assignees

Comments

@ivanmkc
Copy link
Contributor

ivanmkc commented Aug 19, 2021

Describe the bug
model_monitoring notebook is flaky

Step #4: model_monitoring.ipynb                                            FAILED    00:28:43    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [13]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         _InactiveRpcError                         Traceback (most recent call last)
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
Step #4:                                                                                              66         try:
Step #4:                                                                                         ---> 67             return callable_(*args, **kwargs)
Step #4:                                                                                              68         except grpc.RpcError as exc:
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
Step #4:                                                                                             945                                       wait_for_ready, compression)
Step #4:                                                                                         --> 946         return _end_unary_response_blocking(state, call, False, None)
Step #4:                                                                                             947
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
Step #4:                                                                                             848     else:
Step #4:                                                                                         --> 849         raise _InactiveRpcError(state)
Step #4:                                                                                             850
Step #4: 
Step #4:                                                                                         _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
Step #4:                                                                                         	status = StatusCode.INVALID_ARGUMENT
Step #4:                                                                                         	details = "List of found errors:	1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.	"
Step #4:                                                                                         	debug_error_string = "{"created":"@1629366729.915949502","description":"Error received from peer ipv4:74.125.20.95:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"List of found errors:\t1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.\t","grpc_status":3}"
Step #4:                                                                                         >
Step #4: 
Step #4:                                                                                         The above exception was the direct cause of the following exception:
Step #4: 
Step #4:                                                                                         InvalidArgument                           Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_31500/3164745998.py in <module>
Step #4:                                                                                              20 objective_configs = set_objectives(model_ids, objective_template)
Step #4:                                                                                              21
Step #4:                                                                                         ---> 22 monitoring_job = create_monitoring_job(objective_configs)
Step #4: 
Step #4:                                                                                         /tmp/ipykernel_31500/1942246749.py in create_monitoring_job(objective_configs)
Step #4:                                                                                              56     client = JobServiceClient(client_options=options)
Step #4:                                                                                              57     parent = f"projects/{PROJECT_ID}/locations/{REGION}"
Step #4:                                                                                         ---> 58     response = client.create_model_deployment_monitoring_job(
Step #4:                                                                                              59         parent=parent, model_deployment_monitoring_job=job
Step #4:                                                                                              60     )
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/cloud/aiplatform_v1beta1/services/job_service/client.py in create_model_deployment_monitoring_job(self, request, parent, model_deployment_monitoring_job, retry, timeout, metadata)
Step #4:                                                                                            2294
Step #4:                                                                                            2295         # Send the request.
Step #4:                                                                                         -> 2296         response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
Step #4:                                                                                            2297
Step #4:                                                                                            2298         # Done; return the response.
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py in __call__(self, *args, **kwargs)
Step #4:                                                                                             143             kwargs["metadata"] = metadata
Step #4:                                                                                             144
Step #4:                                                                                         --> 145         return wrapped_func(*args, **kwargs)
Step #4:                                                                                             146
Step #4:                                                                                             147
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
Step #4:                                                                                              67             return callable_(*args, **kwargs)
Step #4:                                                                                              68         except grpc.RpcError as exc:
Step #4:                                                                                         ---> 69             six.raise_from(exceptions.from_grpc_error(exc), exc)
Step #4:                                                                                              70
Step #4:                                                                                              71     return error_remapped_callable
Step #4: 
Step #4:                                                                                         /usr/local/lib/python3.9/site-packages/six.py in raise_from(value, from_value)
Step #4: 
Step #4:                                                                                         InvalidArgument: 400 List of found errors:	1.Field: model_deployment_monitoring_job; Message: model_deployment_monitoring_objective_configs is empty in ModelDeploymentMonitoringJob.
Step #4: metrics_viz_run_compare_kfp.ipynb                                 FAILED    00:01:55    ---------------------------------------------------------------------------
Step #4:                                                                                         Exception encountered at "In [28]":
Step #4:                                                                                         ---------------------------------------------------------------------------
Step #4:                                                                                         ValueError                                Traceback (most recent call last)
Step #4:                                                                                         /tmp/ipykernel_1835/3492064536.py in <module>
Step #4:                                                                                         ----> 1 df = pd.DataFrame(pipeline_df["metric.confidenceMetrics"][0])
Step #4:                                                                                               2 auc = np.trapz(df["recall"], df["falsePositiveRate"])
Step #4:                                                                                               3 plt.plot(df["falsePositiveRate"], df["recall"], label="auc=" + str(auc))
Step #4:                                                                                               4 plt.legend(loc=4)
Step #4:                                                                                               5 plt.show()
Step #4: 
Step #4:                                                                                         ~/.local/lib/python3.9/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
Step #4:                                                                                             728         else:
Step #4:                                                                                             729             if index is None or columns is None:
Step #4:                                                                                         --> 730                 raise ValueError("DataFrame constructor not properly called!")
Step #4:                                                                                             731
Step #4:                                                                                             732             # Argument 1 to "ensure_index" has incompatible type "Collection[Any]";
Step #4: 
Step #4:                                                                                         ValueError: DataFrame constructor not properly called!

What sample is this bug related to?

Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

System Information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Framework and version (Tensorflow, scikit-learn, XGBoost):
  • Python version:
  • Exact command to reproduce:
  • Tensorflow Transform environment (if applicable, see below):

To obtain the Tensorflow and Tensorflow Transform environment do

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context
Please fix (it may be hard to reproduce as it doesn't happen all the time) or move to unofficial.

@mco-gh
Copy link
Member

mco-gh commented Nov 18, 2021

Hi Ivan,

Was this observed in official? I have a pending new version in unofficial but I assume this relates to official since I think that's the only version you run your tests on.

@mco-gh
Copy link
Member

mco-gh commented Nov 18, 2021

Thinking about this some more, I'm going to start with the unofficial/community version, make sure it works, verify unit tests, and move it to official. The problem with debugging the older version in official is it's changed a fair bit in the community version (I added explainable AI and feature attribution modeling) so I may end up chasing phantom bugs in the old version.

@ivanmkc
Copy link
Contributor Author

ivanmkc commented Jan 24, 2022

@mco-gh This hasn't been failing recently so feel free to close this.

@mco-gh
Copy link
Member

mco-gh commented Jan 24, 2022 via email

@mco-gh mco-gh closed this as completed Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants