Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect persistent history server uri for dataproc serverless notebook #70

Open
nyoungstudios opened this issue Sep 26, 2023 · 5 comments

Comments

@nyoungstudios
Copy link

Error message

Error from Gateway: [Bad Request] failure creating a backend resource: failure starting the kernel creation: failure starting the kernel creation: failure creating session: [400 Bad Request] generic::invalid_argument: com.google.cloud.hadoop.services.common.error.DataprocException: Cluster name 'projects/my-project-id/locations/my-location/clusters/my-phs-cluster-name' must conform to ^(?:/?/?dataproc\.googleapis\.com/)?projects/([^/]+)/regions/([^/]+)/clusters/([^/]+) pattern (INVALID_ARGUMENT)
. Ensure gateway url is valid and the Gateway instance is running.

The error message regex is validating for regions, but the string passed is locations

Steps

  • Opening jupyter lab
  • Clicking the "New Runtime Template" in the Launcher in "Dataproc Serverless Notebooks"
  • Select an existing "Persistent Spark History Server" from the drop down menu
  • Click "Save" (after also filling out the other required configuration fields)
  • Back at Launcher, create a new Dataproc Serverless notebook with the previously created template

Environment

# OS
Ubuntu 20.04 LTS x86_64

# Python version
Python 3.10.13

# Relevant Python dependencies
jupyterlab==4.0.6
dataproc_jupyter_plugin==0.1.9

# output of gcloud version
Google Cloud SDK 448.0.0
beta 2023.09.22
bq 2.0.98
bundled-python3-unix 3.9.16
core 2023.09.22
gsutil 5.25
@ywskycn
Copy link
Contributor

ywskycn commented Oct 12, 2023

@nyoungstudios, thanks for reporting the issue. I think this has been fixed now. Could you try the latest version?
cc @ptwng @outflyer

@nyoungstudios
Copy link
Author

@ywskycn I installed the latest package, dataproc-jupyter-plugin==0.1.51 and while it does not throw an error any more when creating the template, the dataproc serverless notebook hangs when starting.

The link at the Interactive Session Details page shows the SPARK HISTORY SERVER which is the correct persistent history server. However, on the logs tab, I see these

Failed to connect to master...
Failed to send RPC RPC ... to gdpic-srvls-session-...-m/...: io.netty.channel.StacklessClosedChannelException
Failed to send ExecutorStateChanged(app-...,0,EXITED,Some(Command exited with code 143),Some(143)) to Master NettyRpcEndpointRef(spark://Master@gdpic-srvls-session-...-m.c.....internal:...), will retry (1/5)."
Failed to send RPC RPC ... to gdpic-srvls-session-...-m/...:...: io.netty.channel.StacklessClosedChannelException
Connection to master failed! Waiting for master to reconnect...
Connection to master failed! Waiting for master to reconnect...
RECEIVED SIGNAL TERM

@ywskycn
Copy link
Contributor

ywskycn commented Oct 26, 2023

@nyoungstudios this looks like network connectivity issue. Could you help verify your network/firewall configs, and make sure they follow https://cloud.google.com/dataproc-serverless/docs/concepts/network.

@nyoungstudios
Copy link
Author

@ywskycn sorry for the delay, we are using a subnet. I believe it is setup correctly as I was able to launch a serverless notebook without the persistent history server using the subnet as well as a serverless batch job with the same persistent history server and the same subnet.

Is there anything else I should be checking?

@ywskycn
Copy link
Contributor

ywskycn commented Nov 7, 2023

@nyoungstudios to confirm here, so it works well for a serverless interactive session without PHS, but doesn't work for the interactive session with PHS, right? If so, could you help compare detailed configurations between these two interactive sessions? Just to confirm any other difference there. To view session details, you could click through from Jupyter launcher page: "Dataproc Jobs and Sessions" -> "Serverless" -> "SESSIONS".

ojarjur pushed a commit that referenced this issue May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants