Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Problem #85

Open
gogogwwb opened this issue Jul 6, 2021 · 16 comments
Open

A Problem #85

gogogwwb opened this issue Jul 6, 2021 · 16 comments

Comments

@gogogwwb
Copy link

gogogwwb commented Jul 6, 2021

No description provided.

@gaocegege
Copy link
Contributor

[kubeflow][worker-0] urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

Seems that there is a problem about network connection.

@gogogwwb
Copy link
Author

gogogwwb commented Jul 6, 2021

The network should be normal.

@gogogwwb
Copy link
Author

gogogwwb commented Jul 6, 2021

I don’t know why " [kubeflow][worker-0] -----> Running code... " does not appear

@gaocegege
Copy link
Contributor

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

yann.lecun.com is blocked in China.

@gogogwwb
Copy link
Author

gogogwwb commented Jul 13, 2021

[kubeflow] Building the Docker image...
[kubeflow] Image built successfully
[kubeflow] Getting tensorflow Job jupyter-kernel-nxeof
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
[kubeflow] Waiting for all replicas (0, 1, 1)
Tried 10 times but cannot get the pods
Job jupyter-kernel-nxeof is created.

Pods are normal.It doesn't seem to show the kubeflow log.I don't know why.

@gogogwwb
Copy link
Author

Do I need to install kubeflow locally when using Dockerized Kernel?

@gaocegege
Copy link
Contributor

I think so. You need to install tf-operator at least

@gogogwwb
Copy link
Author

But if this is the case, what are the advantages of CIAO? If Kubeflow is installed, why not use the built-in Jupyter Notebook?

@gogogwwb
Copy link
Author

I'm a little confused.

@gaocegege
Copy link
Contributor

with ciao you can write code and run it distributedly in the notebook.

@gaocegege
Copy link
Contributor

If you do not want to run the training job distributedly in the notebook, you can use kubeflow jupyter or https://github.com/tkestack/elastic-jupyter-operator

@gogogwwb
Copy link
Author

OK,thanks!

[kubeflow] Building the Docker image...
[kubeflow] Image built successfully
[kubeflow] Getting tensorflow Job jupyter-kernel-nxeof
[kubeflow] Waiting for all replicas (0, 1, 1)
........
Tried 10 times but cannot get the pods
Job jupyter-kernel-nxeof is created.

Is there a version issue?

@gaocegege
Copy link
Contributor

I am not sure, Can you get the TFJob in the cluster?

@gogogwwb
Copy link
Author

gogogwwb commented Jul 23, 2021

yes,It seems that the log cannot be obtained.Is it a label problem?

@gaocegege
Copy link
Contributor

Maybe, I am not sure about it. Is there any log from the kernel?

@gaocegege
Copy link
Contributor

Maybe it is related to labels. TFJob's labels are changed in the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants