Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to correctly modify OpenWhisk’s gateway blocking limit from source code (currently 60s) #5467

Open
QWQyyy opened this issue Feb 29, 2024 · 13 comments

Comments

@QWQyyy
Copy link

QWQyyy commented Feb 29, 2024

Because our project requires a large number of HTTP request tests and our functions (modern machine learning workflows) have relatively long execution times, our team believes that it is necessary to compile the Docker image of OpenWhisk from the source code and modify the 1min gateway blocking limit.
We have made a number of source code modifications for this purpose, as follows:

  1. In the source code part of the controller, we modify application.conf, set request-timeout = 36000s
    image
  2. In the source code part of the controller, we also modified reference.conf:
    image
  3. In the common Scala module, we also modified application.conf:
    image
    image
    After completing the above three modifications, we used ./gradlew :core:controller:distDocker to compile the Docker image and replaced the image used by our K8s OpenWhisk cluster. We modified value.yaml .

We ensure that the content written in values.yaml meets the requirements:
image
We also made some modifications to nginx, including the default.conf inside the nginx:1.21.1 image, and the read-only nginx.conf configured using nginx-configmap:
image
image
However, when we used curl to test, we found that it did not seem to work completely. The blocking of the nginx gateway was overcome, but the following information was printed:
image
So, what modifications do we need to make to overcome this problem? Looking forward to your guidance!

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

@style95 Could you please give me some guidance?

@style95
Copy link
Member

style95 commented Feb 29, 2024

It seems you are using the API gateway, could you check again without the API gateway first?

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

It seems you are using the API gateway, could you check again without the API gateway first?

image

@style95
Copy link
Member

style95 commented Feb 29, 2024

You are supposed to be able to invoke the action with wsk.
I think that's the starting point to look into.

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

You are supposed to be able to invoke the action with wsk. I think that's the starting point to look into.

I'm sure that my gateway can correctly ensure that the end-to-end response is greater than 60 seconds. I have also explicitly configured the controller. I can't seem to find any more places where I need to configure the timeout. Can you give me some suggestions?

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

wsk does work, but only the information recorded by the activation can be viewed. We prefer to complete our services directly through the gateway HTTP request.

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

image

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

It seems that I should also pay attention to apigateway:
image

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

image
At the same time, how should I correctly configure the execution time limit? What I wrote in value.yaml is 500 minutes, but when I use wsk to set the upper limit of execution time of 500 minutes, the invoker log prints 600s?

@QWQyyy
Copy link
Author

QWQyyy commented Feb 29, 2024

I am currently studying the source code of openwhisk in depth, and I hope to make some solid changes.

@style95
Copy link
Member

style95 commented Mar 3, 2024

@QWQyyy
First, IIRC, the timeout of the Kubernetes client in the above log is related to the pod creation.
It's not related to the execution of an activation.
The action timeout controls the execution timeout against the pod(container).

I think you need to ensure you can invoke your action successfully with the wsk action invoke command.
If you can successfully invoke your action without the API gateway, then the culprit is the API gateway.
If your action is invoked well but it is changed to the asynchronous(get 202 response) at some point, it's related to the controller configuration. If you can't even invoke your activation in the asynchronous mode as well, you may not configure the action timeout properly.

@QWQyyy
Copy link
Author

QWQyyy commented Mar 4, 2024

@QWQyyy First, IIRC, the timeout of the Kubernetes client in the above log is related to the pod creation. It's not related to the execution of an activation. The action timeout controls the execution timeout against the pod(container).

I think you need to ensure you can invoke your action successfully with the wsk action invoke command. If you can successfully invoke your action without the API gateway, then the culprit is the API gateway. If your action is invoked well but it is changed to the asynchronous(get 202 response) at some point, it's related to the controller configuration. If you can't even invoke your activation in the asynchronous mode as well, you may not configure the action timeout properly.

Okay let's try it!

@QWQyyy
Copy link
Author

QWQyyy commented Mar 5, 2024

@QWQyyy First, IIRC, the timeout of the Kubernetes client in the above log is related to the pod creation. It's not related to the execution of an activation. The action timeout controls the execution timeout against the pod(container).

I think you need to ensure you can invoke your action successfully with the wsk action invoke command. If you can successfully invoke your action without the API gateway, then the culprit is the API gateway. If your action is invoked well but it is changed to the asynchronous(get 202 response) at some point, it's related to the controller configuration. If you can't even invoke your activation in the asynchronous mode as well, you may not configure the action timeout properly.

It is true that I can successfully access my functions using wsk, but it is only limited to functions within 60 seconds. For functions that take longer to execute, the wsk client also returns an Oops--504 error, but I found through the resource manager htop that the function code is still During execution, after the function is executed, you can see from wsk activation that the function ends correctly. This confuses me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants