Pulsar Functions lifecycle and depolyment details. #20195

niclash · 2023-04-27T10:29:36Z

niclash
Apr 27, 2023

Hi,

I have been using Pulsar quite successfully for a bit more than a year now, and quite happy with it.

Now, I would like to streamline my app a bit and use Pulsar Functions (on 2.11.x), but I need a little bit of guidance.

IIUIC, I have the option (among other) to run my functions (trusted) inside the same JVM as the Pulsar Broker itself, by choosing "Run function workers with brokers" and the "thread runtime". And if I want to run on the same VM as the broker, but in separate OS process, I simply follow "process runtime". Is that correct? I don't need K8s to run functions (I have not set up K8s)?
Lifecycle of functions is very unclear to me.
a. How many instances are created?
b. One per request?
c. One per key?
d. Are instances pooled?
e. Do the functions need to be thread-safe?
f. Can I control it?
Deployment
a. When deploying with "pulsar-admin functions create", do I need to
do that on each broker instance, or just once? What happens if my
Ansible tries to do that in parallel on all instances?

b. I assume that "pulsar-admin functions update" will let all
functions complete before killing the thread/process. Right?

I guess there will be more questions once I dive deeper.

TIA
Niclas

asafm · 2023-04-30T13:17:19Z

asafm
Apr 30, 2023
Collaborator

I'm not a function expert, but I can contribute a bit of my knowledge in here.

IIUIC, I have the option (among other) to run my functions (trusted) inside the same JVM as the Pulsar Broker itself, by choosing "Run function workers with brokers" and the "thread runtime". And if I want to run on the same VM as the broker, but in separate OS process, I simply follow "process runtime". Is that correct? I don't need K8s to run functions (I have not set up K8s)?

Process runtime will launch a process per function instance. For example, if you specified a function to have 3 instances, 3 process would be launched.
Function Worker role (even when run inside Pulsar VM) - you can have many of them. One of them would be the leader and scheduler - meaning it will instruct which of the instances should run each function instance.

If this is a production environment, it seems better to separate Function Worker and Broker to separate machines.

Lifecycle of functions is very unclear to me.
a. How many instances are created?
b. One per request?
c. One per key?
d. Are instances pooled?
e. Do the functions need to be thread-safe?
f. Can I control it?

When you deploy or update a function using REST or admin CLI, you can specify how many instances you want per each function.
Each instance runs in its own Thread/Process/Pod depending on the runtime.
Not per request/key.
Not sure about the pooling question. Can you elaborate?

Regarding thread safety, I guess only if you share variables, but you shouldn't.

When deploying with "pulsar-admin functions create", do I need to
do that on each broker instance, or just once? What happens if my
Ansible tries to do that in parallel on all instances?

From my understanding, Pulsar has a storage area where the function metadata and the JAR/NAR are stored. You should deploy to Pulsar a function only once.

b. I assume that "pulsar-admin functions update" will let all
functions complete before killing the thread/process. Right?

Compete on what?

0 replies

niclash · 2023-04-30T14:23:24Z

niclash
Apr 30, 2023
Author

Thanks a lot for taking the time to answer;

Pooling; Basically, if there is dynamic creation/destruction of instances during the life-time depending on load. What you write is basically; no it is set up statically by user.

Thread-safety; Everyone showcase stateless Functions and having no context it operates in. I don't find myself in that luxurious situation, and to set up the overall/over-arching context, it helps a lot to understand the exact behavior of the underlying framework. I don't really like "don't worry about it", that some systems/frameworks give.

Deploy; The thing is, it is a lot simpler for me to let Ansible do the same on plenty of machines, than to do it on one. Pulsar itself sits behind firewall, so I can't reach the Pulsar APIs from my workstation, so Ansible can't execute it on localhost either.

Complete; The processing in the function, letting the function return before killing it. And then there is immediately the follow up, what happens if the function has hanged?

1 reply

asafm May 1, 2023
Collaborator

Regarding auto-scaling - there is another scheduler (WorkerService plugin) which runs the function instances on k8s: https://functionmesh.io/docs/. I think it has auto-scale built in.

Deploy: Any SQL database for example, including NoSQL like HBase, Cassandra, if you set up a table, it needs to be done once. The same goes for S3 buckets. I'm sure there is a built-in solution in Ansible (pattern) for handling "do it once".

nlu90 · 2023-05-01T16:47:58Z

nlu90
May 1, 2023
Collaborator

Running Function Worker with Pulsar Brokers is fine for testing purposes. In a production env, running Function Worker and Pulsar Broker separately is recommended so your Broker will be more stable. The ThreadRuntime will launch your function inside Function Worker's thread pool, and ProcessRuntime will launch separate JVM processes for your functions.

2.a. You can control how many instances for your function via pulsar-admin functions create --parallelism . Once submitted, it won't change during the runtime.

2.b. You are responsible for making your function thread-safe and not blocking on the process method.

3.a. You only need to send the request once. If the same request is submitted multiple times, later requests will be rejected due to the Already Exists error.

3.b. Function Worker will try to shut down the instance for 10 seconds timeout gracefully. After 10 seconds, it will forcibly terminate the process. One thing to notice is that, as long as the subscription is not cleaned, the newly updated function instance will start from where the old instances stopped to continue the message processing.

2 replies

niclash May 2, 2023
Author

Ok, thanks. That helps a lot.

Each function getting its own (times the parallelism) OS process? Or can they share OS processes?

nlu90 May 10, 2023
Collaborator

If you use ThreadRuntime, they all run within the Function Worker process.
If you use ProcessRuntime, each function's instance gets its own process.
If you use KubernetesRuntime, each function's instance gets its own pod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pulsar Functions lifecycle and depolyment details. #20195

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pulsar Functions lifecycle and depolyment details. #20195

niclash Apr 27, 2023

Replies: 3 comments · 3 replies

asafm Apr 30, 2023 Collaborator

niclash Apr 30, 2023 Author

asafm May 1, 2023 Collaborator

nlu90 May 1, 2023 Collaborator

niclash May 2, 2023 Author

nlu90 May 10, 2023 Collaborator

niclash
Apr 27, 2023

Replies: 3 comments 3 replies

asafm
Apr 30, 2023
Collaborator

niclash
Apr 30, 2023
Author

asafm May 1, 2023
Collaborator

nlu90
May 1, 2023
Collaborator

niclash May 2, 2023
Author

nlu90 May 10, 2023
Collaborator