New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestration started but not recorded in task hub #126

Closed
markheath opened this Issue Jan 6, 2018 · 13 comments

Comments

Projects
None yet
5 participants
@markheath
Contributor

markheath commented Jan 6, 2018

I tried creating some durable functions using the templates in the portal. I created a new function app and selected the 2.0.11415.0 (beta) runtime (version 1 didn't offer the durable functions templates)

I then made an instance of each durable function template - the HTTP starter function, the orchestrator and the "hello" activity.

I was able to successfully call my starter function which returned a 202 and indicated a new orchestration had been started and gave me the following JSON back from starter.CreateCheckStatusResponse (secure codes redacted):

{"id":"6b749f72e73443d0bc84d7bb53a6a79f",
  "statusQueryGetUri":"https://durable-functions-test.azurewebsites.net/runtime/webhooks/DurableTaskExtension/instances/6b749f72e73443d0bc84d7bb53a6a79f?taskHub=DurableFunctionsHub&connection=Storage&code={code}",
  "sendEventPostUri":"https://durable-functions-test.azurewebsites.net/runtime/webhooks/DurableTaskExtension/instances/6b749f72e73443d0bc84d7bb53a6a79f/raiseEvent/{eventName}?taskHub=DurableFunctionsHub&connection=Storage&code={code}",
  "terminatePostUri":"https://durable-functions-test.azurewebsites.net/runtime/webhooks/DurableTaskExtension/instances/6b749f72e73443d0bc84d7bb53a6a79f/terminate?reason={text}&taskHub=DurableFunctionsHub&connection=Storage&code={code}"}

However when I tried to call statusQueryGetUri I just got a 404 back.

I checked the DurableFunctionsHubHistory table in Azure storage and it was empty.

However, when I ran my HTTP starter function again and got a new id, this time rows did appear in DurableFunctionsHubHistory and I could call the statusQueryGetUri to see the orchestration had completed with the expected output.

However, I then attempted to run the starter function again and once again it returned an orchestration id that didn't exist in table storage. I kept running it and the same thing happened multiple times.

Eventually after multiple attempts, I got a few that worked, but also got one stuck in a running state:

{
runtimeStatus: "Running",
input: null,
output: null,
createdTime: "2018-01-06T12:07:56Z",
lastUpdatedTime: "2018-01-06T12:08:00Z"
}

So it seems quite unreliable in its current state. Is this a known issue?

@vandersmissenc

This comment has been minimized.

Show comment
Hide comment
@vandersmissenc

vandersmissenc Jan 8, 2018

I am also seeing the exact same behavior as described above.

vandersmissenc commented Jan 8, 2018

I am also seeing the exact same behavior as described above.

@cgillum

This comment has been minimized.

Show comment
Hide comment
@cgillum

cgillum Jan 8, 2018

Collaborator

Thanks for reporting this. I think it might be some race condition in Functions 2.0 that causes this, though I've had a hard time reproducing it locally. I have not seen any such issue with Functions 1.0, but if you're using the portal then Functions 2.0 is your only option. I'll keep investigating when I get a chance - obviously not a great first impression for folks trying this out from the portal (and thanks for making a note about it in your blog post). :)

One other thing to be aware of is that the portal installs the 1.0.0-beta version of the extension, which is outdated. 1.1.0-beta2 is the latest version, though it's not yet clear to me whether the behavior exists in one or both versions.

Collaborator

cgillum commented Jan 8, 2018

Thanks for reporting this. I think it might be some race condition in Functions 2.0 that causes this, though I've had a hard time reproducing it locally. I have not seen any such issue with Functions 1.0, but if you're using the portal then Functions 2.0 is your only option. I'll keep investigating when I get a chance - obviously not a great first impression for folks trying this out from the portal (and thanks for making a note about it in your blog post). :)

One other thing to be aware of is that the portal installs the 1.0.0-beta version of the extension, which is outdated. 1.1.0-beta2 is the latest version, though it's not yet clear to me whether the behavior exists in one or both versions.

@vandersmissenc

This comment has been minimized.

Show comment
Hide comment
@vandersmissenc

vandersmissenc Jan 8, 2018

I created a precompiled function with version 1.1.0-beta2 of durable task and deployed to azure functions on version 1.0.11388.0 and saw the same results. I use a logic app to call the function, the functions begins execution, the logic app attempts to hit the statusQueryGetUri and a 404 occurs. Checking the DurableFunctionsHubHistory storage table there are no entries for the matching id. After the long running task completes I see an entry get created in the storage table.

vandersmissenc commented Jan 8, 2018

I created a precompiled function with version 1.1.0-beta2 of durable task and deployed to azure functions on version 1.0.11388.0 and saw the same results. I use a logic app to call the function, the functions begins execution, the logic app attempts to hit the statusQueryGetUri and a 404 occurs. Checking the DurableFunctionsHubHistory storage table there are no entries for the matching id. After the long running task completes I see an entry get created in the storage table.

@cgillum

This comment has been minimized.

Show comment
Hide comment
@cgillum

cgillum Jan 8, 2018

Collaborator

@vandersmissenc so in this case, you're seeing the orchestration is running, but you're just not getting back a status result until it completes?

Collaborator

cgillum commented Jan 8, 2018

@vandersmissenc so in this case, you're seeing the orchestration is running, but you're just not getting back a status result until it completes?

@vandersmissenc

This comment has been minimized.

Show comment
Hide comment
@vandersmissenc

vandersmissenc Jan 9, 2018

When running locally everything works as expected and I immediately receive a 202 and when checked it is in a running state. When deployed and triggered via a logic app I see the httpstart logs in azure functions but nothing is inserted into the storage table until the task completes and if I check the status before it is complete I receive a 404.

vandersmissenc commented Jan 9, 2018

When running locally everything works as expected and I immediately receive a 202 and when checked it is in a running state. When deployed and triggered via a logic app I see the httpstart logs in azure functions but nothing is inserted into the storage table until the task completes and if I check the status before it is complete I receive a 404.

@cgillum

This comment has been minimized.

Show comment
Hide comment
@cgillum

cgillum Jan 9, 2018

Collaborator

What is your orchestrator function doing? Can you share an instance ID with me and the region in which you observed this?

Collaborator

cgillum commented Jan 9, 2018

What is your orchestrator function doing? Can you share an instance ID with me and the region in which you observed this?

@vandersmissenc

This comment has been minimized.

Show comment
Hide comment
@vandersmissenc

vandersmissenc Jan 9, 2018

My orchestrator is almost identical to https://github.com/Azure/azure-functions-durable-extension/blob/master/samples/precompiled/HttpStart.cs except that I read a specific type from the request content instead of an object. I just ran it and the instance ID was 7b8b6634-c30c-4be0-8c9f-5bf2cff58809 in East US 2. The orchestrator started at 2018-01-09T13:37:00.899 and completed at 2018-01-09T13:37:00.930 with a 202 response. I attempted to hit the statusQueryGetUri and received a 404. The function that it kicks off started at 2018-01-09T13:37:08.659 and completed at 2018-01-09T13:44:19.107 after which time I was able to hit the statusQueryGetUri and receive a 200 response.

vandersmissenc commented Jan 9, 2018

My orchestrator is almost identical to https://github.com/Azure/azure-functions-durable-extension/blob/master/samples/precompiled/HttpStart.cs except that I read a specific type from the request content instead of an object. I just ran it and the instance ID was 7b8b6634-c30c-4be0-8c9f-5bf2cff58809 in East US 2. The orchestrator started at 2018-01-09T13:37:00.899 and completed at 2018-01-09T13:37:00.930 with a 202 response. I attempted to hit the statusQueryGetUri and received a 404. The function that it kicks off started at 2018-01-09T13:37:08.659 and completed at 2018-01-09T13:44:19.107 after which time I was able to hit the statusQueryGetUri and receive a 200 response.

@cgillum

This comment has been minimized.

Show comment
Hide comment
@cgillum

cgillum Jan 14, 2018

Collaborator

@vandersmissenc thanks for the details. This sounds like a different issue though with a likely different root cause. Would you mind opening a separate GitHub issue with this information so that we don't forget to follow up on this?

Collaborator

cgillum commented Jan 14, 2018

@vandersmissenc thanks for the details. This sounds like a different issue though with a likely different root cause. Would you mind opening a separate GitHub issue with this information so that we don't forget to follow up on this?

@tohling

This comment has been minimized.

Show comment
Hide comment
@tohling

tohling Jan 31, 2018

Member

@markheath, I created a setup using the same templates from Portal with the following Functions:

  1. HttpStarterFunction(HTTP-Triggered started function that calls the orchestrator function)
  2. OrchestratorFunction(orchestrator Function)
  3. Hello(activity Function)

Then, I wrote a simple C# console app to trigger the HttpStarterFunction 5000 times and maintained some book-keeping to ensure that all orchestration ids were accounted for.

Admittedly, the sub-second initial poll to statusQueryGetUri would sometimes return a 404but subsequent delayed retries would eventually return a 202, upon which the continued polling will resolve to a 200with a runtime status Completed. There is another open issue #143 specifically related to this that we will fix soon.

I was not able to repro the situation where an orchestration would stay stuck in a Runningstate indefinitely. Are you still experiencing this issue?

Member

tohling commented Jan 31, 2018

@markheath, I created a setup using the same templates from Portal with the following Functions:

  1. HttpStarterFunction(HTTP-Triggered started function that calls the orchestrator function)
  2. OrchestratorFunction(orchestrator Function)
  3. Hello(activity Function)

Then, I wrote a simple C# console app to trigger the HttpStarterFunction 5000 times and maintained some book-keeping to ensure that all orchestration ids were accounted for.

Admittedly, the sub-second initial poll to statusQueryGetUri would sometimes return a 404but subsequent delayed retries would eventually return a 202, upon which the continued polling will resolve to a 200with a runtime status Completed. There is another open issue #143 specifically related to this that we will fix soon.

I was not able to repro the situation where an orchestration would stay stuck in a Runningstate indefinitely. Are you still experiencing this issue?

@soninaren

This comment has been minimized.

Show comment
Hide comment
@markheath

This comment has been minimized.

Show comment
Hide comment
@markheath

markheath Feb 1, 2018

Contributor

@tohling I have re-tested the original function app, and it seems reliable now - all orchestrations I started I can query the status of. My function app is still reporting a runtime of 2.0.11415.0 (beta) and is still using 1.0.0-beta of the durable functions extension, so I don't know what might have changed since.

Contributor

markheath commented Feb 1, 2018

@tohling I have re-tested the original function app, and it seems reliable now - all orchestrations I started I can query the status of. My function app is still reporting a runtime of 2.0.11415.0 (beta) and is still using 1.0.0-beta of the durable functions extension, so I don't know what might have changed since.

@soninaren

This comment has been minimized.

Show comment
Hide comment
@soninaren

soninaren Feb 1, 2018

ignore the SO post, it was an error in user code

soninaren commented Feb 1, 2018

ignore the SO post, it was an error in user code

@tohling

This comment has been minimized.

Show comment
Hide comment
@tohling

tohling Feb 1, 2018

Member

@markheath , thanks for the quick response! I will close this issue for now but feel free to reopen the issue if you experience this again.

Member

tohling commented Feb 1, 2018

@markheath , thanks for the quick response! I will close this issue for now but feel free to reopen the issue if you experience this again.

@tohling tohling closed this Feb 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment