Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/okta): Removed code closing okta's event_loop #8675

Merged
merged 3 commits into from
Aug 29, 2023

Conversation

skrydal
Copy link
Contributor

@skrydal skrydal commented Aug 19, 2023

After previous change #8637 okta connector could start but not finish properly, since line closing the event loop caused exception shown below. I am a little bit confused as to why it started failing now as before we run it with 0.9.6.1 version released in January 2023 and everything was fine - even though event_loop code was present both in ingestor_cli.py and okta ingestor. My conclusion is that this change done a month ago might have caused the issue to appear:
4fb77e4#diff-97f18760199a7ea8722507b31d24f6520b4c1bbf9145d3b3ad24748ecfa831adR125
(making the function async)
What I wonder though is that why nobody else seemed to experience the problem, as it causes the stateful okta ingestor to always fail in environment where I tested it - surely somebody else must be using this connector in the newest version.
This PR removes the line closing the loop and it makes okta ingestor finish successfully after ingesting all the data. It definitely makes the code more messy but it seems refactoring here would take quite a while. Let me know what you think about this PR.

[2023-08-17 08:26:04,191] ERROR    {datahub.entrypoints:199} - Command failed: Cannot close a running event loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 186, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 182, in run_ingestion_and_check_upgrade
    ret = await ingestion_future
  File "/usr/local/lib/python3.10/asyncio/futures.py", line 285, in __await__
    yield self  # This tells Task to wait for completion.
  File "/usr/local/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 367, in run
    for wu in itertools.islice(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 119, in auto_stale_entity_removal
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 143, in auto_workunit_reporter
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 208, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 346, in _batch_workunits_by_urn
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 156, in auto_materialize_referenced_tags
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/identity/okta.py", line 411, in get_workunits_internal
    event_loop.close()
  File "/usr/local/lib/python3.10/asyncio/unix_events.py", line 68, in close
    super().close()
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 84, in close
    raise RuntimeError("Cannot close a running event loop")
RuntimeError: Cannot close a running event loop

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Aug 19, 2023
@maggiehays maggiehays added the community-contribution PR or Issue raised by member(s) of DataHub Community label Aug 21, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda tricky

I think we should only be closing the event loop if we created it here

event_loop = asyncio.new_event_loop()
, but if we got it from get_event_loop then we should not try to close it

That way, if people run the pipeline using Pipeline.create(...); pipeline.run() it also works correctly and doesn't leak the event loop

@skrydal
Copy link
Contributor Author

skrydal commented Aug 25, 2023

This is kinda tricky

I think we should only be closing the event loop if we created it here

event_loop = asyncio.new_event_loop()

, but if we got it from get_event_loop then we should not try to close it
That way, if people run the pipeline using Pipeline.create(...); pipeline.run() it also works correctly and doesn't leak the event loop

I agree, it's not perfect solution - rather one we used to make okta ingestor run fine as a cronjob in our environment, so I thought it might be worth sharing here.
Proper solution would require refactoring to resolve singleton-like problem of the event_loop object...

@sgomezvillamor
Copy link
Contributor

@hsheth2 What about this commit here FundingCircle@dcd9e0f ? Would this solve the discussion and the issue?

@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 28, 2023

@sgomezvillamor yeah that looks like it should do the trick

@sgomezvillamor
Copy link
Contributor

@hsheth2 may you review and merge this one?

@hsheth2 hsheth2 merged commit 2776903 into datahub-project:master Aug 29, 2023
51 checks passed
@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 29, 2023

@sgomezvillamor merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants