Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subbasin Timing Out for Schuylkill HUC-8 #3446

Closed
rajadain opened this issue Dec 27, 2021 · 3 comments
Closed

Subbasin Timing Out for Schuylkill HUC-8 #3446

rajadain opened this issue Dec 27, 2021 · 3 comments
Assignees
Labels
bug PA DEP Funding Source: Pennsylvania Department of Environment Protection

Comments

@rajadain
Copy link
Member

rajadain commented Dec 27, 2021

Subbasin is timing out for the Schuylkill HUC-8 on staging, which has the NLCD2019 and Hi Res Streams:

image

It works correctly on production, and works for smaller shapes like the Lower Schuylkill HUC-10 on staging.

See #2835 for similar work in the past.

@rajadain rajadain added bug PA DEP Funding Source: Pennsylvania Department of Environment Protection labels Dec 27, 2021
@rajadain rajadain mentioned this issue Dec 27, 2021
30 tasks
@rajadain rajadain added the + label Dec 28, 2021
@rajadain rajadain self-assigned this Dec 28, 2021
rajadain added a commit to WikiWatershed/mmw-geoprocessing that referenced this issue Jan 4, 2022
@rajadain
Copy link
Member Author

rajadain commented Jan 4, 2022

While the increased timeout in WikiWatershed/mmw-geoprocessing#98 gets us part of the way, we're not done yet. Now seeing chord_unlock loops when running GWLF-E:

[2022-01-04 14:47:42,228: INFO/MainProcess] Task apps.modeling.geoprocessing.multi[57048cbc-41c0-4fd3-8542-1aced8880850] received
[2022-01-04 14:47:42,631: INFO/MainProcess] Task apps.modeling.mapshed.tasks.collect_subbasin[06fd646f-e3f6-43a1-966f-eebd741b4630] received
[2022-01-04 14:47:54,519: INFO/MainProcess] Task apps.modeling.tasks.subbasin_results_to_dict[3f8585e1-339e-4884-874b-b31dc0762a2f] received
[2022-01-04 14:47:57,573: INFO/MainProcess] Task apps.core.tasks.save_job_result[a1440582-b1ac-4495-a695-6e67f44a0e01] received
[2022-01-04 14:48:03,618: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:03,621: INFO/MainProcess] Task apps.modeling.tasks.run_subbasin_gwlfe_chunks[f46c977c-19b6-4a5a-a12e-1e458118cf03] received
[2022-01-04 14:48:03,641: INFO/MainProcess] Task apps.modeling.tasks.run_subbasin_gwlfe_chunks[98658202-179b-458b-a119-c255c91ed30f] received
[2022-01-04 14:48:04,552: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:04,556: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:04,941: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:04,959: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:05,069: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:05,090: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:05,398: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:05,414: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:05,486: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:05,502: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:05,822: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:05,848: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:05,920: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:05,940: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:06,256: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:06,273: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:06,344: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:06,361: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:06,687: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:06,705: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:06,775: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:06,793: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:07,112: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:07,138: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:07,216: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:07,225: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:07,552: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:07,555: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:07,639: DEBUG/ForkPoolWorker-1] Running model...
[2022-01-04 14:48:07,643: DEBUG/ForkPoolWorker-2] Running model...
[2022-01-04 14:48:07,975: DEBUG/ForkPoolWorker-1] Model run complete for 30 years of data.
[2022-01-04 14:48:07,998: DEBUG/ForkPoolWorker-2] Model run complete for 30 years of data.
[2022-01-04 14:48:08,017: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:09,185: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:10,308: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:11,332: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:12,454: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:13,504: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:14,547: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:15,662: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:16,701: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:17,745: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:18,899: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:19,928: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:21,075: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:22,133: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:23,264: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:24,279: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:25,319: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:26,446: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:27,467: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:28,595: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:29,634: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:31,279: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:32,298: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:33,320: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:34,483: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:35,539: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:36,654: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:37,687: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
[2022-01-04 14:48:38,699: INFO/MainProcess] Task celery.chord_unlock[cffdccbf-1a47-4686-839e-e3c49a503e27] received
^C

rajadain added a commit that referenced this issue Jan 4, 2022
The manner in which the chord was written was causing chord_unlock
loops in large shapes, see #3446 (comment).

By writing it in a more idiomatic way, as other chains are written,
we fix the chord issues.
@rajadain rajadain removed the + label Jan 6, 2022
rajadain added a commit that referenced this issue Jan 7, 2022
@rajadain
Copy link
Member Author

Still seeing timeouts on staging:

==> /var/log/celery/Green-worker0.log <==
[2022-01-10 17:32:07,107: ERROR/ForkPoolWorker-1] Task caf6a9f9-2536-4582-815d-a07ee8b1c15c run from job 142429 raised exception: [convert_data] huc12__54600 Geoprocessing service timed out.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/app/apps/modeling/mapshed/tasks.py", line 562, in collect_subbasin
    return [
  File "/opt/app/apps/modeling/mapshed/tasks.py", line 563, in <listcomp>
    collect_data(convert_data(payload, wkaoi), aoi, watershed_id,
  File "/usr/local/lib/python3.8/dist-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 735, in __protected_call__
    return orig(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/celery/app/task.py", line 392, in __call__
    return self.run(*args, **kwargs)
  File "/opt/app/apps/modeling/mapshed/tasks.py", line 513, in convert_data
    raise Exception(
Exception: [convert_data] huc12__54600 Geoprocessing service timed out.

@rajadain
Copy link
Member Author

As of #3459, this is now working:

image.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug PA DEP Funding Source: Pennsylvania Department of Environment Protection
Projects
None yet
Development

No branches or pull requests

1 participant