Search before asking
What happened
This is a follow-up to the discussion in #8576
When running a pipeline on a project that includes a very large GitHub repository (5+ years of history, 100' of commits weekly), the github_graphql task for collecting job runs hangs for days. The logs show that the process eventually fails due likely to a stream error: CANCEL received from the peer (GitHub's servers).
As identified by @klesh in the original thread, this is likely caused by the API response body size exceeding a limit on GitHub's end, which leads to the server terminating the connection.
Errors in the logs:
time="2025-10-15 13:01:52" level=warning msg=" [pipeline service] [pipeline #63] [task #16353] retry #1 graphql calling after 120s\n\tcaused by: stream error: stream ID 1; CANCEL; received from peer"
time="2025-10-15 13:04:03" level=warning msg=" [pipeline service] [pipeline #63] [task #16353] retry #2 graphql calling after 120s\n\tcaused by: non-200 OK status code: 502 Bad Gateway body: \"<html>\\r\\n<head><title>502 Bad Gateway</title></head>\\r\\n<body>\\r\\n<center><h1>502 Bad Gateway</h1></center>\\r\\n<hr><center>nginx</center>\\r\\n</body>\\r\\n</html>\\r\\n\""
What do you expect to happen
The pipeline should handle large API responses either completing the data collection successfully or fail with a error message about exceeding API limits, rather than hanging indefinitely.
How to reproduce
- Configure a DevLake project with a connection to a very large GitHub repository.
- Create and run a new pipeline that includes collecting GitHub Actions data.
- Monitor the devlake container logs.
- Observe that the pipeline hangs on the
github_graphql task, specifically "Collect Job Runs".
- After a long period, the following error appears in the logs
Anything else
Attaching here a new snapshot log file.
task-16353-2-1-github_graphql.log
Version
v1.0.3-beta6@44f2db2
Are you willing to submit PR?
Code of Conduct
Search before asking
What happened
This is a follow-up to the discussion in #8576
When running a pipeline on a project that includes a very large GitHub repository (5+ years of history, 100' of commits weekly), the
github_graphqltask for collecting job runs hangs for days. The logs show that the process eventually fails due likely to astream error: CANCELreceived from the peer (GitHub's servers).As identified by @klesh in the original thread, this is likely caused by the API response body size exceeding a limit on GitHub's end, which leads to the server terminating the connection.
Errors in the logs:
What do you expect to happen
The pipeline should handle large API responses either completing the data collection successfully or fail with a error message about exceeding API limits, rather than hanging indefinitely.
How to reproduce
github_graphqltask, specifically "Collect Job Runs".Anything else
Attaching here a new snapshot log file.
task-16353-2-1-github_graphql.log
Version
v1.0.3-beta6@44f2db2
Are you willing to submit PR?
Code of Conduct