Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error download large LAR files #801

Open
meissadia opened this issue Dec 21, 2020 · 4 comments · May be fixed by #1739
Open

Error download large LAR files #801

meissadia opened this issue Dec 21, 2020 · 4 comments · May be fixed by #1739
Assignees
Labels
Backlog blocked Waiting on other work to be completed. bug Something isn't working

Comments

@meissadia
Copy link
Contributor

Similar issue:
cfpb/hmda-data-browser#62
cfpb/hmda-data-browser#83

@meissadia meissadia created this issue from a note in Sprint 82 (To do) Dec 21, 2020
@meissadia meissadia added Backlog bug Something isn't working and removed Backlog labels Dec 21, 2020
@meissadia meissadia self-assigned this Dec 21, 2020
@meissadia meissadia moved this from To do to In progress in Sprint 82 Dec 21, 2020
@meissadia
Copy link
Contributor Author

meissadia commented Dec 22, 2020

Downloading

  • Chrome fails after 20 minutes, 150MB (unclear error)
  • Safari fails after 40 minutes, 1.43GB (memory error)

Alternate Fetch methods

  • Anchor method used for Data Browser fails due to our need for authentication
  • Streams method fails (premature done signal)

Need to save the stream?

@meissadia
Copy link
Contributor Author

Tested again on 12/29

  • Anchor method failed with 'File not found'.
  • Basic fetch terminated prematurely without error.
  • Stream method to prematurely get a done signal.

@meissadia
Copy link
Contributor Author

meissadia commented Dec 30, 2020

12/30

  • Doubled nginx config limits
    client_body_buffer_size  32k;
    client_header_buffer_size 2k;
    client_max_body_size 10m;
    large_client_header_buffers 4 16k;
    client_body_timeout 120s;
    client_header_timeout 120s;
    send_timeout 120s;
    
  • Used blob format for API response

I'm still seeing the same early termination for large files.

I also see the same behavior via Postman. When doing a "Send and Download", I have not been able to download more than 230MB.

@meissadia meissadia removed this from In progress in Sprint 82 Jan 5, 2021
@meissadia meissadia added this to To do in Sprint 83 via automation Jan 5, 2021
@meissadia meissadia moved this from To do to In progress in Sprint 83 Jan 5, 2021
@meissadia
Copy link
Contributor Author

1/22

Testing via CURL

curl -H "Authorization: Bearer <token>" \
https://<dev>/v2/filing/institutions/B90YWS6AFX2LGWOXJ1LD/filings/2020/submissions/723/edits/csv?format=csv \
--output ~/Downloads/edits_via_curl.test

Result:

curl: (18) transfer closed with outstanding read data remaining

Some suggestions that did not resolve the error:

  • Add option --keepalive-time 2
  • Add header Accept-Encoding: gzip, deflate
curl -H "Authorization: Bearer <token>" \
-H 'Accept-encoding: gzip, deflate' \
--keepalive-time 2 \
--output ~/Downloads/edits_via_curl.test \
 https://<dev>/v2/filing/institutions/B90YWS6AFX2LGWOXJ1LD/filings/2020/submissions/723/edits/csv?format=csv 

Other avenues to explore:

  • Server sending wrong Content-Length header?

@meissadia meissadia removed this from In progress in Sprint 83 Jan 26, 2021
@meissadia meissadia added this to To do in Sprint 84 via automation Jan 26, 2021
@meissadia meissadia moved this from To do to In progress in Sprint 84 Jan 26, 2021
@meissadia meissadia added the blocked Waiting on other work to be completed. label Jan 28, 2021
@meissadia meissadia removed this from In progress in Sprint 84 Feb 8, 2021
@meissadia meissadia added this to To do in Sprint 85 - FE via automation Feb 8, 2021
@meissadia meissadia moved this from To do to In progress in Sprint 85 - FE Feb 8, 2021
@meissadia meissadia moved this from In progress to To do in Sprint 85 - FE Feb 10, 2021
@meissadia meissadia removed this from To do in Sprint 85 - FE Mar 4, 2021
@meissadia meissadia added this to To do in Sprint 86 - FE via automation Mar 4, 2021
@meissadia meissadia removed this from To do in Sprint 86 - FE Mar 15, 2021
@meissadia meissadia added this to To do in Sprint 87 - FE via automation Mar 15, 2021
@meissadia meissadia removed this from To do in Sprint 87 - FE Mar 31, 2021
@meissadia meissadia added this to To do in Sprint 88 - FE via automation Mar 31, 2021
@kgudel kgudel removed this from To do in Sprint 88 - FE Apr 20, 2021
@meissadia meissadia removed their assignment Apr 26, 2022
@meissadia meissadia self-assigned this Apr 26, 2022
@meissadia meissadia linked a pull request Mar 10, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog blocked Waiting on other work to be completed. bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants