Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: import_utils fetch_archive_from_http - improve url parsing for fetching archive from http #5199

Merged

Conversation

malte-aws
Copy link
Contributor

@malte-aws malte-aws commented Jun 23, 2023

Related Issues

Proposed Changes:

The extension of the archive was determined by splitting the URL at "." using rpartition and then using the string to the right of the "." as the file extension.

This pull request uses urllib.parse.urlparse to split the URL into its components. Then I use splitext to split the archive extension from the path in the URL.

How did you test it?

Manual test with the To Reproduce instruction from #5198

Checklist

@malte-aws malte-aws requested a review from a team as a code owner June 23, 2023 10:18
@malte-aws malte-aws requested review from masci and removed request for a team June 23, 2023 10:18
@CLAassistant
Copy link

CLAassistant commented Jun 23, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements on the docs label Jun 23, 2023
@malte-aws malte-aws force-pushed the fetch-archive-from-http-with-parameters branch from 628d961 to 8e56a07 Compare July 1, 2023 09:38
@anakin87 anakin87 self-requested a review July 3, 2023 08:17
@anakin87
Copy link
Member

anakin87 commented Jul 3, 2023

Hello @malte-aws and thanks for the PR, that is useful and appreciated! 👍

Some suggestions:

Let us know if you need any help and thanks again!

@malte-aws
Copy link
Contributor Author

Hi @anakin87,

thank you for the feedback.

I addressed the improvements by

  1. Running pre-commit
  2. Extracting the code that gets the filename and extension into a separate function
  3. Added unit test covering the code contribution.

Kind regards
Malte

@coveralls
Copy link
Collaborator

coveralls commented Jul 3, 2023

Pull Request Test Coverage Report for Build 5448745686

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 288 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.008%) to 43.939%

Files with Coverage Reduction New Missed Lines %
utils/import_utils.py 19 43.1%
document_stores/faiss.py 116 18.68%
document_stores/pinecone.py 153 51.37%
Totals Coverage Status
Change from base Build 5425580157: -0.008%
Covered Lines: 10087
Relevant Lines: 22957

💛 - Coveralls

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in a very good state.

It would be great if you could add a test for a "normal" URL, such as http://www.mysite.com/resources/myfile.zip

haystack/utils/import_utils.py Outdated Show resolved Hide resolved
haystack/utils/import_utils.py Outdated Show resolved Hide resolved
test/others/test_utils.py Outdated Show resolved Hide resolved
test/others/test_utils.py Outdated Show resolved Hide resolved
malte-aws and others added 4 commits July 3, 2023 23:33
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
@anakin87 anakin87 self-assigned this Jul 4, 2023
@anakin87 anakin87 self-requested a review July 4, 2023 08:18
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Thank you once again for your first PR on Haystack 🥳

@anakin87 anakin87 merged commit 195077e into deepset-ai:main Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import utils fetch_archive_from_http url parsing URL format assumption
4 participants