Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolver.workspace_from_url: cover relative paths, too #266

Closed
wants to merge 1 commit into from

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Jul 19, 2019

This (at least partially) fixes #96, i.e. workspace_from_url failing to download files that had relative paths. The reason was that workspace.download_file does not know anything about baseurl, i.e. it cannot concatenate the source path before copying locally. In contrast, download_url does wrap this case correctly.

@codecov-io
Copy link

codecov-io commented Jul 19, 2019

Codecov Report

Merging #266 into master will increase coverage by 6.32%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #266      +/-   ##
==========================================
+ Coverage   93.13%   99.45%   +6.32%     
==========================================
  Files          30       30              
  Lines        1485     1279     -206     
  Branches      287      248      -39     
==========================================
- Hits         1383     1272     -111     
+ Misses         91        6      -85     
+ Partials       11        1      -10
Impacted Files Coverage Δ
ocrd/ocrd/resolver.py 100% <100%> (ø) ⬆️
ocrd_models/ocrd_models/ocrd_exif.py 100% <0%> (ø) ⬆️
ocrd/ocrd/workspace_backup.py 100% <0%> (ø) ⬆️
ocrd_models/ocrd_models/ocrd_file.py 100% <0%> (ø) ⬆️
ocrd_models/ocrd_models/constants.py 100% <0%> (ø) ⬆️
ocrd/ocrd/__init__.py 100% <0%> (ø) ⬆️
..._validators/ocrd_validators/workspace_validator.py 100% <0%> (+3.44%) ⬆️
ocrd_models/ocrd_models/ocrd_mets.py 100% <0%> (+4.57%) ⬆️
ocrd_utils/ocrd_utils/__init__.py 92.95% <0%> (+16.88%) ⬆️
ocrd/ocrd/workspace.py 97.87% <0%> (+29.8%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5ad22f...452a10c. Read the comment docs.

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 19, 2019

Urgently needed, for example for tests in OCR-D/ocrd_keraslm#7.

@kba kba mentioned this pull request Aug 13, 2019
@kba kba mentioned this pull request Sep 3, 2019
@kba
Copy link
Member

kba commented Sep 6, 2019

#303 should have fixed the issue here. Regressions are always possible, please open an issue if problem persists, thanks.

@kba kba closed this Sep 6, 2019
@bertsky
Copy link
Collaborator Author

bertsky commented Sep 16, 2019

I am still having problems with a simple Resolver.workspace_from_url (as used here:

test/test_wrapper.py 09:58:08.817 DEBUG ocrd.resolver - workspace_from_url
mets_basename='mets.xml'
mets_url='test/assets/kant_aufklaerung_1784/data/mets.xml'
src_baseurl='test/assets/kant_aufklaerung_1784/data'
dst_dir='/tmp/pyocrd-test-ocrd_keraslm'
09:58:08.824 DEBUG ocrd.resolver.download_to_directory - directory=|/tmp/pyocrd-test-ocrd_keraslm| url=|test/assets/kant_aufklaerung_1784/data/mets.xml| basename=|mets.xml| if_exists=|raise| subdir=|None|
09:58:08.825 DEBUG ocrd.resolver.download_to_directory - Copying file '/home/xbert/unsortiert/arbeit/heyer/tools/ocrd_keraslm/test/assets/kant_aufklaerung_1784/data/mets.xml' to '/tmp/pyocrd-test-ocrd_keraslm/mets.xml'
09:58:08.863 DEBUG ocrd.resolver.download_to_directory - directory=|/tmp/pyocrd-test-ocrd_keraslm| url=|OCR-D-IMG/OCR-D-IMG_0001.tif| basename=|OCR-D-IMG_0001.tif| if_exists=|skip| subdir=|OCR-D-IMG|
09:58:08.864 DEBUG ocrd.resolver.download_to_directory - directory=|/tmp/pyocrd-test-ocrd_keraslm| url=|test/assets/kant_aufklaerung_1784/data/OCR-D-IMG/OCR-D-IMG_0001.tif| basename=|OCR-D-IMG_0001.tif| if_exists=|skip| subdir=|OCR-D-IMG|
...
E               FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: OCR-D-IMG/OCR-D-IMG_0001.tif
E               FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: test/assets/kant_aufklaerung_1784/data/OCR-D-IMG/OCR-D-IMG_0001.tif
E                   Exception: Already tried prepending baseurl 'test/assets/kant_aufklaerung_1784/data'. Cannot retrieve 'test/assets/kant_aufklaerung_1784/data/OCR-D-IMG/OCR-D-IMG_0001.tif'

So, looking at your solution in #303 (i.e. keeping workspace_from_url more or less as it is, based on download_file not download_url, but incorporating some baseurl + recursion logic into download_file), it seems to me the new logic is still flawed: It is not enough to prepend baseurl like this, because baseurl itself is relative to the old CWD where we resolved the source METS, not to dst_dir.

I take the liberty of reopening here for comparison instead of starting a new issue.
I opened #319 for this (with the same description as above) – sorry for the noise!

@bertsky bertsky reopened this Sep 16, 2019
@bertsky bertsky closed this Sep 16, 2019
@bertsky bertsky deleted the fix-workspace-from-url branch October 8, 2020 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ocrd workspace clone -a -m mets.xml will not adapt links for files
3 participants