Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd workspace clone -a -m mets.xml will not adapt links for files #96

Closed
VolkerHartmann opened this issue Jun 12, 2018 · 10 comments · Fixed by #303
Closed

ocrd workspace clone -a -m mets.xml will not adapt links for files #96

VolkerHartmann opened this issue Jun 12, 2018 · 10 comments · Fixed by #303
Assignees
Labels
Projects

Comments

@VolkerHartmann
Copy link

After cloning mets.xml looks the same than before.
Although the contained image was copied to the new directory.

@kba kba added the bug label Jun 13, 2018
@kba kba closed this as completed in 569d4b4 Jun 18, 2018
@bertsky
Copy link
Collaborator

bertsky commented Feb 12, 2019

I am not sure if this is the same problem, but there still is a problem with workspace clone:

cd assets/data
ocrd workspace clone -l kant_aufklaerung_1784/data/mets.xml ws1

After this ws1/ will contain nothing but an mets.xml – pointing (as LOCTYPE="URL" with file:// hrefs) not to the original directory, but to new local directories under ws1, which do not exist.

Am I doing something wrong or is that behaviour broken?

@bertsky
Copy link
Collaborator

bertsky commented Feb 13, 2019

Also, by the new rules for METS file URLs, I think core must now not produce absolute file paths at all when cloning.

@bertsky
Copy link
Collaborator

bertsky commented Feb 14, 2019

I think at least one problem is in Resolver.workspace_from_url: If the source METS file uses relative paths, as is now allowed (and even required if local), then it will not be enough to just dump it in the new directory and instantiate a Workspace with src_dir set appropriately – because nothing will preserve that information without download=True or download_local=True. The new METS must always rebase its local file paths.

@bertsky
Copy link
Collaborator

bertsky commented Feb 14, 2019

Please reopen.

@kba
Copy link
Member

kba commented Feb 14, 2019

The original bug was what is now #227, what you describe is indeded a regression, thanks for investigating.

@kba
Copy link
Member

kba commented Feb 25, 2019

The -l option is gone now, but with #232

ocrd workspace clone -a kant_aufklaerung_1784/data/mets.xml ws1

should work as expected (files are copied).

@bertsky
Copy link
Collaborator

bertsky commented Mar 4, 2019

I tried that with 1.0.0b5, but there I get a FileNotFoundError on the very first source file, which does exist in the filesystem. By a first glance, could it be that the chdir in download_file is wrong, or at least not compensated for when entering download_url?

@bertsky
Copy link
Collaborator

bertsky commented Mar 4, 2019

It also applies to all processors or tests that use resolver.workspace_from_url (with download=True) or workspace.download_file – when the input has relative paths, as do spilled bags.

I think the cause is changeset 76e0294, which seems to be the wrong way of fixing #233.

@bertsky
Copy link
Collaborator

bertsky commented Mar 4, 2019

Also, since c0aca9c, now workspace.add_file seems to require a kwarg local_filename, or will raise an exception. How is this supposed to be used by processors, what is going on?

@bertsky
Copy link
Collaborator

bertsky commented Jul 19, 2019

Also, since c0aca9c, now workspace.add_file seems to require a kwarg local_filename, or will raise an exception. How is this supposed to be used by processors, what is going on?

This is just poor documentation. The docstring should say here that unless content is given, the function merely references an existing file URL in the METS, otherwise writes the given content to the disk – but therefore needs a local_filename as well.

But apart from that, I think I have found a (partial) fix for the Resolver.workspace_from_url problem (which fails downloading files into the new workspace if they are relative paths in the METS): see #266.

@kba kba mentioned this issue Aug 13, 2019
@kba kba closed this as completed in #303 Sep 6, 2019
coordinate automation moved this from Backlog to Done Sep 6, 2019
kba added a commit to kba/ocrd-core that referenced this issue Sep 6, 2019
kba added a commit that referenced this issue Sep 6, 2019
bertsky pushed a commit to bertsky/core that referenced this issue Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
coordinate
  
Done
3 participants