Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace pageId to page_id #874

Merged
merged 16 commits into from
Jul 18, 2022
Merged

Workspace pageId to page_id #874

merged 16 commits into from
Jul 18, 2022

Conversation

joschrew
Copy link
Contributor

@joschrew joschrew commented May 24, 2022

fix #862. Change kwargs (expect page_id instead of pageId) for ocrd-workspace-api.
I changed add_file and merge because I think that where the only ones left accepting pageId.I didn't change anything regarding to Workspace.find_files (offering new functionality), because I think maybe that would be a separate topic.

for code in ocrd (e.g. workspace), ocrd_models keep pageId.
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found some more :-)

Plus, to be really consistent, let's also do this:

(Perhaps we should also offer Workspace.find_files as a delegator to OcrdMets.find_files and translator of file_grp|file_id|page_id to fileGrp|ID|pageId)

ocrd/ocrd/cli/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Show resolved Hide resolved
ocrd/ocrd/cli/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Show resolved Hide resolved
@joschrew
Copy link
Contributor Author

Thanks for your help especially for workspace.py line 186, I simply overlooked that. I need to take more time for this stuff, it is too easy to miss that kind of things.
I think you are right about fileGrp, ID and the find_files delegator. I will give it a try (I am not sure yet how to do the delegator) and alter this PR.

@bertsky
Copy link
Collaborator

bertsky commented May 25, 2022

Splendid!

(I am not sure yet how to do the delegator)

Just like Workspace.add_file (i.e. translating some kwargs, entering self.directory). We should then also delegate to that from cli.workspace_find and cli.workspace_prune.

@kba
Copy link
Member

kba commented May 25, 2022

I think you are right about fileGrp, ID and the find_files delegator. I will give it a try (I am not sure yet how to do the delegator) and alter this PR.

Careful about renaming ID though, because lowercase id is a builtin.

@bertsky
Copy link
Collaborator

bertsky commented May 25, 2022

Careful about renaming ID though, because lowercase id is a builtin.

Oh, right, we forgot about ID vs file_id, too!

It looks like:

  • ocrd_models always uses ID, never fileId (which would be nicer, but it's too late for that)
  • Workspace.add_file and .remove_file take ID (but should be file_id IMO)
  • cli.workspace command options use --file-id
  • cli.workspace.workspace_find's --output-field uses the names from ocrd_models again (ID, pageId, fileGrp)

@bertsky
Copy link
Collaborator

bertsky commented Jun 10, 2022

What's the status here @joschrew?

And can we get file ID naming in as well? (That is, by my previous comment, deprecating ID kwargs to Workspace.add_file and Workspace.remove_file in favour of file_id, and allowing the use of fileId as a synonym for ID in the output-field of CLI's find.)

@joschrew
Copy link
Contributor Author

joschrew commented Jun 10, 2022

The status is: I (hopefully correctly) rebased this PR to the newest changes. Then I changed ID and fileGrp params according to what I did for pageId. Next I wanted to to tackle the find_files-delegator, but i haven't really started on that yet.

I don't know how to do the output-field of CLI's find, because the only possibility I see there is to allow file_id, page_id and file_grp in addition to ID, pageId and fileGrp and then manually verify consistency.

@kba
Copy link
Member

kba commented Jun 10, 2022

I don't know how to do the output-field of CLI's find, because the only possibility I see there is to allow file_id, page_id and file_grp in addition to ID, pageId and fileGrp and then manually verify consistency.

Perhaps something like this to normalize the fields to one consistent set of either CamelCase or snake_case at the beginning of the workspace_find function?

camel_to_snake = {                                      
    'page_id': 'page_id',                               
    'page_id': 'page_id',                               
    'fileGrp': 'file_grp',                              
    'file_grp': 'file_grp',
     # [...]                             
}                                                       
output_field = [camel_to_snake[k] for k in output_field]

@joschrew
Copy link
Contributor Author

Added find_files delegator. But I am not finished yet, the last thing (except fixes) I want to do is convert output-field values in workspace_find to snake_case.

ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
add possible output_field values in workspace_find to allow providing them with snake_case in addition to old camelCase-naming
@joschrew
Copy link
Contributor Author

Ok, finished my changes with finally allowing 'new' variable naming for workspace_find's parameter output_field.
I just made it possible to use e.g. page_id in addition to 'old' value pageId. I thought about printing a deprecated warning or extending comments in the python-file but finally left that out because I am unsure about that.

From my current point of view this PR should be finished now. The test are passing but I didn't make any additional tests except briefly testing output_field on cmd.

ocrd/ocrd/cli/workspace.py Outdated Show resolved Hide resolved
Update ocrd/ocrd/cli/workspace.py

Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
@joschrew
Copy link
Contributor Author

joschrew commented Jun 14, 2022

Thanks, that hasn't been in my python-repertoire yet.

@kba kba requested review from kba and bertsky June 16, 2022 08:27
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good – except for that one place in cli.workspace.find regarding pageId.

We should also cover that case by the tests. All we have in test.cli.test_workspace is test_find_all_files_multiple_physical_pages_for_fileids and test_find_all_files, but nothing with -k pageId (or -k page-id now).

ocrd/ocrd/cli/workspace.py Show resolved Hide resolved
ocrd/ocrd/cli/workspace.py Show resolved Hide resolved
@joschrew
Copy link
Contributor Author

joschrew commented Jun 17, 2022

Thanks for review. I agree on the tests, I will try to write them. And I agree about the cosmetic workspace.add_file will change that as well.

Regarding tests you wrote "but nothing with -k page-id". Actually it is page_id currently. I asked myself (a while ago and forgot) if that should be page-id, and now I am unsure about that (file_grp and file_id as well).

But the pageId references you mentioned (maybe I don't understand correctly what you mean):
That are "just" strings, no variable names (cli/workspace.py lines 411, 415, 416) and they are used to query ocrd_models so page_id cannot be used there as far as I understand. Unless I rewrite or remove the performance-reasons-block. So "just" replacing would not work what would you suggest on that part?

@bertsky
Copy link
Collaborator

bertsky commented Jun 17, 2022

Regarding tests you wrote "but nothing with -k page-id". Actually it is page_id currently. I asked myself (a while ago and forgot) if that should be page-id, and now I am unsure about that (file_grp and file_id as well).

Oh, sorry, I actually did mean -k page_id. But it's a good question...

But the pageId references you mentioned (maybe I don't understand correctly what you mean):
That are "just" strings, no variable names (cli/workspace.py lines 411, 415, 416) and they are used to query ocrd_models so page_id cannot be used there as far as I understand. Unless I rewrite or remove the performance-reasons-block. So "just" replacing would not work what would you suggest on that part?

Sorry again. You are right. After your snake_to_camel replacement, the ocrd_models convention already applies. Everything is correct as it is.

@joschrew
Copy link
Contributor Author

Thanks.

Please note that I still want to add the tests and change the one occurrence of workspace.mets.add_file, so this pr is not finished for me yet. I hope to finish that until beginning of next week.

@joschrew
Copy link
Contributor Author

Added 2 tests and changed the call in ocrd/ocrd/cli/workspace.py workspace_add_file() to use the new delegator. In that process I tried to simplify/shorten the code a bit (maybe I just should have kept the kwargs-dict it as before?!).

@kba
Copy link
Member

kba commented Jul 15, 2022

Regarding tests you wrote "but nothing with -k page-id". Actually it is page_id currently. I asked myself (a while ago and forgot) if that should be page-id, and now I am unsure about that (file_grp and file_id as well).

Oh, sorry, I actually did mean -k page_id. But it's a good question...

You mean having kebab-case versions in addition to snake_case and CamelCase? Why?

@kba
Copy link
Member

kba commented Jul 15, 2022

(maybe I just should have kept the kwargs-dict it as before?!).

No, the more explicit variant you used is fine.

This was referenced Jul 15, 2022
@bertsky
Copy link
Collaborator

bertsky commented Jul 15, 2022

You mean having kebab-case versions in addition to snake_case and CamelCase? Why?

Because it is actually the least surprising for the CLI users. They cannot be expected to know anything about the API name conventions, neither in ocrd (snake case) nor in ocrd_models (camel case). They only know these names from GNU longopt style CLI options.

But nevermind – whatever works and is documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consistency of pageId kwarg name
3 participants