Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule for page2img #74

Merged
merged 1 commit into from
Mar 21, 2020
Merged

Add rule for page2img #74

merged 1 commit into from
Mar 21, 2020

Conversation

stweil
Copy link
Collaborator

@stweil stweil commented Mar 21, 2020

Signed-off-by: Stefan Weil sw@weilnetz.de

Signed-off-by: Stefan Weil <sw@weilnetz.de>
@bertsky
Copy link
Collaborator

bertsky commented Mar 21, 2020

Maybe we should add a note somewhere to ensure users know that page2img only represents the PAGE annotations partially, and that there is ocrd-segment-extract-{pages,regions,lines} for full export – including masking of areas outside the polygon (on each level), page cropping (if Border is present), deskewing (on each level / if @orientation is present), dewarping and binarization (on each level / if AlternativeImage is present) etc.

@stweil
Copy link
Collaborator Author

stweil commented Mar 21, 2020

Maybe we should add a note somewhere

What would be a good place? The ocrd_all wiki?
Slightly related: where should users of OCR-D and ocrd_all document their experiences, for example how to install and use OCR-D on a Windows PC?

@stweil stweil merged commit 76f8e69 into OCR-D:master Mar 21, 2020
@stweil stweil deleted the update branch March 21, 2020 12:04
@bertsky
Copy link
Collaborator

bertsky commented Mar 23, 2020

Maybe we should add a note somewhere

What would be a good place? The ocrd_all wiki?
Slightly related: where should users of OCR-D and ocrd_all document their experiences, for example how to install and use OCR-D on a Windows PC?

Good question. IMHO it is most important we don't scatter documentation at too many places, but always use a single entry point. That would probably be the setup guide for things like OS specifics, and the user guide or workflow guide for recommendations on processors and parameters. Maybe we should start a sub-page for custom (layout / recognition) model training. In each case, we could easily link to external sources (like your tesstrain wiki pages). But I am not sure whether that is the appropriate place for sharing experiences / user-generated content.

@wrznr what do you think?

@wrznr
Copy link

wrznr commented Mar 26, 2020

  1. We should think about dropping page2img entirely. ocrd_segment_extract_* offers much more.
  2. We should use the OCR-D docs for documentation related to OCR-D. The new website is promoted heavily by the OCR-D team. The cookbook section would be the most appropriate place for user-contributed recipes. The windows installation instructions should be reviewed by @kba and tested by @tboenig and then become part of the official setup guide.

@stweil
Copy link
Collaborator Author

stweil commented Mar 27, 2020

@FVoigtschild and @Ma-Nuechter worked the last three weeks with me, and both used OCR-D on Windows. The OCR-D installation for Windows is based on the Windows Subsystem for Linux , so it's simply an installation on a Debian or Ubuntu platform as soon as WSL with the Linux distribution was installed.

https://ocr-d.de/ contains great documentation and improved very much recently, but in my opinion it is difficult to find out how to contribute there.

@bertsky
Copy link
Collaborator

bertsky commented Mar 28, 2020

@stweil

https://ocr-d.de/ contains great documentation and improved very much recently, but in my opinion it is difficult to find out how to contribute there.

What @wrznr meant by OCR-D docs was the docs subdirectory of the new website repo, and I should have pointed my links there instead of the running webserver. So,

use a single entry point. That would probably be the setup guide for things like OS specifics,

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/setup.md

the user guide

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/user_guide.md

or workflow guide for recommendations on processors and parameters.

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/workflows.md

Maybe we should start a sub-page for custom (layout / recognition) model training.

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/ocrd-training.md

Anyone can and should open a new issue or start a PR.

@bertsky
Copy link
Collaborator

bertsky commented Mar 28, 2020

@wrznr

1. We should think about dropping `page2img` entirely

I am not convinced of that. page2img.py has minimal dependencies – that's sometimes more important.

(See https://github.com/OCR-D/ocr-d.github.io/issues/22)

2\. The `cookbook` section would be the most appropriate place for user-contributed recipes.

Good idea! Perhaps this could be framed more generally as user stories to invite users with as low as possible a threshold. (Or is a wiki more suited as a first stage?)

@stweil
Copy link
Collaborator Author

stweil commented Mar 28, 2020

Or is a wiki more suited as a first stage?

I think so, because the threshold to fix or add information in a wiki is much lower than managing the usual GitHub workflow. Ideally there are several ways to contribute to the documentation: issues, pull requests and wiki. And all contributions finally get integrated into the official documentation. This official documentation must also document those different ways for users who want to contribute.

@kba
Copy link
Member

kba commented Mar 30, 2020

Thanks for starting this discussion.

A quick break-down on the OCR-D website: It's a jekyll-generated static page. The jekyll source and generated website are version-controlled in https://github.com/OCR-D/ocrd-website. The ocr-d.github.io repo is only for hosting the GitHub pages, it contains a copy of the docs output folder of ocrd-website. Everything you see on the OCR-D page is generated from the site source folder of ocrd-website and repo submodules.

This setup has the advantage of being mostly self-contained and allowed us to move away from various repos for documentation, slides etc. to a single repo with a well-defined structure, multilingual support, menu support etc.

This means that all changes should be done in https://github.com/OCR-D/ocrd-website which should be public now.

However, as @stweil pointed out, using a wiki significantly lower the threshold to contribute and we'll be happy to integrate that into the official documentation in due time. PRs are still welcome but the workflow is more complex and we're currently lacking comprehensive "documentation on documentation". I've put that on my TODO list to consolidate the info on documentation and how to contribute, archive obsolete repos etc.

I propose to use the wiki of ocrd-website for this (and will have a look through our network of repos for information from wikis of other projects).

@stweil
Copy link
Collaborator Author

stweil commented Mar 31, 2020

Here is some new documentation for installing OCR-D on Windows. https://ocr-d.de/en/setup is already rather lengthy, so please suggest where it should be added best.

Thank you to @LauraErhard for doing the final test with Ubuntu and finding some missing information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants