Add rule for page2img #74

stweil · 2020-03-21T08:38:59Z

Signed-off-by: Stefan Weil sw@weilnetz.de

Signed-off-by: Stefan Weil <sw@weilnetz.de>

bertsky · 2020-03-21T11:47:38Z

Maybe we should add a note somewhere to ensure users know that page2img only represents the PAGE annotations partially, and that there is ocrd-segment-extract-{pages,regions,lines} for full export – including masking of areas outside the polygon (on each level), page cropping (if Border is present), deskewing (on each level / if @orientation is present), dewarping and binarization (on each level / if AlternativeImage is present) etc.

stweil · 2020-03-21T12:01:26Z

Maybe we should add a note somewhere

What would be a good place? The ocrd_all wiki?
Slightly related: where should users of OCR-D and ocrd_all document their experiences, for example how to install and use OCR-D on a Windows PC?

bertsky · 2020-03-23T17:26:50Z

Maybe we should add a note somewhere

What would be a good place? The ocrd_all wiki?
Slightly related: where should users of OCR-D and ocrd_all document their experiences, for example how to install and use OCR-D on a Windows PC?

Good question. IMHO it is most important we don't scatter documentation at too many places, but always use a single entry point. That would probably be the setup guide for things like OS specifics, and the user guide or workflow guide for recommendations on processors and parameters. Maybe we should start a sub-page for custom (layout / recognition) model training. In each case, we could easily link to external sources (like your tesstrain wiki pages). But I am not sure whether that is the appropriate place for sharing experiences / user-generated content.

@wrznr what do you think?

wrznr · 2020-03-26T16:34:37Z

We should think about dropping page2img entirely. ocrd_segment_extract_* offers much more.
We should use the OCR-D docs for documentation related to OCR-D. The new website is promoted heavily by the OCR-D team. The cookbook section would be the most appropriate place for user-contributed recipes. The windows installation instructions should be reviewed by @kba and tested by @tboenig and then become part of the official setup guide.

stweil · 2020-03-27T21:55:58Z

@FVoigtschild and @Ma-Nuechter worked the last three weeks with me, and both used OCR-D on Windows. The OCR-D installation for Windows is based on the Windows Subsystem for Linux , so it's simply an installation on a Debian or Ubuntu platform as soon as WSL with the Linux distribution was installed.

https://ocr-d.de/ contains great documentation and improved very much recently, but in my opinion it is difficult to find out how to contribute there.

bertsky · 2020-03-28T00:44:28Z

@stweil

https://ocr-d.de/ contains great documentation and improved very much recently, but in my opinion it is difficult to find out how to contribute there.

What @wrznr meant by OCR-D docs was the docs subdirectory of the new website repo, and I should have pointed my links there instead of the running webserver. So,

use a single entry point. That would probably be the setup guide for things like OS specifics,

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/setup.md

the user guide

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/user_guide.md

or workflow guide for recommendations on processors and parameters.

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/workflows.md

Maybe we should start a sub-page for custom (layout / recognition) model training.

i.e. https://github.com/OCR-D/ocr-d.github.io/tree/master/docs/ocrd-training.md

Anyone can and should open a new issue or start a PR.

bertsky · 2020-03-28T01:11:12Z

@wrznr

1. We should think about dropping `page2img` entirely

I am not convinced of that. page2img.py has minimal dependencies – that's sometimes more important.

(See https://github.com/OCR-D/ocr-d.github.io/issues/22)

2\. The `cookbook` section would be the most appropriate place for user-contributed recipes.

Good idea! Perhaps this could be framed more generally as user stories to invite users with as low as possible a threshold. (Or is a wiki more suited as a first stage?)

stweil · 2020-03-28T06:41:37Z

Or is a wiki more suited as a first stage?

I think so, because the threshold to fix or add information in a wiki is much lower than managing the usual GitHub workflow. Ideally there are several ways to contribute to the documentation: issues, pull requests and wiki. And all contributions finally get integrated into the official documentation. This official documentation must also document those different ways for users who want to contribute.

kba · 2020-03-30T15:43:19Z

Thanks for starting this discussion.

A quick break-down on the OCR-D website: It's a jekyll-generated static page. The jekyll source and generated website are version-controlled in https://github.com/OCR-D/ocrd-website. The ocr-d.github.io repo is only for hosting the GitHub pages, it contains a copy of the docs output folder of ocrd-website. Everything you see on the OCR-D page is generated from the site source folder of ocrd-website and repo submodules.

This setup has the advantage of being mostly self-contained and allowed us to move away from various repos for documentation, slides etc. to a single repo with a well-defined structure, multilingual support, menu support etc.

This means that all changes should be done in https://github.com/OCR-D/ocrd-website which should be public now.

However, as @stweil pointed out, using a wiki significantly lower the threshold to contribute and we'll be happy to integrate that into the official documentation in due time. PRs are still welcome but the workflow is more complex and we're currently lacking comprehensive "documentation on documentation". I've put that on my TODO list to consolidate the info on documentation and how to contribute, archive obsolete repos etc.

I propose to use the wiki of ocrd-website for this (and will have a look through our network of repos for information from wikis of other projects).

stweil · 2020-03-31T19:15:18Z

Here is some new documentation for installing OCR-D on Windows. https://ocr-d.de/en/setup is already rather lengthy, so please suggest where it should be added best.

Thank you to @LauraErhard for doing the final test with Ubuntu and finding some missing information.

Add rule for page2img

76f8e69

Signed-off-by: Stefan Weil <sw@weilnetz.de>

bertsky approved these changes Mar 21, 2020

View reviewed changes

stweil merged commit 76f8e69 into OCR-D:master Mar 21, 2020

stweil deleted the update branch March 21, 2020 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rule for page2img #74

Add rule for page2img #74

stweil commented Mar 21, 2020

bertsky commented Mar 21, 2020

stweil commented Mar 21, 2020

bertsky commented Mar 23, 2020

wrznr commented Mar 26, 2020

stweil commented Mar 27, 2020

bertsky commented Mar 28, 2020 •

edited

Loading

bertsky commented Mar 28, 2020

stweil commented Mar 28, 2020

kba commented Mar 30, 2020 •

edited

Loading

stweil commented Mar 31, 2020 •

edited

Loading

Add rule for page2img #74

Add rule for page2img #74

Conversation

stweil commented Mar 21, 2020

bertsky commented Mar 21, 2020

stweil commented Mar 21, 2020

bertsky commented Mar 23, 2020

wrznr commented Mar 26, 2020

stweil commented Mar 27, 2020

bertsky commented Mar 28, 2020 • edited Loading

bertsky commented Mar 28, 2020

stweil commented Mar 28, 2020

kba commented Mar 30, 2020 • edited Loading

stweil commented Mar 31, 2020 • edited Loading

bertsky commented Mar 28, 2020 •

edited

Loading

kba commented Mar 30, 2020 •

edited

Loading

stweil commented Mar 31, 2020 •

edited

Loading