Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is GIT_RECURSIVE disabled by default? #32

Closed
kba opened this issue Jan 6, 2020 · 18 comments
Closed

Why is GIT_RECURSIVE disabled by default? #32

kba opened this issue Jan 6, 2020 · 18 comments
Assignees
Labels
question Further information is requested

Comments

@kba
Copy link
Member

kba commented Jan 6, 2020

GIT_RECURSIVE = # --recursive
[...]
git submodule sync $(GIT_RECURSIVE) $@
if git submodule status $(GIT_RECURSIVE) $@ | grep -qv '^ '; then \
    git submodule update --init $(GIT_RECURSIVE) $@ && \
    touch $@; fi

This will fail for modules that have themselves submodules such as ocrd_olena. Why ist this not the default?

@kba kba assigned stweil and bertsky Jan 6, 2020
@kba kba added the question Further information is requested label Jan 6, 2020
@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

I changed recursive submodules into an opt-in with 4127bdd in #24. Nobody complained back then.

The ratio was that at the time, submodules were only needed for tests and models across, but they bloated the time and space requirements heavily.

So maybe we should make this distinction more fine-grained now: modules like ocrd_olena which now needs the recursive submodule should set GIT_RECURSIVE = --recursive as a target-specific variable, overriding general user preference. Users can still override GIT_RECURSIVE = --recursive (even in a target-specific way) in their local.mk. What do you think?

@kba
Copy link
Member Author

kba commented Jan 6, 2020

I understand the reasoning. Yeah, probably the easiest way is to either set GIT_RECURSIVE for the ocrd_olena target or just do a git submodule update --init for ocrd_olena. I actually find the latter more intuitive but don't feel strongly either way.

It would be good to decide on a solution soon-ish since building the docker images with a blanket GIT_RECURSIVE=--recursive set will indeed waste a lot of unnecessary space.

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

Since segmentation-runner and ocrd_anybaseocr have no submodules anymore, we are currently left with:

  • opencv-python: needs the submodules to compile
  • ocrd_olena: needs at least 1 submodule to compile
  • tesseract: submodules only needed for testing and training utils
  • cor-asv-ann: submodules only for models and training
  • cor-asv-fst: submodules only for models

I'll prepare a PR right away which gives opencv-python and ocrd_olena recursive, and document this possibility for local.mk.

@kba
Copy link
Member Author

kba commented Jan 6, 2020

Great, thanks, feel free to push directly to https://github.com/kba/ocrd_all/tree/submodules-2020-01-03

@kba
Copy link
Member Author

kba commented Jan 6, 2020

On a semi-related note, we'll need to allow .git and .gitmodules for at least ocrd_olena building docker images or the git submodule calls in ocrd_olena will fail :/

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

Great, thanks, feel free to push directly to https://github.com/kba/ocrd_all/tree/submodules-2020-01-03

Ok, I will. Just found that opencv-python has no build rules so far, checking to get that right as well.

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

On a semi-related note, we'll need to allow .git and .gitmodules for at least ocrd_olena building docker images or the git submodule calls in ocrd_olena will fail :/

Are you referring to the entries in .dockerignore? I thought these would only catch the top-level (ocrd_all)? Otherwise editable mode images would not have worked (ocrd/all:minimal-git etc).

@kba
Copy link
Member Author

kba commented Jan 6, 2020

Otherwise editable mode images would not have worked (ocrd/all:minimal-git etc).

You're right that it will only ignore the top-level .git/.gitmodules but I think all submodules including the recursive ones are hoisted to .git/modules. And in ocrd_olena we do a git submodule sync which then complains that it cannot find ocrd_olena in the root level .git/modules.

make[2]: Entering directory '/build/ocrd_olena'
git submodule sync "repo/olena"
fatal: not a git repository: /build/ocrd_olena/../.git/modules/ocrd_olena
Makefile:36: recipe for target 'repo/olena/configure' failed
make[2]: Leaving directory '/build/ocrd_olena'
make[2]: *** [repo/olena/configure] Error 128
Makefile:46: recipe for target 'deps' failed
make[1]: Leaving directory '/build/ocrd_olena'
make[1]: *** [deps] Error 2
Makefile:181: recipe for target '/usr/local/bin/ocrd-olena-binarize' failed
make: *** [/usr/local/bin/ocrd-olena-binarize] Error 2
The command '/bin/sh -c set -a; bash docker.sh' returned a non-zero code: 2
Makefile:447: recipe for target 'docker-maximum' failed
make: *** [docker-maximum] Error 2

https://circleci.com/gh/OCR-D/ocrd_all/17

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

but I think all submodules including the recursive ones are hoisted to .git/modules

you are absolutely right. So perhaps it's best to just remove these 2 lines from .dockerignore – it only costs a little extra space in build context, the final images are cleared of the build directory anyway (except in editable mode).

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

Just found that opencv-python has no build rules so far, checking to get that right as well.

@stweil can you give me any information on how you were using that submodule? You said you needed it for ARM builds – GNU/Linux or Android/Linux perhaps? Also, did you (have to) build the C++ library from scratch as well, or only the Python bindings? (I am able to build and install a whl file with ENABLE_HEADLESS=1 python opencv-python/setup.py bdist_wheel && pip install dist/opencv_python_headless-*.whl but I am not sure this is what you originally intended, so I am hesitating to forge that into the Makefile.)

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

I decided not to wait for Stefan's answer (since we can still improve opencv-python build later).

Fingers crossed! – yey it helped!

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

Oh @kba, just one more request – it doesn't strictly belong here or in #31 either, but it's not worth a separate issue: Can you please add the editable mode Docker configurations (minimum-git medium-git maximum-git) to your Circle-CI configuration for the master branch and your Dockerhub bridge as well?

@kba
Copy link
Member Author

kba commented Jan 6, 2020

for the master branch

Sure

and your Dockerhub bridge

You actually want to publish those? Is it reasonable to fetch such a docker image that is meant to be edited? I see confusion looming.

But it cannot hurt to set it up.

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

You actually want to publish those? Is it reasonable to fetch such a docker image that is meant to be edited? I see confusion looming.

It's not the Docker image that is editable. It's the git repos inside it. When you run the image, your local container may make modifications. These are not meant for publication on Docker hub, on the contrary: users can then pull from upstream git repos (or make local changes themselves) instead of having to wait for new Docker images.

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

Thanks!

@kba
Copy link
Member Author

kba commented Jan 6, 2020

I understand the concept but I see a certain danger that user will pull docker image, say ocrd/all:maximum-git, instantiate it, make adaptations within that container, pull to update the maximum-git image, create a new container (where are my changes??!1!!), accidentally create her/his containers with --rm -it to edit and delete right after (oops) etc.

Just saying that it is dangerous to treat by-nature ephemeral containers like virtual machines unless you know what you're doing. But until we're documenting this somewhere, it's a non-issue.

@bertsky
Copy link
Collaborator

bertsky commented Jan 6, 2020

create a new container (where are my changes??!1!!), accidentally create her/his containers with --rm -it to edit and delete right after (oops) etc.

Oh, I see what you mean. Or the inevitable docker container prune ...

Yes, (allowing) to deal with git repos gives an impression of persistence, more than any other possible modification. So the increased flexibility comes with a risk here.

Just saying that it is dangerous to treat by-nature ephemeral containers like virtual machines unless you know what you're doing. But until we're documenting this somewhere, it's a non-issue.

Agreed. And if we do, we should mention docker commit for any non-trivial modifications.

@kba
Copy link
Member Author

kba commented Jan 6, 2020

This now works as intended, closing. @stweil feel free to reopen or create a new issue if those changes broke your ARM setup.

@kba kba closed this as completed Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants