Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify shared dependencies #38

Merged
merged 11 commits into from
May 31, 2021
Merged

Unify shared dependencies #38

merged 11 commits into from
May 31, 2021

Conversation

mgorsk1
Copy link
Contributor

@mgorsk1 mgorsk1 commented May 22, 2021

Signed-off-by: at91mm mariusz.gorski@ing.com

Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>

## Alternatives

- Using pip functionality `constraint` but I've identified this as less intuitive for end users as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with python packaging to have an answer here, but it is a bit of a shame we can't use constraint files -- it feels like the "correct" solution. But I do agree it's not intuitive, and even if we published build scripts everywhere, many people wouldn't use them because there is an expectation you can just pip install -r requirements and be off to the races

If instead of including requirements.txt, we only used setup.py deps + constraint file, i wonder if it's possible to automatically bring in the constraint file in setup.py or setup.cfg ?

Copy link
Contributor Author

@mgorsk1 mgorsk1 May 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I know (and from research I did) there is no way to make setup.py constraints aware. what you would do in such case is just add shared dependencies to install_requires without pinning versions and then, when doing pip install you supply pinned versions using --constraint parameter. what's important is that --constraint doesn't force installation of additional deps - if something is listed in constraints file but not in install_requires then it won't be installed

Copy link
Contributor Author

@mgorsk1 mgorsk1 May 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be completely precise - I also was thinking about making clearing setup.py files out of deps and pull them by reading requirements*txt files. again, it would be super easy for anything apart databuilder

Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
@mgorsk1 mgorsk1 marked this pull request as ready for review May 24, 2021 07:30
@mgorsk1 mgorsk1 requested a review from a team as a code owner May 24, 2021 07:30
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
Copy link

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea. I have one suggestion by splitting out the test-dependencies using pre-commit. LMYWYT

rfcs/038-unify-shared-dependencies.md Show resolved Hide resolved
Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
5. Create `requirements-dev.txt` file in root folder of amundsen repo containing pinned dependencies for shared dev libraries (mypy, flake8, etc.)
6. Remove libraries listed in `requirements-dev.txt` from `requirements.txt` of each subdirectory
7. Create symbolic link to `requirements-common.txt` and `requirements-dev.txt` in each subdirectory
8. Shift context of building docker images to root folder (otherwise we cannot share `requirements-common.txt` and `requirements-dev.txt` symlinks across subdirectories)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of shifting the build context up one directory makes sense, it also gives you the ability to copy in the amundsen common that is on the same brach (not in pypi) and copy it into each image. This could also give the benefit that we wouldn't need to add new versions of amundsen-common to pypi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we go about pypi installation of amundsen services if common is not a pypi pckage? I think not all users use pre-built images.

Moreover, we have introduced apache atlas utils recently as a part of common and will be relying on them for both databuilder and metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pypi installation of search, metadata and frontend really that common of use-case? I get that we wouldn't want to pull the plug on that user group, and by no means am I against keeping it in pypi and just overriding it during the build process, but it just feels like one of those areas where we shouldn't need to have the common directory in pypi any more since it's now in the same branch - mostly thinking out loud here.

@mgorsk1 mgorsk1 added Status: Final Comment Period (FCP) On final comment period (seven days) and removed Status: Active labels May 24, 2021
@verdan
Copy link
Member

verdan commented May 24, 2021

@mgorsk1 how would that work with the PyPi versions of Amundsen packages?
PyPi requires the files needed for the setup.py in the build, that means we need to include the requirements-common.txt and other common requirements file within the build for each package.

@mgorsk1
Copy link
Contributor Author

mgorsk1 commented May 24, 2021

That's exactly how it would work. The requirements-dev and requirements-common files would be a part of artifacts built by sdist and bdist. It should be possible with proper manifest definitions (assuming your concern is over symlinks resolution?)

Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
@feng-tao
Copy link
Member

how about databuilder? currently the model is defined in both common and databuilder which we should eventually make databuilder depends on common model to make it consistent to avoid model drifting.

@mgorsk1
Copy link
Contributor Author

mgorsk1 commented May 28, 2021

In principle I agree that long term there should not even be models folder in any of the project except for the common. This would be however a major refactor from a different angle than I had in mind for this RFC. I will add this to future possibilities.

Copy link
Member

@verdan verdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me and really excited about this 🙂
I don't know the technical challenges regarding the PyPi release yet and adding links to files outside the root directory, but more than happy to at least give this a try.
The next step would be to remove all the local models from each package and simply rely completely on common models.

@mgorsk1
Copy link
Contributor Author

mgorsk1 commented May 28, 2021

Nice, thanks! I've already started to draft something under amundsen-io/amundsen#1163 , seems most major things are resolved - only testing of pypi artifacts and building docker images remains.

Signed-off-by: mgorsk1 <gorskimariusz13@gmail.com>
@mgorsk1 mgorsk1 merged commit 8a0d170 into master May 31, 2021
@mgorsk1 mgorsk1 added Status: Landed The proposed changes are shipped in an actual release and removed Status: Final Comment Period (FCP) On final comment period (seven days) labels Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Landed The proposed changes are shipped in an actual release
Projects
None yet
7 participants