-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI to publish multiple e-mission-server images from multiple branches on release/commit #752
Comments
Concretely:
@aGuttman, please LMK if you have any questions around this |
Just want to acknowledge that I've read this. Will probably have questions soon when I begin actually engaging with it. |
@aGuttman I also want to know whether there is any principled reason to put the Because at some point, I think that we may want to split up the monolithic server code. The modules for core code which does not change frequently aka the storage or the data models can be pulled into separate python packages. Then the analysis code (aka in https://github.com/e-mission/e-mission-eval-private-data) can pip install it instead of having to check out the server and add it to the So there may end up being multiple server repos, no just one. So maybe a pre-requisite decision is |
Keeping the Dockerfile in the server repo solves several problems. It keeps the GitHub actions needed to build and push images simpler and avoids potential branch confusion issues. Keeping the Dockerfile in a separate repo requires either manual work to initiate build/push, cross-repository dependent actions, or additional management software. Cross-repository actions can work, but in addition to the yaml file simply being more complex, they create more design decisions that don't have clear best answers to deal with branches. Say branch A in the server repo is updated and we want a new image built and pushed. The docker repo doesn't know anything has changed. We can create a github action in the branch that checks out the docker repo and uses the Dockerfile it gets from the repo to build. What happens when branch B is updated and needs a new image built? It also checkouts the Docker repo. Should the Docker repo have a single Dockerfile that is general enough for all server branches to use, and they rely on some secondary configuration file kept in the server branch to deal with the differences? Or should the secondary file be kept on the Docker repo? Or should we have different Dockerfiles for each branch? How do we determine which configuration gets used? Should the Docker repo have different branches that mirror the server branches? If we keep more branch specific configuration info on the server branches, does this properly keep separate the different responsibilities implied by having a separate Docker repo? If more branch specific configuration info is kept in the Docker repo, is a developer working in the server branch responsible for modifying the configuration in another repo as the brach changes? Or when new server branches are made? If the Dockerfile is kept on the server repo, each branch has its own that can be changed as needed without worrying about interaction outside the repo. When a new image is needed, a GitHub action will run and code won't need to be copied from any outside source. The image can be built and pushed right there. Most discussion online prefer the Dockerfile in the app repo. There are those who advocate having a separate Docker repo, but these usually are paired with more automation software and seem to assume that you have one main branch that is privileged, the others are just for dev and don't need to be published. Images of different branches differentiated as tags works. I prefer build on commit since it makes the GitHub actions easier and allows different branches to be updated on their own pace. This is also premised on the idea that commits made to these branches are good, (as in the messy parts of development get dealt with locally, on a branch that we care less about, or in a fork and merged in) so that we aren't building and pushing bad images. Host images on dockerhub to support cloud team's workflow. If we split up the server code and have the analysis pip installed, would we still want to provide different images for just he server or server with different analysis packages installed? If so, I feel like the idea of keeping a Dockerfile in the server/analysis repo still makes sense. As the analysis is updated, a Dockerfile in that repo can build an image with the latest changes to be pushed for use. |
@aGuttman can you add the original links here for my edification? |
Frustratingly, I didn't find a comprehensive resource explaining the way things should be done or the principals involved, mostly low engagement stack posts: I am assuming a monorepo because separating the modules didn't come up until later. I did try to address the idea of splitting up the modules in the last paragraph, but honestly I'm not sure I understand the codebase or deployment strategies enough to know what to say. |
some of the modules in the e-mission-server repository are reused by other repos. As a concrete example, the DB access code is accessed by the public dashboard repo. Right now, we handle that by cloning the entire server repo every time, which seems like overkill. There is no reason for the REST API components to be accessible from a repo that runs jupyter notebooks. An even more apt example here is that the scheduled task instances that will be launched from these images are intended for running modeling tasks. There is no reason that they need access to the REST APIs Splitting the storage code out into a library will help with this issue. If we publish the storage library as a We need to plan out how that will interact with the automatic image creation. |
There are two ways to handle this:
|
Main question for the first one: "Can we pip install from a local directory"? ETA: Friday |
Seems like this is just |
pip installing from a local directory can be done with The difficulty for us is going to be trying to eliminate circular dependancies. For example, it would be nice to break off the entire If I'm thinking about this correctly (please let me know if I'm not), we would need to break these directories into multiple modules each so that our dependancies form a DAG. This should be possible. Since the code works, it doesn't have dependency cycles where I don't know if this makes sense to do or not. Looking at a few files by hand, it looks messy and a good analysis to figure this out isn't coming to mind immediately. That said, I have a nagging feeling that I'm overcomplicating, and this is actually fine if I think about it from some different perceptive, I'm just not sure what. Perhaps just find a bunch of things that we can package into one big module? |
@aGuttman just to clarify: the goal here is not to actually modularize the server code right now. That is a fairly tricky change and well outside the scope of this PR (and is tracked in #506 anyway). The goal of this investigation is to make design choices for the CI that don't preclude splitting up the code later. So I would want to basically test out the That + code cleanup are the only two tasks left on this issue. By Tuesday, please:
|
Ok, I guess I'm confused about what I should show for The command works provided you include a file
Changes made to the code are used automatically without having to run a reinstall/update. Version numbering is kept in Moving I don't like the naming scheme for the imports. It feels wrong to me the you access Is that adequate information for |
I guess the easy thing to do for the naming issue is to just create a dir |
Yes. Our goal was just to see that the option of creating putting the Dockerfile into the e-mission-server repo would work even after we (in the future) split out the repo. You can now complete the changes for moving the Dockerfile into the server repo and building it via CI
Not sure why an even easier solution is not to just change the name to |
Ah, true. I used |
@aGuttman any updates on the cleaned up PR to review? |
Further cleanup is at |
Finally closing this! |
Finally closing this! |
1 similar comment
Finally closing this! |
In the NREL hosted environment, the cronjobs are run as AWS scheduled tasks.
This causes the containers running them to be created and destroyed for every run.
This implies that the containers should be fast to create.
The analysis tasks are currently created using the
emission/e-mission-server.dev.server-only
images. This allows maximum flexibility, since we clone the server code and set up the related dependencies at run-time. It works fine for a long-running analysis container with scripts that are launched using cronjobs.However, it is a terrible design for scheduled tasks where the container is created on demand. In particular, setting up the dependencies using conda is extremely slow and resource intensive. When we launched multiple scheduled tasks in parallel on the NREL hosting environment, they consumed the entire cluster.
From @jgu2
So we really need to create images with the conda dependencies and the server code preinstalled. The scheduled tasks can then spin up the container and execute the task quickly.
Our default Dockerfile currently does this
https://github.com/e-mission/e-mission-docker/blob/master/Dockerfile
And we currently push it to the
emission/e-mission-server
dockerhub imagehttps://github.com/e-mission/e-mission-docker/blob/master/README.md#docker-build-instructions
However, we need to support better automation and flexibility.
@aGuttman can you handle this task in coordination with @jgu2?
The text was updated successfully, but these errors were encountered: