-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature deployment ordering & Environmental Variables #54
Conversation
…vars in their place
Thanks @ThomasThelen! I'll have a look this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good @ThomasThelen, thanks. The wait/sleep approach was exactly what I was thinking would work well here.
I made a few comments you can take or leave.
Re: Testing, I don't think it's ideal to have to run the tests in a container. But do they all pass there? I ran pytest locally and got just two failures:
================================================= short test summary info =================================================
FAILED tests/test_client.py::test_client - rq.connections.NoRedisConnectionException: Could not resolve a Redis connection
FAILED tests/test_eml220_processor.py::test_production_eml - httpx.ConnectError: [Errno 8] nodename nor servname provide...
============================================== 2 failed, 65 passed in 9.29s ===============================================
The errors just look like missing or wrong hostnames so maybe some minor tweaking would get things passing?
@@ -19,7 +19,10 @@ spec: | |||
image: slinkythegraph/slinky | |||
imagePullPolicy: Always | |||
command: ["/bin/sh","-c"] | |||
args: ["slinky schedule --prod; rqscheduler --host redis -i 10"] | |||
args: ["slinky schedule; rqscheduler --host redis -i 10"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use the value in REDIS_HOST
or just not specify it all and rely on the environmental variables we're already setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch-done in c921777.
@@ -20,7 +20,10 @@ spec: | |||
image: slinkythegraph/slinky | |||
imagePullPolicy: Always | |||
command: ["slinky"] | |||
args: ["work", "default", "--prod"] | |||
args: ["work", "default", "--debug"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above in the worker-dataset deployment
curl | ||
|
||
ADD enable_update.sh /enable_update.sh | ||
ENV DBA_PASSWORD "dba" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is outside of the scope of this review I think but having the password hardcoded here seems like a future problem. Do you have any good ideas for handling this secret? Maybe filing an issue is a good place to start unless there's already one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely leaves room for errors during deployment. The problem is that the secret needs to be shared across developers that are building this image-which means some sort of overarching system where the build takes place. For the moment I think Keybase works well enough-this image should be built rarely anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the secret needs to be shared across developers that are building this image-which means some sort of overarching system where the build takes place.
Does it? For example, the Virtuoso image has a default password and lets you override it during container creation. i.e., customization of the password isn't baked into the image. I think the change here is just removing the ENV
statement from both Virtuoso Dockerfiles and injecting it at runtime from config. I imagine we're already doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the tests in a container is kludgy-but I don't see any other alternative for anyone that can't get the RDF package installed locally. I think that it should be runnable in both the container and outside of the container, given that the REDIS_HOST, REDIS_PORT, VIRTUOSO_HOST, and VIRTUOSO_PORT environmental variables are set to the right locations. |
Actually I take that back. d1lod/tests/conftest.py has hardcoded service names that need to get changed to env vars |
Right, if you can't get the deps installed you can't run the tests. That's fine. To your last statement, I'd really like to be able to run the tests without having to set environmental variables. It makes me think we're missing a default or two somewhere. Is that what you're having to do? |
The defaults are set to production values, which are service names. I left the redis default to None, which is the local graph store though. One problem I'm seeing is in d1lod/tests/conftest.py with hardcoded service names. The issue is that we run two graphs in tests, but only have one environmental variable for the graph host (and if we want to run these on localhost we want to say the the host is on localhost not at http://blazegraph). I can add a second environmental variable, BLAZEGRAPH_HOST - but then we have unittest requirements trickling up into the deployment logic. |
Gotcha. Thanks for explaining. I think it generally works better and follows norms if defaults are for local development. Reasoning being that local development is the same for everyone and its production that varies. Also makes it faster to get started and easier to do various tasks if development setup steps are minimal (e.g., I open up ~6 separate shells while developing and setting global env vars or per-shell env vars is just more work). Re: Helping the tests pick the right graph store: It doesn't feel like we'd need another env var so I agree with you there. Both Stores could use the same env var, have appropriate sensible defaults, and have their tests could manage getting them the environment variables they need (which I think should be zero). |
The issue is that both stores need their own environmental variable for tests. For production we only run one graph, so we have the single environmental variable. This causes problems in testing because we run two graphs-and since the second graph can be either at localhost:1234 or http://blazegraph:1234 we need a way to specify. Since they both run under the same instance, they can't share the same address. I can set the default to localhost so that it works for local tests, but then dockerized tests will fail. |
In ebf2c1f I changed the default network location in |
Great, thanks @ThomasThelen. Do you want me review again? |
I'd appreciate it if you would. |
I ran into two things:
|
|
The redis issue should be resolved in e6a62be The default store for the cli is now LocalStore, but can be overridden to use the environmental variables with To reproduce,
|
Thanks for the continued work on this @ThomasThelen. For commands like If the above commands all produce stack traces without PS: I made one PR comment that might be a culprit. Does that change any of the above output? |
Alright with 57dd95e, using Tested using docker-compose and deployed to dev. |
Thanks @ThomasThelen. I tested this out and I had two make two changes, d6b8067 and 7033442 to make the tests pass locally. They look minor and it doesn't look like they'll affect anything but local development. |
Fixes #53
Fixes #51
Deployment Startup Order
This PR should fix any issues around startup order, meaning that we can start the scheduler, workers, database, etc in any order and the services that need particular services to come online will wait. What this looks like is the scheduler and workers both waiting for redis to come online before executing any job logic. The workers also wait for Virtuoso to come online once a connection to Redis is established to prevent any premature SPARQL inserts.
This allows us to decouple startup order from the kubernetes deployment, resulting in a few changes to the deployment files. I left the redis and virtuoso readiness probes in. I don't think there's much harm in letting other containers or developers know that they're not ready for interaction (note that the pods are in the Ready state while the code is waiting on redis).
The deployment order is invariant with respect to readiness probes. The initContainer on the scheduler however, does dictate startup behavior (it won't run the scheduler
ENTRYPOINT
until redis comes online). Since this logic is now in the scheduler itself, I removed the initContainer.Environmental Variables
This PR also has the addition of a few environmental variables for specifying hosts and ports. These are stored in the settings and was done in ea6feb4.
Production/Development/CLI Environments
The
ENVIRONMENTS
dict was partially deprecated in the changes above and is fully removed in 3ff13e3. TheSlinkyClient
constructor default parameters now come from the environmental variables. To match this change, the calls to SlinkyClient in the cli were refactored.Changes to Testing
I think that the ideal testing workflow is achieved through setting up pipenv before running pytest to isolate all of the dependancies. The RDF module requirement forces us to break this model because of the manual building & installing that needs to happen for its dependencies. This PR includes an additional container running the
d1lod
image alongside the graph store in the docker-compose file. This container has all of the test dependencies and can run the unit tests. The pain point is that you have todocker exec
in to run them. I could just call pytest in the entrypoint - but the container will exit before the test logs can be checked.It's not a perfect system, but it at least provides a reliable way to run the tests on any system and I'm open to other options/routes. There are some changes in the unit test fixtures that change the default blazegraph and virtuoso endpoints to the docker service name rather than localhost.
Testing
Testing for this mostly revolved around deploying things at different orders and making sure that the job system still functioned.
Scheduler, Redis, & Worker Tests
I did some manual testing, bringing the three services up in different orders which are bulleted below. In each case I confirmed with the logs that services waited for redis to come online. Because the retry timeout is set for 30 seconds, give the tests at least a minute before the thumbs up/down.
Virtuoso Startup Tests
Remaining Testing