Feature deployment ordering & Environmental Variables #54

ThomasThelen · 2021-11-18T02:22:30Z

Fixes #53
Fixes #51

Deployment Startup Order

This PR should fix any issues around startup order, meaning that we can start the scheduler, workers, database, etc in any order and the services that need particular services to come online will wait. What this looks like is the scheduler and workers both waiting for redis to come online before executing any job logic. The workers also wait for Virtuoso to come online once a connection to Redis is established to prevent any premature SPARQL inserts.

This allows us to decouple startup order from the kubernetes deployment, resulting in a few changes to the deployment files. I left the redis and virtuoso readiness probes in. I don't think there's much harm in letting other containers or developers know that they're not ready for interaction (note that the pods are in the Ready state while the code is waiting on redis).

The deployment order is invariant with respect to readiness probes. The initContainer on the scheduler however, does dictate startup behavior (it won't run the scheduler ENTRYPOINT until redis comes online). Since this logic is now in the scheduler itself, I removed the initContainer.

Environmental Variables

This PR also has the addition of a few environmental variables for specifying hosts and ports. These are stored in the settings and was done in ea6feb4.

Production/Development/CLI Environments

The ENVIRONMENTS dict was partially deprecated in the changes above and is fully removed in 3ff13e3. The SlinkyClient constructor default parameters now come from the environmental variables. To match this change, the calls to SlinkyClient in the cli were refactored.

Changes to Testing

I think that the ideal testing workflow is achieved through setting up pipenv before running pytest to isolate all of the dependancies. The RDF module requirement forces us to break this model because of the manual building & installing that needs to happen for its dependencies. This PR includes an additional container running the d1lod image alongside the graph store in the docker-compose file. This container has all of the test dependencies and can run the unit tests. The pain point is that you have to docker exec in to run them. I could just call pytest in the entrypoint - but the container will exit before the test logs can be checked.

It's not a perfect system, but it at least provides a reliable way to run the tests on any system and I'm open to other options/routes. There are some changes in the unit test fixtures that change the default blazegraph and virtuoso endpoints to the docker service name rather than localhost.

Testing

Testing for this mostly revolved around deploying things at different orders and making sure that the job system still functioned.

Scheduler, Redis, & Worker Tests

I did some manual testing, bringing the three services up in different orders which are bulleted below. In each case I confirmed with the logs that services waited for redis to come online. Because the retry timeout is set for 30 seconds, give the tests at least a minute before the thumbs up/down.

Virtuoso Startup Tests

Workers, Virtuoso
Virtuoso, Workers

Remaining Testing

Docker compose
Local tests
CLI test

…ting

…vars in their place

… can use it

amoeba · 2021-12-02T02:10:53Z

Thanks @ThomasThelen! I'll have a look this week.

amoeba

This looks really good @ThomasThelen, thanks. The wait/sleep approach was exactly what I was thinking would work well here.

I made a few comments you can take or leave.

Re: Testing, I don't think it's ideal to have to run the tests in a container. But do they all pass there? I ran pytest locally and got just two failures:

================================================= short test summary info =================================================
FAILED tests/test_client.py::test_client - rq.connections.NoRedisConnectionException: Could not resolve a Redis connection
FAILED tests/test_eml220_processor.py::test_production_eml - httpx.ConnectError: [Errno 8] nodename nor servname provide...
============================================== 2 failed, 65 passed in 9.29s ===============================================

The errors just look like missing or wrong hostnames so maybe some minor tweaking would get things passing?

amoeba · 2021-12-07T00:55:04Z

deploy/deployment/scheduler-deployment.yaml

@@ -19,7 +19,10 @@ spec:
            image: slinkythegraph/slinky
            imagePullPolicy: Always
            command: ["/bin/sh","-c"]
-            args: ["slinky schedule --prod; rqscheduler --host redis -i 10"]
+            args: ["slinky schedule; rqscheduler --host redis -i 10"]


Is it possible to use the value in REDIS_HOST or just not specify it all and rely on the environmental variables we're already setting?

Nice catch-done in c921777.

deploy/deployment/worker-dataset-deployment.yaml

amoeba · 2021-12-07T00:56:46Z

deploy/deployment/worker-default-deployment.yaml

@@ -20,7 +20,10 @@ spec:
          image: slinkythegraph/slinky
          imagePullPolicy: Always
          command: ["slinky"]
-          args: ["work", "default", "--prod"]
+          args: ["work", "default", "--debug"]


Same comment here as above in the worker-dataset deployment

amoeba · 2021-12-07T00:58:39Z

deploy/docker/initContainer/Dockerfile

+  curl
+
+ADD enable_update.sh /enable_update.sh
+ENV DBA_PASSWORD "dba"


This comment is outside of the scope of this review I think but having the password hardcoded here seems like a future problem. Do you have any good ideas for handling this secret? Maybe filing an issue is a good place to start unless there's already one.

It definitely leaves room for errors during deployment. The problem is that the secret needs to be shared across developers that are building this image-which means some sort of overarching system where the build takes place. For the moment I think Keybase works well enough-this image should be built rarely anyways.

The problem is that the secret needs to be shared across developers that are building this image-which means some sort of overarching system where the build takes place.

Does it? For example, the Virtuoso image has a default password and lets you override it during container creation. i.e., customization of the password isn't baked into the image. I think the change here is just removing the ENV statement from both Virtuoso Dockerfiles and injecting it at runtime from config. I imagine we're already doing that?

Looks like I forgot to remote the initContainer folder, done here aef613b
I also removed the virtuoso password from the dockerfile (which is overridden by the one in the k8 deployment file and docker-compose). Done in 65b3f33

ThomasThelen · 2021-12-08T21:06:42Z

Running the tests in a container is kludgy-but I don't see any other alternative for anyone that can't get the RDF package installed locally. I think that it should be runnable in both the container and outside of the container, given that the REDIS_HOST, REDIS_PORT, VIRTUOSO_HOST, and VIRTUOSO_PORT environmental variables are set to the right locations.

From the docker image,

ThomasThelen · 2021-12-08T21:15:27Z

Actually I take that back. d1lod/tests/conftest.py has hardcoded service names that need to get changed to env vars

amoeba · 2021-12-08T21:16:34Z

Right, if you can't get the deps installed you can't run the tests. That's fine. To your last statement, I'd really like to be able to run the tests without having to set environmental variables. It makes me think we're missing a default or two somewhere. Is that what you're having to do?

ThomasThelen · 2021-12-08T21:21:07Z

The defaults are set to production values, which are service names. I left the redis default to None, which is the local graph store though.

One problem I'm seeing is in d1lod/tests/conftest.py with hardcoded service names. The issue is that we run two graphs in tests, but only have one environmental variable for the graph host (and if we want to run these on localhost we want to say the the host is on localhost not at http://blazegraph). I can add a second environmental variable, BLAZEGRAPH_HOST - but then we have unittest requirements trickling up into the deployment logic.

amoeba · 2021-12-08T21:31:58Z

Gotcha. Thanks for explaining.

I think it generally works better and follows norms if defaults are for local development. Reasoning being that local development is the same for everyone and its production that varies. Also makes it faster to get started and easier to do various tasks if development setup steps are minimal (e.g., I open up ~6 separate shells while developing and setting global env vars or per-shell env vars is just more work).

Re: Helping the tests pick the right graph store: It doesn't feel like we'd need another env var so I agree with you there. Both Stores could use the same env var, have appropriate sensible defaults, and have their tests could manage getting them the environment variables they need (which I think should be zero).

ThomasThelen · 2021-12-08T21:42:54Z

Re: Helping the tests pick the right graph store: It doesn't feel like we'd need another env var so I agree with you there. Both Stores could use the same env var, have appropriate sensible defaults, and have their tests could manage getting them the environment variables they need (which I think should be zero).

The issue is that both stores need their own environmental variable for tests. For production we only run one graph, so we have the single environmental variable. This causes problems in testing because we run two graphs-and since the second graph can be either at localhost:1234 or http://blazegraph:1234 we need a way to specify. Since they both run under the same instance, they can't share the same address. I can set the default to localhost so that it works for local tests, but then dockerized tests will fail.

ThomasThelen · 2021-12-09T21:03:45Z

In ebf2c1f I changed the default network location in settings.py to what I think they should be for running things locally. In the docker compose and kubernetes deployment files, they're set to the networked version (http://virtuoso, http://blazegraph, etc).

ThomasThelen · 2021-12-15T00:44:25Z

@amoeba Commit e3a8e62 should remove the failing unit test that we encountered earlier with the Graph class.

amoeba · 2021-12-15T01:06:20Z

Great, thanks @ThomasThelen. Do you want me review again?

ThomasThelen · 2021-12-15T01:13:45Z

Great, thanks @ThomasThelen. Do you want me review again?

I'd appreciate it if you would.

amoeba · 2021-12-15T01:31:29Z

I ran into two things:

pytest fails with on failure: tests/test_client.py::test_client (rq.connections.NoRedisConnectionException: Could not resolve a Redis connection). Do you see all the tests passing on your end? I'm running pytest directly and not containerized.
When virtuoso and redis are not running, slinky get fails with connection errors. This is technically a regression but it's not a total deal-breaker, I found it handy to have slinky get not depend on an external services since Docker and such can take so long to start up and I'm constantly stopping it. This could be addressed after we merge unless you see an easy way to enable it that still fits within the spirit of this PR.

ThomasThelen · 2021-12-15T20:50:14Z

I have all the tests working while containerized-I'll try and take a look to see what's not working locally.
I can adda flag to the SlinkyClient to use the in-memory store rather than a running service. This can be used by the cli get method to use the local store.

ThomasThelen · 2021-12-26T22:38:42Z

The redis issue should be resolved in e6a62be

The default store for the cli is now LocalStore, but can be overridden to use the environmental variables with --local False.
I ran across a few issues while testing with the LocalStore; I think(I cant test previous versions without the dockerization) that these were pre-existing conditions; I can file issues around each of these if they're in fact bugs that .

To reproduce,

Clone & checkout
Build the d1lod image
Modify the docker-compose file to use the local image build in step 2
run docker-compose up
docker exec into the image
Run the commands below
Run them again with --local false ie slinky insert --local false xyz and see them work

slinky insert urn:uuid:0910e8cf-9a67-43e1-b257-ebd4e3d66bca

root@ce762e064de4:/tmp# slinky insert urn:uuid:0910e8cf-9a67-43e1-b257-ebd4e3d66bca
Traceback (most recent call last):
  File "/usr/local/bin/slinky", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/d1lod/cli.py", line 221, in insert
    client.process_dataset(id)
  File "/usr/local/lib/python3.9/dist-packages/d1lod/client.py", line 57, in process_dataset
    self.store.insert_model(model)
TypeError: insert_model() missing 1 required positional argument: 'model'

slinky clear

root@ce762e064de4:/tmp# slinky clear
Traceback (most recent call last):
  File "/usr/local/bin/slinky", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/d1lod/cli.py", line 142, in clear
    old_size = client.store.count()
AttributeError: type object 'LocalStore' has no attribute 'count'

slinky count will throw the same

Redis<ConnectionPool<Connection<host=redis,port=6379,db=0>>>
Traceback (most recent call last):
  File "/usr/local/bin/slinky", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/d1lod/cli.py", line 159, in count
    print(client.store.count())
AttributeError: type object 'LocalStore' has no attribute 'count'

slinky insertall (scroll right)

root@ce762e064de4:/tmp# slinky insertall
  0%|                                                                                                                                                                                                                 | 0/130 [00:00<?, ?it/s]Failed to insert doi:10.5063/F16971VP due to query() missing 1 required positional argument: 'query_text'
  1%|█▌                                                                                                                                                                                                       | 1/130 [00:00<00:54,  2.38it/s]Failed to insert doi:10.5063/F1XS5SP4 due to query() missing 1 required positional argument: 'query_text'
  2%|███                                                                                                                                                                                                      | 2/130 [00:00<00:54,  2.36it/s]Failed to insert doi:10.5063/F1T151XN due to query() missing 1 required positional argument: 'query_text'
  2%|████▋                                                                                                                                                                                                    | 3/130 [00:01<00:58,  2.15it/s]Failed to insert urn:uuid:230592c4-cf17-46e7-834b-26fd32731f56 due to query() missing 1 required positional argument: 'query_text'
  3%|██████▏                                                                                                                                                                                                  | 4/130 [00:01<00:47,  2.63it/s]Failed to insert doi:10.5063/F1R20ZMM due to query() missing 1 required positional argument: 'query_text'
  4%|███████▋                                                                                                                                                                                                 | 5/130 [00:02<01:03,  1.98it/s]Failed to insert doi:10.5063/F1BV7DWF due to query() missing 1 required positional argument: 'query_text'
  5%|█████████▎                                                                                                                                                                                               | 6/130 [00:02<00:59,  2.10it/s]Failed to insert doi:10.5063/F1736P5F due to query() missing 1 required positional argument: 'query_text'
  5%|██████████▊                                                                                                                                                                                              | 7/130 [00:03<01:04,  1.92it/s]Failed to insert doi:10.5063/F13B5XD8 due to query() missing 1 required positional argument: 'query_text'
  6%|████████████▎                                                                                                                                                                                            | 8/130 [00:03<01:04,  1.90it/s]Failed to insert doi:10.5063/F1ZK5DXZ due to query() missing 1 required positional argument: 'query_text'

amoeba · 2022-01-04T04:01:00Z

Thanks for the continued work on this @ThomasThelen.

For commands like clear, count, insert, and insertall, using a the LocalStore doesn't really make sense to me since LocalStore is ephemeral (i.e., thrown away when the command exits). Their intended target is going to be some persistent triple store. On the other hand, get only makes sense with a LocalStore because it's generating triples on the fly.

If the above commands all produce stack traces without --local false, I think we're making the CLI harder to use. Do you think the flag is necessary or could the code just follow the logic in my first paragraph here? Is the explicitness of the flag better? If you do think a flag is necessary, maybe it could be --remote instead of --local false since I think the former pattern is more common for boolean flags.

PS: I made one PR comment that might be a culprit. Does that change any of the above output?

ThomasThelen · 2022-01-18T21:20:07Z

Alright with 57dd95e, using get will always use the local store. All other commands no longer have the option for using LocalStore. If there's still issues, maybe a complete documentation of what each command is and how it's supposed to be used will help.

Tested using docker-compose and deployed to dev.

Ref #54

amoeba · 2022-02-24T22:24:04Z

Thanks @ThomasThelen. I tested this out and I had two make two changes, d6b8067 and 7033442 to make the tests pass locally. They look minor and it doesn't look like they'll affect anything but local development.

ThomasThelen added 5 commits November 17, 2021 18:00

Make the worker and scheduler wait for redis and virtuoso before star…

6da693c

…ting

Add more deployment options to the makefile

66f1f3f

Use a configMap for storing the networking environmental variables

ea6feb4

Remove the cli, development, and production environemnts and use env …

3ff13e3

…vars in their place

Generalize the graph database endpoint so that others like blazegraph…

bfb607b

… can use it

ThomasThelen changed the title ~~[WIP] Feature deployment ordering~~ Feature deployment ordering & Environmental Variables Nov 18, 2021

ThomasThelen requested a review from amoeba December 1, 2021 23:54

amoeba suggested changes Dec 7, 2021

View reviewed changes

ThomasThelen added 2 commits December 8, 2021 13:10

Remove initContainer

aef613b

Add missing unit test changes

dd21017

Fix import

666e0e0

ThomasThelen added 5 commits December 8, 2021 13:44

Remove virtuoso env var from the dockerfile

65b3f33

Use the REDIS_HOST env var for running the scheduler

c921777

Remove --debug flags

d53fafa

Update the readme with dockerized testing instructions

0be270f

Add BLAZEGRAPH_ environmental variables for unit testing

ebf2c1f

ThomasThelen mentioned this pull request Dec 9, 2021

Support Different Graphs in Production #58

Open

Remove the legacy 'Graph' class

e3a8e62

ThomasThelen added 3 commits December 16, 2021 10:54

Change the default Redis location to localhost

e6a62be

Remove unused reference to the Graph class in tests

def0f9c

Add a flag to the cli arguments to enable using LocalStore

3d82991

Always use localstore for 'get'

57dd95e

amoeba added 2 commits February 24, 2022 13:21

Force CLI's get method to use local RDF store

d6b8067

Ref #54

Remove http:// from REDIS_HOST fallback value

7033442

amoeba merged commit 243a823 into develop Feb 24, 2022

This was referenced Feb 24, 2022

Configure networking via env vars #51

Closed

Deployment ordering #53

Closed

ThomasThelen mentioned this pull request Apr 28, 2022

Remove legacy code #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature deployment ordering & Environmental Variables #54

Feature deployment ordering & Environmental Variables #54

ThomasThelen commented Nov 18, 2021 •

edited

Loading

amoeba commented Dec 2, 2021

amoeba left a comment

amoeba Dec 7, 2021

ThomasThelen Dec 9, 2021

amoeba Dec 7, 2021

amoeba Dec 7, 2021

ThomasThelen Dec 8, 2021

amoeba Dec 8, 2021

ThomasThelen Dec 8, 2021

ThomasThelen commented Dec 8, 2021

ThomasThelen commented Dec 8, 2021

amoeba commented Dec 8, 2021

ThomasThelen commented Dec 8, 2021

amoeba commented Dec 8, 2021

ThomasThelen commented Dec 8, 2021

ThomasThelen commented Dec 9, 2021

ThomasThelen commented Dec 15, 2021

amoeba commented Dec 15, 2021

ThomasThelen commented Dec 15, 2021

amoeba commented Dec 15, 2021

ThomasThelen commented Dec 15, 2021

ThomasThelen commented Dec 26, 2021 •

edited

Loading

amoeba commented Jan 4, 2022

ThomasThelen commented Jan 18, 2022 •

edited

Loading

amoeba commented Feb 24, 2022

Feature deployment ordering & Environmental Variables #54

Feature deployment ordering & Environmental Variables #54

Conversation

ThomasThelen commented Nov 18, 2021 • edited Loading

Deployment Startup Order

Environmental Variables

Production/Development/CLI Environments

Changes to Testing

Testing

Scheduler, Redis, & Worker Tests

Virtuoso Startup Tests

Remaining Testing

amoeba commented Dec 2, 2021

amoeba left a comment

Choose a reason for hiding this comment

amoeba Dec 7, 2021

Choose a reason for hiding this comment

ThomasThelen Dec 9, 2021

Choose a reason for hiding this comment

amoeba Dec 7, 2021

Choose a reason for hiding this comment

amoeba Dec 7, 2021

Choose a reason for hiding this comment

ThomasThelen Dec 8, 2021

Choose a reason for hiding this comment

amoeba Dec 8, 2021

Choose a reason for hiding this comment

ThomasThelen Dec 8, 2021

Choose a reason for hiding this comment

ThomasThelen commented Dec 8, 2021

ThomasThelen commented Dec 8, 2021

amoeba commented Dec 8, 2021

ThomasThelen commented Dec 8, 2021

amoeba commented Dec 8, 2021

ThomasThelen commented Dec 8, 2021

ThomasThelen commented Dec 9, 2021

ThomasThelen commented Dec 15, 2021

amoeba commented Dec 15, 2021

ThomasThelen commented Dec 15, 2021

amoeba commented Dec 15, 2021

ThomasThelen commented Dec 15, 2021

ThomasThelen commented Dec 26, 2021 • edited Loading

amoeba commented Jan 4, 2022

ThomasThelen commented Jan 18, 2022 • edited Loading

amoeba commented Feb 24, 2022

ThomasThelen commented Nov 18, 2021 •

edited

Loading

ThomasThelen commented Dec 26, 2021 •

edited

Loading

ThomasThelen commented Jan 18, 2022 •

edited

Loading