/healthcheck endpoint should check for Elasticsearch availability #487

aldenstpage · 2020-05-06T18:45:48Z

During deployments, our load balancer repeatedly polls the /healthcheck endpoint to check that the server is reachable. If this check succeeds, the newly deployed instance starts receiving production traffic. Right now, if Elasticsearch is not responsive, /healthcheck will still return 200 OK.

The healthcheck endpoint should check the health of the image index in Elasticsearch using the cluster health API. If it is unavailable, return error 500. Log an informative message explaining why the healthcheck failed.

Because the healthcheck endpoint may be called many times, and Elasticsearch calls are not free, we should cache the response of Elasticsearch for up to 10 seconds per call.

The text was updated successfully, but these errors were encountered:

madewithkode · 2020-05-08T13:47:52Z

Hi Alden, this looks interesting, I'd love to work on it.

madewithkode · 2020-05-08T14:54:24Z

Hi Alden in order to check the health of the image index in the /healthcheck view, I'm trying to use the urllib's urlopen() method to make a request to Elasticsearch's cluster API this way:

cluster_response = urlopen('http://0.0.0.0:8000/_cluster/health/image')

However, I keep getting a 404. Is there something I'm doing wrong?

madewithkode · 2020-05-08T16:44:25Z

Hi Alden in order to check the health of the image index in the /healthcheck view, I'm trying to use the urllib's urlopen() method to make a request to Elasticsearch's cluster API this way:

cluster_response = urlopen('http://0.0.0.0:8000/_cluster/health/image')

However, I keep getting a 404. Is there something I'm doing wrong?

Figured this, didn't know elastic search was running on a seperate host/port :)

aldenstpage · 2020-05-08T20:06:34Z

That's great!

It would be best to use the equivalent elasticsearch-py or elasticsearch-dsl query instead of making direct calls to the REST API (you can get an instance of the connection to Elasticsearch from search_controller.py). Here's an example for getting the cluster health; there ought to also be a way to narrow the query to the image index.

madewithkode · 2020-05-09T08:48:02Z

Alright...would look at the suggestion.

…

On Fri, May 8, 2020, 21:06 Alden S Page ***@***.***> wrote: It would be best to use the equivalent elasticsearch-py query instead of making direct calls to the REST API. Here's <https://discuss.elastic.co/t/how-to-get-cluster-health-using-python-api/25431> an example for getting the cluster health; there ought to also be a way to narrow the query to the image index. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#487 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGFLMYA5WLAQPO5GYZNX5BTRQRQ5RANCNFSM4M2V5EKA> .

madewithkode · 2020-05-09T19:22:18Z

Update:

I've successfully managed to query the health of the entire cluster, using the Elasticsearch connection instance gotten from search_controller.py. However when i try to limit the health check to just the image index, the request never resolves and continues to run forever with no response. And when i try to specify a timeout for the request, i get an "Illegal argument exception" even though timeout is a valid kwarg referenced in the API docs.

It'd be nice to point out that as at the time of writing, I'm yet to successfully run ./load_sample_data.sh so i don't know if this could be linked to the above problem.

madewithkode · 2020-05-11T14:46:29Z

Hi Alden, Progress Report :)

Successfully got the load_sample_data.sh to run, and so far every other thing is working fine.
I've also set up the 10s response caching on the /healthcheck view using redis and also the error logging.

However, I figured out the reason for the unresponsiveness when querying the elastic search image index was that it was non-existent and that the whole cluster index was empty too.

Do I need to do a manual population or something?

aldenstpage · 2020-05-11T19:49:11Z

Hi again Onyenanu – if the index doesn't exist, the healthcheck should fail. This could happen in situations where we are switching Elasticsearch clusters in production and forgot to index data into the new one (or something went wrong while we were loading data into the new cluster).

In my experience, the ES Python libs can behave in unexpected ways that you sometimes have to work around. Since it seems like querying specifically for the image index health hangs when the index doesn't exist, perhaps you could query for healthchecks of every index in the cluster, and fail the healthcheck if image is not among them and green?

It sounds like it's coming along nicely!

madewithkode · 2020-05-11T21:25:13Z

Hi again Onyenanu – if the index doesn't exist, the healthcheck should fail. This could happen in situations where we are switching Elasticsearch clusters in production and forgot to index data into the new one (or something went wrong while we were loading data into the new cluster).

In my experience, the ES Python libs can behave in unexpected ways that you sometimes have to work around. Since it seems like querying specifically for the image index health hangs when the index doesn't exist, perhaps you could query for healthchecks of every index in the cluster, and fail the healthcheck if image is not among them and green?

It sounds like it's coming along nicely!

Hey Alden...Many thanks again for coming through with better insights. Suggestion sounds nice, would proceed with it.

And yes, the whole stuff is getting more interesting, learnt a handful in the few days :)

…t in jq)

aldenstpage added the enhancement label May 6, 2020

This comment has been minimized.

Sign in to view

madewithkode added a commit to madewithkode/cccatalog-api that referenced this issue May 12, 2020

Fixes cc-archive#487(image index availability check)

e4a9838

madewithkode mentioned this issue May 12, 2020

Fixes #487(image index availability check) #492

Closed

2 tasks

madewithkode added a commit to madewithkode/cccatalog-api that referenced this issue May 12, 2020

Fixes cc-archive#487(image index availability check)

dba4eca

madewithkode added a commit to madewithkode/cccatalog-api that referenced this issue May 14, 2020

Fixes cc-archive#487(reflected changes mentioned in PR review)

ee92b11

kgodey added this to Pending Review in Backlog May 15, 2020

madewithkode added a commit to madewithkode/cccatalog-api that referenced this issue May 15, 2020

Fixes cc-archive#487(removed extra jq install cmds to use travis buil…

ed4c0d1

…t in jq)

kgodey moved this from Pending Review to Q2 2020 in Backlog May 21, 2020

annatuma removed this from Q2 2020 in Backlog Jun 12, 2020

annatuma added this to Ready for Development in Active Sprint via automation Jun 12, 2020

annatuma moved this from Ready for Development to In Progress (Community) in Active Sprint Jun 12, 2020

kgodey assigned aldenstpage Jul 24, 2020

kgodey added ✨ goal: improvement Improvement to an existing feature and removed enhancement labels Sep 24, 2020

cc-open-source-bot added the 🏷 status: label work required Needs proper labelling before it can be worked on label Dec 2, 2020

kgodey added the 🙅 status: discontinued Not suitable for work as repo is in maintenance label Dec 16, 2020

kgodey closed this as completed Dec 16, 2020

Active Sprint automation moved this from In Progress (Community) to Done Dec 16, 2020

obulat mentioned this issue Apr 21, 2021

/healthcheck endpoint should check for Elasticsearch availability (original #487) WordPress/openverse-api#14

Closed

TimidRobot removed this from Done in Active Sprint Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/healthcheck endpoint should check for Elasticsearch availability #487

/healthcheck endpoint should check for Elasticsearch availability #487

aldenstpage commented May 6, 2020 •

edited

madewithkode commented May 8, 2020

madewithkode commented May 8, 2020

madewithkode commented May 8, 2020

aldenstpage commented May 8, 2020 •

edited

madewithkode commented May 9, 2020 via email

This comment has been minimized.

madewithkode commented May 9, 2020

madewithkode commented May 11, 2020

aldenstpage commented May 11, 2020 •

edited

madewithkode commented May 11, 2020

/healthcheck endpoint should check for Elasticsearch availability #487

/healthcheck endpoint should check for Elasticsearch availability #487

Comments

aldenstpage commented May 6, 2020 • edited

madewithkode commented May 8, 2020

madewithkode commented May 8, 2020

madewithkode commented May 8, 2020

aldenstpage commented May 8, 2020 • edited

madewithkode commented May 9, 2020 via email

This comment has been minimized.

madewithkode commented May 9, 2020

Update:

madewithkode commented May 11, 2020

aldenstpage commented May 11, 2020 • edited

madewithkode commented May 11, 2020

aldenstpage commented May 6, 2020 •

edited

aldenstpage commented May 8, 2020 •

edited

aldenstpage commented May 11, 2020 •

edited