Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New changes in Dockerfile break Dashboard on Staging #3263

Closed
RC-Lee opened this issue Mar 19, 2022 · 19 comments · Fixed by #3280
Closed

New changes in Dockerfile break Dashboard on Staging #3263

RC-Lee opened this issue Mar 19, 2022 · 19 comments · Fixed by #3280
Assignees
Labels
type: bug Something isn't working
Milestone

Comments

@RC-Lee
Copy link
Contributor

RC-Lee commented Mar 19, 2022

What happened:
Dashboard is broken on staging.
(Clicking on the "i" on the hero banner)
https://dev.api.telescope.cdot.systems/v1/status/

image

What should have happened:
Should show the dashboard

How to reproduce it (as precise as possible):
Go here: https://dev.api.telescope.cdot.systems/v1/status/

Anything else we need to know?:
The docker logs show a healthcheck fail from the Status image. (Sometimes it happens to Posts too)

{"level":30,"time":1647656844771,"pid":1,"hostname":"62b3e97e79c2","msg":"Server started on port 1111"}
{"level":50,"time":1647656863276,"pid":1,"hostname":"62b3e97e79c2","msg":"healthcheck failed"

I went through the commit history to traceback the commit that may have caused this.
It seems like it was commit e1b73fd19100a9d91a458ac538b2f349a6744340 that had this problem
The commit before this is stable, and the dashboard works.

Scanning the changes in the commit, it seems like this line here is terminating the process as an error when healthcheck fails.

HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
	CMD wget --no-verbose --tries=1 --spider localhost:${PORT}/healthcheck || exit 1

I tested locally changing exit 1 to exit 0 and the dashboard showed.
This is only a temporary solution, but I think we should investigate more on healthcheck.

@RC-Lee RC-Lee added the type: bug Something isn't working label Mar 19, 2022
@cindyledev
Copy link
Contributor

Would it make sense to rip out the HEALTHCHECKs temporarily until we can figure out to get them to work properly?

@RC-Lee RC-Lee changed the title New changes in Dockerfiles break Staging New changes in Dockerfiles breaks Dashboard on Staging Mar 19, 2022
@RC-Lee
Copy link
Contributor Author

RC-Lee commented Mar 19, 2022

Yes if it temporarily gets the dashboard working.
But we should continue to investigate.

@cindyledev
Copy link
Contributor

My guess is staging is not happy with the localhost from this line

CMD wget --no-verbose --tries=1 --spider localhost:${PORT}/healthcheck || exit 1

@RC-Lee
Copy link
Contributor Author

RC-Lee commented Mar 19, 2022

I'm just baffled cause it seems some other dockerfiles have this line as well but they don't seem to be affected.
Just Status and sometimes Posts

@manekenpix manekenpix changed the title New changes in Dockerfiles breaks Dashboard on Staging New changes in Dockerfile break Dashboard on Staging Mar 19, 2022
@humphd
Copy link
Contributor

humphd commented Mar 19, 2022

The problem here is that we're asking the HEALTHCHECK to look at ${PORT}, which will be 3000 on staging: https://github.com/Seneca-CDOT/telescope/blob/master/config/env.staging#L184. However, the status service uses STATUS_PORT=1111: https://github.com/Seneca-CDOT/telescope/blob/master/config/env.staging#L45.

We should be using STATUS_PORT vs. PORT in this file: https://github.com/Seneca-CDOT/telescope/blob/master/src/api/status/Dockerfile#L47

@Kevan-Y
Copy link
Contributor

Kevan-Y commented Mar 19, 2022

So PORT is overwriting our env PORT in dockerfile?

@manekenpix
Copy link
Member

manekenpix commented Mar 19, 2022

@humphd I'm checking all the dockerfiles, and they all use ${PORT}, but only status seems to be affected. Also, I'm surprised nobody saw this in local testing, because we're all testing every PR locally, right, RIGHT?

@Kevan-Y
Copy link
Contributor

Kevan-Y commented Mar 19, 2022

@manekenpix , should I fix all the other dockerfiles to use the right env variable in this issue?

@manekenpix
Copy link
Member

manekenpix commented Mar 19, 2022

@Kevan-Y I don't even know if that's the issue. All the other dockerfiles use the same var, but none of them are affected (maybe posts, but I'd have to confirm).

@Kevan-Y
Copy link
Contributor

Kevan-Y commented Mar 19, 2022

@manekenpix I don't think it's an issue of PORT maybe, on portainer I can see that the env PORT is correctly set to 1111

@manekenpix
Copy link
Member

@Kevan-Y thanks for checking 👍

@humphd
Copy link
Contributor

humphd commented Mar 19, 2022

Regardless, PORT is wrong. All of the Dockerfiles use their own special {service-name}_PORT syntax, so this should get fixed.

@Kevan-Y
Copy link
Contributor

Kevan-Y commented Mar 19, 2022

Test scenario (each scenario doesn't contain any change from previous scenario):

  1. removing || exit 1 in Dockerfile

  2. Changing || exit 1 to || exit 0 in Dockerfile

  3. Comment HEALTHCHECK in Dockerfile

  4. Comment healthCheck in server.js

I think in getPackage() function we need GITHUB_ACCESS_TOKEN env to be defined but in our case, it's not. So it fails in there for our healthcheck route.
Since the healthcheck route is failing and with the new Dockerfile change recently HEALTHCHECK is exit 1, so our status is not accessible and shows unhealthy in portainer.

const { version } = await getPackage('https://github.com/Seneca-CDOT/telescope/blob/master/');

@humphd
Copy link
Contributor

humphd commented Mar 19, 2022

Anything that gets awaited, needs a try/catch or we risk crashing (and here we are).

However, I think this getPackage call is not needed. Let's pass the value in as a build ARG to the Dockerfile.

In a GitHub Actions Workflow, you can get the following:

  1. name of branch or tag: ${{ github.ref_name }}
  2. git sha: ${{ github.sha }}

We can stamp these values into the status or even nginx images when we build them, and staging/production can simply read them when they run vs. needing to query anything.

I already do this for all the URLs we pass to next.js when we build the front-end, see https://github.com/Seneca-CDOT/telescope/blob/master/.github/workflows/docker-build-and-push.yml. We could add some more build-args to the appropriate container, and pass these values through. Then in the Dockerfile, we accept the ARG and set an ENV value so it will exist at runtime.

@Kevan-Y
Copy link
Contributor

Kevan-Y commented Mar 20, 2022

I was looking at the previous commit. But don't remember why we return version, sha, gitHubUrl.

@humphd
Copy link
Contributor

humphd commented Mar 20, 2022

It's so that the front-end could get it to display on the front landing page. But we don't need to do that in the status service, we can build that info into the nginx front-end container statically.

@manekenpix
Copy link
Member

In a GitHub Actions Workflow, you can get the following:

name of branch or tag: ${{ github.ref_name }}
git sha: ${{ github.sha }}

We can stamp these values into the status or even nginx images when we build them, and staging/production can simply read them when they run vs. needing to query anything.

If we do this, we'll have undefined in development and we won't be able to display telescope's version and commit, are we ok with that?

@humphd
Copy link
Contributor

humphd commented Mar 20, 2022

I don't think it matters as much in development.

@Kevan-Y
Copy link
Contributor

Kevan-Y commented Mar 21, 2022

It's so that the front-end could get it to display on the front landing page. But we don't need to do that in the status service, we can build that info into the nginx front-end container statically.

I checked the code, in our front-end we don't call the status healthcheck to get that info.
https://github.com/Seneca-CDOT/telescope/blob/master/src/web/src/components/Banner.tsx#L145
https://github.com/Seneca-CDOT/telescope/blob/master/src/backend/web/routes/health.js

In development, when we run pnpm start for starting our backend, the front end will be able to get the version displayed.

@Kevan-Y Kevan-Y self-assigned this Mar 21, 2022
@cindyledev cindyledev added this to the 2.9 Release milestone Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants