Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anchor inside navigation #96

Open
jkfran opened this issue Apr 7, 2021 · 8 comments
Open

Anchor inside navigation #96

jkfran opened this issue Apr 7, 2021 · 8 comments

Comments

@jkfran
Copy link
Contributor

jkfran commented Apr 7, 2021

When the navigation title includes an anchor the Navigation is missing. Our discourses don't include the anchor automatically but it could happen like on https://discuss.kubernetes.io/t/introduction-to-microk8s/11243:

image

The issue was solved by changing ## Navigation to <h2>Navigation</h2>

@evilnick
Copy link

evilnick commented Apr 7, 2021

@jkfran thanks for working out the cause. I imagine some regex update would fix it ;)

@jpmartinspt
Copy link
Contributor

Just to add some more info, white spaces inside the h2 broke the parsing.
Ex: <h2> URLs</h2> will result in url_map failing to be parsed.

@evilnick
Copy link

evilnick commented Apr 8, 2021

I assume we are going to make some changes to make handling these things more robust? I have minimal control over the discourse we use for MicroK8s, so I'm not even notified if any aspect of it changes. If this had happened today, for example, it would have been a major problem.

@anthonydillon
Copy link
Contributor

Understood. This is triaged as a high priority and so should be scheduled to be remedied soon.

@ktsakalozos
Copy link
Member

Thank you web-team and @nick Veitch for addressing this issue so quickly especially since it was outside your working hours.

What guards will we have in place so we do not need to face such an unpleasant incident again? Some use cases we have to consider is what would happen if https://discuss.kubernetes.io goes down or is inaccessible for a significant amount of time? Do we keep backups of our docs? How can we use them? Do we have enough alerting to quickly react to such incidents?

This is probably not the right place to have this discussion. Please, let me know how we can move forward and possibly address some of the above. As i am not familiar with the processes already in place, it is very possible that we already address most of the issues I am concerned with. Thank you again.

@nottrobin
Copy link
Contributor

nottrobin commented Apr 9, 2021

Hi @ktsakalozos

What guards will we have in place so we do not need to face such an unpleasant incident again? Some use cases we have to consider is what would happen if https://discuss.kubernetes.io goes down or is inaccessible for a significant amount of time?

If the back-end that contains the content is down, currently the cached content should be shown for about 5 minutes, after which we will start displaying an error page. I don't know what this error message currently looks like, we could certainly look at improving the messaging to inform the user as best we can as to what happened. I'm not sure this behaviour should be fundamentally changed though, since the canonical source of the content is the kubernetes discourse. This is where it is edited etc. - it is a fundamental part of the system. If we feel relying on an external discuss.kubernetes.io in this way is too risky, we should simply move the content somewhere else, but that discussion should probably involve Mark.

Do we keep backups of our docs? How can we use them?

This is a very good point, thanks for bringing it up. I don't know what backups happen of the Discourse platforms run by IS, but certainly we don't back-up the content in discuss.kubernetes.io. I've filed #98, and we'll discuss it soon.

Do we have enough alerting to quickly react to such incidents?

I'm not sure. We have some alerting, and we have been actively working on this. @tbille was setting up some alerting that would help, I'm not sure where that work got to. @tbille do you know if there's an issue about this anywhere? If not, could you create one?

@ktsakalozos
Copy link
Member

Hi @nottrobin, thank you for the reply.

I'm not sure this behaviour should be fundamentally changed though, since the canonical source of the content is the kubernetes discourse. This is where it is edited etc. - it is a fundamental part of the system. If we feel relying on an external discuss.kubernetes.io in this way is too risky, we should simply move the content somewhere else, but that discussion should probably involve Mark.

The content should be served from the kubernetes discourse since this is where it is edited. We agree on this here. However, IMHO when there is an outage we should have a plan to mitigate it. What if the core personnel is sleeping, missing, are on holidays? What if discuss.kubernetes.io goes down for a weekend or does a change that we are not able to adjust to in a short period of time? It also feels wrong to have the engineers work under the pressure of the site being down. I would much prefer if we could flip a switch and serve the content from a backup while we wait for Nick and the web-team to wake up and work on the issue without stress during their working hours. Again, this is only my opinion.

@evilnick
Copy link

evilnick commented Apr 9, 2021

Hi @nottrobin, welcome back! Thanks for looking at this. My brief thoughts on this:

I'm not sure this behaviour should be fundamentally changed though, since the canonical source of the content is the kubernetes discourse. This is where it is edited etc. - it is a fundamental part of the system.

I know there are very good reasons for keeping the content there, so nobody is talking about moving it. However, if the front-end code can't see or make sense of the discourse, there is a possibility we can't either. It makes sense to me to continue serving the last 'single source of truth' as we know it until everything is working again. I would particularly like to see a solution where:

  • the front end notified us/someone if the discourse was 'unhealthy'
  • continued serving the cache or a backup indefinitely until the problem was resolved.

As for what is unhealthy. I think failing to load the navigation or url mapping would qualify and I think this would be useful for all of the discourse-based docs, not just specifically the microK8s ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants