Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Repository Testing #2731

Closed
fearphage opened this issue May 15, 2019 · 16 comments
Closed

Automated Repository Testing #2731

fearphage opened this issue May 15, 2019 · 16 comments
Assignees

Comments

@fearphage
Copy link

It's not uncommon for the (Debian?) repository to be broken (#2608).

Is there an automated (CI-able) way to confirm this is working so the owners don't have to find out from user-initiated bug reports?

@dagood
Copy link
Member

dagood commented May 15, 2019

As far as I know, this has only happened twice, and only on the Ubuntu 14.04 repository. I'm asking about this in my mail about the recent instance to the repository owners, though--other Microsoft products depend on these feeds so it shouldn't be on .NET Core to set up some sort of health check.

@dagood
Copy link
Member

dagood commented May 15, 2019

/cc @leecow

@fearphage
Copy link
Author

As far as I know, this has only happened twice, and only on the Ubuntu 14.04 repository.

That's one time too many in my opinion. That's the purpose of (regression) tests. When you find a bug, you write a test case for it and then at a minimum, you'll know the next time it becomes broken.

@dagood
Copy link
Member

dagood commented May 15, 2019

Agreed, just contrasting vs. "not uncommon".

@dagood
Copy link
Member

dagood commented Jul 29, 2019

The repo owners have set up some monitoring and alerting to catch this. My impression is that the system is too complex for pre-publish testing to actually catch these things, so this is probably the best we can hope for right now. (The owners automatically knowing there's a problem rather than us having to ping them.) There are continuing conversations about how to improve the service.

@dagood dagood closed this as completed Jul 29, 2019
@fearphage
Copy link
Author

fearphage commented Aug 21, 2019

The repo owners have set up some monitoring and alerting to catch this.

It may be ineffective since the issue is back yet again. It seems like running an Ubuntu docker image with sudo apt-get update after the deploy and ensuring a successful (0) exit code would suffice.

Note: This doesn't feel fixed to me, but I'm unable to reopen the issue.

@dagood
Copy link
Member

dagood commented Aug 21, 2019

I'll be interested to hear from them whether the monitoring caught this. I don't know what response time we expect them to have when it is caught automatically.

@dagood
Copy link
Member

dagood commented Aug 21, 2019

Rolling back to last known good on error would be ideal, of course. 😕

@herebebeasties
Copy link

My impression is that the system is too complex for pre-publish testing to actually catch these things

If your InRelease PGP signed file is missing, or has an earlier timestamp than your Release file does, it's clearly not going to work. I have a complete lack of context around this process so may be wrong, but that seems easy enough to test/monitor. 😕

@herebebeasties
Copy link

Can we get this re-opened? Or another ticket made. It's an improvement if you have monitoring to catch when this happens, but the actual root cause clearly needs addressing.

@dagood
Copy link
Member

dagood commented Sep 4, 2019

Does an open issue help you in some way? (Linking to it from somewhere, etc.) I'm not opposed to having one, but there's no work we (.NET Core) can do since we rely on a shared Microsoft resource to operate the repository properly for a variety of teams. I don't have any visibility into the underlying problems.

@herebebeasties
Copy link

herebebeasties commented Sep 4, 2019

Not if its not in the right place to get fixed, obviously. It's not uncommon for "master" tickets like this one seems to have become to be held open while the thing they are dependent on is fixed, especially if this is the public-facing view of it all.

It's a great shame that there's no external visibility (or internal, you say) on something that breaks a ton of stuff across the globe whenever it happens, probably costing (tens of?) thousands of man hours to people using the Microsoft stack each time. Especially when it's recurring and clearly not fixed yet. Can't you escalate this with the right team or something?

@dagood
Copy link
Member

dagood commented Sep 4, 2019

Can't you escalate this with the right team or something?

@leecow has plans for this, I'll let him decide the best course of action for this (or a new) issue.

@leecow
Copy link
Member

leecow commented Sep 4, 2019

We do escalate when issues like this are encountered and I am planning a meeting with them to cover this, and other areas of concern with respect to SLA and validation.

@fearphage
Copy link
Author

@leecow Any updates?

@leecow
Copy link
Member

leecow commented Sep 12, 2019

Meeting set for next Thurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants