Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pub.dev is down (global outage) #4663

Closed
timsneath opened this issue Mar 25, 2021 · 92 comments
Closed

pub.dev is down (global outage) #4663

timsneath opened this issue Mar 25, 2021 · 92 comments

Comments

@timsneath
Copy link
Contributor

timsneath commented Mar 25, 2021

image

Overview

At approximately 7pm, the pub infrastructure began to respond with a HTTP 502 Site Error message. The site infra is deployed to the us-central GCP region. This impacts core Flutter services, specifically:

  • The pub.dev website itself, which presents the error above;
  • Calls made by the flutter or pub commands (e.g. flutter upgrade or pub upgrade).

Please do not reply to this bug with "me too" or +1 messages, it makes it harder for folk to track. Thanks!

Updates

7:10pm Pacific
We're currently experiencing an outage on pub.dev, which appears to be related to a load balancer issue. We don't have an ETA for a resolution at this time; we're currently working to understand the issue.

7:59pm Pacific
The pub.dev site is still down. We have a Google on-call engineer currently investigating. We have not yet identified a root cause.

8:15pm Pacific
We apologize for the inconvenience. We're seeing load balancer errors and are escalating to the appropriate team. Still no ETA, unfortunately, since we're still haven't determined the root cause.

8:27pm Pacific
We have multiple Google Cloud engineers on-call investigating, but I'm sorry to report that we still don't have a root cause. We'll continue to post updates regularly. Thank you for your patience.

9:00pm Pacific
We are continuing to debug the problem. We have declared a Google escalated outage while we attempt to identify the root cause. Some folk have been successful using the Chinese mirror site at https://pub.flutter-io.cn.

9:20pm Pacific
Again, apologies.

9:35pm Pacific
We are currently exploring the theory that we have exceeded a quota, but that the error didn't show in the log. Paging an oncall team to try and increase the quota to see if this resolves. Again, this really sucks -- we recognize that it's a major inconvenience to you all, and we're feeling sick that we're down. Thank you for being patient with us :(

9:45pm Pacific
We have updated the quota and are resetting the VM instances, to see if we have successfully identified the root cause.

9:51pm Pacific
We are seeing evidence of partially restored service.

9:55pm Pacific
The pub service appears to be fully restored.

10:15pm Pacific
Here's what we think we know at this point in time. At some point within the last day or two, a change was made to the pub.dev landing page that includes a call to the YouTube API. There is a quota limit for YouTube calls that we didn't hit over the last few days, but today we hit it. Confounding the issue, the code was missing exception handling and the logging was inadequate or obfuscated sufficiently that we were unable to immediately spot the problem. The immediate resolution was to raise the quota temporarily to give us time to revert the original change.

At this time we think the issue is resolved, but we'll obviously be monitoring closely. Again, apologies on behalf of the Flutter & Dart teams for the disruption. We take this very seriously, and we will perform a full post-mortem and share the learnings and actions we'll take as a result of this.

@timsneath
Copy link
Contributor Author

@timsneath
Copy link
Contributor Author

timsneath commented Mar 25, 2021

No ETA to fix yet. We're figuring out what's gone wrong; we'll update this bug with status as we learn more.

@timsneath timsneath pinned this issue Mar 25, 2021
@Levi-Lesches

This comment has been minimized.

@yuebenshan

This comment has been minimized.

@GiteshDalal
Copy link

GiteshDalal commented Mar 25, 2021

Is there any back up website that we can point to while pub.dev is down?
This is a big blocker for business.

@duongtruong12

This comment has been minimized.

@boyan01
Copy link

boyan01 commented Mar 25, 2021

Is there any back up website that we can point to while pub.dev is down?
This is a big blocker for business.

If you can access pub.flutter-io.cn, then you can try to use this URL as a temporary solution.

export PUB_HOSTED_URL=https://pub.flutter-io.cn

ref: https://flutter.dev/community/china

Did not work.

@timsneath
Copy link
Contributor Author

timsneath commented Mar 25, 2021

Updates will be posted at the top of this bug. To make it easier to follow along, please don't post "me too" or +1 comments.

@xuning0

This comment has been minimized.

@timsneath timsneath changed the title pub.dev appears to be down pub.dev is down Mar 25, 2021
@eseidelGoogle

This comment has been minimized.

@GiteshDalal

This comment has been minimized.

@LeoAiolia
Copy link

Kpi is gone this month

@BytesZero
Copy link

BytesZero commented Mar 25, 2021

I provide the following solutions, If you can't pack it, please try as follows.
If you have successfully run the project before, please use the second plan. The newly acquired package is not applicable. You need to wait for the website to be completely restored.

  • pub.dev
 export PUB_HOSTED_URL=~/.pub-cache/hosted/pub.dev
  • pub.flutter-io.cn (中国地区请使用下面的命令)
 export PUB_HOSTED_URL=~/.pub-cache/hosted/pub.flutter-io.cn
  • See effect
    image
Running "flutter pub get" in example...                             0.7s

Please see below for new solutions #4663 (comment)

@paurakhsharma
Copy link

Is this the reason for me being stuck at this? This happened after I downgraded from 2.0.1 to 1.22.6. I even reinstalled flutter but still stuck at it.

image

@eseidelGoogle eseidelGoogle changed the title pub.dev is down pub.dev is down (global outage) Mar 25, 2021
@Rquinz
Copy link

Rquinz commented Mar 25, 2021

Is this the reason for me being stuck at this? This happened after I downgraded from 2.0.1 to 1.22.6. I even reinstalled flutter but still stuck at it.

image

Yeah, it is

@almpazel

This comment has been minimized.

@yubaokang

This comment has been minimized.

@Anoirwork

This comment has been minimized.

@YowFung

This comment has been minimized.

@mingyouzhu

This comment has been minimized.

@milewski

This comment has been minimized.

@Rquinz
Copy link

Rquinz commented Mar 25, 2021

Does aliyun has any pub.dev mirrors? or something we can use meanwhile?

Saw this above. You can try this.

If you can access pub.flutter-io.cn, then you can try to use this URL as a temporary solution.

export PUB_HOSTED_URL=https://pub.flutter-io.cn

ref: https://flutter.dev/community/china

doesnt work for me though. goodluck

@mingyouzhu
Copy link

the "pub.flutter-io.cn" is accessible

@azhon

This comment has been minimized.

@sudoaccess
Copy link

uuuuuuuppppppp

@ensaryusuf
Copy link

@timsneath What was the source of the problem?

@kxviel
Copy link

kxviel commented Mar 25, 2021

pub.dev seems to back, but pub get is aint working
UPDATE: pub get works now (india)

@hjleesm
Copy link

hjleesm commented Mar 25, 2021

👍

@Nanra
Copy link

Nanra commented Mar 25, 2021

pub.dev is now accessible from Indonesia. Great 👍🏻✨
Screen Shot 2021-03-25 at 12 07 46

@timsneath
Copy link
Contributor Author

Thank you all. I've posted a quick update at the top of this bug, but in summary services should now be resolved. We've identified the root cause and increased the quota as a short-term measure until we rollback the offending code.

@timnew
Copy link

timnew commented Mar 25, 2021

@timsneath is there any status page for the pub.dev. like https://status.cloud.google.com for GCP?
And will it be any public incident report for issue today?

@timsneath
Copy link
Contributor Author

Can't speak to the status page yet; we'll figure out the right mitigations during the post mortem. I'm not sure it would have helped us much: the issue page seemed fairly effective to communicate status. But interested to hear from others.

Yes, we'll share the post-mortem summary. It should make for fun reading :) We operate a blameless post-mortem policy at Google; it's all about learning lessons rather than finding scapegoats. Any failure is a system failure, and we try and learn how we can address the system causes.

@lookiestudio
Copy link

pub.dev is now accessible from Vietnam.

@vgsrivathsan
Copy link

sock error
pub get is showing socket error

@BytesZero
Copy link

袜子错误
pub get显示套接字错误

你被墙了

@vgsrivathsan
Copy link

袜子错误
pub get显示套接字错误

你被墙了

thanks!

@themisir
Copy link

Seriously, why didn't you cached YouTube calls for some period of time (eg: release cache a few times a day) in the beginning?
Just wondering, seriously.

@isoos
Copy link
Collaborator

isoos commented Mar 25, 2021

Seriously, why didn't you cached YouTube calls for some period of time (eg: release cache a few times a day) in the beginning?
Just wondering, seriously.

We do cache them, here is the related code with history:
https://github.com/dart-lang/pub-dev/blob/master/app/lib/service/youtube/backend.dart

However, once the fetched failed with the quota limit, the error propagated up in the chain - until the isolate was killed, restarted and with the restart we started to fetch it again. We will redesign/refactor this and similar background task so we can make sure such failures will not be propagated in the future.

@themisir
Copy link

Seriously, why didn't you cached YouTube calls for some period of time (eg: release cache a few times a day) in the beginning?
Just wondering, seriously.

We do cache them, here is the related code with history:
https://github.com/dart-lang/pub-dev/blob/master/app/lib/service/youtube/backend.dart

However, once the fetched failed with the quota limit, the error propagated up in the chain - until the isolate was killed, restarted and with the restart we started to fetch it again. We will redesign/refactor this and similar background task so we can make sure such failures will not be propagated in the future.

Oh I get it. :D Since the previous fetch is failed and the cache was empty the restarted isolate tried to fetch new data again, and it failed & crashed the isolate and gce restarted it then the loop continued... Interesting failure.

Thanks for letting us know!

@Levi-Lesches
Copy link

Levi-Lesches commented Mar 25, 2021

@timsneath + Google team
Want to reiterate that -- thanks for the quick response and clear communication. Maintenance is one of those things that goes unnoticed until it's a bad thing, but it makes us appreciate that pub.dev is otherwise 100% reliable and easy-to-use. ❤️

@jonasfj
Copy link
Member

jonasfj commented Apr 27, 2021

Postmortem is referenced here: https://github.com/flutter/flutter/wiki/Postmortems

@xi1570-krupeshanadkat
Copy link

It seems it is down again (region South Asia - India)

Screenshot 2021-11-29 at 10 35 44 AM

Attached chrome devtools > Network tab screenshots for reference.

My network seems fine, rest of the stuff is opening correctly.
Screenshot 2021-11-29 at 10 37 50 AM

@xi1570-krupeshanadkat
Copy link

xi1570-krupeshanadkat commented Nov 29, 2021

It seems it is down again (region South Asia - India)

Screenshot 2021-11-29 at 10 35 44 AM

Attached chrome devtools > Network tab screenshots for reference.

My network seems fine, rest of the stuff is opening correctly. Screenshot 2021-11-29 at 10 37 50 AM

Looks like it is working now!

Screenshot 2021-11-29 at 11 04 55 AM

@manglide
Copy link

It's still not working. Can't access pub.dev

@isoos
Copy link
Collaborator

isoos commented Apr 15, 2022

@manglide: please open a new issue next time, we don't monitor closed issues.

As pub.dev is working for me, it is possible that the problem is at your ISP's side. Please run this script and return back its output:
https://github.com/dart-lang/pub-dev/blob/master/app/bin/tools/check_domain_access.dart

@manglide
Copy link

Hi @isoos, thanks for your response. The issue is from my local dnsmasq configuration on mac. I have resolved it now and can access pub.dev.

Thanks once again.

@luis901101
Copy link

Hi, is any problem with this again right now?

@isoos
Copy link
Collaborator

isoos commented Apr 13, 2023

@luis901101 There is an outage right now, we are aware and trying to fix. Also: please don't comment on old threads.

@iamchathu
Copy link

Is there any public status page for pub.dev?

@sigurdm
Copy link
Contributor

sigurdm commented Dec 14, 2023

@iamchathu no we don't have such a page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests