Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag Day 2020: The date #139

Closed
oerdnj opened this issue Nov 17, 2019 · 32 comments
Closed

Flag Day 2020: The date #139

oerdnj opened this issue Nov 17, 2019 · 32 comments

Comments

@oerdnj
Copy link
Member

@oerdnj oerdnj commented Nov 17, 2019

This issue serves as a public, open to all, discussion forum for what the date should be for DNS Flag Day 2020.

(I will make a summary of the discussion below...)

@oerdnj
Copy link
Member Author

@oerdnj oerdnj commented Nov 17, 2019

I proposed 31. October 2020 during the DNS-OARC in Austin and nobody objected. Therefore, I propose the DNS Flag Day 2020 should be 31. October 2020.

@vixie
Copy link
Contributor

@vixie vixie commented Nov 17, 2019

i do not and may never love calling these "flag days" which they manifestly are not, and the cost of the resulting confusion will not be zero. however, as to the date itself, i think it's exactly arbitrary enough.

@mnordhoff
Copy link

@mnordhoff mnordhoff commented Nov 17, 2019

I'm loath to sound this reasonable, but I have concerns about 31 October 2020. That's near e-commerce companies' holiday shopping freeze period, the 51 weeks of the year when they get especially whiny and intransigent when people ask them to fix bugs.

Since buggy load balancers popular in industries like that are often the cause of DNS pain, it might be better to schedule it a few months earlier, when they might be more solicitous? (I would hate to schedule it a few months later.)

@vttale
Copy link

@vttale vttale commented Nov 17, 2019

In addition to mnordhoff's entirely reasonable concern about peak e-commerce season, I have to wonder something in the opposite direction ... if we expect this to be largely inconsequential to actual operations, why delay it for a year? I'm in favour of doing it much sooner, like say in the spring. To just throw out a number for the sake of something specific: April 1.

@vdukhovni
Copy link

@vdukhovni vdukhovni commented Nov 17, 2019

If a few weeks in October makes enough of a difference to make e-commerce sites less worried, then one of the biggest "flag days" in history was the introduction of the Gregorian calendar on 4 Oct 1582 (followed by 15 Oct 1582), the DNS "flag day" could pay homage to that date.

Or if we wanted to move it back (earlier), Britain adopted the new calendar on 2 Sep 1752 (followed by 14 Sep 1752). The Sep 2nd is a Wednesday in 2020, just as it was 1752.

@vdukhovni
Copy link

@vdukhovni vdukhovni commented Nov 17, 2019

Another historical flag day is the introduction of the metric system in France on 30 Mar 1791. That fits with the proposals to schedule it early in the year (Northern Hemisphere spring).

@vttale
Copy link

@vttale vttale commented Nov 17, 2019

With Viktor's comment I hearby amend my April 1st suggestion to March 31. That's 03/037/03744 in octal which must mean something somehow too.

@franklouwers
Copy link

@franklouwers franklouwers commented Nov 17, 2019

Why not Feb 1st, as that historical day marked the first DNS Flag Day?

Unless there are very good reasons (eg: some big noncompliant vendor told us they could make October, but not February), why wait almost another year?

@oerdnj
Copy link
Member Author

@oerdnj oerdnj commented Nov 17, 2019

the 51 weeks of the year
Did you mean days? ;)

We’ve been told that business would like to have a period closer to 1 year rather than couple of months. What about 1. October 2020 then?

@vttale
Copy link

@vttale vttale commented Nov 18, 2019

We’ve been told that business would like to have a period closer to 1 year rather than couple of months. What about 1. October 2020 then?

I guess I'd like to hear more about what businesses and why. Maybe they could actually participated in this thread. Given that y'all have already been telegraphing the next "flag day" since at least -- what, late spring? -- then spring 2020 is a year too.

@wtoorop
Copy link

@wtoorop wtoorop commented Nov 18, 2019

Last time results of impact studies were not available to operators before flag day. I think it would be nice if impact study results would be available before the new flag day this time. Realistically this means results will be available in spring. So, allowing operators time to process- and respond to- those results, the flag day should IMHO be in fall.

@letoams
Copy link

@letoams letoams commented Nov 18, 2019

The day picked is arbitrary because there isn't actually a flag day because of the delays between upstream implementers and downstream vendors.

If you would write a BCP document for DNS implementers, then that RFC document's publication date would be what you need, AND you wouldn't need to confuse people with "flag date" as a term or confuse people about who is "DNS violations". And this information would remain available even after "dns-violations" website and github repository have perished.

@pspacek
Copy link
Contributor

@pspacek pspacek commented Nov 18, 2019

Let me point out that this actually is flag day for 1/4 of Internet user population is behind cloud-resolvers. We know that cloud-resolvers are able to roll out changes in short periods of time (as opposed to slow SW upgrades elsewhere) so the date actually matters.

For that reason we need to coordinate with cloud-resolver operators, let's see what they can tell us.

@letoams
Copy link

@letoams letoams commented Nov 18, 2019

@vcunat
Copy link
Contributor

@vcunat vcunat commented Nov 20, 2019

I believe the main point of coordinated date is to lower the first mover's disadvantage like

My site is broken with your resolver but works with (almost) everyone else's, so it's "obviously" your fault.

Last movers might still have some shorter-term advantage, but I don't think that really matters, as we should now have enough critical mass to force fixing TCP in basically all cases that care to keep running (if I simplify it). Having a doomsdate also helps with marketing, i.e. making them fix it in advance and dismissing complaints that they couldn't have known this would come.

Either you will break 1/4 of the Internet and then you shouldn’t or you don’t break 1/4 of the Internet and then it is not a flag day?

Depends on your point of view. If you're a user, almost nothing should break. I hear that many million users already are behind a post-flag resolver config. If you're a badly setup service, you will experience problems from a large fraction of internet... but if you get warned in advance, are given testing tools, etc.

@vcunat
Copy link
Contributor

@vcunat vcunat commented Jan 10, 2020

So... are we waiting for something in particular? I've heard no real objections against the beginning of October so far. If some claim they need to know long in advance, we shouldn't take too long to set the date.

@oerdnj
Copy link
Member Author

@oerdnj oerdnj commented Feb 3, 2020

  1. October 2020 seems like a good date then. Let's settle on that.

@Habbie @bjovereinder @wtoorop @pspacek @ralphdolmans ok with that?

@amsowellman
Copy link

@amsowellman amsowellman commented Mar 19, 2020

Greetings! I wanted to check in regarding the recent world events. Since the recent COVID-19 pandemic has led to global unrest, should the next DNS Flag Day (i.e. DNS Flag Day 2020) be moved out in time to accommodate? Enforcing stricter rules during a period of unrest could result in more pain and counter productivity.

@jelu
Copy link
Contributor

@jelu jelu commented Mar 20, 2020

@amsowellman

Enforcing stricter rules...

I don't think you understand what we are trying to do... it's about removing workarounds and making things better by, some times, actually following the rules.

@puneetsood
Copy link

@puneetsood puneetsood commented Jul 9, 2020

Hi Ondrej,

Is October 2020 official now? And is it the beginning or end of Oct?

@pspacek
Copy link
Contributor

@pspacek pspacek commented Jul 10, 2020

Hi @puneetsood! Thanks for asking. It was meant as October 1st 2020 aka 2020-10-01.

As far as I can see it received positive response from ISC, CZ.NIC, NLnet Labs and PowerDNS as well, so now Google et. al are missing in the list.

(Github auto-formatting changed the obsolete pre-ISO https://en.wikipedia.org/wiki/Date_and_time_notation_in_Europe into a numbered list with a single item in it.)

@SvenVD-be
Copy link

@SvenVD-be SvenVD-be commented Jul 13, 2020

It is still unclear to me if 2020-10-01 is now the official date or that we are waiting for Google et. al to confirm this date?

@oerdnj
Copy link
Member Author

@oerdnj oerdnj commented Jul 16, 2020

Ok, let's go with 2020-10-01 officially. The impact of this DNSFlagDay will be minimal anyway, removing the EDNS0 was much bigger deal and it was handled gracefully.

@vixie
Copy link
Contributor

@vixie vixie commented Jul 16, 2020

@puneetsood
Copy link

@puneetsood puneetsood commented Jul 23, 2020

Hi @puneetsood! Thanks for asking. It was meant as October 1st 2020 aka 2020-10-01.

As far as I can see it received positive response from ISC, CZ.NIC, NLnet Labs and PowerDNS as well, so now Google et. al are missing in the list.

(Github auto-formatting changed the obsolete pre-ISO https://en.wikipedia.org/wiki/Date_and_time_notation_in_Europe into a numbered list with a single item in it.)

We are doing some experimentation to quantify impact for our users. Will have an update in 2 weeks.

@pspacek
Copy link
Contributor

@pspacek pspacek commented Jul 24, 2020

Thank's great news, thank you.

FYI I did quick scan over CZ TLD and number of domains which work over UDP and do not support TCP at all around 0.05 %. This methodology does not take into account that some domains which do not support TCP will never send big enough answer to require TCP, i.e. it is upper bound.

[Unfortunatelly I'm swamped so I do not have enough time to optimize the code for com. zone so I do not see world-wide results.]

@dkg
Copy link

@dkg dkg commented Aug 12, 2020

@puneetsood wrote:

We are doing some experimentation to quantify impact for our users. Will have an update in 2 weeks.

@puneetsood, any results from these experiments?

@pspacek
Copy link
Contributor

@pspacek pspacek commented Aug 13, 2020

FYI recent research results about EDNS buffer size values were presented at DNS-OARC 32b: https://indico.dns-oarc.net/event/36/contributions/776/

Keep in mind this is an academic research. Practical implementation will have to take into account additional complexity from real-world, e.g. that resolver does not have information if the "other end" of communication lies in the same network or on the other side of the "other side" of the Internet etc.

@puneetsood
Copy link

@puneetsood puneetsood commented Aug 13, 2020

@puneetsood wrote:

We are doing some experimentation to quantify impact for our users. Will have an update in 2 weeks.

@puneetsood, any results from these experiments?

Our experiments show that the change will work well for our users.

Summary of our plan below. I will be posting a similar message to the dns-operations@ list today.

  • Google Public DNS Resolver to Authoritative
    For UDP requests, our current implementation sets an EDNS0 bufsize of 4096 for requests to most nameservers. We intend to change the EDNS0 bufsize value to one of 1232 or 1400 based on further experimentation. Our experiments show that either value has similar failure rates as the earlier value of 4096. At this time we do not intend to implement PMTUD.

We plan to deploy this change incrementally over a period of 4-6 weeks starting on the flag day. We will start with
a low percentage of queries using the new value for EDNS0 bufsize on the flag day and increase to 100% coverage by the end of the period. In case of significant problems, we will pause or rollback the changes and communicate this to the community.

  • Client to Google Public DNS Resolver
    For UDP queries, our current implementation uses the client specified EDNS0 bufsize to determine the size of the response to send. Responses larger than the truncated response can be retried over TCP by clients.

This behavior has been working well for our users. We do not plan to make any changes on the client side.

  • Summary of experiment results
    NOTE: Experiments with 1232 and 1400 were run in different metros which resulted in different levels of UDP truncation.

Baseline queries with bufsize = 4096
Queries with UDP truncation: 0.345%
Queries with TCP retry failure: 0.115%
With bufsize = 1232
Queries with UDP truncation: 0.367%
Queries with TCP retry failure: 0.116%

Baseline queries with bufsize = 4096
Queries with UDP truncation: 0.238%
Queries with TCP retry failure: 0.072%
With bufsize = 1400
Queries with UDP truncation: 0.259%
Queries with TCP retry failure: 0.071%

@pspacek
Copy link
Contributor

@pspacek pspacek commented Aug 14, 2020

If I read the table above correctly dropping EDNS buffer size down to 1232 adds additional fallback to TCP for roughly 0.022 %. That seems negligible to me. Out of curiosity, do you have some error bounds for these measurements?

Also the increased TCP failure rate roughly 0.001 % seems almost like measurement error.

Thank you very much for these measurements, it is much appreciated.

@puneetsood
Copy link

@puneetsood puneetsood commented Aug 26, 2020

If I read the table above correctly dropping EDNS buffer size down to 1232 adds additional fallback to TCP for roughly 0.022 %. That seems negligible to me. Out of curiosity, do you have some error bounds for these measurements?

Also the increased TCP failure rate roughly 0.001 % seems almost like measurement error.

There is variability across different metros and we do see a decrease in the truncation and TCP retry failure rate in some metros. The TCP failure rate with the experiment ranges between 0.0046% and 0.0104%.

                                                                        Min |        Max |   |           Average
                                                                         All  |           All |   | 1232        | 1400         | All

Truncation increase, in experiment: -0.0474% | 0.0628% |   | 0.0330% | 0.0123% | 0.0218%
TCP retry failure increase, in experiment: -0.0354% | 0.0009% |   | -0.0014% | -0.0065% | -0.0042%

Thank you very much for these measurements, it is much appreciated.

@pspacek
Copy link
Contributor

@pspacek pspacek commented Sep 8, 2020

Apparently we forgot to close the issue even though the date is set, let me fix this mistake!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet