Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDLC, Production Support, Bugs, Escalation, etc #620

Open
thegreatfatzby opened this issue Jun 12, 2023 · 1 comment
Open

SDLC, Production Support, Bugs, Escalation, etc #620

thegreatfatzby opened this issue Jun 12, 2023 · 1 comment

Comments

@thegreatfatzby
Copy link
Contributor

thegreatfatzby commented Jun 12, 2023

(To be clear this applies to both TEEs and on-device auctions, both are interesting but the on-device one is particularly interesting)

What is the thinking around how all the various operational aspects of handling systems w/r/t all of Privacy Sandbox (asking here b/c I had to choose somewhere, but it would apply for all of the things). At the various iterations of this company I've been at (AppNexus, Xandr, Microsoft Ads), we've had processes of varying levels of respectability, such as:

  • A defined process by which bugs can be submitted by clients, evaluated, routed and escalated if needed.
  • 24/7 Pager with primary and secondary, SLAs for what is an Incident (I believe Google calls them OMGs); expected ack time in the single to low double digit minutes, resolution time depends on the incident but anything beyond a few hours triggers a fairly serious escalation depending on the system.
  • Incident (OMG) post-mortems to focus significant attention on the root cause, prevention, amelioration, etc.
  • Metrics for observability with passive/active alerts that can be routed to teams.
  • Code review and testing requirements to prevent regressions.
  • Critical systems having phased deployments to help identify unexpected issues.
  • Beatings to keep morale up.

Privacy Sandbox introduces some really interesting challenges to over-come:

  • Broadly, we are now going from a distributed environment of thousands of executors to millions (billions??!!)
  • An Ad Tech's code now runs in an execution environment it does not own from a code, operations, or accountability perspective.
  • That execution environment can now have revenue impacting bugs of different severities; those bugs may not impact all consumers or ad techs equally.
  • Deployment of that execution environment represents a change to the ad techs environment.
  • Chrome browser metrics are now business relevant to other businesses.
  • Chromium SDLC, including deployment (see above) but also standards, code review, etc, now impact those businesses.
  • Tests for the ad techs code are now, in theory, relevant to Chrome deployment.
  • But of course, Chrome is a platform with many stakeholders, so Chrome being tightly coupled to every ad techs deployment process would be...suboptimal.

So my question is...what do we do about all that? Some particular questions:

  • How do ad techs "observe" this system, meaning things like metrics and anonymized logs?
  • How will the browser owner (Google in the case of Chrome, MSFT in the case of Edge) observe what is happening that is relevant to ad tech?
  • Should Chrome/Edge have 24/7 support available for ad techs who have been woken up at 3 AM (it's never at 2 PM) because event level reporting is failing in X% of revenue impacting cases after a rollout of Chrome x.y.z ramps up to 100%?
  • How does a bug get submitted? Is the assumption that somehow it gets to the ad tech and they figure out if it belongs to Chromium or to them?
  • How will Chromium deploy in a way that minimizes risk to ad tech regressions? Will ad tech test suites be factored in?
  • Do ad tech engineers have some general say in the Chrome/Edge SDLC, code review, testing standards and strategy w/r/t auction and bidding, etc?
@thegreatfatzby
Copy link
Contributor Author

Did a small edit but wanted to add a bigger related thought separately.

More fundamentally, we'll be putting Chrome/Edge/etc into not just advertising code execution but advertising business relationships. When I've been woken up at 3 AM to fix an IM that I (or occasionally someone else) caused, what I'm ostensibly doing is fixing code/data/systems, but what I'm really doing is keeping clients happy and maintaining revenue, clients and revenue that I have a direct financial incentive to keep happy. When a Product Manager and I discuss a backlog we're ostensibly deciding on work for our team(s), but what we're really doing is keeping clients happy to maintain or expand revenue.

What is the thinking on how Chrome/Edge incorporate those relationships into its planning, response times, etc? If a Publisher comes to MSFT-Xandr hoppin' mad because their monetization has dropped and it's a core Chromium Ads issue, how will their anger be translated into action?

(Note: clearly, the browser is currently a critical part of the advertising ecosystem: HTTP requests need to work, cookies need storage, etc. But those functions are part of a platform, a fairly well contained distinct layer, that ad tech functions are built on, and business relationships and processes are built around those parties. Chrome/Edge/Chromium are now direct parties in the auction, bidding, notification, and reporting processes, rather than a platform those functions are built around, and therefore more directly and deeply involved in a business relationship than it was before.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant