Skip to content

Latest commit

 

History

History
47 lines (34 loc) · 1.87 KB

45-how-to-design-netflix-view-state-service.md

File metadata and controls

47 lines (34 loc) · 1.87 KB
layout title date comments categories language references
post
How Netflix Serves Viewing Data?
2018-09-13 20:39
true
system design
en

Motivation

How to keep users' viewing data in scale (billions of events per day)?

Here, viewing data means...

  1. viewing history. What titles have I watched?
  2. viewing progress. Where did I leave off in a given title?
  3. on-going viewers. What else is being watched on my account right now?

Architecture

Netflix Viewing Data Architecture

The viewing service has two tiers:

  1. stateful tier = active views stored in memory

    • Why? to support the highest volume read/write
    • How to scale out?
      • partitioned into N stateful nodes by account_id mod N
        • One problem is that load is not evenly distributed and hence the system is subject to hot spots
      • CP over AP in CAP theorem, and there is no replica of active states.
        • One failed node will impact 1/nth of the members. So they use stale data to degrade gracefully.
  2. stateless tier = data persistence = Cassandra + Memcached

    • Use Cassandra for very high volume, low latency writes.
      • Data is evenly distributed. No hot spots because of consistent hashing with virtual nodes to partition the data.
    • Use Memcached for very high volume, low latency reads.
      • How to update the cache?
        • after writing to Cassandra, write the updated data back to Memcached
        • eventually consistent to handling multiple writers with a short cache entry TTL and a periodic cache refresh.
      • in the future, prefer Redis' appending operation to a time-ordered list over "read-modify-writes" in Memcached.