# AWS - Customer story

## Airbnb
- Airbnb is an online marketplace where guests and hosts can connect with each other and share accommodations

<img src="images/airbnb_AWS_architecture.png">

**Challenges solved by business architecture**
- it runs on own multi-tenant Kubernetes clusters on EC2
- its owned because at times a back port fixes are required or add new features into Kubernetes 
- challenges faced in the past were
  - there are many different pods that run on a single host
  - fine grained access controls or fine grained IAM roles are needed to these pods
- currently
  - the control plane runs on one of EC2
  - the control plane runs on a physically separate EC2 host than all worker nodes
  - the control plane manages all the pods in cluster
  - the control plane injects service account tokens into the pods
    - these tokens are cryptographically signed and contain an identity
    - tokens will have namespace that pod belong to and name of the pod
  - the pod can then take this token and send it directly to STS for verification
  - STS can verify if the token is valid 
  - STS can then check IAM role and verify the namespace and name of the pod being requested is correct
  - if properly checked out, will give IAM credentials and send those credentials back to the pod
  
**From multitenant perspective, how are the pods secured in more clustered manner in Kubernetes cluster?**
- each pod has its own identity that is provided by the control plane
- in the IAM role definition, the service owners are able to verify the roles XYZ should be allowed to be assumed by pods in XYZ namespace
- given the connection with each other, it can be guaranteed that other pods and different namespaces cannot assume XYZ IAM role

**In terms of reconciliation and making sure that the pods are assuming the right role, how are they monitored and audited?**
- STS emits events into the CloudTrail, whenever it creates new tokens and also when the pods are assumed to an new IAM role 
- these events are then ingesteed into Elasticsearch cluster
- an engineeer can then see 
  - when a pod is assuming an IAM role, 
  - what IAM rolees are being used
  - can see when a failure happens, and when that happens before service owner knows, 
    - check definition of their IAM role to make sure the correctness
    - alert the service owner automatically


<img src="https://www.codekarle.com/images/Airbnb.png">

## SSB Cargo: Data collection and processing with serverless analytics services

- SSB - Swiss Federal Railway company
- freight company

**What is the project**
- earlier the freight train has to be physically inspected, took lot of time, goal is to reduce the time
- sensors are put in on the train and along the track
- focus on critical parts to do inspection quickly

**Architecture**
- Producer is the locomotive that sends lot of data, 3 million samples a day
- Lambda takes this data and forards it to Kinesis
- 3 million samples is filtered by Kinesis Analytics and forwards it into Kinesis stream
- this data is then sent to lambda and then it gets pushed in the right format into DynamoDB
- A consumer can be an algorithm, which is deciding if this train is in good condition or not
- Consumer asks for this data from API gateway interface
- this data is sent from the DynamoDB through Lambda to the consumer
- Lambda manages the different formats of data into the system, enabling different pipelines 

**Why was other business solution as EC2 not considered?**
- goal of this project priorities was to have minimal operations

<img src="images/ssb_AWS_arch.png">  

## Disney+ scales globally on Amazon DynamoDB
- **Introduction**
  - What is Disney+
    - video streaming service
    - content discovery team is responsible for APIs that serve the content metadata for Disney+ application
    - metadata around the videos
    - launched in Nov 2019
    - 3 billion requests to content APIs  a day
    - tens of TB of content metadata a day
    - hundreds of TB of images a day
    - hundreds of millions of recommendations per day, inserted into DynamoDB
- **DynamoDB use cases (architecture)**
  - **watchlist**
    - backed by a global table
    - simple, a service infrnt of global DynamoDB table, which lets us query watchlist related tables
    - sync across all regions, low latency
  - **bookmarks**
    - start watching a video, pause it, then pick it up from you left, or start it on another device
    - while a user watches video, the app sends stream of bookmark data to a telemetry service at the nearest AWS region to the video player
    - that service takes bookmark data, writes it to a Kinesis stream, then read that data from Kinesis stream and insert it into global DynamoDB table that exists in the regions wheree the content API is deployed to
    - client then requests that bookmark data from one of the content API services, when they load up homepage or movie page
    - such architecture allows to decouple where the bookmarks is read from to where the clients are served the bookmark data from
    - this can be deployed to any number of regions just by adding region to global DynamoDB table
  - **recommendations**
    - ML team generates recommendations and writes it into Kinesis stream in a single region
    - the stream from Kinesis is read and placed into DynamoDB global table
    - this is then used in any region using content API
    - DynamoDB takes care of replication
  - **content caching**
    - DynamoDB is not used in this use case, a different datastore is used
    - DynamoDB is used to cache the results of queries, that is used to run against the document data store
    - this is cached with a TTL (time-to-live)
    - basic use case is 
      - when a user query for some piece/set of content, DynamoDB cache is checked 
      - if its missing or its expired based on TTL 
      - pull from primary datastore, put it into cache
      - serve it out from there
      - helps to buffer request to primary data store
- **TakeAways**
  - **on occasions like failovers or planned maintenance, regions are evacuated** 
    - move to a different region to maintain reliability and performance of service
    - replication that DynamoDB offers with latencies are of very low digit seconds
    - allows to shift traffic
  - **when Disney+ is launched in new countries**
    - can leverage additional AWS region
    - just by adding it to DynamoDB global table
  - **scaling - recommendations and bookmarks volume growth**
    - DynamoDB grows along with user base
    - very little operational overhead
  - **on-demand vs provisioned mode**
    - ability to switch back and forth between on-demand mode or provision mode
    - helps launching in new country when volume capacity is unknown
  - **pre-partioning**
    - DynamoDB partitions data as data storage grows, as throughput grows
    - the number of partition grows as well, to maintain a high level of throughput
    - on launch day 1, a large influx of bookmark data was expected
    - into an empty table
    - if it wasn't partitioned beforehand, to meet the demands of that scale, would have experienced throttles
    - to meet the anticipated demand of read and write throughput about throttles, tables were pre-partitioned prior to launch
    - a provisioned thoroughput write value was set
    - allowing DynamoDB to partition based on that
    - and then was switched back to on-demand mode prior to launch
    - thus was able to avoid throttles on those tables
  - **supporting concentrated read traffic**
    - an items living in a partition has its own limits on read/write
    - in content cache, some contents are more popular than other
    - this can result in disproportionate lookups to a partition, which then results in throttle
    - to resolve this, a strategy of appending a sequence number to the end of key was used
    - this data was written to more than one entry
    - on a request to any content, a GUID would be used on the request, and then hash that to one of the sequence number
    - this is then used for the key lookup
    
    
<p align="center"><img src="./images/disneyplus_watchlist.png" width=400 height=400></p>

$\tiny{\text{YouTube - Disney+ - Watchlist}}$   

<p align="center"><img src="./images/disneyplus_bookmark.png" width=400 height=400></p>

$\tiny{\text{YouTube - Disney+ - Bookmark}}$   

<p align="center"><img src="./images/disneyplus_recom.png" width=400 height=400></p>

$\tiny{\text{YouTube - Disney+ - Recommendation}}$   

<p align="center"><img src="./images/disneyplus_contentCache.png" width=400 height=400></p>

$\tiny{\text{YouTube - Disney+ - Content Caching}}$   

<p align="center"><img src="./images/disneyplus_supprtConcReadTraffic.png" width=400 height=400></p>

$\tiny{\text{YouTube - Disney+ - Supporting Concentration Read Traffic}}$   