# Concerns

#### Features
- Features of the product
- Eg: Design FB messenger
    + 1-to-1 chat feature
    + seen/read message feature
    + Group chat and security

#### APIs
- APIs for services - what API
- who call the APIs

#### Availability
- How available is the service?
- Example: If the host went down -> services still available?

#### Latency Performance
- Customer facing request -> must care about latency
- Solution: Add caches to improve latency

#### Scalability
- Can service be scaled with 1000, 100000, 1000000 users?
- Can system be scaled if we add more users/requests?
    + Services still available?
    + latency change?

#### Durability
- Data stored in database can be lost/secured/compromised

#### Class Diagram
- OOP design

#### Security and Privacy
- Some system focus on Security and Privacy like Authentication system

#### Cost Effective


# Concepts
### Vertical vs horizontal Scaling
- **Vertical Scaling** = upgrade computing resources: CPUs, Memory, hardrives...
    + Expensive
    + Got constraints
- **Horizontal Scaling** = more machines/hosts
    + Facing distributed system challenges
    + Preferable than Vertical Scaling
    
### CAP Theorem
- **Consistency**
    + Ensure the read can get the most recent write
- **Availability**
    + Ensure always get response may/may not recent write
- **Partition Tolerance**
    + Can drop network packets between nodes
    
```
CAP theorem: can only achieve 2/3
    + Partition Tolerance must be ensured
    + Trade-off between Consistency vs Availability
```

- **Traditional Relational database**: favor Consistency
- **No-SQL**: favor Availability

### ACID vs BASE
- **ACID**
    + Atomicity: All operations in a transaction succeed or every operation is rolled back.
    + Consistency: Guarantee that data will be consistent
    + Isolation: 
        - One transaction cannot read data from another transaction that is not yet completed
        - If two transactions are executing concurrently, each one will see the world as if they were executing sequentially
        - If one needs to read data that is written by another, it will have to wait until the other is finished.
    + Durability
        - Once a transaction is complete, it is guaranteed that all of the changes have been recorded to a durable medium (such as a hard disk)
        - Transaction has been completed is likewise recorded
    + Used mostly in relational database
    
- **BASE**
    + Basically Available: System does guarantee Availability
    + Soft state: the state of the system may change over time, even without input
    + Eventual consistency: system will become consistent over time, given that the system doesn't receive input during that time.
    + Used mostly in No-SQL database
    
### Partition/sharding data
- Store data over multiple different nodes
- Technique: Consistent hashing

### Optimistic vs Pessimistic Locking
- **Optimistic Locking**
    + Do transaction - Do not acquire locking
    + When commit -> Check if no other transactions affect the record
- **Pessimistic Locking**
    + Acquire all the locks

### Strong vs Eventual Consistency
- **Strong Consistency**: 
    + Ensure read see the latest write
    + Used in relational database
- **Eventual Consistency**: 
    + Read see some write
    + Eventually see the latest write
    + Can be chosen for No-SQL database -> ensure high availability
    
### Relational vs No-SQL database
- **Relational database**: Strong Consistency
- **No-SQL database**: High Availability, Scalable

### Types of NoSQL database
- key-value
- wide column: 1 row can have many different formats(columns)
- Document based
- Graph based

### Caching
- Store frequent used data into cache -> access quickly
- 2 types of cache
    + cache shared between node
    + each node has its own cache
    
### Latency
- Between data centers
- Between racks in a data center
- Between hosts in a rack

### Computing resources
- CPU, memory, harddrive, network bandwidth
- Resources are limited

### Random vs Sequential read/write on disk
- Sequential read/write faster than random read/write on hard disk
- Try to avoid random read/write

### http - http2 - websockets
- http: resquest/response architecture between clients and servers
- http2: improved from http, can do multiple requests in 1 single connection
- websockets: bi-directional communication between clients and servers

### https and TLS
- https
    + Secured version of http
- TLS
    + Transport Layer Security
    + Ensure data integrity, security between server and client
    
### Certificate Authority
- Certificate Authority of websites. Eg: Facebook, Google
- Verify that the public key is from the certificated websites
- Dont let 3rd party fake the websites

### Symmetric and Asymmetric Encryption
- Asymmetric Encryption
    + Public - private key encryption 
    + Allow send a large amount of data
- Symmetric Encryption
    + Same key Encryption
    + AES
    
### Load balancer
- Distribute requests -> nodes
- L4: Consider both client and destination IP addresses and Port number to do the routing
- L7: Consider http URI to do the routing

### CDN and Edge 
- CDN
    + Content Delivery Network: Deliver the whole Content over Network instead of from Data Centres
    + Boost the performance + latency
- Edge
    + Dedicated Network: packet routed over network instead of wandering over internet

### Bloom filters and count-min sketch
- Bloom filters
    + Data structure to decide if an element is a member of set or not
    + Can have false positive but not false negative
- count-min sketch
    + count frequency of events
    + Keep track of the top case
    + Give an answer which is close enough to the answer with some error rates
    

# High Scaled System

### Cassandra
- Wide column, scalable database
- key-value store
- Time series data
- Eventual + Strong Consistency
- Consistent hashing to shard data
- Use Gossiping to keep the nodes informs each others

### Mongo DB
- JSON structure
- ACID properties
- scalable

### MySQL
- Relational database
- Full ACID
- Master-slave architecture
- Scaled Up well

### Memcached
- Distributed cache -> limited amount of data
- Hold data in host memory
- Simple, fast key-value storage
- 

### Redis
- Distributed cache -> limited amount of data
- Hold data in host memory
- Clustered -> more availability, data replication
- Flush data on hard drive

### Zookeeper
- Centralized configuration management tool
- Used for distributed logging + leader election
- scaled for read - not scaled well for write
- Keep data in memory -> limited amount of data
- High availability

### Kafka
- Fault-tolerance
- High availability
- Streaming application
- Deliver message

### Nginx, HAProxy
- Load balancer
- Can manage 10000 connections from 1 client
