## What is Scalability?

* __Scalability__: the ability to cost-efficiently adjust your system in response to any amount of demand without compromising user experience
    - being able to handle more requests, transactions, users, etc
    - being able to scale up as well as scale down
* the ability to scale is measured by 3 requirements:
    1. Handling more data
        - common challenge
        - the more data you have, the harder it becomes to sort through and search through it
    2. Handling higher concurrency levels
        - basically, how many users can use your application at the same time
        - more users means more things needing to be handled at the same time and your system needs to be able to handle that
    3. Handling higher interaction rates
        - refers to the # of interactions between the user and your servers
        - this is related to concurrency levels but is more so a requirement based on the type of application
        - a typical web application would have very low interaction rates compared to a multiplayer online game
            * an online game could have multiple interactions happening in less than a second
        - the main challenge for this requirement is _latency_
            * the higher the interaction rates, the higher the need to serve responses quicker
* scalability is _related_ to performance but they are not the same thing
    - performance refers to how efficiently you are able to server a user's request
        * being able to load your website up in 3 seconds vs 500 ms
    - scalability refers to how efficiently you are able to accommodate any number of users using your website
        * being able to serve 1k users vs 100k users vs 1 million users and scaling up or down from there
* scalability is also constrained by the number of engineers that work on the system, i.e. organizational scalability matters as well
    - if the architecture and design of your system is very interconnected, it would be very difficult to scale your engineering teams
    - you would have very large teams that work on the same codebase at once, making it very hard to communicate with each other

## Evolution from a Single Server to a Global Audience

* these stages of evolution can only work if they have been planned for from the beginning
* typically, you plan for a type of architecture and you stay there
    - you might have to scale up one or 2 stages but for the most part, you do not continually evolve your architecture unless you plan to redesign and rewrite your application

### Single-Server Configuration

![Single-server configuration](./assets/Single-server%20configuration.png)

* simplest configuration
    - entire application runs on a single machine
    - your single server has to have multiple responsibilities, e.g.:
        * serving static content to render your web page
        * also runs a database management system
* in this scenario:
    - users connect to the Domain Name System (DNS) server to resolve to grab the Internet Protocol (IP) address associated with your domain
        * normally, the DNS is provided by the hosting company and not run on your own servers
    - using this IP address, HTTP requests are sent to the web server
    - your web server would then send any information that the user requires like HTML/CSS/JavaScript to render your webpage and any images/videos/files
        * so that single machine handles the processing and traffic
* this type of set up is great for simple company websites
    - they might not even need a dedicated server (physical machine) and can just host on a virtual private server (VPS) or on shared hosting
    - Virtual Private Server (VPS)
        * a virtual machine for rent
        * a VPS instance is hosted together with other VPS instances on one shared machine but they have their own dedicated resources that don't impact the performance of each other
        * you are able to add more power to this machine pretty quickly and cheaply
    - Shared Hosting:
        * multiple websites or applications share the same server
        * it is pretty cheap but they share the same resources and the performance of one website can impact the performance of another
        * they are not isolated
* __when is this configuration good?__
    - when you have a simple website that has pretty low traffic
* __this configuration won't take you far in scalability for a couple of reasons:__
    - as your user base grows, you have to accommodate for more traffic to your servers
    - your database will have to grow to accommodate the amount of data it's adding
        * requires more resources to query your database
    - new functionality added to your system would also take up more resources

### Making the Server Stronger: Scaling Vertically

![Single server, but stronger](./assets/Single%20server%2C%20but%20stronger.png)

* __Vertical Scalability__: upgrading the server's hardware to be able to handle more traffic or data processing
    - very simple to do without requiring any changes to your architecture or application
    - just add more RAM bro
* ways to scale vertically:
    - adding more hard drives for improve I/O capacity
    - switching from hard disks to solid-state drives (SSDs)
    - add more RAM to decrease need for I/O operations
    - improve network throughput by upgrading network interfaces or adding new ones
    - switching to a server with more processors or virtual cores
* __when is this scaling good?__
    - useful for small applications
    - good if you can afford the hardware upgrades
    - vertical scaling is pretty simple, you don't have to change anything about your application
* __when is this not great?__
    - it's cheaper in the beginning but becomes extremely expensive after a certain point
        * e.g. getting 128 GB of RAM = ~\\$3k whereas doubling it to 256 GB of RAM = ~\\$18,000
        * the pricing is NOT linear after a certain point
    - there is a limit to how much you can actually upgrade the hardware
        * you can't keep adding RAM forever bro
    - the operating system design or your own application could hinder how much you can upgrade the hardware
        * e.g. MySQL cannot be scaled infinitely with more CPU due to lock contention
            * locks basically control access to a shared resource like memory or files or data structures between threads
            * lock contention refers to a performance bottleneck where threads have to play a waiting game in order to acquire a lock to access a shared resource
                - they basically have to wait for a thread to release possession of a lock to gain access to it
        * so regardless of how much vertical scaling you do, if lock contention is not under control, then you can't even access those resources
    - if your application was not designed with high concurrency in mind
        * lock management is very complex

### Isolation of Services

![Configuration with separate services residing on different servers](./assets/Separate%20Services.png)

* can also scale your system by separating different services into their own servers (physical machines)
    - so instead of 1 server for a web server and database engine, you now have 2
    - this still looks pretty similar to a single-server setup though but the number of servers has increased to share the load
    - these different servers are hosted in a 3rd party data center usually
        * data centers are physical locations where these machines reside or they could be cloud data centers
* __when is this scaling good?__
    - pretty good for small websites or for web development agencies
        * they host multiple websites on a single web server
        * if one of those websites has a large amount of traffic, it can have its own web server
    - you can also vertically scale each server as well
* __when is it not good?__
    - when you require even more scaling b/c each server for each service can only be scaled vertically which has its own limitations

#### Functional Partitioning

![Configuration showing functional partitioning of the application](./assets/Functional%20Partitioning.png)

* you can also divide your system based on functionality, i.e. __functional partitioning__
    - i.e. you have separate servers for your admin portal and the rest of your application
* each part of your application could use a different subdomain so that traffic could be directed to them based on the IP address of the web server
    - each partition could have its own servers installed and have different vertical scaling needs

### Content Delivery Network: Scalability for Static Content

![Integration with a content delivery network provider](./assets/CDN.png)

* could offload some traffic to a 3rd party content delivery network service
    - hosted service that handles serving static content to your clients globally
    - clients connect to one of the CDN servers to grab any static content they need
        * if the CDN doesn't have it, it'll request it from your servers, cache it, then fulfill any subsequent requests for that content with the cached content
* __what are the benefits of a CDN?__
    - reduces the amount of bandwidth (maximum amount of data you can transfer per second over a network) your servers need
    - requires less web servers to serve static content
    - CDNs are global so your clients will be served content faster if they aren't close to your servers
    - CDNs are a 3rd party service
        * you just have to use the service, you don't really need to change your application that much
        * it's cheaper to optimize serving content like this using a CDN which has a global network than you coming up with your own solution

### Distributing the Traffic: Horizontal Scalability

![Multiple servers dedicated to each role](./assets/Horizontal%20Scalability.png)

* __Horizontal Scalability__: scalability that increases your system's capacity by adding more servers
    - harder to achieve and has to be considered before the application is built
        * horizontal scalability can be added later on but requires a lot of effort
    - there are multiple ways to achieve this scalability but the simplest one would be to run each component of your system on multiple servers and being able to add more servers when you need it
        * each server does not need to be strong at all since you can just run more servers
    - initially, horizontal scalability costs more to set up but as more capacity is needed, the costs are much more efficient than vertical scaling
        * it costs more to set up b/c it's more complex to setup
        * also requires experienced engineers to setup and maintain
        * but as you need more and more capacity, it's pretty easy to estimate how much it would cost
            - e.g. if you get 2x as much traffic, you'll be charged 2x as much if you're using a 3rd party service
    - horizontal scaling does not have a ceiling like vertical scaling does
        * there's a limit to how much more powerful your hardware can get but there isn't really a limit on how many servers you can add
* __when is this scaling good to use?__
    - when you know you're going to need it and can plan for it
    - when you have experienced engineers that can set it up and maintain it
    - when you have the option of using 3rd party providers that can add more servers (AWS, for example)
* __what makes this scaling different from the previous ones?__
    - each service can be scaled by adding more servers
    - it's more expensive up front b/c of its complexity but becomes predictable at higher capacities
    - allows for partial horizontal scaling in stages on things that are easier to scale first:
        * i.e. scaling your web servers and caches first b/c they're easier to setup
        * then focusing on scaling your databases and persistence stores later
* horizontal scaling also makes use of a round-robin DNS service to distribute traffic between web servers
* __Round-robin DNS__: allows you to map a domain name to multiple IP addresses
    - each IP address points to a different machine
    - when a client tries to resolve a domain name with an IP address, the DNS returns one of the IP addresses mapped to that domain
    - when that client receives that IP address, it will only communicate with that server

### Scalability for a Global Audience

![Customers from different locations are served via local edge caches](./assets/Global%20Scalability.png)

* used by the largest websites (think Facebook or Twitter)
    - you need more than one data center
    - a data center can host many servers but its location hinders its usefulness
        * if a user is far away from your data center, their experience is worse
    - using multiple data centers also prepares you for outages from things like floods, storms, fires, etc
    - scaling of this size requires the use of a GeoDNS
* __GeoDNS__: a DNS service that resolves domain names to IP addresses based on the user's location
    - you would have different IP addresses based on a geographic region
    - if a user is a part of that region, they would be served from that IP address
    - the goal of a GeoDNS is to serve an IP address of a data center closest to the client
        * this reduces latency and provides a much better user experience
* global scaling could also make use of edge-cache servers
* __Edge cache__: it is an HTTP cache server
    - it serves already generated web pages or part of one
    - if it serves part of a web page, it will make requests in the background to your web servers for the rest of it
    - if the edge cache cannot cache the web page, it will just delegate that function to your web servers entirely
    - in essence, edge-cache servers can serve entire pages or cache fragments of HTTP responses
* Edge cache vs CDN:
    - edge cache is a server, CDN is an entire network
    - edge cache serves cached static assets
    - CDN also does that but it has more features like load balancing and DDoS protection

## Overview of a Data Center Infrastructure