# Software Engineering concepts
## Database design
### DB stack
- SQL vs NoSQL
- RDBMS
- Oracle
- MSSQL
- Sybase
- Hadoop
- MapReduce
- Cassandra
- MongoDB
- Reddis
- Kafka
- BigData
- Data warehousing
- OLAP
- OLTP

### Types of database
- Structured 
  - Relational/Normalized
- Unstructured 
  - Heterogenous type 
  - text, image, video, blobs, etc
  - cannot be consumed directly
- Semi-structured 
  - mix of structured and unstructured data
  - XML, JSON or as per business needs
- User state data 
  - information of activity that user performs on website

### Relational
- Forms
  - one-to-one
  - one-to-many
  - many-to-one
  - many-to-many
- data consistency
  - normalized

<p align="center"><img src="https://miro.medium.com/max/1236/1*kTcdlLdvq6pZUpsjKpifOg.png" title="medium.com" width=500></p>

- ACID compliant
  - Atomicity
    - Transactions are made up of multiple statements
    - Atomicity guarantees that either it succeeds completely, or fails completely
    - if any statement in a transaction fails to complete, the entire transaction fails and the database is left unchanged
  - Consistency
    - any data written to database must be valid as per the rules, including constraints, cascades, triggers, and any combination thereof. 
    - no illegal/incorrect transaction must be allowed
  - Isolation
    - concurrent execution of transactions leaves database in same state if the transactions were executed sequentially
    - incomplete transaction must not be visible to other transactions
  - Durability
    - if a transaction has been committed it will remain committed during reboot or a system failure or crash


- Scaling
  - scaling relational databases is not trivial 
  - they have to be sharded, replicated to make them run smoothly on a cluster
  - require high skill set resources

### NoSQL
- they are more like JSON-based databases
- built for high-frequency read writes
- Scaling
  - can add new server nodes easily
  - designed to manage exponential growth of Web
  - as nodes can be added easily, it allows handling more concurrent traffic quickly
- Cluster
  - they are designed to run on clusters
- ACID
  - support for ACID transactions, limited to an extent
  - when a large application is deployed on hundreds of servers across the globe, the distributed nodes may take some time to reach a global consensus and actual figures as in Twitter like/follower count
  - it works on the principle of **eventual consistency**
  - best fit for analytics use case
- Examples
  - MongoDB, Redis, Neo4J, Cassandra, Memcache, Elasticsearch, Google Cloud Datastore
  
<p align="center"><img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Which-Database_v07-10-21_1.max-2000x2000.jpeg" width=600></p>

$\tiny{\text{Priyanka Vergadia}}$

### DataLake 

### BigQuery - data warehouse

### Why Postgres

### Why MySQL

### Graph database

### Document-oriented database
- data is generally semi-structured and stored in a JSON-like format
- optimal for use cases that require flexibility and fast, continual development
- MongoDB, CouchDB, Elasticsearch, Google Cloud Datastore, and Amazon DocumentDB

#### MongoDB - Document
- A **document is a record in a document database**. 
- A document typically stores information about one object and any of its related metadata.
- Documents **store data in field-value pairs**. 
- The values can be a **variety of types and structures**, including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like JSON, BSON, and XML.
- A **collection is a group of documents**. Collections typically store documents that have similar contents.
- Not all documents in a collection are required to have the same fields, because **document databases have a flexible schema**. Note that **some document databases provide schema validation**, so the schema can optionally be locked down when needed.


```json
{
     "_id": 1,
     "first_name": "Tom",
     "email": "tom@example.com",
     "cell": "765-555-5555",
     "likes": [
        "fashion",
        "spas",
        "shopping"
     ],
     "businesses": [
        {
           "name": "Entertainment 1080",
           "partner": "Jean",
           "status": "Bankrupt",
           "date_founded": {
              "$date": "2012-05-19T04:00:00Z"
           }
        },
        {
           "name": "Swag for Tweens",
           "date_founded": {
              "$date": "2012-11-01T04:00:00Z"
           }
        }
     ]
  }
```

### Key-Value database
- dictionary data structure 
- key has to be unique
- commonly used for caching, storing, and managing user sessions, ad servicing, and recommendations
- examples - Redis

### Time Series database

### Wide-Column database

### Eventual vs Strong Consistency
- Eventual consistency
  - enables datastores to be highly available
  - is also known as **optimistic replication** and is key to distributed systems.
  - each data centers have multiple clusters with numerous server nodes running
  - data is initially inconsistent but eventually becomes consistent across all the server nodes deployed around the world
- Strong consistency
  - data has to be consistent at all times
  - to achieve this all nodes need to be locked down while getting updated
  - this behavior enables implementation of ACID transactions

### Polyglot persistence
- 

<p align="center"><img src="https://www.abhishek-tiwari.com/assets/images/Polyglot-persistence-pattern-for-an-ecommerce-application.png"></p>

### CAP theorem
- consistency, availability, partition tolerance(fault tolerance)
  - Consistency
    - Every read receives the most recent write or an error.
  - Availability
    - Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  - Partition tolerance
    - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
  > - No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability.  
  >> - When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. 
  >> - When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning. 

### BASE semantics
- basically-available, soft-state, eventual consistency
- Eventually consistent services has only liveness guarantee - updates will be observed eventually
  - **Basically available:** reading and writing operations are available as much as possible (using all nodes of a database cluster), but might not be consistent (the write might not persist after conflicts are reconciled, the read might not get the latest write)
  - **Soft-state:** without consistency guarantees, after some amount of time, we only have some probability of knowing the state, since it might not yet have converged
  - **Eventually consistent:** If we execute some writes and then the system functions long enough, we can know the state of the data; any further reads of that data item will return the same value
  
### Multi-model database
- ArangoDB, CosmosDB, OrientDB, Couchbase, etc
- support multiple data models like the graph, document-oriented, relational, etc. 
- They also avert the need for managing multiple persistence technologies in a single service. 
- They reduce the operational complexity by notches

### Cassandra

### Memcached

### HBase

### Neo4j

### MongoDB

### MapReduce

### Elasticsearch

### Redis

### Google Cloud Datastore

### Redis vs Memcached

### Oracle vs MySQL



## Cloud design
- [Cloud design considerations](system_designs/design_patterns.html)

## AWS vs Azure vs Google Cloud


## Amazon Web Services (AWS)
- Infrastructure as a service
  - ec2 - virtual machine
    - launch any application
    - based on memory requirement
    - configure with different data centers
    - install software and get started
- Platform as a service
  - not interested in virtual machine
  - but use platform
  - use elastic beanstalk
    - deploy application
    - which gets deployed on AWS cloud
    - no control on middleware such as Java
- Software as a service
- Database as a service
  - RDBMS
  - Load balancing services
- flexible billings

----
- Services
  - Batch
    - can run batch/overnight applications
  - Lambda
    - based on event triggered, a program will run and will give response back
    - no server is running in background
    - server gets initialized when the request comes in
    - server will be initialized, it will be served and then will be destroyed once complete
  - Elastic container service
    - equivalent to docker container
    - deploy containers into AWS
  - Developer tools
    - CodeStar/CodeCommit/CodeBuild/CodeDeploy
    - code, build, deploy and pipe it into the queue
  - Services for Machine Learning
    - to check
  - Storage 
    - S3
      - simple storage
      - can use in any of the computing design pattern
    - EFS
    - Glacier
    - Storage Gateway
  - Management Tools
    - CloudWatch
    - AWS Auto scaling
      - automatically deploy and spawn new instance of say Tomcat server
  - Analytics
    - Athena
    - EMR
    - CloudSearch
  - Security
  - Migration services
  - Networking

## Chaos Engineering
- discipline of experimenting distributed system in production to build system capability
- designed by Netflix
- nothing breaks in production
- but if it breaks how do we address system availability
- helps in identifying loopholes in system
- what are the principles
  - build a hypothesis around steady state behaviour
  - real world events
    - plugging out the real world scenario and experimenting on it
    - increasing the traffic
    - adding failure events
    - adding non-expected events
  - run experiments in production
  - automate experiments in a continuous manner
  - minimize the blast radius
    - identify the problem and localize the problem to an extent
    - live traffic does not get impacted heavily or is minimalized
- how does it work in AWS 
  - different services are launched for NA-East, NA-West, EU-East, EU-West and APAC
  - if one service goes down the traffic is rerouted to another service and server request is handled by another service
- Open Source Tools 
  - Netflix 
    - Chaos Monkey
      - randomly kills a service not a server in production
      - identify resilence
      - checks system behaviour
    - Chaos Gorilla
      - kill entire availability zone
      - NA West will be killed
    - Chaos Kong
      - kill the whole region
    - Latency Monkey
      - delay the response
  - Facebook
    - Facebook Storm
      - datacenter goes down
  - AWS
    - AWS gamedays

## Scalability
- Applications's ability to handle increased workload without impacting performance
- If application was able to respond to a user request in x seconds, it should take same x seconds to respond, when concurrent user base increases

Types of scaling
- Horizonal scaling/ Scale out
  - add servers
- Vertical scaling/ Scale up
  - add more power to server, increase RAM
  - this is simple, as we don't need code or complex system configuration changes
- Cloud elasticity
  - scale up/down your servers, as per demand

The scalability of application should be tested using load and stress tests by simulating concurrent traffic

## Scale trade-offs

## AutoScaling
- Autoscaling is a cloud computing feature that enables organizations to scale cloud services such as server capacities or virtual machines up or down automatically, based on defined situations such as traffic utilization levels. Cloud computing providers, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), offer autoscaling tools.
- Overall benefit of autoscaling is that it eliminates the need to respond manually in real-time to traffic spikes that merit new resources and instances by automatically changing the active number of servers
- An application load balancer and auto scaling package works in tandem as follows. You can deploy an auto scaling group load balancer to improve availability and performance, and decrease application latency. This works because you can define your autoscaling policies based on your application requirements to scale-in and scale-out instances and thus instruct how the load balancer distributes the traffic load between the running instances.
- Autoscaling allows a user to set a policy based on predefined criteria that manage the number of available instances in both peak and off-peak hours. This enables multiple instances with the same functionality—parallel capabilities increasing or decreasing depending on demand.

## Latency
- Network latency
- Application latency
  - use stress and load tests for bottlenecks


## Cost considerations

## High availability
The ability of system to stay online despite failures at infrastructural level in real-time. The hosting time SLA of cloud platforms should be 99.99999% available. The distributed system must be designed to fault-tolerant and redundant.

To achieve high availability at application level, the entire service is architecturally broken down into granular loosely coupled services called **_microservices_**.

### Redundancy
Redundancy is the process of duplicating the server instances and keeping them on standby incase any of the active server instances goes down. It is the fail-safe backup mechanism.

This helps:  
- eliminate single point failures with monolithic architecture
- facilitate fault-tolerant microservices architecture

Tools like __*Kubernetes*__ are intelligent enough to add or remove instances on the fly as per requirement, thereby reducing human error.

### Replication
- running a __*similar/duplicate node*__, so that when a few nodes go down, other nodes bear the load. Its a __*fail-safe backup mechanism*__.

### High available cluster
- Fail-over cluster, contains a set of nodes running in conjunction with each other that ensures high availability of service

Tools like __*Zookeeper*__, monitor the state of cluster network with __*heartbeat network*__ and maintain shared distributed memory (to maintain a common state across several nodes in a cluster). It uses node coordinator service to maintain nodes in its cluster.

Multiple HA clusters run together in one geographical zone to ensure minimum downtime and uninterrupted service.

## Load balancing
Load balancing enables services to scale well when the traffic load increases. The component that facilitates this are called *__load balancers__*. It distributes load across servers in cluster using different algorithms, thereby safe-guarding overloads and spikes in latency. 

Load balancers are *__single point of contact for client requests__*. They can be setup to __*manage traffic*__ at application component level, database component level, message queue level or at backend server level. They regularly perform __*health checks*__ on the machines in the cluster. They keep track of _**in service**_ and **_out of service_** instances.

### Load balancers methods
- DNS load balancing
- Hardware based load balancing
  - expensive
  - high performance
- Software based load balancing
  - cost effective
  - flexible
  - advanced compared to DNS load balancers
  - continually perform health checks
  - __*HAProxy*__ one of leading software load balancers
  
### Load balancing algorithms
- Round Robin
- Weighted Round Robin
- Least Connection (dynamic)
- Weighted Least Connection
- Resource Based (Adaptive)
- Weighted Response Time
- Source IP Hash
- URL Hash
- Random

## Fault isolation
- Microservice architecture enables isolate faults, debug and fix issues

## Fault tolerance
Fault tolerance is the system's ability to stay up and running despite taking hits and faults. The instances/nodes may go offline/down but they come back up, which is called **_fail soft_**.

## Distributed system
- There are four capabilities that are involved in a resilient, scalable, distributed system:
  - __Load balancing (aka spraying)__ - The ability to spread a load source in an arbitrary manner to certain servers in a distributed system;
  - __Partition targeting (aka routing)__ - The ability to direct a request to a particular sever based on attributes of the request and/or the service that handles the request;
  - __Partitioning (aka sharding)__ - The ability to divide a domain into discrete, identifiable sub-divisions that together form the whole; and
  - __Replication__ - The use of events, transaction logs, state copy, or other means to provide (i) resiliency to server failure, (ii) dynamic repartitioning, (iii) immediate access to non-mutating state, and (iv) local cache copies of mutating state.  
$\tiny{\text{Source - Quora}}$  

## Caching

## Distributed Caching

## Distributed Transactions

## Distributed File System

## Distributed Datastore

## Locks

## Distributed Locks

## Development practices
- Continuous integration
  - Available tools
    - Jenkins, Azure DevOps
  - Practices
    - merge as often as possible
    - automated build and tests run(integration/acceptance)
    - emphasis on testing automation to check new commits dont break the main branch
- Continuous delivery
  - Available tools
    - Jenkins, TeamCity, AWS CodeBuild, Spinnaker
  - Practices
    - extension of continuous integration
    - automated deployment and release to test and prod environment
    - best practice is to deploy and release in small batches
- Continuous deployment
  - Available tools
    - Jenkins, TeamCity
  - Practices
    - extension of continuous delivery
    - the changes that passes production pipeline is released to customers
    - no human intervention and accelerates the feedback loop from customers to developers
- [Link](https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment)

## Intelligent Automation 

## Project Management
- Agile 
  - structured iterative approach
  - self organizing teams
- Kanban  
  - continuous improving flow of work 
  - uses boards or visuals
- Scrum 
  - short structured work sprints
- Jira

## Tools
- Kubernetes 
  - orchestrates clusters of virtual machines and schedules containers to run on those machines based on their available computing resources
- Docker
  - packages containerized application on a single node

## Containers vs Virtual Machines
- [Containers vs VM](https://www.docker.com/resources/what-container)

## Clusters

## Pipelines

## Google Compute Engine

## Microsoft Azure

## Microservice
- [Architecture](https://martinfowler.com/articles/microservices.html)

_"In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies."_  
$\tiny{\text{martinfowler.com - Microservices Guide}}$   

### Monolithic Architecture
- self-contained as a single codebase 
- tightly coupled application
- LinkedIn started as monolithic architecture in 2003 and then scaled out as cloud computing and microservices architecture
- Simple to develop, test, deploy and manage as everything resides in one repo
- reduced complexity of deployment
- a monolithic architecture holds a lot of application state in the static variables, and an application to scale inherently on the cloud needs to be stateless.

### Microservice Architecture
- Loosely coupled services deployed and working in conjunction to form large distributed service as a whole
- This architecture is designed to scale, which is relatively easy than in monolithic
- Continuous deployment is a pre-requisite, dedicated teams can have their independent release and deployments
- Regression testing becomes easier
- No single point of failure
- Can leverage heterogenous technologies
- one of the ways of interaction can be REST API gateway interface

### Micro frontends


<img src="https://devblogs.microsoft.com/dotnet/wp-content/uploads/sites/10/2017/08/eShopOnContainers_Architecture_Diagram.png">


## Sharded services
- Do we have sharding of services, or is it applicable only in database management?

## Sharding database
### Types of partitioning
- Horizontal partitioning(sharding) 
  - separating table rows into multiple tables with same schema 
- Vertical partitioning
  - separating columns into distinct tables

### What is Sharding
- breaking data into logical shards
- distribute these logical shards into separate database nodes or into physical shards
- physical shards can hold multiple logical shards
- helps in mitigating impact of outages

### Benefits of sharding
- horizontal scaling or scaling out
  - adding more machines to exsiting stack to spread out load and allow for more traffic
- vertical scaling or scaling up
  - upgrading hardware by adding more RAM or CPU
  
### Sharding architectures
- Key based sharding
  - hash based sharding based on partition key
- Range based sharding
  - for example, price range
- Directory based sharding
  - maintain lookup table
  - Multiple tenants might share the same shard, but the data for a single tenant won't be spread across multiple shards

<p align="center"><img src="images/shardTable.png" width=600 height=600></p>

$\tiny{\text{Types of Partitioning - digitalocean.com}}$   



## Hashing
- bucket
- hash collison
- consistent hashing

### Consistent Hashing
- is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.
- it allows requests to be mapped into hash buckets while allowing the system to add and remove nodes flexibly so as to maintain a good load factor on each machine
- the standard way to hash objects is to map them to a search space, and then transfer the load to the mapped computer
  - systems using this policy is likely to suffer when new nodes are added or removed from it
- Consistent Hashing maps servers to the key space and assigns requests(mapped to relevant buckets, called load) to the next clockwise server. Servers can then store relevant request data in them while allowing the system flexibility and scalability
- system design scenarios 
  - such as fault tolerance - in which a machine crashes
  - or scalability - in which a machine needs to be added to process more requests
    - in both these principles consistent hashing is used
- this principle is used extensively in load balancing, caching, databases
- request allocation


<p align="center"><img src="./images/consistentHashing.png" width=400 height=400></p>

$\tiny{\text{YouTube - Gaurav Sen}}$   

## HTTP Messaging

### HTTP GET vs POST Message
- GET is used for viewing something, without changing it
  - fetch data
  - GET carries request parameter appended in URL string 
- POST is used for changing something
  - change pwd
  - writing data into resources
  - POST carries request parameter in message body which makes it more secure way of transferring data
  
### HTTP codes

<p align="center"><img src="images/http_codes.png" width=600 height=600></p>  

$\tiny{\text{https://datatracker.ietf.org/}}$   


## HTTP 1.0 vs 2.0 vs 3.0
- https://cheapsslsecurity.com/p/http2-vs-http1/
- https://ably.com/topic/http-2-vs-http-3

<p align="center"><img src="images/httpProtocol.webp" width=600 height=600></p>

$\tiny{\text{https://ably.com/}}$   



## Messaging

### Messaging patterns
- Client-Server
- Publish-Subscribe
  - https://ably.com/topic/pub-sub
  - Topic 
  - https://cloud.google.com/pubsub/docs/subscriber
  - https://www.bmc.com/blogs/pub-sub-publish-subscribe/
- Push-Pull
- Polling
- Conventional sockets
  - One-to-one
  - Many-to-one
  - One-to-many(multicast)
- Event streaming
  - https://ably.com/topic/event-streaming


## Components vs Services

## Stateless architecture


## Transport Protocols
- TCP
- UDP
- Gossip Protocol
- Broadcast
- Multicast
- Web sockets
- FTP
- Telnet 

## gRPC
- gRPC
- Protocol buffers

## Payload
- Request payload
- API payload

## Sockets
- TCP sockets
- SSL sockets
- IPv4/IPv6 connections

## Connection Pools

## Connection Proxy

## POSIX message queue
## Windows message queues
[Windows Message Queue](https://web.archive.org/web/20120317065349/http://msdn.microsoft.com/en-us/library/ms644927(VS.85).aspx)

## Distributed system or N-tier application

An n-tier application has more than 3 components 
- user interface
- backend server
- database

The components that are part of such system architecture are:
- Load balancers
- Caching system
- Message queue for asynchronous activity
- Microservices 
- Web services
- etc

## Layers vs Tiers
- Layers are at code level, it represent conceptual organization of code. In application, it means the logical separation at code level, for example business layer, service layer and data access layer
- Tiers is at physical layer of components, where layers are deployed and where layers are run

> - <i> Logical layers are merely a way of organizing your code. Typical layers include Presentation, Business and Data – the same as the traditional 3-tier model. But when we’re talking about layers, we’re only talking about logical organization of code. In no way is it implied that these layers might run on different computers or in different processes on a single computer or even in a single process on a single computer. All we are doing is discussing a way of organizing a code into a set of layers defined by specific function. </i>  
> - <i> Physical tiers however, are only about where the code runs. Specifically, tiers are places where layers are deployed and where layers run. In other words, tiers are the physical deployment of layers.</i>  

$\tiny{\text{Rockford Lhotka, Should all apps be n-tier?}}$   

## POSIX thread model

## Static and Dynamic Dispatch

https://medium.com/ingeniouslysimple/static-and-dynamic-dispatch-324d3dc890a3

## Static vs Dynamic typing

http://en.wikipedia.org/wiki/Type_system

> **Static typing**

> A programming language is said to use static typing when type checking is performed during compile-time as opposed to run-time. In static typing, types are associated with variables not values. Statically typed languages include Ada, C, C++, C#, JADE, Java, Fortran, Haskell, ML, Pascal, Perl (with respect to distinguishing scalars, arrays, hashes and subroutines) and Scala. Static typing is a limited form of program verification (see type safety): accordingly, it allows many type errors to be caught early in the development cycle. Static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient (i.e. faster or taking reduced memory) by omitting runtime type checks and enabling other optimizations.

> Because they evaluate type information during compilation, and therefore lack type information that is only available at run-time, static type checkers are conservative. They will reject some programs that may be well-behaved at run-time, but that cannot be statically determined to be well-typed. For example, even if an expression always evaluates to true at run-time, a program containing the code

> if \<complex test\> then 42 else \<type error\>
> will be rejected as ill-typed, because a static analysis cannot determine that the else branch won't be taken. The conservative behaviour of static type checkers is advantageous when evaluates to false infrequently: A static type checker can detect type errors in rarely used code paths. Without static type checking, even code coverage tests with 100% code coverage may be unable to find such type errors. Code coverage tests may fail to detect such type errors because the combination of all places where values are created and all places where a certain value is used must be taken into account.

> The most widely used statically typed languages are not formally type safe. They have "loopholes" in the programming language specification enabling programmers to write code that circumvents the verification performed by a static type checker and so address a wider range of problems. For example, Java and most C-style languages have type punning, and Haskell has such features as unsafePerformIO: such operations may be unsafe at runtime, in that they can cause unwanted behaviour due to incorrect typing of values when the program runs.

> **Dynamic typing**

> A programming language is said to be dynamically typed, or just 'dynamic', when the majority of its type checking is performed at run-time as opposed to at compile-time. In dynamic typing, types are associated with values not variables. Dynamically typed languages include Groovy, JavaScript, Lisp, Lua, Objective-C, Perl (with respect to user-defined types but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl. Compared to static typing, dynamic typing can be more flexible (e.g. by allowing programs to generate types and functionality based on run-time data), though at the expense of fewer a priori guarantees. This is because a dynamically typed language accepts and attempts to execute some programs which may be ruled as invalid by a static type checker.

> Dynamic typing may result in runtime type errors—that is, at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. This operation may occur long after the place where the programming mistake was made—that is, the place where the wrong type of data passed into a place it should not have. This makes the bug difficult to locate.

> Dynamically typed language systems, compared to their statically typed cousins, make fewer "compile-time" checks on the source code (but will check, for example, that the program is syntactically correct). Run-time checks can potentially be more sophisticated, since they can use dynamic information as well as any information that was present during compilation. On the other hand, runtime checks only assert that conditions hold in a particular execution of the program, and these checks are repeated for every execution of the program.

> Development in dynamically typed languages is often supported by programming practices such as unit testing. Testing is a key practice in professional software development, and is particularly important in dynamically typed languages. In practice, the testing done to ensure correct program operation can detect a much wider range of errors than static type-checking, but conversely cannot search as comprehensively for the errors that both testing and static type checking are able to detect. Testing can be incorporated into the software build cycle, in which case it can be thought of as a "compile-time" check, in that the program user will not have to manually run such tests.

## Name binding

### Static binding vs Dynamic binding
- Static binding
  - checks performed without running the program
    - in most statically typed languages it is done when the program is compiled
  - such variables dont change during the course of its lifetime
    - although we may cast it to some other data type 
- Dynamic binding or Late binding or Virtual binding
  - as


## Typing
- Typing refers to changes in program structure that are due to the differences between data values: integers, characters, floating point numbers, strings, objects and so on. These differences can have many effects, for example:
  - memory layout (e.g. 4 bytes for an int, 8 bytes for a double, more for an object)
  - instructions executed (e.g. primitive operations to add small integers, library calls to add large ones)
  - program flow (simple subroutine calling conventions versus hash-dispatch for multi-methods)

### Static typing vs Dynamic typing
- Static typing means that the executable form of a program generated at build time will vary depending upon the types of data values found in the program. 
- Dynamic typing means that the generated code will always be the same, irrespective of type -- any differences in execution will be determined at run-time.


### Dynamic typing vs Dynamic binding
- Dynamic typing defers the determination of the class that an object belongs to until the program is executing
- Dynamic binding defers the determination of the actual method to invoke on an object until program execution time

## Weak vs Strong typing

Check out this JavaScript code:
```javascript
4 + '7';      // '47'
4 * '7';      // 28
2 + true;     // 3
false - 3;    // -3
```

- Adding number 4 to a string '7' gives us a string '47'. JavaScript converted number 4 into a string '4' and concatenated two strings — glued them together. JavaScript just took the liberty of assuming this is what we wanted. It's hard to blame it — what did we want? Adding a number to a string — that doesn't make sense. Some other language, like Ruby or Python would've just complained and not do anything.

- Multiplying number 4 by a string '7' is, well, 28, according to JavaScript. In this case, it converted string '7' into number 7 and did the normal multiplication.

- it tries to assume and convert from type to type without telling. Sometimes it's useful, sometimes it's mindboggling. This happens because **JavaScript is a weakly typed language**. 

- This has nothing to do with dynamic versus static typing, __which is about WHEN to check for types. Strong versus weak is about HOW SERIOUS DO YOU GET while checking the types.__

- **You can say that weak typing is relaxed typing, and strong typing is strict typing.**

- All three are dynamically typed languages
  - JavaScript has very weak typing 
  - PHP has somewhat stronger typing 
  - Python — even stronger 


## Single vs Multiple Dispatch
https://en.wikipedia.org/wiki/Dynamic_dispatch#Single_and_multiple_dispatch

> The choice of **which version of a method to call** may be based either on **a single object**, or **on a combination of objects**. The former is called **single dispatch** and is directly supported by common object-oriented languages such as Smalltalk, C++, Java, C#, Objective-C, Swift, JavaScript, and Python. In these and similar languages, one may call a method for division with syntax that resembles

>> dividend.divide(divisor)  # dividend / divisor

> where the **parameters are optional**. This is thought of as sending a message named divide with parameter divisor to dividend. An **implementation will be chosen based only on dividend's type** (perhaps rational, floating point, matrix), **disregarding the type or value of divisor**.

> By contrast, some languages dispatch methods or functions based on the **combination of operands**; in the division case, the **types of the dividend and divisor together** determine which divide operation will be performed. This is known as **multiple dispatch**. Examples of languages that support multiple dispatch are Common Lisp, Dylan, and **Julia**.




## Dynamic Dispatch mechanisms

A language may be implemented with different dynamic dispatch mechanisms. The choices of the dynamic dispatch mechanism offered by a language to a large extent alter the programming paradigms that are available or are most natural to use within a given language.

Normally, in a typed language, the dispatch mechanism will be performed based on the type of the arguments (most commonly based on the type of the receiver of a message). **Languages with weak or no typing systems often carry a dispatch table as part of the object data for each object. This allows instance behaviour as each instance may map a given message to a separate method.**

Some languages offer a hybrid approach.

**Dynamic dispatch will always incur an overhead so some languages offer static dispatch for particular methods.**

- C++ uses early binding and offers both dynamic and static dispatch. The **default form of dispatch is static**. To get dynamic dispatch the programmer must declare a method as **virtual**.


C++ example:
```cpp
#include <iostream>
using namespace std;

//make Pet an abstract virtual base class
class Pet {
    public:
    virtual void speak() = 0;
};

class Dog : public Pet {
    public:
    void speak()
    {
        std::cout<<"Woof!\n";
    }
};

class Cat : public Pet {
    public:
    void speak()
    {
        std::cout<<"Meow!\n";
    }
};

//Speak will be able to accept anything deriving from Pet
void speak(Pet& pet)
{
    pet.speak();
}

int main()
{
    Dog fido;
    Cat simba;
    speak(fido);
    speak(simba);
    return 0;
}
```

Python example:
```python
class Cat:
    def speak(self):
        print("Meow")

class Dog:
    def speak(self):
        print("Woof")


def speak(pet):
    # Dynamically dispatches the speak method
    # pet can either be an instance of Cat or Dog
    pet.speak()

cat = Cat()
speak(cat)
dog = Dog()
speak(dog)
```

## Virtual Method Table

- https://legacy.python.org/workshops/1998-11/proceedings/papers/lowis/lowis.html
- https://programs.wiki/wiki/detailed-parsing-of-c-virtual-functions-with-examples.html


> - Anyone who knows C++ should know that virtual functions are implemented through a Virtual Table.V-Table for short. In this table, **the address table of the Virtual Function of a class is the main address table**. This table solves the problem of **inheritance and override**, and ensures that it can faithfully reflect the actual function. In this way, the **table is shared by classes in instances of classes with virtual functions**, so when we manipulate a subclass with the pointer of the parent class, the **Virtual Function table becomes important, just like a map**, indicating what functions should actually be called.
> - The C++ compiler should ensure that the **pointer to the virtual function table exists at the top of the object instance** (this is to ensure the highest performance when fetching the virtual function table - if there is multiple or multiple inheritance).This means that **we get this virtual function table from the address of the object instance, then we can iterate through the function pointer and call the corresponding function.**

> - The **inherited class derives a table of virtual functions**, if any, and copies the virtual function address of the parent class.
> - Virtual functions are **placed in tables in the order in which they are declared**.
> - The virtual function of the parent class precedes the virtual function of the child class

> - The **override function is placed in the new virtual table where the parent virtual function address** is, that is, the parent virtual function address is overridden, thus **achieving polymorphism.**
> - Functions that are not overwritten are still listed after the virtual table in the declarative order.


- codes/cpp_templates/virtual_test.cpp

## Bottlenecks

## CDN - Content delivery network
> A content delivery network (CDN) refers to a geographically distributed group of servers which work together to provide fast delivery of Internet content.  

> A CDN allows for the quick transfer of assets needed for loading Internet content including HTML pages, javascript files, stylesheets, images, and videos. The popularity of CDN services continues to grow, and today the majority of web traffic is served through CDNs, including traffic from major sites like Facebook, Netflix, and Amazon.  

> It helps cache content at the network edge, which improves website performance. Many websites struggle to have their performance needs met by traditional hosting services, which is why they opt for CDNs.  

> Benfits of using a CDN:
> - Improving website load times
> - Reducing bandwidth costs
> - Increasing content availability and redundancy
> - Improving website security

<p align="center"><img src="https://cf-assets.www.cloudflare.com/slt3lc6tev37/540CpDkqSDg6QAPi5nO1AP/b44a3edb5abc4e115ddab9b4d9bf7a32/Learning-How-does-a-CDN-work.svg" width=500></p>

$\tiny{\text{cloudflare.com}}$   

## Point of Presence(PoP)
A point of presence (PoP) is a demarcation point, access point, or physical location at which two or more networks or communication devices share a connection.

The routers, switches, servers, and other devices necessary for traffic to cross over networks are all present at PoPs. Internet service providers and edge networks like StackPath typically have multiple points of presence located near large Internet exchange points (IXPs) at which they have peering agreements. The proximity of points of presence and Internet exchange points is one very important factor in how quickly traffic is able to traverse the Internet.

$\tiny{\text{https://blog.stackpath.com/}}$   

## Domain Name Service (DNS)
This query service maps long IP addresses to domain names such as google.com. There are four key components:
- DNS Resolver 
- Root nameserver  
- Top level domain nameserver
- Authorative nameserver

The authorative server sends list of IP addresses in a round-robin fashion, enabling to use other IP addresses if the client doesn't respond within a time limit.

<p align="center"><img src="https://foxutech.com/wp-content/uploads/2017/09/How-DNS-works.png" width=500></p>

$\tiny{\text{https://foxutech.com}}$   

## System Design Problems

- 1) Requirements clarifications
  - they are open-ended questions
  - they dont have one correct answer
  - define end goals of system
  - clarify which part of system to focus on
- 2) Back-of-the-envelope estimation
  - estimate scale of system to design
  - how much storage is needed
  - what type of storage is required, photos, videos, tweets
  - what network bandwidth is expected
- 3) System interface definition
  - define the main APIs
  - define the main contracts expected within the system
- 4) Defining data model
  - how data flows between components
  - define different aspects of data management for example storage
  - what type of database to choose - relational or no-sql
  - how will storage of photos and videos be done
- 5) High-level design
  - draw block diagrams, 
    - as how many application server, 
    - how will load balancers be designed, 
    - what sort of database is required, 
    - how many read-writes are needed, 
    - what sort of distributed file storage system is required
- 6) Detailed design
  - dig deeper into couple of major components
  - which layer would we need caching
  - how will we partition the data
  - what time will have major tweets and how to optimize 
  - which components need load balancing
- 7) Identifying and resolving bottlenecks
  - which components may cause bottlenecks
  - how to create redundant serives/applications to serve well
  - how to monitor performance of services


### Design URL shortening system like bit.ly or tinyurl
- What is the need?
  - shorter links
  - saves space
  - less likely to mistype
- What are the requirements of designing such system?
  - Functional requirements
    - shorter link or aliases
    - service must redirects them to original link
    - users should have a choice of picking up shorter name link
    - should expire after standard time span
  - Non-Functional requirements
    - minimal or no downtime
    - url redirection must happen realtime



### How hotstar scaled 10.3 million concurrent users
https://www.scaleyourapp.com/how-hotstar-scaled-with-10-3-million-concurrent-users-an-architectural-insight/

### How facebook support global events
https://engineering.fb.com/2018/02/12/production-engineering/how-production-engineers-support-global-events-on-facebook/



## Application server features

## Web server features

## API 
Application Protocol Interface


## Proxy Server
- Proxy acts as a firewall
- Multiple clients can talk to the same proxy and can interact on our behalf with the outside world 
- Acts as a security layer and protects against phishing and other malwares
- Helps in caching by saving static content
- helps in encrypting/decrypting sensitive client data

## Reverse Proxy Server
- instead of protecting the client, it protects the server
- outside world can interact with reverse proxy server instead of interacting with server directly
- works as load balancing
- helps in caching
- helps in compressing the data, thereby improving the bandwidth and performance of network

## OAuth 2.0
When we launch Spotify to listen to some music, we may choose into login using Facebook. Spotify logins using Facebook API. 
- We may think, Spotify uses Facebook username and password to login. Wrong. __*It does not*__ 
- Passwords are never passed from server to server in an OAuth 2.0 framework

In the example below, Sarah wants to check her balance in Memorial bank account. She logs into MyBucks application to check her checkings bank account in MyBucks dashboard. 
- MyBucks makes an _**authorization request**_ to memorial bank authorization server
- Memorial bank asks Sarah to _**authorize**_ MyBucks to access her account balances 
- Sarah's authorization is sent back to Memorial Bank __*authorization server*__ with _**authorization grant and authorization code**_
- Authorization Server does not hold any of Sarah's account details. It is with _**Resource Server**_, so authorization code is used for interactions
- Authorization Server then sends MyBucks application with _**access token**_ to get Sarah's checking details
- The access token has details of what __*authorization Sarah provided*__. In this case its her account balances only
- MyBucks sends _**access token**_ to Resource Server, which validates the token
- Resource server sends the **_protected resource_**

<p align="center"><img src="./images/OAuth2.png" width=400></p>

$\tiny{\text{YouTube - InterSystems Learning Services}}$   

## Master Slave Model

## API Gateway
- APIs are the interfaces which applications use to communicate and the gateway is the control point for routing, shaping, and securing that traffic
- Incoming traffic is filtered and routed to the appropriate services.

Example
- Reverse proxy server. Nginx and HAProxy

<p align="center"><img src="https://docs.microsoft.com/en-us/azure/architecture/microservices/images/gateway.png" width=500></p>

$\tiny{\text{https://docs.microsoft.com}}$

## Worker Pool

## Data Lake
- https://www.snaplogic.com/glossary/data-lake
- https://www.snaplogic.com/glossary/hadoop-data-lake


## Message Queue
A message queue is a data structure, or a container - a way to hold messages for eventual consumption

### Message Broker
- A message broker is a **separate component that manages queues**
- It is also known as a **service bus** which is a piece of middleware responsible with persisting and routing of message while allowing you to decouple your system into smaller parts. 
  - A message queue is a part of a message broker and is just a persistence mechanism.


## Web components
- low level browser APIs
- gives a standard component interface
- web components are not specific framework
- web components simply tell broser **"When and Where to create components, but not how"**

### AMP component



## Web 2.0

## Costs of operation
- Read from disk
- Read from memory
- Local Area Network (LAN) round-trip
- Cross Continental Network

## Solution Patterns
- Sharding Data
- Replication Types
- Write-Ahead Logging
- Separating Data and Metadata Storage
- Load Distribution

## Trade-Offs and Compromise
- Rotating disk and pay for latency
- Flash drive and pay more money
- Seek multiple solutions, pick one and commit to it

## Better Practices
- how you think - how you solve problems
- most questions will be open ended
  - ask for clarifications
- first solution may not be the best solution
  - ways to improve upon it
- practice on paper or white-board

## Hardware

## Stack vs Heap

## Functional programming vs OOP

## Advantages of Physical layer vs Logical layer separation

## Service Providers 
- What are the type of features that service providers offer like Gmail, Microsoft or IMAP
  

## Calculate capacity of shard services

## Calculate acceptable latency

## Calculate hardware capability

## Calculate scalability requirements

## Elasticity of cloud

## Features that cloud provides

## User Session 
- When a website stores user state, the next time the user logs in they can continue from where they left off
- It would not feel like that one is starting fresh & all the previous activity is lost

## Algorithms 
### Completeness
### Time complexity
### Space complexity


## Hypervisor

- Hypervisor makes virtualization possible. 
- It is also called virtual machine monitor. 
- It divides host system and allocates the resources to each of the divided virtual environment. 
- there can be multiple operating system in a single host system


- Type 1 hypervisor 
  - native hypervisor
  - runs directly on the underlying host system
  - it has direct access to host system hardware, so does not require base server operating system
- Type 2 hypervisor
  - they make use of underlying host operating system
  - hosted hypervisor


## Virtualization

## Containerization
- An application that is developed and deployed is bundled and wrapped together with all its configuration files and dependencies. This is **containerization** and the bundle is called **container**. Examples of containerization environments are Docker and Kubernetes.

## Virtualization vs Containerization
- Container provide an isolated environment for running the application
- the entire user space is explicitly dedicated to application
- any changes made inside container is not reflected to the host or other containers running on the same host
- container is the abstraction of application layer
- each ocntainer is a different application


- hypervisor gives the entire virtual machine to the guest which includes the kernel as well
- virtual machine is the abstraction of hardware layer and each virtual machine is a physical machine


## Bare metal deployment
### Bare Metal vs. Virtual Machine Deployments
- https://www.containiq.com/post/deploying-kubernetes-on-bare-metal

## TTL
- Time to live

## TLS/SSL

## CA certificate

## Public-Private Key