<a href="https://colab.research.google.com/github/NadiaHolmlund/BDS_M6_Exam_Notes/blob/main/BDS_M6_Exam_Notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Databases

## 1. What are the main types of databases and what are the main differences between them?

- Relational databases:
 - Store data in tables with rows and columns.
 - Relational database use SQL (Structured Query Language) to manipulate data.
 - Well suited for structured data and are widely used in business applications.

- NoSQL databases:
 - Use a document-based model, key-value pairs, column-family or graphs.
 - Often used for large and unstructured data sets
 - Can handle different types of data such as images, videos, and social media feeds.


The main differences:
- Data structures
- The way they store and organize data
- The way they handle queries and data access
 
Main advantages:
- Relational databases are well suited for structured data and support complex queries
- NoSQL databases are well suited for unstructured data and are more scalable

## 8. How can we use a database in an ML project, and can you give an example?

Databases can be used to store, manage, and preprocess large amounts of data, which can then be used for model training, testing, and validation.


Example:

Use a database, such as MySQL or PostgreSQL, to store data in a structured format. Then use SQL queries to extract and transform the data into a format that can be used for training and testing a machine learning model. For example, we can use SQL queries to:
- Join multiple tables to combine data from different sources
- Filter and select data based on specific criteria
- Aggregate data to calculate summary statistics
- Normalize and preprocess data to prepare it for machine learning algorithms, such as scaling or one-hot encoding categorical features.
- Once the data has been transformed and preprocessed, we can use it to train and test our machine learning model

## SQL

SQL database:
* SQL stands for Structured Query Language
* SQL is a domain-specific language for managing relational databases (tables with relationships)
* CRUD operations: Create, Read, Update, and Delete data
* Commonly used in data analytics, data science, and data engineering

### What is a relational database

Relational Databases 

* Relational database managament systems (RDBMS) store data in tables with rows and columns
  * Each row represents a record or an entity
  * Each column represents an attribute of the record
* Tables are connected through relationships (primary and foreign keys)

Example: 

| id | first_name | last_name | age | affiliation_id |
|----|------------|-----------|-----|----------------|
| 1  | John       | Doe       | 30  | 1337           |
| 2  | Jane       | Smith     | 25  | 1505           |

### 2. What is a primary key in a relational database?

- A primary key is a unique identifier for each record (row) in a table.
- It is a column or combination of columns that uniquely identifies a record and ensures that each record in a table can be accessed and updated efficiently.
- The primary key is typically used as a reference in other tables (foreign keys) to establish relationships between tables

For example:

In a customer and orders table, the customer ID column could be the primary key in the customer table and a foreign key in the orders table, linking the two tables together.

### 3. What is a foreign key in a relational database?

- A foreign key is a column or a combination of columns that references the primary key of another table.
- It is used to establish a relationship between two tables by creating a link between them.
- Foreign keys ensure referential integrity by preventing actions that would violate relationships between tables, such as deleting a record that is referenced by a foreign key in another table.
- They also allow for efficient retrieval of related data from multiple tables.

For example:

In a customer and orders table, the customer ID column could be the primary key in the customer table and a foreign key in the orders table, linking the two tables together. This allows the orders table to associate each order with a specific customer in the customer table.

### 4. Can you give me some examples of SQL syntax (statements) and their applications?

SQL Syntax Overview:

* SELECT: Read data from the database
* INSERT: Add new records to the database
* UPDATE: Modify existing records in the database
* DELETE: Remove records from the database
* CREATE, ALTER, DROP: Manage database structure
* WHERE: Filter rows based on a condition
* ORDER BY: Arrange rows based on a column
* JOIN: Combine tables based on common columns
* GROUP BY: operations on group level

### 5. Give some examples of SQL applications.

Business Analytics:
* Data warehousing for storing historical data
* Structured data analysis for business intelligence and reporting
* Support for complex queries and aggregations for decision-making

Machine Learning:
* Data preprocessing for machine learning algorithms
* Feature engineering to create relevant features from raw data
* Joining multiple data sources for a comprehensive dataset
* Storing and managing machine learning model metadata and results


### 6. What are the different types of SQL and what are their key features and use cases?

Types of SQL databases:

- MySQL
 - MySQL is one of the most popular fully-managed database types in SQL-based management.

- PostgreSQL
 - PostgreSQL is an advanced type of database in SQL management systems that seeks to step up MySQL solutions.
 - PostgreSQL blends the traditional table-based approach with user-defined objects to create resilient databases supporting and analyzing complex and voluminous data.

- SQLite
 - SQLite is a type of SQL database or storage engine structurally considered equivalent to a C library. It is embedded within other applications to enhance their storage capabilities. It is often used as the on-disk file format in applications for financial analysis, cataloging, etc.

- Microsoft SQL Server
 - Microsoft SQL Server (MSSQL) is one of the most popular DBMS in SQL for innovative management solutions. T-SQL, a derivative of SQL, is used to interact with MSSQL databases. The 2019 version of MSSQL comes integrated with Apache Spark and Hadoop Distributed File System for big data management and analysis.

- MariaDB
 - MariaDB is an open-source fork of MySQL. It intends to remain freely accessible to all under the General Public License. It is a database management system in SQL that seeks to be an alternative to MySQL DBMS.

- Oracle
 - The relational database management system provided by Oracle Corp. is a multi-model RDBMS that can support diverse, multiple workloads. This DBMS type in SQL is commonly used for online transaction processing and data warehousing.

### 9. What are the main advantages of SQL?

Advantages of SQL:
- ACID transactions:
 - Atomicity: transactions are all or nothing, meaning that either all the changes in the transaction are committed, or none of them are
 - Consistency: transactions takes the database from one valid state to another, and that any constraints or rules in place are not violated during the transaction.
 - Isolation: transactions are executed in isolation from other transactions, so that they don't interfere with each other
 - Durability): once a transaction is committed, it will persist even in the face of power failures, system crashes, or other failures

- Standardized language:
 - SQL for querying and managing data

- High Performance:
 - designed to handle large volumes of data efficiently and quickly process complex queries.

- Scalability:
 - can scale to accommodate increasing amounts of data and users.

- Flexibility:
 - allows for the manipulation of data in various ways, incl. sorting, filtering, and aggregating

- Security:
 - offer built-in security features that allow for controlled access to data.

- Data Integrity:
 - ensure data integrity through the use of constraints, such as unique keys, foreign keys, and check constraints.

- Compatibility: 
 - supported by many database management systems, making it a widely adopted and compatible language for data storage and manipulation.

## NoSQL

NoSQL Databases:
* NoSQL stands for "Not only SQL"
* Non-relational databases (document, key-value, column-family, graph)
* Designed to handle unstructured data and scalability challenges
* Flexible schema allows for evolving data structures
* Horizontal scalability for handling large volumes of data
* Generally, weaker consistency models compared to SQL databases

### Can you give me some examples of NoSQL syntax (statements) and their applications?

NoSQL databases are often schema-less, which means that they don't have a fixed structure for storing data like SQL databases. Instead, they use various query languages and data models to manipulate data.

### 7. What are the different types of NoSQL and what are their key features and use cases?

Types and Structure of NoSQL Databases

- Document Databases
  * Store data in documents (usually JSON or BSON)
  * Documents can have nested structures
  * Documents can be organized in collections
  * Examples: MongoDB, Couchbase, RavenDB
  - Applications:
    * Content management systems (CMS) with complex data structures
    * Real-time analytics for handling unstructured data
    * IoT data management for handling diverse data from multiple devices

- Key-Value Stores
 * Store data as key-value pairs
 * Keys are unique identifiers for the data
 * Values can be any data type, including complex structures
 * Examples: Redis, Amazon DynamoDB, Riak KV
 - Applications:
   * Caching for improving the performance of data retrieval
   * Session management for storing user-specific data across sessions
   * Configuration data storage for application settings and metadata

- Column-Family Stores
 * Store data in column families (loosely related to tables)
 * Data is organized as rows and columns
 * Designed for high write and read performance on large-scale data
 * Examples: Apache Cassandra, HBase, ScyllaDB
 Applications:
   * Time-series data management for high write and read workloads
   * Event logging and analytics for large-scale systems
   * Recommendation systems for analyzing user behavior and preferences

- Graph Databases
 * Store data as nodes (entities) and edges (relationships)
 * Designed for graph traversal and querying connected data
 * Examples: Neo4j, Amazon Neptune, ArangoDB
 - Applications:
   * Social network analysis for exploring connections between users
   * Fraud detection for uncovering complex patterns in financial transactions
   * Knowledge graphs for storing and querying complex, interrelated information

### 10. What are the main advantages of NoSQL?

Advantages of NoSQL Databases:
* Flexibility:
 - Schema flexibility for changing data requirements to handle unstructured data
* Scalability/performance:
 - High scalability for large data volumes and high read/write rates
* Availability:
 - High availability and fault tolerance through data replication even if a node cluster fails
* Schemaless:
 - Optimized for specific use cases, depending on the type of NoSQL database
- Cost:
 - NoSQL databases can be less expensive than traditional SQL databases

## Big Data

### 11. What is Apache Spark used for?

- Apache Spark is an open-source distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It is designed to handle large datasets by distributing the data and computation across a cluster of computers
- Apache Spark is used for large-scale data processing and analytics.
- It is designed to handle batch processing, streaming data processing, machine learning, graph processing, and other data processing workloads.
- One of the main advantages of Spark is its ability to run SQL commands for data analysis. 

Common use cases:
- Data processing:
 - Process large volumes of data quickly and efficiently.
 - Supports structured, semi-structured, and unstructured data.

- Machine learning:
 - Built-in libraries classification, regression, clustering, and collaborative filtering.

- Real-time analytics:
 - Spark's streaming API enables real-time data processing and analytics.

- Graph processing:
 - Spark's graph processing API enables the processing and analysis of large-scale graph data.

- Data integration:
 - Spark can be used to integrate data from multiple sources and transform it into a common format for analysis.
 - This makes it useful for data warehousing and ETL (extract, transform, load) processes.

### 12. Can you provide an example of a use case where Apache Spark might be preferred?

Example: e-commerce website that generates a massive amount of data, including customer interactions, clickstream data, and purchase history

Task: analyze data to gain insights into customer behavior, optimize marketing campaigns, and improve the customer experience.

- Real-time analytics (Spark streaming API):
 - Analyze customer interactions and clickstream data in real-time, to optimize customer experience and improve conversion rates.

- Machine learning (built in libraries):
 - Build and train models on large datasets to personalize the customer experience, recommend products, and optimize marketing campaigns.

- Data integration:
 - Integrate data from multiple sources, including social media, email marketing, and customer service, into a single platform for analysis to gain a comprehensive view of customer behavior and preferences.

- Data processing:
 - Process and clean up large volumes of data quickly and efficiently, making it easier to prepare data for analysis.

### What is Polars used for?

What is Polars and Why is it Faster Than Pandas?

Polars is a DataFrame library designed for parallelization. It is built from the ground up and written in Rust but also has a Python package, making it a potential alternative to Pandas. 
> Polars has two different APIs: an eager API and a lazy API. Eager execution is similar to Pandas, while lazy execution is more efficient because it avoids running unnecessary code. 

Polars is faster than Pandas because it utilizes all available cores on your machine. Polars has different syntax from Pandas and can perform operations in parallel. However, Polars code is usually a little longer than the Pandas code. If you need to do a lot of data processing on large datasets, Polars can be a good alternative to Pandas.

What is Polars?
The best way to understand Polars is that it is a better dataframe library than Pandas. Here are some advantages of Polars over Pandas:

- Polars does not use an index for the dataframe. Eliminating the index makes it much easier to manipulate the dataframe (the index is mostly redundant in Pandas dataframe anyway).
- Polars represents data internally using Apache Arrow arrays while Pandas stores data internally using NumPy arrays. Apache Arrow arrays is much more efficient in areas like load time, memory usage, and computation.
- Polars supports more parallel operations than Pandas. As Polars is written in Rust, it can run many operations in parallel.
- Polars supports lazy evaluation. Based on your query, Polars will examine your queries, optimize them, and look for ways to accelerate the query or reduce memory usage. Pandas, on the other hand, support only eager evaluation, which immediately evaluates an expression as soon as it encounters one.

### Can you provide an example of a use case where Polars might be preferred?

### 13. What are the key differences between Apache Spark, Polars, and Pandas, and their use cases?

Comparison between Pandas and Polars
At first glance, Pandas and Polars (eager API) are similar regarding syntax because of their shared main building blocks: Series and DataFrames.

Apache Spark, Polars, and Pandas are all popular data processing and analytics tools, but they have some key differences in terms of their design, features, and use cases. Here's a comparison of the three:

- Apache Spark: Apache Spark is a distributed computing system that is designed to process large datasets quickly and efficiently. It includes a range of libraries for batch processing, streaming data processing, machine learning, and graph processing. Spark's key features include scalability, fault tolerance, and real-time data processing. It is best suited for processing and analyzing very large datasets, where distributed processing is necessary for performance.
- Polars: Polars is a data processing library for Python and Rust that is designed to provide fast, memory-efficient data processing capabilities for large datasets. It includes a range of data manipulation and aggregation functions, as well as support for parallel processing and GPU acceleration. Polars is best suited for working with large, structured datasets that require complex data transformations and filtering.
- Pandas: Pandas is a popular data manipulation library for Python that is designed for working with smaller datasets. It includes a range of functions for data cleaning, transformation, and aggregation, as well as support for data visualization. Pandas is best suited for working with structured data that can fit into memory on a single machine.

Overall, the key differences between these tools come down to their scalability and performance capabilities. Apache Spark is designed for processing and analyzing very large datasets that require distributed processing, while Polars is designed for fast, memory-efficient processing of large structured datasets. Pandas is designed for working with smaller datasets that can fit into memory on a single machine. The choice of tool depends on the size and complexity of the dataset, the performance requirements, and the specific use case.

# MLOps

## 14. What is MLOps, and why is it important in the context of machine learning projects?

Definition:
- MLOps is the process of taking experimental ML models into a production system.
- MLOps (Machine Learning Operations) is a set of practices and tools used to manage the lifecycle of ML models in production environments. It combines principles and techniques from ML, DevOps, and data engineering to create a consistent and efficient process for managing ML projects.

![picture](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/ML_Ops_Venn_Diagram.svg.png)

Why is MLOps important:

- Collaboration:
 - MLOps provides a common framework for collaboration and communication across teams, ensuring that everyone is working towards the same goals.

- Efficiency:
 - MLOps helps to automate many of the repetitive tasks involved in ML projects, such as data cleaning, feature engineering, model training, and deployment.

- Reproducibility:
 - MLOps provides a systematic approach to managing ML projects, including version control, testing, and documentation.

- Scalability:
 - MLOps provides tools and techniques for managing data, models, and infrastructure at scale, allowing teams to deploy and manage models across multiple environments.


![pciture](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/HIddenTechnicalDebtinML.jpg)

In summary:

MLOps is important because it helps to streamline the development, deployment, and maintenance of models, making it easier to collaborate across teams, improve efficiency, ensure reproducibility, and scale up projects as needed.

Stats:
- ~85 % of ML models that are built never reach production
- ~60% of projects make it from prototype to production

## 15. How does MLOps help in streamlining the machine learning lifecycle from development to deployment? Answer based on an example.

MLOps helps to streamline the machine learning lifecycle from development to deployment by providing a systematic approach to managing machine learning projects. For example:

- Data preparation:
 - MLOps provides tools and techniques for managing data, including data cleaning, transformation, and feature engineering.
 - For example, data pipelines can be developed and managed using tools such as Apache Airflow or Kubeflow, allowing data to be processed and transformed automatically.

- Model development:
 - MLOps provides tools and techniques for managing the model development process, including version control, testing, and documentation.
 - For example, models can be developed and tested using tools such as Jupyter notebooks or PyCharm, and code can be managed using version control systems such as Git.

- Model training:
 - MLOps provides tools and techniques for managing the model training process, including scalability and reproducibility.
 - For example, models can be trained using distributed computing frameworks such as Apache Spark or TensorFlow, and training can be managed using tools such as Kubeflow or MLflow.

- Model deployment:
 - MLOps provides tools and techniques for managing the model deployment process, including automation and scalability.
 - For example, models can be deployed using containerization technologies such as Docker or Kubernetes, and deployment can be managed using tools such as GitOps or Jenkins.

- Model monitoring:
 - MLOps provides tools and techniques for managing the model monitoring process, including tracking model performance and detecting anomalies.
 - For example, models can be monitored using tools such as Prometheus or Grafana, and alerts can be triggered based on specific criteria.

## 16. How do Continuous Integration (CI) and Continuous Deployment (CD) principles apply to MLOps, and can you give a use case?

Definitions:

- Continuous Integration (CI) is the practice of regularly integrating code changes into a shared repository, and automatically building and testing the code to identify and fix any issues.
- Continuous Deployment (CD) is the practice of automatically deploying code changes to production once they have been tested and verified.

We can also consider other "continuous' concepts":

- Continuous Training (CT): Increasing automation allows a model to be retrained when new data becomes available. 
- Continuous Monitoring (CM): Another reason to retrain a model is decreasing performance. We should also understand whether models are still delivering value against business metrics. 

CI/CD can help streamline the ML development process, by automating tasks involved in building, testing, and deploying models. For example:

- Continuous Integration:
 - Data scientists can use CI tools such as GitHub Actions, CircleCI, or Jenkins to automatically build and test their models whenever changes are made to the code or data.
 - For example, when a new feature is added to the model or when new data becomes available, the CI system can automatically rebuild and test the model to ensure that it is still working as expected.

- Continuous Deployment:
 - Once a model has been developed and tested, CD tools such as Kubernetes, Argo, or Jenkins can be used to automatically deploy the model to production.
 - For example, when a new version of the model is ready, the CD system can automatically deploy it to a testing environment, where it can be further evaluated and tested. Once the model has passed all tests, it can be automatically deployed to production.

![](https://www.synopsys.com/glossary/what-is-cicd/_jcr_content/root/synopsyscontainer/column_1946395452_co/colRight/image_copy.coreimg.svg/1663683682045/cicd.svg)

## 17. What are some popular tools and platforms used in MLOps for managing and deploying machine learning models?

- Experiment Tracking and Model Metadata Management Tools:
 - MLFlow
 - Comet ML
 - Weights & Biases

- Orchestration and Workflow Pipelines MLOps Tools:
 - Prefect
 - Metaflow
 - Kedro

- Data and Pipeline Versioning Tools:
 - Pachyderm
 - Data Version Control (DVC)

- Model Deployment and Serving Tools:
 - TensorFlow Extended (TFX) Serving
 - BentoML
 - Cortex

- Model Monitoring in Production ML Ops Tools:
 - Evidently
 - Fiddler
 - Censius AI

- End-to-End MLOps Platforms
 - AWS SageMaker
 - DagsHub
 - Kubeflow

## 18. Can you give a best practice for implementing MLOps in an organization?

Follow a systematic and iterative approach that involves the following steps:

- Define your ML goals and use cases:
 - Identify business problems that you want to solve using machine learning.
 - Define the specific ML use cases and goals that you want to achieve.

- Assemble a team:
 - Build a team with the necessary skills to work on the project.

- Define data pipelines:
 - Create data pipelines to collect, preprocess, and store data for machine learning.
 - Use version control systems to manage code and data.

- Develop and train models:
 - Develop models using best practices in data science and machine learning.
 - Use scalable architectures to build models that can handle large datasets.

- Test and evaluate models:
 - Test models using validation and testing techniques to ensure that they are accurate and reliable.
 - Establish a process to monitor models in production and to retrain models when necessary.

- Deploy models:
 - Deploy models using containerization tools like Docker or Kubernetes.
 - Establish a process for continuous integration and continuous deployment (CI/CD) to streamline the deployment process.

- Monitor and optimize:
 - Establish a process to monitor the performance of models in production and to optimize them when necessary.

# MLFlow

## 19. What is MLflow, and how can it help in terms of MLOps?

- MLFlow is a platform for managing the end-to-end machine learning lifecycle
- In terms of MLOps, MLFlow provides a centralized platform for managing the ML lifecycle, from data preparation and model development to deployment and monitoring. Using MLflow, teams can work collaboratively on ML projects, easily reproduce experiments, and deploy models in a standardized and scalable way.

- It provides tools for:
 - Tracking experiments
 - Packaging and sharing code
 - Deploying models

- It helps to increase:
 - Productivity
 - Collaboration
 - Reproducibility

Main components of MLflow:
 - Experiment Tracking:
   - Helps track ML experiments by recording and visualizing:
    - Metrics: values computed during an experiment, e.g. accuracy, loss, F1 score
    - Parameters: configuration used during an experiment, e.g. learning rate, batch size, epochs
    - Artifacts: outputs or results of an experiment, e.g. trained models, viz or data files
   - Allows easy comparison of experiment runs and reproducable results

- Model Packaging:
    - Simple format for packaging code in a reusable and reproducible way
    - Allows to specify dependencies, such as libraries and data files, and to run code in different environments
    - Provides a standardized way to package and deploy machine learning models, supporting frameworks, such as TensorFlow, PyTorch, and Scikit-learn, and tools for deploying models, such as Docker containers and cloud services

- Model Registry:
   - Centralized repository for managing and sharing ML models
   - Allows to track model versions, assign permissions, and share models with other users


![picture](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/mlflow.jpg)

## 20. What types of information can be stored in MLflow?

MLflow can store information related to ML experiments, including:
- Metrics: Numeric values that track the performance of a model during training or testing, such as accuracy, precision, recall, and F1-score.
- Parameters: Settings and configurations used during model training, such as learning rate, batch size, and number of epochs.
- Artifacts: Any output generated during model training or evaluation, such as trained models, visualizations, or data preprocessing scripts.
- Source code: The code used to train and evaluate the model, along with any dependencies and environment settings. Experiment metadata: Information about the experiment, such as the name, date, user, and any tags or annotations.

The information is stored in a centralized repository, which can be accessed and managed through the MLflow UI or API.

# SSH

## 21. What is the purpose of SSH?

- SSH = Secure Shell
- Provides a secure connection between two entities (computer/computer or computer/remote server) over an unsecured network such as the internet
- Uses encryption and public-key authentication to protect sensitive information and data transferred between the two entities
- E.g. in terminal: ssh ubuntu@Public_IP -i key.pem     
 - ssh: command to start an SSH session
 - ubuntu: Ubuntu operating system images provided by AWS
 - Public_IP: public IP address of the AWS instance you want to connect to
 - -i key.pem: the path to the private key file downloaded from AWS. The key authenticates the connection to the AWS instance.

# Docker

## 22. What is Docker, and why is it useful?

- Docker is an open-source containerization platform that allows software developers and system administrators to package, distribute, and run applications in isolated and portable containers.
- Containers are lightweight, standalone, and portable units that contain all the necessary dependencies and configurations needed to run an application.

Docker is useful for several reasons:

- Portability: Docker allows developers to package an application with all its dependencies and configurations into a container, making it easily portable and deployable across different environments, such as local machines, servers, and cloud platforms.
- Consistency: By using Docker, developers can ensure that the application runs consistently across different environments, eliminating the risk of issues arising due to differences in the underlying infrastructure or dependencies.
- Scalability: Docker enables developers to quickly and easily spin up multiple instances of an application in a matter of seconds, making it easier to scale applications to meet changing demand.
- Isolation: Docker containers provide a secure and isolated runtime environment, which helps prevent conflicts between different applications and dependencies.
- Efficiency: Docker containers are lightweight and have minimal overhead, making them faster and more efficient than traditional virtual machines.

Overall, Docker is a powerful tool that simplifies the process of building, distributing, and running applications, enabling developers to focus on their core tasks rather than the underlying infrastructure.

![picture](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*gLwtHvsO8yebQzwB05nZ8Q.png)

Deploying Scalable ML Models:

Docker can be combined with orchestration tools like Kubernetes or Docker Swarm to deploy scalable ML models. These tools help manage, scale, and distribute Docker containers across multiple nodes or clusters. This allows you to handle high loads and serve multiple requests concurrently, ensuring a reliable and efficient service.

![picture](https://editor.analyticsvidhya.com/uploads/85227PSLLpU1LQX8EY9LNae5tvSpq0BXn7DLhlI9VRp-rMxPxtqcbwa6EpAeQI6WFheKQZ4jtvJC2DgaSW9Ogs3ON5BksIKFgxNlczWKTrCI8k0WrBFMA2byFJElr3V-tfLDSV0C1eRE6.png)

## 23. What are containers, and how do they relate to Docker?

Containers:

- Containers are a technology that allows developers to package and run applications with all their dependencies and configurations in an isolated environment.
- Containers are lightweight, portable, and provide a consistent runtime environment for applications, regardless of the underlying infrastructure.

Containers in relation to Docker:

- Docker uses containers to package and distribute applications.
- A Docker container is a running instance of a Docker image.
- When a Docker image is run, it creates a container that runs in isolation from the host operating system, providing a secure, reproducible and consistent runtime environment for the application.
- Multiple containers can be run on a single host, each with its own set of resources and isolated from other containers running on the same host.
- Containers can be started, stopped, and removed, while the underlying Docker image remains unchanged.

## 24. What is the difference between a virtual machine and a Docker container?

The key difference between a virtual machine (VM) and a Docker container is the way they use and interact with the host operating system.
 - Virtual machines are like a computer within a computer. They run a complete operating system and require a significant amount of resources to run (CPU, memory, disk space). Each VM is isolated from the host operating system and other VMs running on the same hardware.
 - Docker containers are like a lightweight wrapper around an application and its dependencies. They share the same kernel as the host operating system and are isolated from other containers running on the same hardware.


Key differences include:

- Architecture:
 - A virtual machine emulates an entire computer system, including the operating system, hardware, and virtualized resources
 - A Docker container shares the same kernel as the host operating system, but isolates the application and its dependencies from the rest of the system.

- Resource usage:
 - Virtual machines require a significant amount of resources, including CPU, memory, and disk space, to run a complete operating system.
 - Docker containers are lightweight and share resources with the host operating system, which allows them to be more efficient and scalable.

- Portability:
 - Virtual machines are more difficult to move between different virtualization platforms and operating systems.
 - Docker containers are designed to be highly portable and easily moved between different systems and environments.

- Deployment speed:
 - Virtual machines can take longer to start up and shut down.
 - Docker containers can be started and stopped very quickly, allowing for rapid deployment and scaling of applications. 

In summary, virtual machines provide a complete virtualized system, while Docker containers provide a lightweight, isolated environment for running applications.

Containers are more efficient, portable, and scalable than virtual machines, which makes them a popular choice for deploying modern applications.


![](https://thesolving.com/wp-content/uploads/2021/05/How-docker-works-structure-and-functioning-in-detail.png)

## 25. What is a Docker image, and how is it created?

Docker image:
- A Docker image is a lightweight, stand-alone, executable package that includes everything needed to run a piece of software, including the code, runtime, system tools, libraries, and settings.
- Images act as a blueprint or template from which containers can be created.

How is it created:
- Docker images are created using a Dockerfile, which is a text file that contains instructions on how to build the image.
- The Dockerfile specifies the base image, sets environment variables, copies files, installs dependencies, and runs commands to configure the software.
- The Dockerfile is then used to build the Docker image, which is stored in a Docker registry.

Example of a simple Dockerfile that creates a Docker image for a Python Flask web application:
- The Dockerfile sets the base image to Python 3.9
- Installs the dependencies from the requirements file
- Copies the application code to the working directory
- Sets the environment variables
- Exposes port 5000 for the application
- Runs the Flask web server

In [None]:
# Set the base image
FROM python:3.9

# Set the working directory
WORKDIR /app

# Copy the requirements file
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Set the environment variables
ENV FLASK_APP=app.py

# Expose the application port
EXPOSE 5000

# Run the application
CMD ["flask", "run", "--host=0.0.0.0"]

To build the Docker image from the Dockerfile, run the command below in the same directory as the Dockerfile.

This command builds a Docker image with the tag "my-flask-app" using the Dockerfile in the current directory (".") as the build context. Once the Docker image is built, it can be run as a Docker container on any platform that supports Docker.

In [None]:
docker build -t my-flask-app .

## 26. How do you run a Docker container from an image?

To run a Docker container from an image, run the following commands:

In [None]:
# Basic syntax
docker run <image_name>

# Example from above
docker run my-flask-app

By default, Docker will create a new container from the image and start it in the foreground. You can interact with the container by typing commands into the terminal.

However, you may want to customize the behavior of the container by passing additional arguments to the docker run command. Here are some common options:

- -d: Runs the container in the background (detached mode).
- -p: Maps a port on the host machine to a port on the container.
- -name: Assigns a name to the container.
- -v: Mounts a volume from the host machine to the container.
- -e: Sets environment variables for the container.

For example, to run the "my-flask-app" container in detached mode and map port 5000 on the host machine to port 5000 on the container, you can run the following command:

In [None]:
docker run -d -p 5000:5000 my-flask-app

## 27. What is Docker Hub, and what is its purpose?

Docker Hub:
- Public registry that allows developers to store, share, and distribute Docker images.
- It serves as a central repository for Docker images (like GitHub does for code).

Purpose of Docker Hub:
- To store Docker images and make them available to others.
- Discover and download images that others have created and shared.
- Collaborate with others by sharing images and collaborating on Dockerfiles.
- Automate the build and testing of images using Continuous Integration (CI) tools.
- Docker Hub also provides additional features for paid users, such as:
 - Private repositories
 - Role-based access control
 - Image vulnerability scanning

## 28. What is a Dockerfile, and what are its main components?

Dockerfile:
- A text file containing instructions and commands for building a Docker image.
- It defines the environment, packages, dependencies, configurations, files and commands required to run the application.


Main components:
- Base image:
 - This is the starting point for the image. The base image provides a set of pre-configured operating system files and packages that the new image can build on. 
 - E.g. Base image: FROM python:3.9-alpine

- Instructions:
 - These are the commands that tell Docker how to build the image. For example, an instruction might install a specific package or copy files from the local file system into the image
 - E.g. Instructions: RUN apk add --update git

- Arguments:
 - These are variables that can be passed to the Dockerfile at build time.
 - Arguments allow the Dockerfile to be customized for different environments or use cases.
 - E.g. Arguments: ARG version=1.0

- Environment variables:
 - These are variables that can be set within the image and used by the container at runtime.
 - E.g. Environment variables: ENV FLASK_APP=app.py

- Labels:
 - These are key-value pairs that provide metadata about the image, such as the version, author, and build date.
 - E.g. Labels: LABEL maintainer="John Smith"

- Working directory:
 - This is the directory where the Dockerfile instructions are executed. It can be set to any location within the image.
 - E.g. Working directory: WORKDIR /app
 
Using a Dockerfile provides a consistent and repeatable way to build Docker images, making it easier to manage and deploy containers. By defining the image creation process in a Dockerfile, developers can easily version control their image builds and ensure that the same image is built every time.

Example of a simple Dockerfile that creates a Docker image for a Python Flask web application:
- The Dockerfile sets the base image to Python 3.9
- Installs the dependencies from the requirements file
- Copies the application code to the working directory
- Sets the environment variables
- Exposes port 5000 for the application
- Runs the Flask web server

In [None]:
# Set the base image
FROM python:3.9

# Set the working directory
WORKDIR /app

# Copy the requirements file
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Set the environment variables
ENV FLASK_APP=app.py

# Expose the application port
EXPOSE 5000

# Run the application
CMD ["flask", "run", "--host=0.0.0.0"]

## 29. How do you build a Docker image using a Dockerfile?

To build a Docker image using a Dockerfile:
- Create a Dockerfile:
 - Create a text file called "Dockerfile" in a directory where you want to build the image.
- Write the Dockerfile instructions:
 - In the Dockerfile, write the instructions for building the image.
 - These instructions typically start with a base image and include commands for installing packages, copying files, and configuring the environment.
- Build the Docker image:
 - Run the "docker build" command in the directory where the Dockerfile is located. This command will read the instructions in the Dockerfile and build a Docker image based on those instructions.

The command builds a Docker image with the tag "my-flask-app" using the Dockerfile in the current directory (".") as the build context. Once the Docker image is built, it can be run as a Docker container on any platform that supports Docker.

The "-t" flag is used to specify the name and tag for the image, and the "." indicates that the build context is the current directory.

In [None]:
# Basic syntax
docker build -t <image_name> .

# Example from above
docker build -t my-flask-app .

## 30. How can you share data between Docker containers?

There are several ways to share data between Docker containers:

- Docker volumes (recommended method):
 - A volume is a persistent data storage mechanism that can be shared between multiple containers.
 - Volumes can be created using the "docker volume create" command, and then mounted to one or more containers using the "-v" flag.
 - Data written to the volume from one container can be read from another container that shares the same volume.

- Docker networks:
 - Docker networks enable containers to communicate with each other, and data can be shared between containers through network connections.
 - Containers can be connected to the same network using the "docker network connect" command.

- Shared file systems:
 - If multiple containers are running on the same host, they can share data using a shared file system.
 - In this approach, files are stored on the host file system and mounted into the containers using the "-v" flag.

- Docker Compose:
 - Docker Compose is a tool for defining and running multi-container Docker applications.
 - With Docker Compose, you can define multiple services (containers) that interact with each other and share data through volumes or networks.

Mounting a volume to a Docker container allows you to persist data across container restarts and share data between containers. Docker supports two types of volumes: 
- Bind mounts
 - Bind mounts map a directory or file on the host system to a directory or file in the container. This method is useful when you need to access files on the host system or share data between the host and the container.
- Managed volumes
 - Managed volumes are created and managed by Docker, providing better isolation and improved performance compared to bind mounts. Docker automatically manages the storage and lifecycle of these volumes.

In both bind mounts and managed volumes, the volume is mounted at /path/to/container/directory in the container, and any changes made in the container's mounted directory will persist even after the container is stopped or removed.

![](https://techmormo.com/wp-content/uploads/2022/11/docker4-stateless-containers.png)

# Code Refactoring

## 31. How would you define Code Refactoring (CF)?

Definition:
- CF is the process of restructuring existing code to improve its internal structure, without changing its external behaviour


Goal:
- To make the code easier to read,understand, and maintain as well as to improve it's performance, scalabiltiy and reliability

Steps:
- Transformation of 'working' code into deployed code
- Separating parts of code into multiple files
- Organizing repository to be deployment-ready

### a. How does CF influence the external behavior of the code?

Code refactoring should not change the external behavior of the code. The purpose of refactoring is to improve the internal structure and design of the code, without modifying its functionality or behavior.

### b. What are the concrete goals one wants to achieve with CF?

Common goals of CF include:
- Improving code readability: By improving the structure and organization of the code, it becomes easier for developers to understand and work with it.
- Increasing code maintainability: CF can help simplify complex code, reduce dependencies, and eliminate redundant code, making it easier to maintain it.
- Enhancing code reusability: CF can help extract common functionality into reusable components, reducing the amount of code that needs to be written and improving overall code efficiency.
- Optimizing code performance: CF can help identify and eliminate bottlenecks and improve the overall performance.
- Reducing technical debt: By addressing technical debt, or the accumulation of bad code practices over time, CF can help improve the overall quality and sustainability of the codebase.
- Making the code more testable: CF can help reduce coupling and increase cohesion, making the code more modular and easier to test.

## 32. What do we call “code smell,” and what are the most common types?

- "Code smell" refers to potential design problems or poor programming practices in code.
- Code smells are not necessarily bugs or errors, but they may make the code difficult to maintain, modify, or extend over time.

Common types of code smell includes:
- Long functions
- Duplicate code
- Dead code
- Data clumps
- Improper names

Other types includes:
- Large class
- Feature envy
- Primitive obsession
- Shotgun surgery
- God object

### a. Can you describe any of the types, why does it appear and how to solve/improve it?

Common types:

- Long functions
 - Description: code blocks that are too long and complex, containing too many lines of code, making them harder to read and understand.
 - Why it appears: devs try to solve too many problems at once or fail to break down a complex problem into smaller, more manageable parts
 - How to solve/improve it: break down complex problems into smaller, more manageable parts, and write separate functions to solve each part

- Duplicate code
 - Description: code blocks that appear in multiple places in the codebase, leading to redundancy and increased risk of errors
 - Why it appears: devs copy/paste code instead of creating reusable functions or modules.
 - How to solve/improve it: create reusable functions or modules that can be called from multiple places in the codebase

- Dead code
 - Description: code blocks that are no longer being used in the codebase and serve no purpose.
 - Why it appears: devs forget to remove code that has been replaced or is no longer needed.
 - How to solve/improve it: regularly review the codebase and remove any code blocks that are no longer being used or serve no purpose

- Data clumps
 - Description: groups of data elements that appear together in multiple places throughout the codebase, leading to redundancy and increased complexity
 - Why it appears: devs fail to encapsulate related data elements into classes or data structures.
 - How to solve/improve it: group related data elements together into classes or data structures
 
- Improper names
 - Description: unclear, confusing, or inconsistent naming conventions in the codebase, leading to reduced readability and increased confusion
 - Why it appears: devs fail to establish clear and consistent naming conventions in the codebase.
 - How to solve/improve it: establish clear and consistent naming conventions in the codebase. They should use descriptive, meaningful, and easy-to-understand names and avoid names that are too short, too long, or inconsistent with the established conventions

Other types:
- Large class
 - Description: classes that contain too much functionality or data, leading to increased complexity and reduced maintainability
 - Why it appears: devs try to solve too many problems in a single class or fail to encapsulate related functionality into separate classes.
 - How to solve/improve it: break down complex functionality into smaller, more focused classes. They should use inheritance, composition, and other design patterns to create a modular and maintainable codebase.

- Feature envy
 - Description: code blocks that use too much data or functionality from other classes, leading to increased coupling and reduced flexibility
 - Why it appears: devs fail to encapsulate related functionality into separate classes or use classes in an inappropriate way
 - How to solve/improve it: use proper encapsulation and modularity principles. They should avoid using data or functionality from other classes unless it is absolutely necessary and should instead rely on established interfaces and APIs

- Primitive obsession
 - Description: overuse of primitive data types instead of more appropriate data structures or objects, leading to increased complexity and reduced maintainability
 - Why it appears: devs fail to create appropriate data structures or objects for their code, or when they try to optimize for performance at the expense of maintainability
 - How to solve/improve it: use appropriate data structures and objects that reflect the domain model of the code. They should avoid using primitive data types as objects and should instead create custom objects and data structures where appropriate

- Shotgun surgery
 - Description: changes that require modifying multiple parts of the codebase, leading to increased complexity and reduced maintainability
 - Why it appears: devs fail to properly modularize their code or create proper dependencies between different parts of the codebase.
 - How to solve/improve it: create proper modularization and dependency structures in their code. They should use design patterns such as dependency injection and inversion of control to reduce coupling between different parts of the codebase and make changes easier to manage

- God object
 - Description: classes that contain too much functionality and data, making them difficult to understand and maintain
 - Why it appears: devs fail to properly modularize their code or create appropriate abstractions for their functionality
 - How to solve/improve it: break down complex functionality into smaller, more focused classes. They should use design patterns such as composition and inheritance to create modular and maintainable code. They should also use appropriate abstractions and interfaces to separate concerns and reduce complexity

## 33. How can we improve understanding in reusability of our code when publishing in a specific platform or in specific company?

Following practices can be applied:

- Standardization:
 - Developing a set of coding guidelines and standards can help ensure that the code is written in a consistent and reusable manner.

- Documentation:
 - Documenting the code, including comments, API documentation, and user guides, can help other developers understand how to use the code and how it fits into the overall architecture.

- Code reuse analysis:
 - Analyzing the code to identify common patterns or functions that can be reused, e.g.:
   - looking for duplicate code
   - creating libraries or modules that can be used across projects
   - developing code designed to be generic and reusable

- Education and training:
 - Help devs understand the importance of code reusability and best practices for achieving it. 
 - This can include training on coding standards, documentation practices, and code reuse analysis.

- Collaboration:
 - Encouraging collaboration among devs can help improve understanding of code reusability.
 - This can include pair programming, code reviews, and working together on shared projects.

# API

![picture](https://res.cloudinary.com/dyd911kmh/image/upload/v1664210695/A_simple_API_architecture_design_f98bfad9ce.png)

## 34. Can you define Application Programming Interface?

Definitions:
- It is a set of defined rules that enable different applications to communicate with each other.
- It acts as an intermediary layer that processes data transfers between systems.

Usability:
- APIs simplify software development and innovation by enabling applications to exchange data and functionality easily and securely.

## 35. What types of API do we know based on their availability?

- Public API
 - is open and available for use by any outside developer or business. These are also called open or external APIs
- Partner API
 - is only available to specifically selected and authorized outside developers or API consumers. It facilitates business-to-business activities
- Private API
 - is intended only for use within the enterprise to connect systems and data within the business
- Composite API
 - is a sequence of tasks that run synchronously as a result of the execution and not at the request of a task.

### a. Can you describe any of the types, how they differentiate from others, and can you tell one case you would use it?

- Public API:
 - These APIs are available for public use and can be accessed by anyone.
 - They are typically offered by companies or organizations as a way to promote their services, products, or platforms.
 - For example the OpenWeatherMap API, which provides weather data for various locations around the world. It can be used by developers to integrate weather information into their applications.

- Partner API:
 - These APIs are offered to partners or third-party developers for specific purposes.
 - They are typically used to integrate with a specific service or platform.
 - For example, a social media platform might offer an API to allow developers to integrate their applications with the platform and access user data.

- Private API:
 - These APIs are used within an organization and are not accessible to the public.
 - They are typically used to facilitate communication between different systems or services within the organization.
 - For example, a company might use an internal API to allow different departments to share data and information.

- Composite APIs:
 - These APIs are created by combining multiple APIs into a single interface.
 - They are used to simplify complex processes and provide a unified interface to developers.
 - For example, a travel booking website might use a composite API to integrate with multiple airlines and hotels.


In summary:
- Public APIs are useful for promoting services or products
- Partner APIs are useful for integrating with specific services or platforms
- Private APIs are useful for facilitating communication within an organization
- Composite APIs are useful for simplifying complex processes

## 36. What are the types based on their protocol?

- REST (Representational State Transfer)
 - is a web services API and crucial for modern web applications

- SOAP (Simple object access protocol)
 - is a well-established protocol but comes with strict rules, rigid standards

- RPC (Remote Procedure Call protocol)
 - is the oldest and simplest type of API with a goal for the client to execute code on a server

- Event-driven or asynchronous APIs
 - transmit information in quasi-real-time. The advantage is that it allows the source to send a response only when the information is new or has changed, useful for stock exchanges

### a. Can you describe any of the types, how they differentiate from others, and can you tell one case you would use it?

Description and use case examples:

- REST (Representational State Transfer):
 - REST is an architectural style for building APIs that uses HTTP requests to perform CRUD (Create, Read, Update, Delete) operations on resources.
 - RESTful APIs use common HTTP verbs like GET, POST, PUT, and DELETE to manipulate data. They also typically use JSON (JavaScript Object Notation) as the data format.
 - Use case: REST APIs are widely used for building web applications and mobile apps.
   - For example, a social media app might use a REST API to fetch and display posts, comments, and likes.
   - For example, Twitter's REST API allows developers to retrieve tweets, user profiles, and other data related to the platform's users and content.

- SOAP (Simple Object Access Protocol):
 - SOAP is an XML-based messaging protocol for exchanging information between web services.
 - SOAP uses a set of well-defined rules for messaging and authentication, and typically uses HTTP or SMTP (Simple Mail Transfer Protocol) for transport.
 - Use case: SOAP APIs are often used in enterprise applications that require high security and reliability.
   - For example, a bank might use a SOAP API to exchange sensitive financial information between different systems.
   - For example, integrating two enterprise systems, such as an order management system and a shipping provider's system.

- RPC (Remote Procedure Call):
 - RPC is a protocol for building distributed applications in which a client calls a remote procedure (function) on a server and receives the result.
 - RPC can use different transport protocols such as HTTP, TCP, or UDP.
 - Use case: RPC APIs are often used for building microservices-based architectures where different services need to communicate with each other.
   - For example, an e-commerce website might use an RPC API to handle order processing and inventory management.
   - For example, controlling a remote device or system, such as a robotic arm or a network router.

- Event-driven or asynchronous APIs:
 - Event-driven APIs use a publish-subscribe model in which events (such as updates or changes) are sent to subscribers who have expressed interest in them.
 - Asynchronous APIs use non-blocking I/O to handle requests and responses, allowing multiple requests to be processed simultaneously.
 - Use case: Event-driven and asynchronous APIs are often used for building real-time applications like chat apps or online games.
   - For example, a chat app might use an event-driven API to send and receive messages in real-time.
   - For example, a real-time data streaming service, such as a stock market feed. The API would push updates to subscribed clients as new data becomes available, rather than requiring the clients to repeatedly query for updates.

How they differentiate:
- Architecture:
 - REST and SOAP are based on the client-server model
 - RPC and event-driven APIs are based on the peer-to-peer model.

- Data format:
 - REST typically uses JSON or XML to format data
 - SOAP uses XML
 - RPC uses various data formats such as JSON and XML
 - Event-driven APIs can use any data format.

- Communication:
 - REST, SOAP, and RPC use synchronous communication (sender and receiver interact in real-time)
 - Event-driven APIs use asynchronous communication.

- Transport protocol:
 - REST and SOAP use HTTP as the transport protocol
 - RPC can use any protocol
 - Event-driven APIs can use any protocol that supports asynchronous communication.

- Endpoint definition:
 - REST and SOAP define endpoints using URLs
 - RPC and event-driven APIs use function calls or event subscriptions.

- Caching:
 - REST has built-in support for caching
 - SOAP and RPC do not.

- Error handling:
 - REST uses HTTP status codes for error handling
 - SOAP and RPC use custom error codes

## 37. We use API options that can be passed with the endpoint to influence the response. How do we call them, and can you give a few examples?

- API options can be passed as query parameters in the URL
- To call an API option, you append the option name and value to the end of the URL using the "?" separator. Multiple options can be separated with ampersand "&".
- Query parameters can be in the form of HTTP requests such as GET, POST, PUT, DELETE and PATCH
- Some options require a python script using the 'requests' library to pass the request body to the URL





What are API options used for:

- Filtering: You can filter data by passing parameters that specify certain conditions that must be met.

- Sorting: You can sort data by passing parameters that specify which field to sort by and whether to sort in ascending or descending order.

- Pagination: If an API endpoint returns a large amount of data, you can use pagination parameters to control how much data is returned at a time.

- Authentication: You can pass authentication parameters to authenticate the API request.

- Language: Some APIs allow you to specify the language of the response by passing a language parameter.

### a. Can you describe any of the request types, and for what purpose do we use it?

- GET:
 - This request type is used to retrieve data from a server.
 - For example, if you want to retrieve information about a user from a database, you would use a GET request to retrieve that information.

- POST:
 - This request type is used to send data to a server to create or update a resource.
 - For example, if you want to create a new user in a database, you would use a POST request to send the user's information to the server.

- PUT:
 - This request type is used to update an existing resource on the server.
 - For example, if you want to update a user's information in a database, you would use a PUT request to send the updated information to the server.

- DELETE:
 - This request type is used to delete a resource from the server.
 - For example, if you want to delete a user from a database, you would use a DELETE request to remove that user's information from the server.

- PATCH:
 - This request type is used to make a partial update to an existing resource on the server.
 - For example, if you want to update only a specific field of a user's information in a database, you would use a PATCH request to send only the updated field to the server.

Code example of POST option:

In [None]:
# Example of URL
POST https://api.example.com/users

In [None]:
# Example of request body
{
  "name": "John Doe",
  "email": "johndoe@example.com",
  "password": "secretpassword"
}

In [None]:
# Example of request in python, i.e. how the request body is POSTed to the URL
import requests
import json

url = 'https://example.com/api/users'

data = {
    'name': 'John Doe',
    'email': 'john@example.com',
    'password': '12345'
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.status_code)
print(response.json())


## 38. Discuss pro/cons of using API deployement on a virtual machine vs a managed serverless solution (e.g. GCP CloudRun)

Virtual Machine (VM)

Pros:

- More control: With a VM, you have complete control over the software stack, including the operating system and any libraries or dependencies that are required for your API.
- Customizable infrastructure: You can customize the infrastructure to your exact needs, such as storage, networking, and security.
- Flexibility: A VM can be used to deploy a wide range of applications, including APIs, web applications, and databases.

Cons:

- Maintenance: Managing a VM requires ongoing maintenance, including security updates, software updates, and patches.
- Scalability: Scaling a VM can be complex, as it requires additional resources and configuration.
- Cost: VMs can be expensive to run, especially if you require a high level of resources or require custom infrastructure.

Managed Serverless Solution

Pros:

- Easy deployment: Managed serverless solutions make it easy to deploy and run your API, as they handle the underlying infrastructure for you.
- Automatic scaling: These solutions can automatically scale to handle increases in traffic, which can be beneficial for APIs that experience fluctuating traffic.
- Low maintenance: With a serverless solution, you don't need to worry about managing the underlying infrastructure, as this is taken care of for you.
- Cost-effective: You only pay for the resources you use, so serverless solutions can be cost-effective for APIs with low to moderate traffic.

Cons:

- Less control: With a serverless solution, you have less control over the underlying infrastructure, which can be a problem if you require specific configurations.
- Performance: Serverless solutions can be slower than running an API on a dedicated VM, as the infrastructure needs time to spin up when a request is made.
- Vendor lock-in: You may be locked into a specific vendor's platform if you choose to use a serverless solution, which can be problematic if you want to switch to a different provider.

In summary:

Virtual machine: If you require a high level of control and customization or have specific requirements

Managed serverless solution: if you want a cost-effective, easy-to-deploy solution with automatic scaling

## Fast API

FastAPI

FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+.
- It is fast
- Supports asynchronous code (async and await commands). It can perform multiple tasks concurrently. In that way, it doesn’t need to wait until one called to be answered and can continue with a new request
- Short development line 7-8 lines of code


Uvicorn + FastAPI
- Uvicorn is a lightning-fast ASGI server that is used to serve FastAPI applications.
- FastAPI is built on top of Starlette which itself is built on top of Uvicorn.

Tranformation process

Starting point: A folder with working python code for prediction on already trained model saved in sub-folder.
Steps:
- Install dependencies: fastapi, uvicorn, pydentic
- Create a new python script based on prediction script which initiate fastapi app
- define request parameters and their types
- start uvicorn server and define address and fastapi app 
- Send request and hope for response 200 :)