<a href="https://colab.research.google.com/github/NadiaHolmlund/BDS_M6_Exam_Notes/blob/main/BDS_M6_Exam_Notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. What are the main types of databases and what are the main differences between them?

There are several types of databases, each with its own unique features and characteristics. Some of the main types of databases are:

- Relational databases: These are the most common type of database and store data in tables with rows and columns. They are based on the relational model and use SQL (Structured Query Language) to manipulate data. Relational databases are well suited for structured data and are widely used in business applications.

- NoSQL databases: These are databases that do not use the traditional table-based model of relational databases. Instead, they use a document-based model, key-value pairs, or graphs. NoSQL databases are often used for large and unstructured data sets and can handle different types of data such as images, videos, and social media feeds.

- Object-oriented databases: These databases store data in objects rather than in tables. They are used primarily in object-oriented programming and can store complex data structures such as arrays and linked lists.

- Graph databases: These databases use graph structures to store and organize data. They are used to represent complex relationships between data entities and are often used in social networks, recommendation systems, and fraud detection.


The main differences between these types of databases are the data structures they use, the way they store and organize data, and the way they handle queries and data access. Relational databases are well suited for structured data and support complex queries, while NoSQL databases are better suited for unstructured data and are more scalable. Object-oriented databases are designed for object-oriented programming languages, and graph databases are designed to handle complex relationships between data entities.

# 2. What is a primary key in a relational database?

In a relational database, a primary key is a unique identifier for each record (row) in a table. It is a column or combination of columns that uniquely identifies a record and ensures that each record in a table can be accessed and updated efficiently.

The primary key constraint is used to enforce this uniqueness and requires that no two records in a table have the same value for the primary key column(s). This means that a primary key cannot contain duplicate values, and every record in the table must have a unique value for the primary key.

The primary key can be a single column or a combination of columns in the table, and it is typically used as a reference in other tables (foreign keys) to establish relationships between tables.

For example, in a customer and orders table, the customer ID column could be the primary key in the customer table and a foreign key in the orders table, linking the two tables together.

Using primary keys in a database helps ensure data integrity, consistency, and efficient data retrieval.

# 3. What is a foreign key in a relational database?

In a relational database, a foreign key is a column or a combination of columns that references the primary key of another table. It is used to establish a relationship between two tables by creating a link between them.

A foreign key column in one table is used to refer to the primary key column(s) in another table. This allows data from one table to be linked to data in another table, creating a relationship between the two tables.

For example, in a database for an e-commerce website, there may be a customer table and an orders table. The customer table might have a primary key column called "customer_id", and the orders table might have a foreign key column called "customer_id" that references the primary key column in the customer table. This allows the orders table to associate each order with a specific customer in the customer table.

Foreign keys ensure referential integrity in the database by preventing actions that would violate relationships between tables, such as deleting a record that is referenced by a foreign key in another table. They also allow for efficient retrieval of related data from multiple tables.

# 4. Can you give me some examples of SQL syntax (statements) and their applications?

SELECT statement: This statement is used to retrieve data from one or more tables in a database. For example:

This would retrieve all the columns and rows from the "customers" table.

In [None]:
SELECT * FROM customers;

INSERT statement: This statement is used to insert new data into a table. For example:

This would insert a new record into the "customers" table with the values "John" for the first_name column, "Doe" for the last_name column, and "johndoe@example.com" for the email column.

In [None]:
INSERT INTO customers (first_name, last_name, email)
VALUES ('John', 'Doe', 'johndoe@example.com');

UPDATE statement: This statement is used to update existing data in a table. For example:

This would update the email address of the customer with a customer_id of 123 in the "customers" table.

In [None]:
UPDATE customers
SET email = 'newemail@example.com'
WHERE customer_id = 123;

DELETE statement: This statement is used to delete data from a table. For example:

This would delete the record for the customer with a customer_id of 123 from the "customers" table.

In [None]:
DELETE FROM customers
WHERE customer_id = 123;


CREATE TABLE statement: This statement is used to create a new table in a database. For example:

This would create a new table called "orders" with columns for order_id, customer_id, order_date, and total_amount.

In [None]:
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  order_date DATE,
  total_amount DECIMAL(10, 2)
);

# 5. Give some examples of SQL applications.

SQL (Structured Query Language) is a popular programming language used to manage and manipulate relational databases. Here are some examples of applications of SQL:

- Database management: SQL is used to create and manage relational databases. SQL statements can be used to create tables, insert data, retrieve data, update data, and delete data from a database.
- Business intelligence: SQL is used to query databases to generate reports and analyze data. SQL statements can be used to filter data, group data, calculate averages, sums, and other metrics, and combine data from multiple tables.
- E-commerce websites: SQL is used to manage online transactions, such as tracking orders and customer information, and updating inventory levels. SQL statements can be used to update order status, retrieve customer information, and adjust inventory levels.
- Healthcare: SQL is used to manage patient information, including medical records, insurance information, and billing information. SQL statements can be used to retrieve patient data, update medical records, and generate billing statements.
- Social media: SQL is used to store and manage large amounts of user-generated data, such as posts, comments, and likes. SQL statements can be used to retrieve user data, update user profiles, and track engagement metrics.

# 6. What are the different types of SQL and what are their key features and use cases?

There are several types of SQL, each with its own unique features and use cases. Here are some of the most common types of SQL:

- Standard SQL: Standard SQL is the most common and widely used type of SQL. It is a standardized version of SQL that is supported by most relational database management systems (RDBMS). Standard SQL is used for managing and querying relational databases, and it includes features such as data definition language (DDL), data manipulation language (DML), and data control language (DCL).
- Transact-SQL (T-SQL): T-SQL is a proprietary implementation of SQL used by Microsoft SQL Server. It includes extensions to standard SQL, such as additional built-in functions, error handling, and support for procedural programming constructs like IF-ELSE statements and loops. T-SQL is commonly used for building complex queries and stored procedures in SQL Server databases.
- PL/SQL: PL/SQL is a procedural extension to SQL used by Oracle databases. It includes support for loops, conditionals, and exception handling, and it is used for building complex database applications and stored procedures.
- MySQL: MySQL is a popular open-source relational database management system that uses a variant of SQL called MySQL SQL. MySQL SQL includes support for features such as transactions, subqueries, and triggers, and it is commonly used for web development and data-driven applications.
- PostgreSQL: PostgreSQL is an open-source relational database management system that uses a variant of SQL called PostgreSQL SQL. PostgreSQL SQL includes support for features such as nested transactions, user-defined functions, and full-text search, and it is commonly used for enterprise applications and data warehousing.
- NoSQL: NoSQL databases use non-relational data models and query languages, and they are used for managing large volumes of unstructured data, such as social media posts, sensor data, and log files. Some popular NoSQL databases include MongoDB, Cassandra, and Couchbase.

# 7. What are the different types of NoSQL and what are their key features and use cases?

NoSQL (Not only SQL) databases are non-relational databases that do not use the traditional table-based relational database model. Instead, they use a variety of data models and structures that are designed to handle large volumes of unstructured and semi-structured data. Here are some of the most common types of NoSQL databases and their key features and use cases:

- Document databases: Document databases store data in flexible, semi-structured documents, such as JSON or BSON. These documents can contain nested values and arrays, making them highly flexible and scalable. Document databases are commonly used for content management systems, e-commerce platforms, and mobile applications. Examples of document databases include MongoDB, CouchDB, and RavenDB.
- Key-value stores: Key-value stores are simple databases that store data as key-value pairs, where the value can be any type of data, including binary data or complex objects. Key-value stores are highly scalable and can handle high volumes of read and write operations. They are commonly used for caching, session management, and real-time analytics. Examples of key-value stores include Redis, Riak, and Amazon DynamoDB.
- Column-family databases: Column-family databases store data in columns rather than rows, making them highly scalable and efficient for handling large volumes of data. Column-family databases are commonly used for content management, time-series data, and analytics. Examples of column-family databases include Apache Cassandra, HBase, and Amazon SimpleDB.
- Graph databases: Graph databases store data as nodes and edges, allowing for highly efficient traversal and querying of relationships between data. Graph databases are commonly used for social networking, recommendation engines, and fraud detection. Examples of graph databases include Neo4j, OrientDB, and Titan.

# 8. How can we use a database in an ML project, and can you give an example?

Databases can be used in machine learning projects to store and manage large amounts of data, which can then be used for model training, testing, and validation. Here is an example of how a database can be used in an ML project:

Let's say we want to build a machine learning model that can predict the likelihood of a customer making a purchase on an e-commerce website. To build this model, we will need to collect and store data about customer behavior, such as their browsing history, purchase history, demographics, and other relevant features.

We can use a database, such as MySQL or PostgreSQL, to store this data in a structured format. We can then use SQL queries to extract and transform the data into a format that can be used for training and testing our machine learning model.

For example, we can use SQL queries to:

Join multiple tables to combine data from different sources
Filter and select data based on specific criteria
Aggregate data to calculate summary statistics, such as average purchase value or conversion rate
Normalize and preprocess data to prepare it for machine learning algorithms, such as scaling or one-hot encoding categorical features.
Once the data has been transformed and preprocessed, we can use it to train and test our machine learning model. We can then use the trained model to make predictions about new customers and their likelihood of making a purchase.

In summary, databases can be used in machine learning projects to store, manage, and preprocess large amounts of data, which can then be used for model training, testing, and validation.

# 9. What are the main advantages of SQL?

Here are some of the main advantages of SQL:

- Easy to learn and use: SQL is a simple and user-friendly language that is easy to learn, even for those with no programming experience. Its intuitive syntax and simple structure make it easy to write and understand SQL queries.
- Widely used: SQL is a standard language that is widely used in the industry and is supported by many relational database management systems (RDBMS). This means that SQL skills are in high demand, and there are many resources available for learning and using SQL.
- High performance: SQL is designed to work efficiently with large volumes of data, making it a good choice for handling big data applications. Its optimized queries and indexing techniques help to ensure fast query execution times.
- Data integrity: SQL databases ensure data integrity through the use of constraints and rules that enforce data consistency and accuracy. This helps to prevent errors and inconsistencies in the data, ensuring its reliability and trustworthiness.
- Scalability: SQL databases are highly scalable and can handle large volumes of data and users. They can also be easily scaled up or down to meet changing business needs.
- Security: SQL databases have built-in security features that help to protect data from unauthorized access and ensure its confidentiality and integrity.

In summary, SQL is a widely used and powerful language that offers a range of benefits, including ease of use, high performance, data integrity, scalability, and security.

# 10. What are the main advantages of NoSQL?

Here are some of the main advantages of NoSQL:

- Scalability: NoSQL databases are highly scalable and can handle large volumes of data and traffic. They are designed to be horizontally scalable, meaning that additional servers can be added to the cluster to handle increased load and traffic.
- Flexible data models: NoSQL databases do not use a fixed schema, which allows for more flexible data models that can be adapted to changing business needs. This means that data can be added or modified without the need for complex schema migrations.
- High performance: NoSQL databases are optimized for performance and can handle large volumes of read and write operations. They use techniques such as sharding, caching, and distributed processing to ensure fast query response times.
- Cost-effective: NoSQL databases can be more cost-effective than traditional relational databases, especially for large-scale applications. They require less hardware and can be run on commodity hardware, which can help to reduce infrastructure costs.
- Availability: NoSQL databases are designed to be highly available, with built-in redundancy and failover mechanisms. This means that if one server fails, the system can automatically switch to a backup server without any interruption to service.
- Easy to use: NoSQL databases are often easier to use than traditional relational databases, with simpler query languages and more intuitive interfaces. This can make it easier for developers to work with the database and to build applications that use it.

In summary, NoSQL databases offer a range of benefits, including scalability, flexible data models, high performance, cost-effectiveness, availability, and ease of use. These advantages make NoSQL databases a good choice for many modern applications that require flexible and scalable data storage.

# 11. What is Apache Spark used for?

Apache Spark is a popular open-source distributed computing system used for large-scale data processing and analytics. It is designed to handle batch processing, streaming data processing, machine learning, graph processing, and other data processing workloads. Here are some common use cases for Apache Spark:

- Data processing: Apache Spark can be used to process large volumes of data quickly and efficiently. It supports a wide range of data sources and data formats, including structured, semi-structured, and unstructured data.
- Machine learning: Apache Spark provides built-in libraries for machine learning, including classification, regression, clustering, and collaborative filtering. These libraries can be used to build and train machine learning models on large datasets.
- Real-time analytics: Apache Spark's streaming API enables real-time data processing and analytics. It can be used to process and analyze data as it is generated, making it a useful tool for real-time applications.
- Graph processing: Apache Spark's graph processing API enables the processing and analysis of large-scale graph data. This makes it useful for applications such as social network analysis and recommendation engines.
- Data integration: Apache Spark can be used to integrate data from multiple sources and transform it into a common format for analysis. This makes it useful for data warehousing and ETL (extract, transform, load) processes.

Overall, Apache Spark is a powerful tool for large-scale data processing and analytics that can be used for a wide range of applications, including data processing, machine learning, real-time analytics, graph processing, and data integration.

# 12. Can you provide an example of a use case where Apache Spark might be preferred?

Sure, here's an example of a use case where Apache Spark might be preferred:

Let's say you are working for a company that runs a large e-commerce website. The website generates a massive amount of data, including customer interactions, clickstream data, and purchase history. You need to analyze this data to gain insights into customer behavior, optimize marketing campaigns, and improve the customer experience.

To process this data efficiently, you might consider using Apache Spark. With Spark, you can process and analyze large volumes of data quickly and efficiently, making it easier to gain insights and take action in real-time.

Here are some specific use cases for Apache Spark in this scenario:

- Real-time analytics: With Spark's streaming API, you can analyze customer interactions and clickstream data in real-time, allowing you to optimize the customer experience and improve conversion rates.
- Machine learning: Apache Spark provides built-in libraries for machine learning, allowing you to build and train models on large datasets. You can use these models to personalize the customer experience, recommend products, and optimize marketing campaigns.
- Data integration: Apache Spark can be used to integrate data from multiple sources, including social media, email marketing, and customer service, into a single platform for analysis. This makes it easier to gain a comprehensive view of customer behavior and preferences.
- Data processing: Apache Spark can be used to process and clean up large volumes of data quickly and efficiently, making it easier to prepare data for analysis.

Overall, Apache Spark can be a powerful tool for e-commerce companies looking to analyze large volumes of customer data and gain insights to improve the customer experience and optimize marketing campaigns. Its scalability, real-time processing capabilities, and machine learning libraries make it an ideal choice for this type of use case.

# 13. What are the key differences between Apache Spark, Polars, and Pandas, and their use cases?

Apache Spark, Polars, and Pandas are all popular data processing and analytics tools, but they have some key differences in terms of their design, features, and use cases. Here's a comparison of the three:

- Apache Spark: Apache Spark is a distributed computing system that is designed to process large datasets quickly and efficiently. It includes a range of libraries for batch processing, streaming data processing, machine learning, and graph processing. Spark's key features include scalability, fault tolerance, and real-time data processing. It is best suited for processing and analyzing very large datasets, where distributed processing is necessary for performance.
- Polars: Polars is a data processing library for Python and Rust that is designed to provide fast, memory-efficient data processing capabilities for large datasets. It includes a range of data manipulation and aggregation functions, as well as support for parallel processing and GPU acceleration. Polars is best suited for working with large, structured datasets that require complex data transformations and filtering.
- Pandas: Pandas is a popular data manipulation library for Python that is designed for working with smaller datasets. It includes a range of functions for data cleaning, transformation, and aggregation, as well as support for data visualization. Pandas is best suited for working with structured data that can fit into memory on a single machine.

Overall, the key differences between these tools come down to their scalability and performance capabilities. Apache Spark is designed for processing and analyzing very large datasets that require distributed processing, while Polars is designed for fast, memory-efficient processing of large structured datasets. Pandas is designed for working with smaller datasets that can fit into memory on a single machine. The choice of tool depends on the size and complexity of the dataset, the performance requirements, and the specific use case.

# 14. What is MLOps, and why is it important in the context of machine learning projects?

MLOps, or Machine Learning Operations, is a set of practices and tools that aims to streamline the development, deployment, and maintenance of machine learning models. It combines principles and techniques from software engineering, DevOps, and data science to create a consistent and efficient process for managing machine learning projects.

MLOps is important in the context of machine learning projects for several reasons:

- Collaboration: Machine learning projects typically involve multiple teams and stakeholders, including data scientists, software engineers, and business analysts. MLOps provides a common framework for collaboration and communication across these teams, ensuring that everyone is working towards the same goals.
- Efficiency: MLOps helps to automate many of the repetitive tasks involved in machine learning projects, such as data cleaning, feature engineering, model training, and deployment. This can help to reduce the time and effort required to develop and deploy models, allowing teams to focus on more creative and innovative tasks.
- Reproducibility: MLOps provides a systematic approach to managing machine learning projects, including version control, testing, and documentation. This helps to ensure that models are reproducible, and that results can be verified and validated across different environments.
- Scalability: As machine learning projects grow in size and complexity, it becomes increasingly important to have a scalable and reliable infrastructure for managing models. MLOps provides tools and techniques for managing data, models, and infrastructure at scale, allowing teams to deploy and manage models across multiple environments.

Overall, MLOps is important in the context of machine learning projects because it helps to streamline the development, deployment, and maintenance of models, making it easier to collaborate across teams, improve efficiency, ensure reproducibility, and scale up projects as needed.

# 15. How does MLOps help in streamlining the machine learning lifecycle from development to deployment? Answer based on an example.

MLOps helps to streamline the machine learning lifecycle from development to deployment by providing a systematic approach to managing machine learning projects. Here's an example of how MLOps can help to streamline the lifecycle:

- Data preparation: MLOps provides tools and techniques for managing data, including data cleaning, transformation, and feature engineering. For example, data pipelines can be developed and managed using tools such as Apache Airflow or Kubeflow, allowing data to be processed and transformed automatically.
- Model development: MLOps provides tools and techniques for managing the model development process, including version control, testing, and documentation. For example, models can be developed and tested using tools such as Jupyter notebooks or PyCharm, and code can be managed using version control systems such as Git.
- Model training: MLOps provides tools and techniques for managing the model training process, including scalability and reproducibility. For example, models can be trained using distributed computing frameworks such as Apache Spark or TensorFlow, and training can be managed using tools such as Kubeflow or MLflow.
- Model deployment: MLOps provides tools and techniques for managing the model deployment process, including automation and scalability. For example, models can be deployed using containerization technologies such as Docker or Kubernetes, and deployment can be managed using tools such as GitOps or Jenkins.
- Model monitoring: MLOps provides tools and techniques for managing the model monitoring process, including tracking model performance and detecting anomalies. For example, models can be monitored using tools such as Prometheus or Grafana, and alerts can be triggered based on specific criteria.

By using MLOps, the entire machine learning lifecycle can be managed in a systematic and consistent way, allowing teams to collaborate effectively and improve efficiency. For example, data scientists can work on developing and testing models, while software engineers can focus on deploying and managing models in production. This helps to reduce the time and effort required to develop and deploy models, and improves the overall quality and reliability of the models.

# 16. How do Continuous Integration (CI) and Continuous Deployment (CD) principles apply to MLOps, and can you give a use case?

Continuous Integration (CI) and Continuous Deployment (CD) principles apply to MLOps in much the same way as they apply to software development. CI is the practice of regularly integrating code changes into a shared repository, and automatically building and testing the code to identify and fix any issues. CD is the practice of automatically deploying code changes to production once they have been tested and verified.

In the context of MLOps, CI/CD can help to streamline the machine learning development process, by automating many of the tasks involved in building, testing, and deploying models. Here's an example of how CI/CD principles can be applied to MLOps:

- Continuous Integration: Data scientists can use CI tools such as GitHub Actions, CircleCI, or Jenkins to automatically build and test their models whenever changes are made to the code or data. For example, when a new feature is added to the model or when new data becomes available, the CI system can automatically rebuild and test the model to ensure that it is still working as expected.
- Continuous Deployment: Once a model has been developed and tested, CD tools such as Kubernetes, Argo, or Jenkins can be used to automatically deploy the model to production. For example, when a new version of the model is ready, the CD system can automatically deploy it to a testing environment, where it can be further evaluated and tested. Once the model has passed all tests, it can be automatically deployed to production.

By using CI/CD principles in MLOps, the machine learning development process can be streamlined, with changes to the model and data being automatically tested and deployed. This helps to reduce the time and effort required to deploy new models and updates, and improves the overall quality and reliability of the models in production.

# 17. What are some popular tools and platforms used in MLOps for managing and deploying machine learning models?

There are several popular tools and platforms used in MLOps for managing and deploying machine learning models. Here are some examples:

- Kubeflow: An open-source machine learning toolkit for Kubernetes, which enables data scientists to build, train, and deploy machine learning models at scale.
- TensorFlow Extended (TFX): An end-to-end machine learning platform for building production ML pipelines, which includes components for data validation, preprocessing, training, and deployment.
- MLflow: An open-source platform for managing the ML lifecycle, which includes tools for tracking experiments, packaging code into reproducible runs, and sharing and deploying models.
- AWS SageMaker: A cloud-based platform for building, training, and deploying machine learning models, which provides tools for data preparation, model training, and deployment.
- Microsoft Azure Machine Learning: A cloud-based platform for building, training, and deploying machine learning models, which provides tools for data preparation, model training, and deployment.
Google Cloud AI Platform: A cloud-based platform for building, training, and deploying machine learning models, which provides tools for data preparation, model training, and deployment.
- Databricks: A cloud-based platform for building, deploying, and managing machine learning models at scale, which includes tools for data preparation, model training, and deployment.

These tools and platforms help data scientists and machine learning engineers to streamline the MLOps process, by providing tools for managing data, building models, and deploying them into production. They also provide capabilities for monitoring and optimizing models in production, which is critical for ensuring their ongoing performance and reliability.

# 18. Can you give a best practice for implementing MLOps in an organization?

Sure! One best practice for implementing MLOps in an organization is to follow a systematic and iterative approach that involves the following steps:

- Define your ML goals and use cases: Start by identifying the business problems that you want to solve using machine learning, and define the specific ML use cases and goals that you want to achieve.
Assemble a team: Build a team with the necessary skills and expertise in data science, machine learning, software engineering, and DevOps to work on the project.
- Define data pipelines: Create data pipelines to collect, preprocess, and store data for machine learning. Use version control systems to manage code and data.
- Develop and train models: Develop models using best practices in data science and machine learning. Use scalable architectures to build models that can handle large datasets.
- Test and evaluate models: Test models using validation and testing techniques to ensure that they are accurate and reliable. Establish a process to monitor models in production and to retrain models when necessary.
- Deploy models: Deploy models using containerization tools like Docker or Kubernetes. Establish a process for continuous integration and continuous deployment (CI/CD) to streamline the deployment process.
- Monitor and optimize: Establish a process to monitor the performance of models in production and to optimize them when necessary. This includes collecting data on model performance and using this data to improve the accuracy and reliability of the models over time.

By following this systematic and iterative approach, you can effectively implement MLOps in your organization and achieve the desired results from your machine learning projects. This approach helps to ensure that all aspects of the machine learning lifecycle are properly managed and that the models are accurate, reliable, and scalable.

# 19. What is MLflow, and how can it help in terms of MLOps?

MLflow is an open-source platform that provides tools for managing the end-to-end machine learning lifecycle. It is designed to help data scientists and machine learning engineers streamline the process of developing, testing, deploying, and managing machine learning models.

MLflow consists of several components, including:

- Tracking: This component allows users to track experiments and model runs, including metrics, parameters, and artifacts, in a centralized repository.
- Projects: This component provides a standardized format for packaging and deploying machine learning code, making it easier to reproduce and share models across teams and environments.
- Models: This component provides tools for managing and deploying machine learning models in various formats, including PMML, TensorFlow, and PyTorch.
- Model Registry: This component provides a central repository for managing models, including versioning, access control, and collaboration features.
MLflow can help in terms of MLOps by providing a centralized platform for managing the end-to-end machine learning lifecycle, from data preparation and model development to deployment and monitoring. By using MLflow, teams can work collaboratively on machine learning projects, easily reproduce experiments, and deploy models in a standardized and scalable way.

MLflow also supports continuous integration and continuous deployment (CI/CD) workflows, making it easier to integrate machine learning models into production environments. For example, MLflow can be integrated with tools like Jenkins or GitLab CI/CD pipelines to automate the process of building, testing, and deploying machine learning models.

Overall, MLflow can help organizations to achieve better results from their machine learning projects by providing a standardized and streamlined approach to managing the entire machine learning lifecycle.

# 20. What types of information can be stored in MLflow?

MLflow can store various types of information related to machine learning experiments, including:

- Metrics: Numeric values that track the performance of a machine learning model during training or testing, such as accuracy, precision, recall, and F1-score.
- Parameters: Settings and configurations used during model training, such as learning rate, batch size, and number of epochs.
- Artifacts: Any output generated during model training or evaluation, such as trained models, visualizations, or data preprocessing scripts.
- Source code: The code used to train and evaluate the model, along with any dependencies and environment settings.
Experiment metadata: Information about the experiment, such as the name, date, user, and any tags or annotations.

All of this information is stored in a centralized repository, which can be accessed and managed through the MLflow UI or API. By storing all of this information in a single location, MLflow provides a centralized platform for managing and tracking machine learning experiments, making it easier to reproduce and share models across teams and environments.

# 21. What is the purpose of SSH?

SSH (Secure Shell) is a cryptographic network protocol used to securely connect and communicate between two networked devices. The purpose of SSH is to provide a secure and encrypted channel over an unsecured network, such as the internet, to enable secure remote access to a server or network device.

The main purpose of SSH is to establish a secure connection between two devices, allowing a user to securely log in and remotely control a server or network device. SSH provides encryption and authentication mechanisms to protect against eavesdropping, data tampering, and unauthorized access to the remote device.

SSH is commonly used by system administrators and developers to remotely access servers and devices for tasks such as software installation, system configuration, and debugging. It is also used for secure file transfer between devices using the Secure Copy (SCP) or Secure File Transfer Protocol (SFTP) features provided by SSH.

Overall, the purpose of SSH is to provide a secure and reliable mechanism for remote access and communication between networked devices, helping to ensure the confidentiality, integrity, and availability of data and resources on the network.

# 22. What is Docker, and why is it useful?

Docker is an open-source containerization platform that allows software developers and system administrators to package, distribute, and run applications in isolated and portable containers. Containers are lightweight, standalone, and portable units that contain all the necessary dependencies and configuration needed to run an application.

Docker is useful for several reasons:

- Portability: Docker allows developers to package an application with all its dependencies and configurations into a container, making it easily portable and deployable across different environments, such as local machines, servers, and cloud platforms.
- Consistency: By using Docker, developers can ensure that the application runs consistently across different environments, eliminating the risk of issues arising due to differences in the underlying infrastructure or dependencies.
- Scalability: Docker enables developers to quickly and easily spin up multiple instances of an application in a matter of seconds, making it easier to scale applications to meet changing demand.
- Isolation: Docker containers provide a secure and isolated runtime environment, which helps prevent conflicts between different applications and dependencies.
- Efficiency: Docker containers are lightweight and have minimal overhead, making them faster and more efficient than traditional virtual machines.

Overall, Docker is a powerful tool that simplifies the process of building, distributing, and running applications, enabling developers to focus on their core tasks rather than the underlying infrastructure.

# 23. What are containers, and how do they relate to Docker?

Containers are a technology that allows software developers and system administrators to package and run applications with all their dependencies and configurations in an isolated environment. Containers are lightweight, portable, and provide a consistent runtime environment for applications, regardless of the underlying infrastructure.

Docker is an open-source containerization platform that uses containers to package and distribute applications. Docker provides a set of tools and APIs for building, shipping, and running containers, making it easier for developers to build, deploy, and manage applications in a containerized environment.

Docker uses a layered file system and container images to package and distribute applications. A container image is a lightweight, standalone, and portable package that contains all the necessary dependencies and configuration needed to run an application. Docker images can be shared and reused across different environments, making it easier to deploy applications consistently across different platforms.

When a Docker image is run, it creates a container that runs in isolation from the host operating system, providing a secure and consistent runtime environment for the application. Multiple containers can be run on a single host, each with its own set of resources and isolated from other containers running on the same host.

Overall, containers and Docker provide a powerful toolset for building, deploying, and managing applications in a consistent and efficient manner, enabling developers to focus on their core tasks rather than the underlying infrastructure.

# 24. What is the difference between a virtual machine and a Docker container?

The main difference between a virtual machine (VM) and a Docker container is the way they use and interact with the host operating system.

A virtual machine is an emulation of a complete computer system, including its own operating system, hardware, and resources, running on top of a hypervisor or virtual machine manager. Each virtual machine requires its own set of resources, including CPU, memory, and storage, and has its own isolated operating system environment. This makes virtual machines heavier and slower to start and run than Docker containers.

In contrast, a Docker container is a lightweight and portable package that includes only the application and its dependencies, sharing the underlying host operating system with other containers. Containers use a layered file system and container images to package and distribute applications, making them faster and more efficient to start and run than virtual machines.

Another difference between VMs and Docker containers is the level of isolation they provide. Virtual machines offer complete isolation from the host operating system, while Docker containers share the same kernel as the host but are still isolated from each other. This makes Docker containers more lightweight and efficient than virtual machines, as they use fewer resources and start faster.

Overall, Docker containers provide a more efficient and lightweight alternative to virtual machines, allowing developers to package and distribute applications quickly and easily across different environments while leveraging the underlying host operating system.

# 25. What is a Docker image, and how is it created?

A Docker image is a lightweight, standalone, and executable package that contains everything needed to run a piece of software, including the code, libraries, dependencies, and configuration files. Docker images are used to create Docker containers, which can be run on any platform that supports Docker.

Docker images are created using a Dockerfile, which is a text file that contains instructions on how to build the image. The Dockerfile specifies the base image, sets environment variables, copies files, installs dependencies, and runs commands to configure the software. The Dockerfile is then used to build the Docker image, which is stored in a Docker registry.

Here's an example of a simple Dockerfile that creates a Docker image for a Python Flask web application:

In [None]:
# Set the base image
FROM python:3.9

# Set the working directory
WORKDIR /app

# Copy the requirements file
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Set the environment variables
ENV FLASK_APP=app.py

# Expose the application port
EXPOSE 5000

# Run the application
CMD ["flask", "run", "--host=0.0.0.0"]

This Dockerfile sets the base image to Python 3.9, installs the dependencies from the requirements file, copies the application code to the working directory, sets the environment variables, exposes port 5000 for the application, and runs the Flask web server.

To build the Docker image from the Dockerfile, we can run the following command in the same directory as the Dockerfile:

In [None]:
docker build -t my-flask-app .

This command builds a Docker image with the tag "my-flask-app" using the Dockerfile in the current directory (".") as the build context. Once the Docker image is built, it can be run as a Docker container on any platform that supports Docker.

# 26. How do you run a Docker container from an image?

To run a Docker container from an image, you need to use the docker run command followed by the name of the Docker image you want to run. Here's the basic syntax:

In [None]:
docker run <image_name>

For example, if you have a Docker image named "my-flask-app" that you want to run as a container, you can run the following command:

In [None]:
docker run my-flask-app

By default, Docker will create a new container from the image and start it in the foreground. You can interact with the container by typing commands into the terminal.

However, you may want to customize the behavior of the container by passing additional arguments to the docker run command. Here are some common options:

-d: Runs the container in the background (detached mode).
-p: Maps a port on the host machine to a port on the container.
--name: Assigns a name to the container.
-v: Mounts a volume from the host machine to the container.
-e: Sets environment variables for the container.
For example, to run the "my-flask-app" container in detached mode and map port 5000 on the host machine to port 5000 on the container, you can run the following command:

In [None]:
docker run -d -p 5000:5000 my-flask-app

This command runs the container in the background, maps port 5000 on the host machine to port 5000 on the container, and assigns a random name to the container. You can use the docker ps command to see a list of running containers and their details.

# 27. What is Docker Hub, and what is its purpose?

Docker Hub is a public registry that allows developers to store, share, and distribute Docker images. It serves as a central repository for Docker images, much like GitHub is for code. Docker Hub provides both free and paid services to developers and organizations.

Developers can use Docker Hub to:

Store their own Docker images and make them available to others
Discover and download images that others have created and shared
Collaborate with others by sharing images and collaborating on Dockerfiles
Automate the build and testing of images using Continuous Integration (CI) tools
Docker Hub also provides additional features for paid users, such as private repositories, role-based access control, and image vulnerability scanning.

Using Docker Hub can save developers time and effort by allowing them to easily access and share Docker images without having to build them from scratch every time. It also enables collaboration and sharing of best practices within the Docker community.

# 28. What is a Dockerfile, and what are its main components?

A Dockerfile is a text file that contains instructions for building a Docker image. These instructions tell Docker how to build the image by specifying what packages and dependencies to install, what commands to run, and what files to include.

The main components of a Dockerfile are:

Base image: This is the starting point for the image. The base image provides a set of pre-configured operating system files and packages that the new image can build on.
Instructions: These are the commands that tell Docker how to build the image. For example, an instruction might install a specific package or copy files from the local file system into the image.
Arguments: These are variables that can be passed to the Dockerfile at build time. Arguments allow the Dockerfile to be customized for different environments or use cases.
Environment variables: These are variables that can be set within the image and used by the container at runtime.
Labels: These are key-value pairs that provide metadata about the image, such as the version, author, and build date.
Working directory: This is the directory where the Dockerfile instructions are executed. It can be set to any location within the image.
Using a Dockerfile provides a consistent and repeatable way to build Docker images, making it easier to manage and deploy containers. By defining the image creation process in a Dockerfile, developers can easily version control their image builds and ensure that the same image is built every time.

# 29. How do you build a Docker image using a Dockerfile?

To build a Docker image using a Dockerfile, follow these steps:

Create a Dockerfile: Create a text file called "Dockerfile" in a directory where you want to build the image.
Write the Dockerfile instructions: In the Dockerfile, write the instructions for building the image. These instructions typically start with a base image and include commands for installing packages, copying files, and configuring the environment.
Build the Docker image: Run the "docker build" command in the directory where the Dockerfile is located. This command will read the instructions in the Dockerfile and build a Docker image based on those instructions. For example:

In [None]:
docker build -t my-image-name .

The "-t" flag is used to specify the name and tag for the image, and the "." indicates that the build context is the current directory.

Check the Docker image: After the build is complete, run the "docker images" command to see a list of all the Docker images on your system. Your newly created image should be listed in the output.
You can now use the Docker image to create and run Docker containers, or push the image to a container registry like Docker Hub for others to use.

# 30. How can you share data between Docker containers?

There are several ways to share data between Docker containers:

- Using Docker volumes: A volume is a persistent data storage mechanism that can be shared between multiple containers. Volumes can be created using the "docker volume create" command, and then mounted to one or more containers using the "-v" flag. Data written to the volume from one container can be read from another container that shares the same volume.
- Using Docker networks: Docker networks enable containers to communicate with each other, and data can be shared between containers through network connections. Containers can be connected to the same network using the "docker network connect" command.
- Using shared file systems: If multiple containers are running on the same host, they can share data using a shared file system. In this approach, files are stored on the host file system and mounted into the containers using the "-v" flag.
- Using Docker Compose: Docker Compose is a tool for defining and running multi-container Docker applications. With Docker Compose, you can define multiple services (containers) that interact with each other and share data through volumes or networks.
Overall, the approach for sharing data between Docker containers depends on the specific use case and requirements of the application.

# 31. How would you define Code Refactoring (CF)?

Code refactoring is the process of restructuring and improving existing source code without changing its external behavior. The main purpose of code refactoring is to improve the code quality, maintainability, and extensibility, while reducing technical debt and increasing efficiency.

Code refactoring can involve various tasks such as:

Simplifying code by removing unnecessary complexity or duplication
Modifying the code structure to make it more modular and easier to maintain
Improving code readability and naming conventions
Optimizing code performance
Removing code smells or anti-patterns
Code refactoring is an essential part of the software development process and is often done as a proactive measure to prevent future issues or as a reactive measure to resolve existing problems. It can be carried out manually or with the help of automated tools.

## 1. How does CF influence the external behavior of the code?

Code refactoring should not change the external behavior of the code. The purpose of refactoring is to improve the internal structure and design of the code, without modifying its functionality or behavior.

Code refactoring involves making changes to the code that are safe and do not introduce new bugs or errors. The process should be carried out in a way that ensures that the code still behaves exactly the same way before and after refactoring.

While the refactoring process does not affect the external behavior of the code, it can have a positive impact on the overall quality and maintainability of the code. By improving the code structure, readability, and maintainability, it becomes easier for developers to work with the code, make changes, and add new features.

## 2. What are the concrete goals one wants to achieve with CF?

The concrete goals of code refactoring can vary depending on the specific situation and the code being refactored. However, some common goals that are typically aimed for include:

- Improving code readability: By improving the structure and organization of the code, it becomes easier for developers to understand and work with the code.
- Increasing code maintainability: Refactoring can help simplify complex code, reduce dependencies, and eliminate redundant code, making it easier to maintain the code in the long run.
- Enhancing code reusability: Refactoring can help extract common functionality into reusable components, reducing the amount of code that needs to be written and improving overall code efficiency.
- Optimizing code performance: Refactoring can help identify and eliminate bottlenecks and improve the overall performance of the code.
- Reducing technical debt: By addressing technical debt, or the accumulation of bad code practices over time, refactoring can help improve the overall quality and sustainability of the codebase.
- Making the code more testable: Refactoring can help reduce coupling and increase cohesion, making the code more modular and easier to test.

# 32. What do we call “code smell,” and what are the most common types?

"Code smell" is a term used to describe symptoms in code that indicate potential design problems or poor programming practices. Code smells are not necessarily bugs or errors, but they can indicate that the code may be difficult to maintain, modify, or extend over time. Some common types of code smells include:

- Duplicated code: Repeated code that performs the same function in different parts of the codebase, which can make the code harder to maintain.
Long method: A method that is too long and complex, which can make the code harder to understand and modify.
- Large class: A class that has too many responsibilities or methods, which can make the code harder to understand and maintain.
- Feature envy: A code smell where a method or class is more interested in the data or behavior of another class, rather than its own data or behavior, which can lead to increased coupling and reduced cohesion.
- Primitive obsession: A code smell where simple data types, such as integers or strings, are used excessively instead of creating custom classes, which can make the code harder to maintain and understand.
- Shotgun surgery: A code smell where a single change requires modifications to many different parts of the code, which can make the codebase more fragile and difficult to modify.
- God object: A class that does too much or has too much knowledge of other classes, which can lead to increased coupling and reduced cohesion.
Identifying and addressing code smells is an important part of code refactoring and can help improve the overall quality and maintainability of the codebase.

## 1. Can you describe any of the types, why does it appear and how to solve/improve it?

One common type of code smell is a long method or function, which occurs when a function contains too many lines of code or performs too many tasks. This can make the code difficult to read, test, and maintain. To solve this code smell, the method can be broken down into smaller, more focused methods, each responsible for a specific task. This can make the code more modular, easier to test, and more reusable.

Another type of code smell is duplicated code, which occurs when the same or similar code is repeated in multiple places throughout the codebase. This can lead to inconsistencies, errors, and difficulties in maintenance. To solve this code smell, the duplicated code can be extracted into a separate function or class and reused where needed. This can make the code more modular, reduce redundancy, and make it easier to maintain.

A third type of code smell is a complex conditional or switch statement, which occurs when the logic of a function relies heavily on multiple nested if-else or switch-case statements. This can make the code difficult to read and understand, and can lead to errors and difficulties in maintenance. To solve this code smell, the logic can be refactored using polymorphism or the strategy pattern, which can simplify the logic and make it more modular and maintainable.

Overall, code smells are indicators of potential problems in code, and by identifying and addressing them, developers can improve the quality, maintainability, and readability of their code.

# 33. How can we improve understanding in reusability of our code when publishing in a specific platform or in specific company?

Improving understanding of code reusability when publishing in a specific platform or company can be achieved through the following practices:

- Standardization: Developing a set of coding guidelines and standards can help ensure that the code is written in a consistent and reusable manner. Standardization can also facilitate code reviews and make it easier for developers to collaborate on projects.
- Documentation: Documenting the code, including comments, API documentation, and user guides, can help other developers understand how to use the code and how it fits into the overall architecture. This documentation should be accessible to all developers who will be using the code.
- Code reuse analysis: Analyzing the code to identify common patterns or functions that can be reused can help improve the code's reusability. This can include looking for duplicate code, creating libraries or modules that can be used across projects, and developing code that is designed to be generic and reusable.
- Education and training: Providing education and training to developers can help them understand the importance of code reusability and best practices for achieving it. This can include training on coding standards, documentation practices, and code reuse analysis.

Collaboration: Encouraging collaboration among developers can help improve understanding of code reusability. This can include pair programming, code reviews, and working together on shared projects.

# 34. Can you define Application Programming Interface?

An Application Programming Interface (API) is a set of protocols, routines, and tools for building software applications. It specifies how software components should interact and exchange data with each other. APIs are commonly used for building web applications, allowing different software components to communicate and share data with each other over the internet. APIs can be built for internal use within an organization or external use by third-party developers. They typically expose a set of endpoints or functions that can be called by other software components to perform specific tasks or retrieve specific information.

# 35. What types of API do we know based on their availability?

Based on their availability, there are two main types of APIs:

- Public APIs: These are APIs that are made available to third-party developers and the public. Public APIs are typically used to provide access to a service or data from a company or organization. For example, Twitter's API allows developers to access Twitter's data and build applications on top of it.
- Private APIs: These are APIs that are used within a company or organization, and are not made available to third-party developers or the public. Private APIs are typically used to enable communication between different internal systems or applications. For example, a company might use a private API to allow their customer service application to access data from their order processing system.

## 1. Can you describe any of the types, how they differentiate from others, and can you tell one case you would use it?

Sure, here are some common types of APIs based on their availability and usage:

- Open/Public APIs: These APIs are available for public use and can be accessed by anyone. They are typically offered by companies or organizations as a way to promote their services, products, or platforms. One example is the OpenWeatherMap API, which provides weather data for various locations around the world. It can be used by developers to integrate weather information into their applications.
- Internal/Private APIs: These APIs are used within an organization and are not accessible to the public. They are typically used to facilitate communication between different systems or services within the organization. For example, a company might use an internal API to allow different departments to share data and information.
- Partner APIs: These APIs are offered to partners or third-party developers for specific purposes. They are typically used to integrate with a specific service or platform. For example, a social media platform might offer an API to allow developers to integrate their applications with the platform and access user data.
- Composite APIs: These APIs are created by combining multiple APIs into a single interface. They are used to simplify complex processes and provide a unified interface to developers. For example, a travel booking website might use a composite API to integrate with multiple airlines and hotels.

Each type of API has its own advantages and use cases. Open/Public APIs are useful for promoting services or products, while Internal/Private APIs are useful for facilitating communication within an organization. Partner APIs are useful for integrating with specific services or platforms, while Composite APIs are useful for simplifying complex processes.

Choosing the right type of API for a specific use case depends on the specific needs of the project or application.







# 36. What are the types based on their protocol?

Based on their protocol, there are four types of APIs:

- RESTful API: REST (Representational State Transfer) is a software architectural style for building distributed systems. A RESTful API uses HTTP requests to GET, POST, PUT, and DELETE data. It is stateless, meaning that the server does not store any session data, making it easier to scale. A RESTful API is widely used in web-based applications and mobile applications.
- SOAP API: SOAP (Simple Object Access Protocol) is a messaging protocol for exchanging structured data between applications. SOAP APIs use XML to encode the data, and the messages are transported over HTTP or SMTP. It is more secure than RESTful APIs as it has built-in error handling and supports encryption. However, it is also more complex, and the data payload is larger than RESTful APIs.
- GraphQL API: GraphQL is a query language for APIs developed by Facebook. It allows the client to specify the data that it needs, and the server will return only the requested data. It uses a single endpoint for all requests, and the response is in JSON format. It is popular in modern web applications where the client needs to fetch large amounts of data.
- WebSocket API: WebSocket is a protocol for real-time communication between the client and the server. Unlike HTTP, which is request/response-based, WebSocket provides a bidirectional communication channel, enabling the server to push data to the client at any time. It is commonly used in applications that require real-time updates, such as chat applications, multiplayer games, and financial trading platforms.

Each protocol has its advantages and disadvantages, and the choice of API protocol depends on the specific use case and requirements of the application.

## 1. Can you describe any of the types, how they differentiate from others, and can you tell one case you would use it?

Sure! Here are some examples of API types based on their protocol:

- REST (Representational State Transfer) API: REST is a widely used protocol for building APIs, and it relies on HTTP to transfer data. REST APIs are stateless and allow clients to access and manipulate web resources using a set of predefined operations, such as GET, POST, PUT, and DELETE. They are popular for building web applications and mobile apps that need to interact with web servers.
- SOAP (Simple Object Access Protocol) API: SOAP is a protocol for exchanging structured information between applications. SOAP APIs are based on XML and use a set of well-defined rules for defining messages and operations. They require a more complex setup than REST APIs but offer features like encryption, authentication, and reliability. They are often used for building enterprise-level applications that require secure and reliable communication.
- GraphQL API: GraphQL is a query language for APIs that was developed by Facebook. It allows clients to request exactly the data they need and nothing more, which can help reduce overfetching and underfetching of data. GraphQL APIs have a strongly typed schema and allow clients to specify the shape and structure of the response they want. They are popular for building complex client-side applications that need to fetch data from multiple sources.

For example, if you were building a mobile app that needs to display a list of products, a REST API could be a good choice. You could define a set of endpoints for retrieving, creating, updating, and deleting products, and the mobile app could make requests to these endpoints using HTTP. On the other hand, if you were building a financial application that needs to securely transfer data between different systems, a SOAP API could be a better choice. It would allow you to define a set of operations for transferring data, and you could use encryption and authentication to ensure that the data is transmitted securely. Finally, if you were building a complex web application that needs to fetch data from multiple sources, a GraphQL API could be a good choice. It would allow you to define a schema for your data and let clients query for exactly the data they need, reducing network traffic and improving performance.

# 37. We use API options that can be passed with the endpoint to influence the response. How do we call them, and can you give a few examples?

The API options that can be passed with the endpoint to influence the response are called "query parameters" or "query strings". Here are a few examples:

- https://api.example.com/books?limit=10&offset=20: This endpoint retrieves a list of 10 books starting from the 21st book (offset=20).
- https://api.example.com/search?q=python&lang=en: This endpoint searches for books with the keyword "python" and filters the results to only include books in English.
- https://api.example.com/books?sort=title: This endpoint retrieves a list of books sorted by title in ascending order.
- https://api.example.com/books?id=1234: This endpoint retrieves the details of a book with the ID "1234".

Query parameters allow clients to specify additional information about the request, such as filtering, sorting, pagination, and search criteria.

## 1. Can you describe any of the request types, and for what purpose do we use it?

Yes, there are several types of requests that can be made through an API, including:

- GET: This request type is used to retrieve data from a server. For example, if you want to retrieve information about a user from a database, you would use a GET request to retrieve that information.
- POST: This request type is used to send data to a server to create or update a resource. For example, if you want to create a new user in a database, you would use a POST request to send the user's information to the server.
- PUT: This request type is used to update an existing resource on the server. For example, if you want to update a user's information in a database, you would use a PUT request to send the updated information to the server.
- DELETE: This request type is used to delete a resource from the server. For example, if you want to delete a user from a database, you would use a DELETE request to remove that user's information from the server.
- PATCH: This request type is used to make a partial update to an existing resource on the server. For example, if you want to update only a specific field of a user's information in a database, you would use a PATCH request to send only the updated field to the server.

Each request type serves a specific purpose in communicating with an API and can be used to interact with the server in different ways.

# 38. Discuss pro/cons of using API deployement on a virtual machine vs a managed serverless solution (e.g. GCP CloudRun).

API deployment on a virtual machine and a managed serverless solution each have their own advantages and disadvantages, which should be considered when deciding which approach to use.

Virtual Machine (VM)

Pros:

- More control: With a VM, you have complete control over the software stack, including the operating system and any libraries or dependencies that are required for your API. This can be important if you have specific requirements or need to make changes to the underlying system.
- Customizable infrastructure: You can customize the infrastructure to your exact needs, such as storage, networking, and security.
- Flexibility: A VM can be used to deploy a wide range of applications, including APIs, web applications, and databases.

Cons:

- Maintenance: Managing a VM requires ongoing maintenance, including security updates, software updates, and patches.
- Scalability: Scaling a VM can be complex, as it requires additional resources and configuration.
- Cost: VMs can be expensive to run, especially if you require a high level of resources or require custom infrastructure.

Managed Serverless Solution

Pros:

- Easy deployment: Managed serverless solutions make it easy to deploy and run your API, as they handle the underlying infrastructure for you.
- Automatic scaling: These solutions can automatically scale to handle increases in traffic, which can be beneficial for APIs that experience fluctuating traffic.
- Low maintenance: With a serverless solution, you don't need to worry about managing the underlying infrastructure, as this is taken care of for you.
- Cost-effective: You only pay for the resources you use, so serverless solutions can be cost-effective for APIs with low to moderate traffic.

Cons:

- Less control: With a serverless solution, you have less control over the underlying infrastructure, which can be a problem if you require specific configurations.
- Performance: Serverless solutions can be slower than running an API on a dedicated VM, as the infrastructure needs time to spin up when a request is made.
- Vendor lock-in: You may be locked into a specific vendor's platform if you choose to use a serverless solution, which can be problematic if you want to switch to a different provider.

Overall, the decision of whether to use a virtual machine or a managed serverless solution for API deployment will depend on your specific needs and resources. If you require a high level of control and customization or have specific requirements, then a VM may be the better choice. On the other hand, if you want a cost-effective, easy-to-deploy solution with automatic scaling, then a managed serverless solution may be more appropriate.