# Visão Geral dos Conceitos de Nuvem

Computação em Nuvem é a entrega <span style="color:brown;">sob demanda</span> de <span style="color:yellow;">poder computacional</span>, <span style="color:yellow;">banco de dados</span>, <span style="color:yellow;">armazenamento</span>, <span style="color:yellow;">aplicativos</span> e outros recursos de TI <span style="color:brown;">pela internet</span> com uma definição de <span style="color:brown;">preço conforme o uso</span>.

Permite que a organização deixe de pensar na infraestrutura como hardware e passe a pensar nela (e usá-la) como software.

No modelo de computação tradicional, a infraestrutura é hardware. Isso exige espaço, equipe, segurança física, planejamento, despesas de capital e provisionamento de capacidade por meio de tentativa de adivinhar os picos máximos teóricos. Existe um ciclo longo de aquisição de hardware. 

Já no modelo de computação em nuvem, a infraestrutura é software. As soluções são flexíveis, podem mudar com mais rapidez, facilidade e economia do que as soluções de hardware e eliminam as tarefas monolíticas de trabalho pesado.

<img src="figs/aula01/iaas_saas_paas.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:40%;"/>


- **Public**: This model represents the use of cloud services provided by third-party cloud service providers. All the infrastructure and services are managed by the provider, and the user can access and utilize these resources over the internet. It offers scalability, flexibility, and a pay-as-you-go pricing model.

- **Hybrid**: This model combines both cloud and on-premises infrastructure. It allows data and applications to be shared between them, providing greater flexibility and more deployment options. This approach can help businesses balance between having control over critical data and leveraging the benefits of cloud computing.

- **Private**: In this model, the cloud infrastructure is hosted within an organization’s own data center. It offers greater control over data, enhanced security, and compliance with regulatory requirements. This is ideal for organizations that have stringent data privacy needs and require complete control over their IT environment.

<img src="figs/aula01/aws.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:70%;"/>


### Security
- **Traditional IT:**
  - **Firewalls:** Used to protect the network from unauthorized access.
  - **ACLs:** Access Control Lists used to manage user permissions.
  - **Administradores:** Administrators who manage security policies and configurations.
- **AWS:**
  - **Grupos de segurança (Security Groups):** Control the inbound and outbound traffic to AWS resources.
  - **ACLs de rede (Network ACLs):** Provide an additional layer of security at the subnet level.
  - **IAM:** Identity and Access Management to control user permissions and access to AWS resources.

### Networking
- **Traditional IT:**
  - **Roteador (Router):** Directs data packets between networks.
  - **Pipeline de rede (Network Pipeline):** Manages the flow of data in and out of the network.
  - **Switch:** Connects devices within the same network to enable communication.
- **AWS:**
  - **Elastic Load Balancing:** Distributes incoming application traffic across multiple targets.
  - **Amazon VPC:** Virtual Private Cloud to provision a logically isolated section of the AWS cloud.

### Compute
- **Traditional IT:**
  - **Servidores locais (Local Servers):** Physical servers hosted on-premises.
- **AWS:**
  - **AMI:** Amazon Machine Images to launch virtual servers.
  - **Instâncias do Amazon EC2:** Virtual servers in the cloud providing scalable computing capacity.

### Storage and Databases
- **Traditional IT:**
  - **DAS (Direct-Attached Storage):** Storage directly attached to the server.
  - **SAN (Storage Area Network):** High-speed network of storage devices.
  - **NAS (Network-Attached Storage):** Dedicated file storage connected to a network.
  - **RDBMS (Relational Database Management Systems):** Databases hosted on-premises.
- **AWS:**
  - **Amazon EBS:** Elastic Block Store for persistent block storage.
  - **Amazon EFS:** Elastic File System for scalable file storage.
  - **Amazon S3:** Simple Storage Service for scalable object storage.
  - **Amazon RDS:** Relational Database Service for managed relational databases.


Using a cloud solution often represents a shift from Capital Expenditure (CapEx) to Operational Expenditure (OpEx).

Devido ao uso agregado de todos os clientes, a AWS pode proporcionar grande economia de escala e repassar os descontos para os clientes.

<img src="figs/aula01/scalability.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:70%;"/>

## Introdução à Amazon Web Services

Um <span style="color: #5a2ca0;">serviço web</span> é qualquer software disponibilizado pela Internet que usa um <span style="color: #5a2ca0;">formato padronizado</span>, como Extensible Markup Language (XML) ou JavaScript Object Notation (JSON), para a solicitação e resposta de uma interação de <span style="color: #5a2ca0;">Application Programming Interface (API)</span>.

O pagamento se dá apenas pelos serviços individuais necessários, pelo tempo de utilização.


### Exemplo de Solução

<img src="figs/aula01/sol1.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:50%;"/>

1. **Usuários (Users)**
   - **Users**: Represent the end-users who interact with the application or services hosted on the AWS cloud.

2. **Nuvem AWS (AWS Cloud)**
   - **AWS Cloud**: The overall cloud environment provided by Amazon Web Services (AWS).

3. **Virtual Private Cloud (VPC)**
   - **Virtual Private Cloud (VPC)**: A logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define. It provides complete control over the virtual networking environment, including selection of your IP address range, creation of subnets, and configuration of route tables and network gateways.

4. **Amazon EC2**
   - **Amazon EC2 (Elastic Compute Cloud)**: Provides resizable compute capacity in the cloud. It allows you to run virtual servers, known as instances, to host your applications and services.

5. **Amazon DynamoDB**
   - **Amazon DynamoDB**: A fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. It is used to store and retrieve any amount of data, and serve any level of request traffic.

6. **Amazon S3**
   - **Amazon S3 (Simple Storage Service)**: An object storage service that offers industry-leading scalability, data availability, security, and performance. It is used to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

- **Users** interact with the application through the **Nuvem AWS (AWS Cloud)**.
- **Virtual Private Cloud (VPC)** acts as the main networking component, isolating the environment and controlling network traffic.
- **Amazon EC2** instances run within the VPC and handle the compute workload.
- **Amazon DynamoDB** is used for database storage, interacting with EC2 instances to handle data operations.
- **Amazon S3** is used for storing and retrieving files and objects, providing durable and scalable storage.
- The components are interconnected within the VPC, ensuring secure and efficient communication between the services.

- **Redes (Networking)**
  - Includes services like VPC which provides networking capabilities.
- **Computação (Compute)**
  - Includes services like Amazon EC2 which provide computing power.
- **Banco de dados (Database)**
  - Includes services like Amazon DynamoDB for database management.
- **Armazenamento (Storage)**
  - Includes services like Amazon S3 for object storage.

This simple solution example demonstrates how different AWS services can be combined to create a secure, scalable, and efficient cloud infrastructure. The interaction between compute, database, and storage services within a Virtual Private Cloud (VPC) ensures that the application can handle various workloads and provide reliable service to end-users.


### Três maneiras de interagir com a AWS

1. **Console de Gerenciamento da AWS**
   - Interface gráfica fácil de usar
   - Provides a user-friendly graphical interface to interact with AWS services.

2. **Interface da linha de comando (CLI da AWS)**
   - Acesso a serviços por comandos ou scripts específicos
   - Allows access to AWS services using command line commands or specific scripts.

3. **Kits de desenvolvimento de software (SDKs)**
   - Acesse serviços diretamente do seu código (como Java, Python e outros)
   - Enables access to AWS services directly from your code in various programming languages like Java, Python, and others.

####  AWS Cloud Adoption Framework (CAF)

O AWS CAF oferece orientação e melhores práticas para ajudar as organizações a criar uma abordagem abrangente para a computação em nuvem em toda a organização e durante todo o ciclo de vida de TI para acelerar a adoção bem-sucedida da nuvem.

## The NIST Definition of Cloud Computing

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models.

### Essential Characteristics:

- **On-demand self-service.** A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.

- **Broad network access.** Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).

- **Resource pooling.** The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.

- **Rapid elasticity.** Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.

- **Measured service.** Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.


### Service Models

#### Software as a Service (SaaS)
The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.

A cloud infrastructure is the collection of hardware and software that enables the five essential characteristics of cloud computing. The cloud infrastructure can be viewed as containing both a physical layer and an abstraction layer. The physical layer consists of the hardware resources that are necessary to support the cloud services being provided, and typically includes server, storage and network components. The abstraction layer consists of the software deployed across the physical layer, which manifests the essential cloud characteristics. Conceptually the abstraction layer sits above the physical layer.

#### Platform as a Service (PaaS)

The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider (this capability does not necessarily preclude the use of compatible programming languages, libraries, services, and tools from
other sources). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.

#### Infrastructure as a Service (IaaS)

The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).

### Deployment Models

- **Private cloud.** The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

- **Community cloud.** The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.

- **Public cloud.** The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

- **Hybrid cloud.** The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).


## Requisitos de Máquina

PC Ubuntu 22.04 ou 24.04  (mínimo 8GBytes RAM, 4 núcleos, virtualização habilitada no BIOS, hyperthreading desabilitada, 32Gytes de disco livres)

virtualbox: é difícil montar um cluster de máquinas

core: core físico, sem hyperthread

Um número típico de cores para uma máquina intel é de 4 a 6 cores (em cada socket, de 4 a 6 cores).
Para rodar as cargas de processamento, vamos colocar um servidor ubuntu, com 2 GB. Podemos chegar até 3 máquinas virtuais; assumindo que alocamos 2 cores por máquina virtual, precisaria de 6 cores.

Precisamos de 8 GB de RAM e 6 cores.

Em computação em nuvem, tipicamente, há 4 máquinas virtuais para cada core físico. Colocando 1 máquina virtual por core, encareceria muito o custo para o cliente. Mas para nós no curso, é importante ter mais para podermos avaliar o que está acontecendo, para não termos influência negativa de uma máquina virtual em outra. Procuraremos colocar uma máquina virtual para cada core; e idealmente, fazer pinning: fixar uma máquina virtual para core (esse cenário, por questões comerciais, há 4 máquinas virtuais por core).


### VirtualBox and Clustering

1. **VirtualBox**: This is a software application that allows you to create and run virtual machines on your computer. A virtual machine (VM) is a software emulation of a physical computer.

2. **Cluster of Machines**: In a cloud computing context, a cluster refers to a group of interconnected computers that work together as a single system. Creating a cluster of VMs can be complex because it involves configuring multiple VMs to communicate and cooperate effectively.

### CPU and Cores

1. **Core**: A core is a processing unit within a CPU. It is capable of executing instructions from a computer program. Modern CPUs often have multiple cores, allowing them to perform multiple tasks simultaneously.

2. **Physical Core**: This refers to the actual hardware core in the CPU.

3. **Hyperthreading**: This is a technology used by some Intel processors that allows a single physical core to act like two logical cores, which can improve performance for certain types of tasks. Your professor is specifying to use physical cores, not hyperthreaded ones.

### Machine Requirements

1. **Number of Cores**: Typical modern Intel CPUs have 4 to 6 cores per CPU socket. For the course, they suggest using 6 cores for running VMs.

2. **RAM**: For running the VMs, 8 GB of RAM is recommended.

### Virtual Machines (VMs)

1. **Server Setup**: The course will use Ubuntu as the server operating system, with each VM allocated 2 GB of RAM.

2. **Number of VMs**: You can create up to 3 VMs, each using 2 cores. This totals 6 cores for 3 VMs (2 cores per VM).

### Cloud Computing and VM Allocation

1. **VM to Core Ratio**: In a commercial cloud environment, it's common to have multiple VMs sharing a single physical core to optimize resource usage and reduce costs. Typically, there might be 4 VMs per physical core.

2. **Course Setup**: For educational purposes, it's important to minimize interference between VMs. Therefore, your professor recommends using a 1-to-1 ratio (one VM per core) and ideally performing "pinning," which means fixing each VM to a specific core to ensure stable performance and accurate monitoring.


Requisitos:
- Máquina Ubuntu 22.04/24.04 BARE METAL
- 4/6 CORES
- 8/16 GBYTES RAM
- ESPAÇO DE 32GBYTES DISCO PARA AS VMS

# Virtualização, Hypervisors e KVM

## Virtualização

The simulation of the software or hardware upon which other software runs. This simulated environment is called a virtual machine.

A methodology for emulation or abstraction of hardware resources that enables complete execution stacks including software applications to run on it.

The use of an abstraction layer to simulate computing hardware so that multiple operating systems can run on a single computer.



A mainframe is a large, powerful computer system primarily used by large organizations for critical applications, bulk data processing, and enterprise resource planning. Mainframes are known for their high reliability, scalability, and security, making them suitable for handling massive volumes of transactions and data.

Eram caros; começou-se a estudar métodos para se ter o uso compartilhado dos recursos, também para instalação de mais de um sistema operacional. Queria-se colocar várias máquinas virtuais dentro do mainframe e, para cada máquina virtual, se ter um ambiente completamente isolado um do outro (se ter ambientes operacionais diferentes). Assim os diverso usuários poderiam executar seu trabalho de forma independente (sem interferência).

Com o aparecimento de computadores menores, o interesse pela virtualização diminui (com a opção de computadores pessoais e uma máquina por usuário). Um retorno desse interesse veio na década de 90 com o aumento da capacidade computacional dos computadores (no caso, servidores).

Houve outro boom no tópico quando se notou a possibilidade de implantar máquinas virtuais utilizando suporte de hardware. Desde 2010, há uma perda de desempenho dessas virtualizações clássicas em relação ao desempenho de virtualizações via **containers**.

<img src="figs/aula02/taxonomy_of_virtualization.png" alt="Taxonomy of Virtualization" style="width:50%;"/>

Virtualization is the process of creating a virtual version of something, such as hardware platforms, storage devices, or network resources.

**Categories of Virtualization**:
   - **Execution Environment**:
     - Virtualization of environments where processes run.
   - **Storage**:
     - Virtualization of storage devices.
   - **Network**:
     - Virtualization of network resources.

Virtualization can be done at two primary levels: 

1. **Process Level**:
   - **Technique**:
     - **Emulation**:
       - Creates an environment that mimics another system, allowing applications to run as if they are on the original hardware.
     - **High-Level VM (Virtual Machine)**:
       - Uses a high-level virtual machine to execute programs.
     - **Multiprogramming**:
       - Multiple programs run on a single processor by managing their execution.
   - **Virtualization Model**:
     - **Application**:
       - Virtualization at the application level.
     - **Programming Language**:
       - Virtualization using programming languages.
     - **Operating System**:
       - Virtualization at the OS level.

2. **System Level**:
   - **Technique**:
     - **Hardware-Assisted Virtualization**:
       - Uses hardware features to improve the efficiency of virtualization.
     - **Full Virtualization**:
       - Complete simulation of the underlying hardware to run unmodified operating systems.
     - **Paravirtualization**:
       - A virtualization technique that presents a software interface to virtual machines that is similar, but not identical, to that of the underlying hardware.
     - **Partial Virtualization**:
       - Only some parts of the target environment are virtualized.
   - **Virtualization Model**:
     - **Hardware**:
       - Virtualization at the hardware level.


System Level: o foco maior é a virtualização do processador, mas há interesse na virtualização de outros itens (todo o entorno do processador é atualmente o gargalo do desempenho).





#### Summary of "Formal Requirements for Virtualizable Third Generation Architectures" by Gerald J. Popek and Robert P. Goldberg

The paper by Popek and Goldberg presents a formal analysis of the requirements for third generation computer architectures to support virtualization. The key contributions of the paper include defining what constitutes a virtual machine (VM), outlining the characteristics of a virtual machine monitor (VMM), and establishing the conditions under which a third generation architecture can support virtual machines.

##### Key Concepts

1. **Virtual Machine (VM)**: An efficient, isolated duplicate of a real machine, where programs running under a VM experience an environment identical to the real machine with only minor performance overhead.
2. **Virtual Machine Monitor (VMM)**: A software layer that creates and manages virtual machines, providing an environment identical to the underlying hardware, maintaining control over system resources, and ensuring efficient execution of most instructions directly by the hardware.

##### Formal Model

- The authors develop a model of a third-generation-like computer system, specifying necessary assumptions about its behavior, state-space, and state transitions.
- The model includes a processor with supervisor and user modes, relocation registers for memory addressing, and a set of conventional instructions.

##### Conditions for Virtualization

The paper establishes a critical condition for an architecture to support virtualization:
- **Sensitive Instructions**: Instructions that can affect the hardware state in a way that could interfere with the operation of the VMM must be a subset of the privileged instructions. This ensures that any sensitive operation will trap to the VMM, allowing it to maintain control over system resources.

##### Major Theorems

1. **Theorem 1**: For any conventional third-generation computer, a VMM can be constructed if the set of sensitive instructions is a subset of the privileged instructions.
2. **Theorem 2**: A conventional third generation computer is recursively virtualizable if it is virtualizable and a VMM without timing dependencies can be constructed for it.
3. **Theorem 3**: A hybrid virtual machine monitor (HVM) can be constructed for any conventional third generation machine where user-sensitive instructions are a subset of privileged instructions.

##### Practical Implications

- The formal techniques provided in the paper can be applied to evaluate existing architectures and design new architectures to support virtualization.
- The results have been used to modify existing systems, such as the DEC PDP-11/45, to support virtual machines.

##### Conclusion

The paper concludes that while the model captures essential aspects of third generation virtual machines, some simplifications were made for presentation purposes. Empirical evidence suggests that additional complexities, such as I/O operations and asynchronous events, can be integrated into the model. The formal techniques outlined may also be applied to newer architectures designed to support virtualization without traditional VMM overhead.


Virtualization is a core technology used for the implementation of cloud computing. It increases the utilization of resources such as processor, storage, network etc. by collecting various underutilized resources available in the form of a shared pool of resources built through the creation of Virtual Machines (VMs).

The requirements in cloud environment are dynamic therefore there is always a need to move virtual machines within the same cloud or amongst different clouds. This is achieved through migration of VMs which results in several benefits such as saving energy of the host, managing fault tolerance if some host is not working properly and load balancing among all hosts.

Cloud is a parallel and distributed computing system consisting of a **collection of inter connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources** based on service level agreement (SLA) established through negotiation between the service provider and consumers.

Virtualization creates an abstract layer over the actual hardware and software. It emulates a physical machine in software to run multiple operating systems on single machine hardware. The main goal of virtualization is to utilize the maximum capacity of available resources such as processor, storage and network. By creating virtual machines, it collaborates multiple unutilized resources into a shared resource pool and utilizes them by performing different tasks simultaneously to fulfill multiple user demands. These resources can be scaled on virtual machines (i.e. allocated dynamically).

There can be various types of virtualizations like -

- **Application Virtualization** – In this, application/s including operating system of host machine is moved to the virtual environment. It is a technology in which the application is present somewhere else but is accessed by the client computer. The application behaves same as the local application on the client system. For example - VMWare Thinapp, Oracle secure Global desktop, etc.

- **Storage Virtualization** – It provides a virtual storage environment by collecting or combining various physical storages. Through this, distributed storage is managed in such a way as if it is one consolidated storage. After this virtualization, the availability of storage increases because now the applications do not have limited or a specific resource. The storage can be updated any time without affecting the performance of the application.

- **Server Virtualization** – In this, existing server is moved into a virtual environment i.e. hypervisor, which is hosted on a physical server. The resources of server are hidden from clients and the physical server is divided into multiple virtual environments. Web server virtualization is one of the most popular examples of this technology used for providing low cost web hosting services.

- **Hardware Virtualization** – This virtualization makes hardware components of real machine as virtual components. This technology hides all the physical components and details of actual computing platform from end users.


Virtualization is done by using a **hypervisor**, a software which acts as an intemediator between virtual machine and physical hardware. It is used to create virtual machines.

The hypervisor manages virtual hardware and guest operating system on the said hardware on a virtual platform. Hypervisor can be native (Type-1) or Hosted (Type-2).

Type-1 hypervisor layer comes before the Operating System and runs on hardware directly to manage the guest Operating System. This type of virtualization is known as full virtualization.

Type-2 hypervisor requires host Operating System to run it and the guest operating Systems are then managed by the hypervisor. This type of virtualization is called Para-virtualization.


<img src="figs/aula02/Type-1-and-Type-2-Hypervisor.png" alt="Taxonomy of Virtualization" style="width:50%;"/>

These
resources can be allocated or de-allocated dynamically on VMs allowing a single physical host to be converted into number of virtual hosts. Each virtual host delivers a secure and isolated environment for applications. These environments can be customized in the form of software and hardware platform according to the demand.

O sistema que suporta a implantação de MV é o gerenciador (ou monitor) de MV; também chamado de Hypervisor. O Hypervisor pode rodar em cima:

- da plataforma de hardware (solução bare metal, **Tipo 1**) 
- de outro sistema operacional (**Tipo 2**).



Técnicas para Virtualização:

- Complete Machine Emulation (Hosted Interpretation): É a mais poderosa de todas, permite a execução de código de qualquer processador na sua máquina, bastando que se tenha o emulador específico para aquele processador. Mas se precisa modelo do processador e olhar instrução por instrução (é muito lento). 

- Full Virtualization (Execução Direta): muito associada com hypervisor Tipo 1.

    - Execução Direta com Trap-and-Emulate: nela, toda instrução que mexe com recursos sensíveis precisa ser executada de forma protegida. Gera-se problemas com x86.

    - Execução Direta com Binary Translation: Exemplo: VMware's Dynamic Binary Translation. Faz simulação (não emulação). As instruções que são executadas a nível de usuário não requerem muito trabalho; já as execuções executadas a nível kernel (sistema operacional) requerem um software que traduz dinamicamente, sob demanda, as linhas de código para o código nativo do seu processador. A parte do código que não faz chamadas críticas é rodado rapidamente, em tempo de bare-metal (a execução vai direto pro processador). Mas a parte do código relacionada ao kernel roda mais devagar.

    - Execução Direta com Hardware-Assisted Virtualization: Ex Hardware-Assisted CPU Virtualization (Intel VT-x). VT-x se refere ao processador, mas o desempenho depende de outros hardwares também. 

- Paravirtualization: Quando se necessita mudar o código para que ele fique ciente do virtualizador que se tem embaixo. O OS guest é recompilado sendo mapeado para o ambiente operacional para onde ele vá. A parte do usuário em si não precisa ser mudado (é jogado direto pra ser processado), precisando apenas mudar e mapear o código que roda a nível de kernel.

Emulação: software executa instrução a instrução do arquivo.

Um programa real possui uma fração que roda dentro do sistema operacional (operações de entrada e saída, por exemplo) e outras que usam a nível de usuário. Execução Direta com Binary Translation roda rapidamente a primeira, e faz uma tradução rápida para a segunda. 



### Complete Machine Emulation

- VMM implementa a arquitetura completa do hardware em software.
- O VMM segue as instruções da VM e atualiza o hardware emulado conforme necessário.
- Pode lidar com todos os tipos de instruções, mas é muito lento.


QEMU (Quick EMUlator) is an open-source emulator and virtualizer. It provides the ability to emulate entire hardware systems and supports a wide range of architectures. QEMU can emulate the hardware of various computer systems, including CPUs, memory, storage, network interfaces, and other peripherals. This allows software written for one architecture to run on a completely different architecture.


### Execução Direta com Trap-and-Emulate

- VMM (Ring 0)
- Guest OS (Ring 1)
- Guest Applications (Ring 3)

Executar a maioria das instruções de convidado nativamente no hardware (assumindo que o sistema operacional convidado seja executado na mesma arquitetura do hardware real)

Os aplicativos são executados no anel 3 (não é possível acessar a memória pertencente ao sistema operacional convidado (anel 1))

O SO convidado é executado no anel 1 (não é possível acessar a memória pertencente ao VMM (anel 0))

Não é possível permitir que o sistema operacional convidado execute instruções confidenciais diretamente!
- Goldberg (1974)'s two classes of instructions:
  - *privileged instructions*: those that trap when in user mode
  - *sensitive instructions*: those that modify or depend on hardware configurations

Quando o sistema operacional convidado executa uma instrução privilegiada, será interceptado para o VMM.

Quando os aplicativos convidados geram uma interrupção de software, serão interceptados no VMM.

Instrução privilegiada é executada no modo núcleo, só podem ser rodadas pelos OS. Se está no modo usuário, se provoca um desvio para o OS, através de um *trap*. 

*TRAP*: Ás vezes há causas que causam interrupção do fluxo normal de execução do processador, para que a execução seja desviada para um outro local (geralmente esse local é apontado por um vetor de interrupções, uma grande tabela que tem o número da interrupção que chegou como indexador).

O x86 fez uma instrução sensível que não era privilegiada: certas situações que deveriam provocar a mudança do estado do processo sendo rodado pelo usuário deveria cair pro SO do usuário que tá rodando na MV. Mas isso não ocorre, por a instrução sensível é uma instrução normal no x86. Isso gerou problemas.

O hypervisor ou gerenciado (monitor) de MV roda no nível 0 (de mais alta prioridade: capaz de executar tudo que tem direito).

As aplicações do usuário rodam no nível 3.

O OS guest roda no nível 1, pois não pode estar no nível 0 (se estivesse, poderia controlar o VMM, uma vez qeu teria acesso a tudo). Ao se rodar esse tipo de coisa, pode haver vários guest OSs; se poderia perturbar a vida dos outros guest OSs.

Se todas as operações sensíveis são subconjunto das operações privilegiadas, não há problemas.

#### Trap and Emulate

- Goal: hand off sensitive operations to the VMM
- Reality: privileged operations trap to VMM
- VMM emulates the effect of privileged operations on virtual hardware provided to the guest OS
- VMM controls how the VM interacts with physical hardware
- VMM fools the guest OS into thinking that it runs at the highest privilege level
- Performance implications
  - Almost no overhead for non-privileged instructions
  - Large overhead for privileged instructions



Teorema de Popek e Goldberg: *Uma máquina pode ser virtualizada usando trap-and-emulate se toda instrução sensível for privilegiada*.

### Execução Direta com Tradução Binária

VMM reescreve as instruções dinamicamente, para que as instruções não virtualizáveis possam capturar o VMM. 

É o principal ponto de venda da VMware.

Dado que trap-and-emulate tem aqueles problemas de trap aquelas instruções. Nessa solução, as instruções são monitoradas para que instruções sensíveis são traduzidas.

- Binary: Input is binary x86 code
- Dynamic: Translation happens at runtime
- On demand: Code is translated only when it is about to execute
- System level: Rules set by x86 ISA, not higher-level ABIs
- Subsetting: Output a safe subset of input full x86 instruction set
- Adaptive: Translated code is adjusted according to guest behavior changes

Garantir eficiência é um problema dessa solução.



### Execução Direta com Hardware-Assisted Virtualization

#### Virtualização de CPU assistida por hardware (Intel VT-x)

Dois novos modos de execução (ortogonais aos anéis de proteção)
- Modo root VMX: igual ao x86 sem VT-x
- Modo não root VMX: executa a VM; instruções sensíveis causam transição para o modo root, mesmo no anel 0

Nova estrutura de hardware: VMCS (estrutura de controle de máquina virtual)
- Um VMCS para um processador virtual
- Configurado pelo VMM para determinar quais instruções confidenciais causam a saída da VM
- Especifica o estado do SO convidado


<img src="figs/aula02/vtx.png" alt="Taxonomy of Virtualization" style="width:50%;"/>

Sem o VT-x, o guest OS ficava no Ring 1; certas instruções do guest OS chamavam uma interrupção que caía no hypervisor; e outros códigos do guest OS rodavam direto em cima do hardware (Ring 3 indo direto pro hardware). Mas certas coisas rodando no guest OS e nas guest applications não chamavam o hypervisor.

Com o VT-x, o hypervisor e o guest OS ficam no Ring 0. Mas não terá um excesso de acesso ao guest OS, pois ele entra no modo VMX non-root. O VMX dá acesso diferenciado para algumas coisas. 

### KMV-QEMU

- QEMU (Userspace process): Works with binary translation if no hardware support; sets up guest VM memory as part of userspace process
- KMV (kernel module): When invoked, KMV switches to VMX mode to run guest
- CPU with VMX mode: CPU switches between VMX and non-VMX root modes

O KMV é o necessário para se instalar uma MV, não precisa do QEMU. O QEMU, um emulador, costuma vir junto apesar de não ser necessário para se ter a MV. Mas o QEMU é muito bom para emular entrada-saída (disco, vídeo, terminal de console, rede); há até emulador de SSD. KVM também tem coisas input-output.

<img src="figs/aula02/qemu.png" style="width:50%;"/>

<img src="figs/aula02/qemu_kvm.png" style="width:50%;"/>

<img src="figs/aula02/qemu_kvm2.png" style="width:50%;"/>

- QEMU creates guest physical memory, one thread per VPCU
- QEMU VCPU thread gives KVM_RUN command to KVM kernel module
- KVM configures VM information in VMCS, launches guest OS in VMX mode
- Guest OS runs natively on CPU until VM exit happens
- Control returns to KVM/Host OS on VM exit
- VM exits handled by KVM or QEMU
- Host schedules QEMU like any other process, not aware of guest OS


### LIBVIRT

<img src="figs/aula02/libvirt.png" style="width:50%;"/>


<span style="color:yellow;">1h36</span>
https://www.youtube.com/watch?v=BLpv0BZG3II

In emulation, the Virtual Machine Monitor (VMM) provides hardware simulation, making it independent of the underlying system hardware. This is because emulation simulates the entire hardware environment, allowing the guest system to run regardless of the host system's hardware.

VirtualBox is an example of a Type-2 hypervisor, which runs on top of an existing operating system rather than directly on the hardware.

## Instalação do Virtual Machine Manager

Seguindo o tutorial
https://www.tecmint.com/install-qemu-kvm-ubuntu-create-virtual-machines/
no meu PC.

alex@alex-inspiron:~$ egrep -c '(vmx|svm)' /proc/cpuinfo
8

Ou seja, há 8 cores disponíveis (mas pode haver hyperthreading). 

alex@alex-inspiron:~$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

/dev/kvm é a porta de entrada para o módulo KVM. Aí que o QEMU pede recursos

alex@alex-inspiron:~$ sudo apt install qemu-kvm virt-manager virtinst libvirt-clients bridge-utils libvirt-daemon-system -y

At this point, we have installed QEMU and all the essential virtualization packages. 

sudo systemctl enable --now libvirtd

sudo systemctl start libvirtd

alex@alex-inspiron:~$ sudo systemctl status libvirtd
● libvirtd.service - Virtualization daemon
     Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2024-06-26 19:34:17 -03; 13min ago
TriggeredBy: ● libvirtd-ro.socket
             ● libvirtd.socket
             ● libvirtd-admin.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
   Main PID: 13661 (libvirtd)
      Tasks: 21 (limit: 32768)
     Memory: 10.7M
        CPU: 651ms
     CGroup: /system.slice/libvirtd.service
             ├─13661 /usr/sbin/libvirtd
             ├─13821 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvir>
             └─13822 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvir>
jun 26 19:34:17 alex-inspiron systemd[1]: Started Virtualization daemon.
jun 26 19:34:18 alex-inspiron dnsmasq[13821]: started, version 2.90 cachesize 150
jun 26 19:34:18 alex-inspiron dnsmasq[13821]: compile time options: IPv6 GNU-getopt DBus no-UBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP conntrack >
jun 26 19:34:18 alex-inspiron dnsmasq-dhcp[13821]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
jun 26 19:34:18 alex-inspiron dnsmasq-dhcp[13821]: DHCP, sockets bound exclusively to interface virbr0
jun 26 19:34:18 alex-inspiron dnsmasq[13821]: reading /etc/resolv.conf
jun 26 19:34:18 alex-inspiron dnsmasq[13821]: using nameserver 127.0.0.53#53
jun 26 19:34:18 alex-inspiron dnsmasq[13821]: read /etc/hosts - 8 names
jun 26 19:34:18 alex-inspiron dnsmasq[13821]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 names
jun 26 19:34:18 alex-inspiron dnsmasq-dhcp[13821]: read /var/lib/libvirt/dnsmasq/default.hostsfile

From the output above, the libvirtd daemon is up and running as expected. 

sudo usermod -aG kvm $USER

sudo usermod -aG libvirt $USER

The next step is to launch the QEMU/KVM GUI tool which is the Virtual Machine Manager.

sudo virt-manager

Um novo programa "Virtual Machine Manager" pops up. From here, you can start creating and managing virtual machines.

Instalei o Ubuntu Server 22.0 nesse Virtual Machine Manager

### Ubuntu Server 22.0

<img src="figs/aula02/cpu.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:50%;"/>

#### Sysbench CPU Benchmark Results

##### Test Configuration
- **Sysbench Version**: 1.0.20 (using system LuaJIT 2.1.0-beta3)
- **Number of Threads**: 1
- **Prime Numbers Limit**: 10000

##### CPU Speed
- **Events Per Second**: 1049.97

The CPU handled 1049.97 events per second, which indicates the performance of the CPU in processing the workload.

##### General Statistics
- **Total Time**: 10.0006 seconds
- **Total Number of Events**: 10502

The total time for the test was approximately 10 seconds. The CPU processed a total of 10502 events during the test.

##### Latency (milliseconds)
- **Minimum**: 0.94 ms
- **Average**: 0.95 ms
- **Maximum**: 3.18 ms
- **95th Percentile**: 1.01 ms
- **Sum**: 9990.28 ms

##### Threads Fairness
- **Events (avg/stddev)**: 10502.0000/0.00
- **Execution Time (avg/stddev)**: 9.9903/0.00 seconds

The average and standard deviation of events processed by the thread were 10502.0000 and 0.00, respectively, indicating consistent performance.

The average execution time was 9.9903 seconds with no deviation, showing uniformity in execution time across the test.

These results provide an overview of the CPU's performance under the given test conditions, highlighting its event processing capability and latency metrics.



<img src="figs/aula02/memory.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:50%;"/>

The memory test performed 44,148,636 operations, with an average rate of 4,414,078.59 operations per second.

A total of 43,113.90 MiB of data was transferred, with an average transfer rate of 4,310.62 MiB/sec.

These results provide an overview of the memory performance under the given test conditions, highlighting the operations per second and data transfer rates, along with latency metrics.

The total time for the test was approximately 10 seconds. The CPU handled 44,148,636 events during the test.


<img src="figs/aula02/fileio.png" alt="IaaS, PaaS, and SaaS Comparison" style="width:50%;"/>

The test performed 0 reads per second as it was a sequential write test. The test performed 979.05 writes per second. The test performed 1259.57 fsyncs per second.

The read throughput was 0.00 MiB/s because the test did not involve reading operations. The write throughput was 15.30 MiB/s, indicating the rate at which data was written to the files.

The average and standard deviation of events processed by the thread were 22,280.0000 and 0.00, respectively, indicating consistent performance. The average execution time was 9.9463 seconds with no deviation, showing uniformity in execution time across the test. These results provide an overview of the file I/O performance under the given test conditions, highlighting the write operations, throughput, and latency metrics.



# Docker e Containers

https://www.youtube.com/watch?v=Sc9XOSTjFcU

1h41min

Máquinas virtuais são uma tecnologia-chave para a implementação da IAAS. Virtualização a nível de sistemas operacionais é importante para a implementação de PAAS; é uma virtualização mais leve e rápida.

Essa virtualização a nível de sistemas operacionais não usa o conceito de máquinas virtuais (não é possível instalar o seu OS em cada sistema virtualizado).

<img src="figs/aula03/container_x_vm.png"  style="width:80%;"/>

Os containers compartilham o OS base, possuem diferentes conjuntos de bibliotecas, utilitários, sistemas de arquivos raiz, visualização da árvore de processes, rede, etc. As VMs possuem cópiad diferentes do próprio OS. Os containers exervem menos sobrecarga do que as VMs, mas também apresentam menos isolamento.

Cada container acha que tem o OS só para ele. A máquina virtual acha que tem tudo pra ela (não só o OS, mas também hardware).

Quando baixo uma imagem de um OS, não baixa o kernel junto (pode-se ter separados root file systems, libraries, utilities; mas não kernel); já uma máquina virtual tem seu próprio kernel. Numa máquina com containers, o kernel é compartilhado.

In Unix-like operating systems, the user with ID 0 is known as the root user, who possesses the highest level of privileges and control over the system. The process with Process ID (PID) 1 is the initial process started by the kernel during the system boot sequence. This process, commonly known as "init" or "systemd" in modern Linux distributions, is responsible for mounting the root filesystem and initiating other essential system processes, thereby setting up the user space environment.

From within a container, it is not possible to see the processes of another container. Containers isolate their own processes, hiding any processes created within them from other containers. This isolation ensures that each container operates independently and securely, maintaining a clear separation of processes between containers.


## Namespaces



Namespaces are a Linux kernel feature released in kernel version 2.6.24 in 2008. They provide processes with their own system view, thus isolating independent processes from each other. In other words, namespaces define the set of resources that a process can use (you cannot interact with something that you cannot see). At a high level, they allow fine-grain partitioning of global operating system resources such as mounting points, network stack and inter-process communication utilities. A powerful side of namespaces is that they limit access to system resources without the running process being aware of the limitations. In typical Linux fashion they are represented as files under the `/proc/<pid>/ns` directory (these files allow you to inspect and manage the namespaces) ( `<pid>` represents the process id).

Think of them as creating mini-systems within the larger operating system. Each of these mini-systems, or namespaces, has its own set of resources, like files, network interfaces, and communication tools. This means that processes running in different namespaces are kept separate and can't see or interfere with each other’s resources. A process in one namespace doesn't even know about the existence of resources in another namespace. 

The processes run as if they have full access to the system, but in reality, their view is limited to what the namespace allows. This is particularly useful for creating secure and isolated environments for applications.

Namespaces provide a way to create isolated environments within the same operating system, enhancing security and resource management, and they are crucial for technologies like containers, which rely on this isolation to function effectively.

<img src="figs/aula03/namespace1.png"  style="width:50%;"/>

Containers on Linux utilize the namespaces provided by the Linux kernel. On Linux, the containers are running as normal processes which share the same kernel as other processes. On Windows or Mac, one virtual machine is provided first. All the containers are running inside the virtual machine.

In computer science, everything can be represented by an object, including network devices, processes, threads, PID numbers, routing tables, file systems, etc. If we define a namespace object, then put the network device objects as a member of this namespace object, then we are able to restrict the scope of network devices to a certain namespace. The same goes for other objects.

Each process on Linux has a unique PID number. The processes are in a tree structure. When one process creates another, it becomes the parent of the other process. The first process on Linux is the init process which has PID=1.

<img src="figs/aula03/init.png"  style="width:35%;"/>

Processes can create a new `pid_namespace` and put their child processes in the new PID namespace. The new `pid_namespace` becomes a child the parent pid_namespace. The PID number in a new `pid_namespace` starts with 1.

<img src="figs/aula03/init2.png"  style="width:35%;"/>

When a process inside a `pid_namespace` sees itself, it sees its subjective PID from within its own `pid_namespace`. However, when another process outside of the `pid_namespace` sees it, the other process uses its own `pid_namespace` as reference. It sees a different PID number.

In the picture, we have a three-layer PID namespace hierarchy. The processes in the `child-namespace` see themselves having PID 1,2,3. However, processes from `init-namespace` see the processes in the `child-namespace` having PID 4,5,6.

Processes in the `grandchild-namespace` see themselves having PID 1,2,3. However, the processes from `child-namespace` see them having PID 4,5,6. The processes from `init-namespace` see them having PID 7,8,9.

### Two Mechanisms in the Linux Kernel on which Containers are Built:

**Namespaces:**  
Namespaces provide an isolated view of a global resource (such as the root filesystem) for a set of processes. Processes within a namespace see only their portion of the global resource. This isolation ensures that processes in different namespaces cannot see or interact with each other's resources.

**Cgroups:**  
Cgroups (Control Groups) are a way to define resource limits for a group of processes. They allow the allocation and restriction of resources like CPU time, memory, disk I/O, etc., ensuring that no single process can exhaust system resources.

Together, namespaces and cgroups allow us to isolate a set of processes into a "bubble" and set resource limits for them. This combination ensures both isolation and resource management for processes.

Implementations of containers like LXC and Docker leverage these mechanisms to build container abstractions. LXC is a general-purpose container, while Docker is optimized for single applications.

Frameworks like Docker Swarm or Kubernetes help manage multiple containers across hosts, providing features like automatic scaling, lifecycle management, and more.


Namespaces: group of Processes with an Isolated/Sliced View of a Global Resource

In Linux, all processes start in the default namespace. Through system calls, new namespaces can be created, and processes can be assigned to these namespaces.

#### Which Resources Can Be Isolated?

1. **Mount Namespace:** Isolates the filesystem mount points seen by a group of processes. The `mount()` and `umount()` system calls affect only the processes within this namespace.

2. **PID Namespace:** Isolates the PID number space seen by processes. For example, the first process in a new PID namespace has a PID of 1.

3. **Network Namespace:** Isolates network resources like IP addresses, routing tables, and port numbers. For example, processes in different network namespaces can reuse the same port numbers.

4. **UTS Namespace:** Isolates the hostname and NIS domain name.

5. **User Namespace:** Isolates the UID/GID number space. For example, a process can have a UID of 0 (root) within a namespace but have no privileges outside that namespace. UID mappings between the parent and child namespaces are specified.

6. **IPC Namespace:** Isolates POSIX message queues and other IPC resources.


The UTS (UNIX Time-Sharing) namespace allows you to modify the hostname and domain name of your machine. Each process can have a different hostname and domain name, which can be modified as needed. This feature provides isolation at the system identity level, ensuring that each process or container can operate with its own unique system identifiers.

Namespace is more powerful than `chroot()`, which isolates only the filesystem root.

The filesystem root in Linux, denoted by "/", is the topmost directory in the hierarchy of the filesystem. It serves as the starting point from which all other directories and files branch out. When you list the contents of the root directory, you typically see system-critical directories like `/bin`, `/etc`, `/home`, `/lib`, `/var`, and more. This root directory is fundamental to the organization and structure of the filesystem, acting as the anchor point for all other files and directories on the system.

The UID (User ID) and GID (Group ID) number space in Linux refer to the unique identifiers assigned to each user and group, respectively. Every user on a Linux system has a unique UID, and every group has a unique GID. These IDs are used by the system to determine ownership and permissions for files and processes. The UID/GID number space is essentially the range of possible values for these identifiers, allowing the system to manage and distinguish between different users and groups efficiently. 

For example, the root user typically has a UID of 0, which grants it superuser privileges. Other users will have different UIDs, ensuring that they have access only to their own files and resources. Similarly, groups are identified by GIDs, which help manage permissions and access control for multiple users collectively.

IPC stands for Inter-Process Communication. IPC resources in Linux are mechanisms that allow processes to communicate and synchronize their actions. Common IPC methods include message queues, semaphores, shared memory, and sockets. These resources are crucial for enabling different processes to work together by exchanging data or signaling events.

POSIX stands for Portable Operating System Interface. It is a family of standards specified by the IEEE for maintaining compatibility between operating systems. POSIX defines APIs and interfaces for system calls, file handling, IPC, and more, ensuring that software developed on one compliant system can run on another with minimal modification.



Namespaces are a more powerful and flexible mechanism for isolation in Unix-like operating systems compared to `chroot()`. While `chroot()` only isolates the filesystem root, essentially changing the apparent root directory for a process and its children, namespaces provide isolation across multiple aspects of the operating system environment. 

Namespaces can isolate:

1. **Process IDs (PID namespace)**: Each container or process group can have its own PID namespace, allowing processes to have the same PID in different containers without conflict.

2. **Mount points (Mount namespace)**: This allows each container to have its own filesystem structure, independent of the host or other containers.

3. **Network interfaces (Network namespace)**: Each container can have its own network interfaces, IP addresses, routing tables, and firewall rules.

4. **Interprocess Communication (IPC namespace)**: This isolates communication mechanisms like message queues and semaphores, so they are not shared between containers.

5. **UTS (UNIX Time-Sharing namespace)**: This allows each container to have its own hostname and domain name, providing isolation at the system identity level.

6. **User IDs (User namespace)**: This enables each container to have its own set of user and group IDs, enhancing security by allowing processes to run as non-root users within the container even if they are root on the host.

7. **Control groups (Cgroups namespace)**: This provides resource management and isolation for groups of processes, controlling CPU, memory, disk I/O, and network usage.

By isolating these various aspects, namespaces provide a much more comprehensive and robust form of isolation compared to `chroot()`, which only limits the view of the filesystem. This makes namespaces ideal for creating containers and other isolated environments, enabling multiple isolated instances to run on a single host system without interfering with each other.

Namespaces are the essence of containerization and Docker, providing the foundational isolation required for containers. However, namespaces alone are not sufficient for Docker to function effectively. In addition to namespaces, Docker relies on other technologies such as cgroups for resource management, union file systems for efficient storage, and a robust networking stack to enable communication between containers and the outside world. These additional components work together to create the complete containerization platform that Docker provides, ensuring containers are isolated, resource-efficient, and easy to manage.

### Namespaces API

System Calls Related to Namespaces:

1. **clone()**:
    - Used to create a new process and place it in a new namespace. This is a more general version of `fork()`.
    - Example:
      ```c
      childPID = clone(childFunc, childStack, flags, arg);
      ```
    - The `flags` specify what should be shared with the parent and what should be created anew for the child (including virtual memory, file descriptors, namespaces, etc.).

2. **setns()**:
    - Allows a process to join an existing namespace. 
    - The arguments specify which namespace and its type.

3. **unshare()**:
    - Creates a new namespace and places the calling process in it. 
    - The `flags` indicate which namespace to create. 
    - Forking a process and calling `unshare()` is equivalent to `clone()`.

- Once a process is in a namespace, it can open a shell and perform other useful tasks within that namespace.
- By default, forked children of a process belong to the parent's namespace.


To effectively use namespaces in Linux, there are three crucial system calls: `clone`, `setns`, and `unshare`.

1. **clone**: This system call is a more versatile version of `fork`. It creates a new process and can place it into a new namespace. With `clone`, you can specify which namespaces (e.g., PID, network, mount, etc.) should be shared with the parent process and which should be created anew, allowing for fine-grained control over process isolation.

2. **setns**: This system call allows an existing process to join an already established namespace. Essentially, it instructs the process to enter a specified namespace, such as a PID, network, or mount namespace. The target namespace must exist beforehand, and this call changes the namespace context of the process to that of the specified namespace.

3. **unshare**: This system call enables a process to disassociate from its current namespace and create a new one. When a process calls `unshare`, it breaks away from its existing namespace, creating a new namespace for specified resources. This allows the process to have a separate environment for those resources. If a process forks and then calls `unshare`, it is effectively the same as using `clone` to create a new namespace from the start.

These system calls are fundamental for creating isolated environments, which are essential for containerization. They enable processes to operate in distinct namespaces, ensuring isolation and independence from the host system and other containers.

Each application you run on your computer, like a web browser, text editor, or game, is a process. For example, when you open a web browser, the operating system creates a process to run it. Operating systems also run many background processes that manage hardware, perform scheduled tasks, and provide various system services. These include processes like `init`, which is the first process started by the kernel and remains running to manage system startup and shutdown.

The `fork` system call in Unix-like operating systems is used to create a new process by duplicating the existing process. The new process is referred to as the child process, while the original process is called the parent process. Here's a more detailed explanation:

When a process calls `fork`, the operating system creates a new process that is an exact copy of the parent process. This includes a copy of the parent’s memory, file descriptors, and execution state.
The `fork` call returns twice: once in the parent process and once in the child process.
- In the parent process, `fork` returns the Process ID (PID) of the newly created child process.
- In the child process, `fork` returns 0.

After the `fork` call, both the parent and child processes continue executing from the point where the `fork` call was made. They have separate address spaces, meaning changes in the memory of one process do not affect the other.

`fork` is commonly used to perform parallel execution. For example, a server might `fork` a new process to handle each incoming client request, allowing the server to handle multiple clients simultaneously.

Each process runs independently, providing isolation. This is crucial for tasks that require a high degree of separation between different operations.

Modern operating systems use a technique called copy-on-write (COW) for `fork`. This means that the parent and child processes share the same memory pages initially, and pages are copied only when they are modified, which makes `fork` more efficient. Since `fork` creates two processes running concurrently, proper synchronization mechanisms (like semaphores or mutexes) may be needed to coordinate shared resources between parent and child processes. The `fork` system call is a fundamental mechanism in Unix-like operating systems for process creation and is the basis for process control and management in these systems.

### Namespace handles

The directory `/proc/PID/ns` of a process contains information about which namespace a process belongs to. These are symbolic links pointing to the inode of that namespace ("handle").

Example:
```sh
$ ls -l /proc/$$/ns
# $$ is replaced by shell's PID
total 0
lrwxrwxrwx. 1 user user 0 Jan  8 04:12 ipc  -> ipc:[4026531839]
lrwxrwxrwx. 1 user user 0 Jan  8 04:12 mnt  -> mnt:[4026531840]
lrwxrwxrwx. 1 user user 0 Jan  8 04:12 net  -> net:[4026531956]
lrwxrwxrwx. 1 user user 0 Jan  8 04:12 pid  -> pid:[402653186]
lrwxrwxrwx. 1 user user 0 Jan  8 04:12 user -> user:[4026531837]
lrwxrwxrwx. 1 user user 0 Jan  8 04:12 uts  -> uts:[4026531838]
```

- The namespace identifier can be used in system calls (for example, as an argument to `setns`).
- Processes in the same namespace will have the same identifier. A new identifier will be created when a new namespace is created.


### PID Namespaces (1)

- **The first process to be created in a new PID namespace** will have `PID=1` and will act as the init process in that namespace.
    - This process will reap orphans in this namespace.

- **Processes in a PID namespace** get a separate PID number space.
    - The child of the init process gets `PID=2` onwards.

- **A process can see all other processes in its own or nested namespaces**, but not in its parent namespace.
    - Example:
        - P2 and P3 are not aware of P1 (parent PID of P2 = 0).
        - P1 can see P2 and P3 in its namespace (with different PIDs).
        - P2=P2' (just different PIDs in different namespaces).

<img src="figs/aula03/pid1.png"  style="width:20%;"/>


### PID Namespaces (2)

- **First process in a namespace acts as init and has special privileges**:
    - Other processes in the namespace cannot kill it.
    - If the init process dies, the namespace is terminated.
    - However, the parent process can kill the init process in the parent namespace.

- **Who reaps whom?**:
    - The init process is reaped by the parent in the parent namespace.
    - Other child processes are reaped by their parent in the same namespace.
    - Any orphan process in a namespace is reaped by the init process of that namespace.

<img src="figs/aula03/pid2.png"  style="width:20%;"/>

### PID Namespaces (3)

Namespace-related system calls have slightly different behavior with PID namespaces alone. `clone()` creates a new namespace for the child as expected. However, `setns()` and `unshare()` do not change the PID namespace of the calling process. Instead, the child processes will begin in a new PID namespace.

Why this difference? If the namespace changes, the PID returned by `getpid()` will also change. However, many programs assume that `getpid()` returns the same value throughout the life of the process. `getpid()` returns the PID in the namespace the process resides in.


### Mount Namespaces

When a Linux system boots up, it starts with an initial filesystem structure that contains the necessary directories and files for the operating system to function. This initial structure can be extended by mounting additional filesystems. For instance, a USB drive (pen drive) has its own independent directory structure. To access the contents of a USB drive in Linux, you need to mount it to a directory within the existing Linux filesystem. This is done using the mount command, which integrates the USB drive’s filesystem with the Linux filesystem.

Once the USB drive is mounted, its directory structure becomes part of the overall filesystem hierarchy of the system. This integration allows users to access and interact with files on the USB drive as if they were part of the original filesystem. Consequently, the new filesystem is a composite of the initial filesystem plus any additional filesystems that have been mounted, enabling seamless access and management of multiple storage devices within a unified directory structure.

- The root filesystem seen by a process is constructed from a set of mount points (syscalls `mount()` and `umount()`).

- The new mount namespace can have a new set of mount points:
    - This provides a new view of the root filesystem.

- Mount points can be shared or private:
    - Shared mount points are propagated to all namespaces, while private ones are not.
    - If the parent makes all its mount points private and clones the child in the new mount namespace, the child will start with an empty root filesystem.

- Use mount namespaces to create a customized root filesystem for each container using a base `rootfs` image.


### Mount Namespaces and ps

- **How does `ps` work?**:
    - Linux has a special `procfs`, where the kernel populates information about processes.
    - Reading `/proc/PID/...` does not read a file from disk but fetches information from the operating system.
    - `procfs` is mounted at root as a special type of filesystem.

- P1 clones P2 to be in the new PID namespace but uses the old mount namespace. We open a shell in the new PID namespace and run `ps`. We still see all the processes from the parent namespace. Why?:
    - The `ps` command is still using `procfs` from the parent mount namespace.

- **How to make `ps` work correctly inside a PID namespace?**:
    - Place P2 in the new mount namespace and mount a new `procfs` at the root.
    - The new `procfs` at the mount point is different from the parent's `procfs`.
    - `ps` will now show only processes in this PID + mount namespace.


### Network Namespaces (1)

- A network namespace can be created by cloning a process into a new namespace or simply via the command line:
    ```sh
    # ip netns add netns1
    ```

- The list of network namespaces can be viewed in `/var/run/netns`. You can use `setns()` to join an existing namespace.

- The command `ip netns exec` can be used to execute commands inside the network namespace, for example, to list all IP links:
    ```sh
    # ip netns exec netns1 ip link list
    1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT
       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    ```


### Network Namespaces (2)

- Any new network namespace only has a loopback interface. How to communicate with the rest of the network?

- Create a virtual Ethernet link (veth pair) to connect the parent namespace to the new child namespace:
    - Assign endpoints to two different namespaces.
    - Assign IP addresses to both endpoints.
    - Communicate through this link with the parent namespace.
    - Configure bridge/NAT to connect to a broader Internet.

<img src="figs/aula03/network_namespaces2.png"  style="width:40%;"/>

### Cgroups

Namespaces provide the necessary isolation for containers, but without proper resource control, this isolation is ineffective. A machine has limited resources, such as CPU cores and memory. If processes or containers request more resources than available, it can cause system instability and performance issues.

To prevent this, resource limiters are essential for each container. For example, you might create a new namespace but want to restrict it to use only a maximum of two CPU cores. These limitations ensure that each container operates within its designated resource boundaries, preventing resource contention and ensuring stable and efficient performance across the system.

By combining namespaces for isolation with resource control mechanisms, such as cgroups, you can effectively manage and allocate system resources, maintaining balance and preventing any single container from monopolizing the available resources.

- **Cgroup (Control Groups)**: A Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.

- **cgroups-v1**:
    - Added to the Linux kernel initially by Google engineers Paul Menage and Rohit Seth in 2006 and called process containers.
    - Appeared in the official kernel 2.6.24 in 2008 and was renamed to cgroups-v1 to avoid confusion with other entities.

- **cgroups-v2**:
    - Development and maintenance were then passed to Tejun Heo, who rewrote and reassigned cgroups starting in 2013.
    - This rewrite is now called cgroups-v2 and its documentation appeared in Linux 4.5 on March 14, 2016.

Namespaces allow us to isolate processes in a slice with respect to many resources: mount points, PID/UID number space, network endpoints, etc. Cgroups allow us to assign resource limits to a group of processes:
- Divide processes into groups and subgroups hierarchically.
- Assign resource limits to processes in each group/subgroup.

What resources can be limited?
- CPU, memory, I/O, CPU sets (which process can be executed on which CPU core), and so on.
- Specify what fraction of a resource can be used by each group of processes.

Creates separate hierarchies for each resource, or a combined hierarchy for multiple resources together.

<img src="figs/aula03/cgroups.png"  style="width:40%;"/>

### Creating Cgroups

Creating cgroups does not involve any new system calls; it is managed through the filesystem. A special cgroup filesystem is mounted at `/sys/fs/cgroup`. Within this filesystem, directories and subdirectories are created for different resources and different classes of users. 

To create a cgroup, you write the PID of the "founding parent" task into the tasks file. This ensures that all child processes of this task will also be in the same cgroup. For example, to assign a process with `browser_pid` to a cgroup, you would use the command:
```sh
# echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks
```

Tasks can be assigned to leaf nodes in the hierarchy. If tasks are not explicitly placed into any hierarchy, they will belong to the default cgroup of the parent. This hierarchical structure allows for organized and efficient resource management, as each task or group of tasks can be limited and accounted for in terms of resource usage such as CPU, memory, and I/O. By managing these resources at the cgroup level, system administrators can ensure that no single process or group of processes can monopolize system resources, thereby maintaining overall system stability and performance.


### How to Create a Container?

To create a container, you first need to logically separate its environment using namespace tools. Namespaces provide the isolation necessary for the container's processes, network, and filesystem. After establishing this isolation, you then use cgroups to customize and manage the container's resource usage. Cgroups allow you to set limits on resources such as CPU, memory, and I/O to ensure that each container operates within its designated boundaries. Additionally, tools like `chroot` or `pivot_root` are used to set up a new root filesystem for the container. These tools help create an isolated filesystem environment within the container, separate from the host system's filesystem. Some experts argue that containers don't truly exist because they rely on existing system resources to restrict process access. Essentially, containers mask and manipulate the environment to achieve the desired isolation and resource management, rather than being entirely separate entities.

Suppose you want to run an application or shell in a container. How would you do it?

First, create separate namespaces for isolation. This includes creating namespaces for PID, mount points, network, and other necessary resources. These namespaces ensure that the process running inside the container is isolated from the host system and other containers.

Next, create a root filesystem compatible with the CPU's ISA (Instruction Set Architecture) and the operating system's binaries. This root filesystem should contain all the utilities, binaries, and configuration files necessary to run the application.

After setting up the root filesystem, a process enters the namespaces, mounts the root filesystem (rootfs), registers itself in the cgroups, and then executes the desired application or shell. This setup ensures that the application or shell runs in a fully isolated environment, effectively creating a "container."

By following these steps, you can successfully create and run applications in containers, leveraging the isolation and resource management features provided by namespaces and cgroups.


### Container Frameworks

LXC e Docker são duas soluções comerciais. LXC é uma virtual machine leve. Docker é mais amplo, permite desenvolver aplicações inteiras.

Existing container frameworks, such as LXC and Docker, automatically handle the configuration of namespaces and cgroups, simplifying the process of container creation and management.

**LXC** (Linux Containers) provide a lightweight virtual machine-like environment. LXC containers use the standard shell interface of the operating system and leverage namespaces and cgroups to ensure isolation and resource management.

**Docker** containers are optimized for running a single application. The Docker configuration file specifies the base root filesystem along with all the necessary utilities to run a specific application. Docker runs the application in an isolated container environment, making it easy to package an application and all its dependencies and run it anywhere.

By using these frameworks, developers and system administrators can create and manage containers efficiently, without manually configuring the underlying namespaces and cgroups.


### Container Orchestration Frameworks

Kubernetes pode rodar com o docker ou com outras soluções de containers.

**Docker Swarm** and **Kubernetes** are frameworks designed to manage multiple containers across various hosts. These frameworks handle the orchestration of containers, ensuring that they run efficiently and can scale as needed.

**Kubernetes** is a popular container orchestration framework. It operates across multiple <span style="color:green;">physical machines ("nodes")</span>, each containing multiple "pods". <span style="color:yellow;">A pod consists of one or more containers sharing the same network namespace and IP address</span>.

Pods are typically layers of a multi-tier application (e.g., "frontend", "backend", "database", "web server"). Kubernetes manages these nodes and pods, for instance, by instantiating pods on free nodes, automatically scaling pods when the load increases, and restarting pods when they crash.

Kubernetes facilitates the efficient and scalable deployment of containerized applications, ensuring high availability and resource optimization across a cluster of nodes.


### Resumo

Os containers fornecem isolamento leve com menor sobrecarga, uma vez que o OS não é reinstalado. Eles compartilham o mesmo binário do kernel, mas possuem sistemas de arquivos raiz diferentes. Em cima de um kernel compartilhado, há os utilitários básicos e as bibliotecas necessárias para a sua aplicação rodar. 

Containers provide lightweight isolation with minimal overhead. Unlike traditional virtual machines, containers share the same kernel binary but have different root filesystems, which include utilities and configurations on top of the kernel.

Containers are implemented using two Linux primitives: **namespaces** and **cgroups**. Namespaces provide isolation, ensuring that the containerized processes do not interfere with each other or the host system. Cgroups are used to enforce resource limits, ensuring that each container gets its fair share of CPU, memory, and I/O resources.

Frameworks like Docker, LXC, and Kubernetes build on these primitives to offer additional functionalities, such as simplified container management, orchestration, and scaling across multiple hosts.


### Docker

|                   | Packaged Software | IAAS      | PAAS     | SAAS     | 
|-------------------|-------------------|-----------|----------|----------|
|    Applications   |    x              |    x     |    x     |    ✓     |
|    Data           |    x              |    x     |    x     |    ✓     |
|    Runtime        |    x              |    x     |    ✓     |    ✓     |
|    Middleware     |    x              |    x     |    ✓     |    ✓     |
|    OS             |    x              |    x     |    ✓     |    ✓     |
|    Virtualization |    x              |    ✓     |    ✓     |    ✓     |
|    Servers        |    x              |    ✓     |    ✓     |    ✓     |
|    Storage        |    x              |    ✓     |    ✓     |    ✓     |
|    Networking     |    x              |    ✓     |    ✓     |    ✓     |

✓ : managed by vendor ______ x : managed by user

Docker is PAAS.
    
VMWare, KVM and VBox are IAAS.

Middleware is software that provides common services and capabilities to applications outside of what's offered by the operating system. It enables communication and data management for distributed applications. Middleware essentially acts as a bridge between different software applications or between an application and the network. Some common functions of middleware include:
- **Message Oriented Middleware (MOM)**: Facilitates communication between different systems using messaging queues.
- **Database Middleware**: Connects applications to databases, enabling database access and management.
- **Application Servers**: Provides an environment for running and managing applications.
- **Web Servers**: Handles HTTP requests and responses, serving web pages to users.
- **Transaction Monitors**: Manages transactions to ensure data integrity across distributed systems.


A Docker image is a file used to execute code in a Docker container. Docker images act as a set of instructions to build a Docker container, such as a template. Docker images also act as the starting point when using Docker.

A Docker image contains application code, libraries, tools, dependencies, and other files needed to make an application run. When a user runs an image, it can become one or many instances of a container.

A Docker image has many layers. Each image includes everything needed to configure a container environment, including system libraries, tools, dependencies, and other files. The parts of an image include the following:

- **Base image**: The user can build this first layer entirely from scratch with the build command. A base image functions as the initial empty layer, facilitating the construction of Docker images from the ground up. While providing full control over image contents, base images are generally tailored for users with advanced Docker skills.

- **Parent image**: As an alternative to a base image, a parent image can be the first layer in a Docker image. It's a reused image that serves as a foundation for all other layers. A standard parent image typically consists of a bare-bones Linux distribution or comes with an installed service, such as a content management system or database management system.

- **Layers**: Layers are added to the base image using code that enables it to run in a container. Each layer of a Docker image is viewable under `/var/lib/docker/aufs/diff` or via the Docker history command in the command-line interface (CLI). Docker's default status is to show all top-layer images, including repository, tags, and file sizes. Intermediate layers are cached, making top layers easier to view. Docker storage drives manage the image layer contents.

- **Container layer**: A Docker image creates not only a new container but also a writable or container layer. This layer hosts changes made to the running container, and it stores newly written and deleted files as well as changes to existing files. This layer is also used to customize containers.

A Dockerfile is a text file that contains instructions for the Docker daemon to use when building a container image. It provides information on the commands to run, files to copy, startup command, and more.

Dockerfiles can be used to create automated builds that execute several command-line instructions in succession. They can also enable greater flexibility and portability of business applications. For example, IT organizations can use Dockerfiles to package applications and their dependencies in a virtual container that can run on premises, in public or private clouds, or on bare metal.

Docker supports over 15 different instructions in Dockerfiles, including:

- **FROM**: Refers to an existing image as the base for your build.
- **COPY**: Adds files and folders to your image's filesystem.
- **ENV**: Sets environment variables that will be available within your containers.
- **RUN**: Executes commands within the container at build time.
- **USER**: Changes the current directory.
- **ADD**: Copies files and directories to the image and creates a new layer.
- **CMD**: Sets default arguments for container start.
- **ENTRYPOINT**: Sets default command for container start.
- **EXPOSE**: Defines port assignments for the running container.
- **VOLUME**: Includes a directory in the image as a volume when starting the container in the host system.



<img src="figs/aula03/teste1.png"  style="width:50%;"/>

In virtual environments, each application runs on its own kernel. This is because virtual machines (VMs) provide complete isolation and include a full operating system instance with its own kernel. Containers, on the other hand, share the host system's kernel and do not run separate kernels for each application.


<img src="figs/aula03/teste2.png" style="width:50%;"/>

In containerization, images are used to create and deploy containers. These images contain everything needed to run a piece of software, including the code, runtime, libraries, and dependencies. Virtualization can also use images (often called VM images) to create virtual machines, but the term "images" is more commonly associated with containerization.


<img src="figs/aula03/teste3.png" style="width:50%;"/>

Containers share the host system's kernel but have their own isolated user space. This allows containers to run multiple isolated applications on the same host operating system while sharing the same kernel.


<img src="figs/aula03/teste4.png" style="width:50%;"/>

Virtualization uses a hypervisor to create and manage virtual machines. The hypervisor allows multiple operating systems to share a single hardware host by abstracting and partitioning the underlying hardware. Containerization, on the other hand, does not use a hypervisor; it relies on the host operating system's kernel to provide isolation and resource management.

## Build Your Own Docker with Linux Namespaces, cgroups, and chroot