# AUA, DS 229 – MLOps
### Week 12 – Docker Compose in all its beauty

***

## [PostgreSQL](https://www.postgresql.org/)


<center><img src="./images/postgresql.png" width=400 height = 80/></center>

PostgreSQL is a popular open-source relational database management system that stores data in a structured way. When you create a database in PostgreSQL, it creates a directory on your system's file system where all the data associated with that database is stored.

Within the database directory, PostgreSQL stores the data in files called tables. Each table has a name and consists of rows and columns. The data in each row is stored in a separate file, and the columns are stored as individual fields within each file.

The most significant contrast between SQLite and other servers such as MySQL or Postgres is that SQLite is essentially a file which can be accessed with SQL, whereas **Postgres is a server that necessitates interaction**.

- **Server-based**: Postgres is typically hosted on cloud servers, such as those provided by Amazon or Google. This allows multiple users or applications to connect to it simultaneously and perform operations. *In contrast, imagine collaborating with someone in another country using a SQLite file. It would be challenging to determine where to store the file and establish a connection for both parties. With a Postgres server, each user can access it using a connection string that includes the IP and Port of the server, enabling them to establish a socket connection to the database*.

- **Full text search**: Postgres is capable of storing vector representations of textual data and performing lightning-fast queries on it. This feature is beneficial for various purposes, such as implementing autocomplete search fields on websites and conducting data science projects that involve natural language processing.

- **Data types**: Compared to SQLite, Postgres provides a more *extensive range of column data types*. Some examples of column types that are available in Postgres but not SQLite include:
    - *JSON*: allows you to store JSON arrays and perform queries on them.
    - *MONEY*: simplifies working with time series data, such as stock prices.
    - *Date* and *timestamp*: allows you to sort and index data based on dates and times, which is particularly useful for time series data.
    - *Inet/cidr*: enables you to store IP addresses, which can be helpful for certain web applications.

***
Installation guides:
- [Windows](https://www.postgresqltutorial.com/postgresql-getting-started/install-postgresql/)
- [Linux](https://www.postgresqltutorial.com/postgresql-getting-started/install-postgresql-linux/)
- [MacOS](https://www.postgresqltutorial.com/postgresql-getting-started/install-postgresql-macos/)

> Set the PATH environment variable as done [here](https://stackoverflow.com/questions/36155219/psql-command-not-found-mac).
***

### Plan for today:
1) Play with PostgreSQL database.
2) Introduce `docker-compose`.
3) Run 2 services -- a Python API and a PostgreSQL db, where the API must support requests to write/read from the db.
4) Is there any dependency? How to handle it with docker?
5) Explore the API with swagger-ui. 
6) What if I cange my code when the containers are still running? Maybe I need to rebuild them..
7) A few words about "Shadow mode deployment".
***

### HTTP request methods

**When developing a web application, deciding which HTTP request method (GET, POST, PUT, DELETE, etc.) to use for a particular operation depends on the nature of the operation and the conventions of the HTTP protocol. Here's a general guideline to help you make this decision**:

- **GET**: This method should be used when you want to retrieve data from the server. In general, GET requests should not have side effects on the server, meaning they should not change any data or state on the server. For example, when you want to retrieve a webpage, a list of products, or some user information, you would typically use a GET request.

- **POST**: This method should be used when you want to create new data on the server. POST requests usually involve submitting a form or uploading a file, and they may modify the state of the server by adding new data to a database or initiating a new process. In general, a POST request should be used when you are creating new data on the server.

- **PUT**: This method should be used when you want to update existing data on the server. A PUT request usually involves submitting data to update an existing resource or record. In general, a PUT request should be used when you are modifying an existing resource on the server.

- **DELETE**: This method should be used when you want to delete data from the server. A DELETE request usually involves deleting an existing resource or record. In general, a DELETE request should be used when you want to remove an existing resource from the server.

In summary, GET requests should be used for retrieving data, POST requests should be used for creating new data, PUT requests should be used for updating existing data, and DELETE requests should be used for deleting existing data. However, it's important to note that these guidelines are not strict rules, and the conventions may vary depending on the specific application and its requirements.

Now let's connect to PostgreSQl on local machine, create a database, table and query the data.

In [None]:
import base64  # For password encryption.
import pandas as pd
from sqlalchemy import create_engine, MetaData, inspect, Column, Integer, String
from sqlalchemy_utils import database_exists, create_database, drop_database
from sqlalchemy.orm import declarative_base, sessionmaker, scoped_session

Use te following code to hide your password in code. Although this is not a clever idea (the encription function has openly available inverse), we can use it for its simplicity. 

```python
<ENCODED-PASSWORD> = base64.b64encode(<PASSWORD>).encode("utf-8")
<PASSWORD> = base64.b64decode(<ENCODED-PASSWORD>).decode("utf-8")
```

The current most popular tool for storing sensitive information is [Vault](https://www.vaultproject.io/) which we are not going to cover. Another option is [docker secrets](https://docs.docker.com/engine/swarm/secrets/).

In [None]:
# Credentials.
username = "postgres"
password = base64.b64decode("RGF2JHlhbjNpZDAw").decode("utf-8")  # Returns a string.
hostname = "localhost"
port = "5432"
db_name = "aua_mlops_test_db"

# Constructing URL for PostgreSQL.
DB_URL = f"postgresql://{username}:{password}@{hostname}:{port}/{db_name}"

# Create (if necessary) and connect.
engine = create_engine(DB_URL, pool_recycle=3600)
if not database_exists(engine.url):
    print("Creating the database..", end=" ")
    create_database(engine.url)
    print("done")

In [None]:
# Printing the table names.
def get_table_names(engine):
    metadata = MetaData()
    inspector = inspect(engine)
    print("Registered tables:", inspector.get_table_names())
    

get_table_names(engine)

In [None]:
Base = declarative_base()
metadata = Base.metadata


class TestTable(Base):
    __tablename__ = "test_table"
    
    row_id = Column(Integer, primary_key=True, autoincrement=True)  # Starts from 1.
    name = Column(String(100))
    age = Column(Integer)
    
    def __repr__(self):
        return f"<TestTable(row_id={self.row_id}, name={self.name}, age={self.age})>"

In [None]:
# Creating the table(s).
metadata.create_all(bind=engine)
get_table_names(engine)

In [None]:
# Defining session object to run queries.
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)

In [None]:
# Fill in some data.
samples = [
    TestTable(name="Murzik", age=65), 
    TestTable(name="Ernesto Parpeci", age=68),
    TestTable(name="Voldemort", age=13)
]
print("An example of a sample:", samples[0])

with Session() as sess:
    sess.add_all(samples)
    sess.commit()

In [None]:
# Read data.
with Session() as sess:
    result = sess.query(TestTable.name, TestTable.age).all()
    
print(result)

In [None]:
# Read data as pandas DataFrame.
with Session() as sess:
    data = pd.read_sql_query(sql = sess.query(TestTable.name, TestTable.age).statement,
                             con = engine.connect())
    
data

<div class="alert alert-block alert-danger">
<b>Action</b>:
    <b>Open pgAdmin 4 and visualize the table.
</div> 

In [None]:
# Drop the table:
engine = create_engine(DB_URL, pool_recycle=3600)
TestTable.__table__.drop(bind=engine)
get_table_names(engine)

# Drop the database:
drop_database(DB_URL)
if not database_exists(engine.url):
    print("Successfully dropped the database.")
    
engine.dispose()  # Close the connection to the db.

# [Docker Compose](https://docs.docker.com/compose/)

<center><img src="./images/docker_compose.png" width=400 height = 80/></center>

Docker Compose is a tool that allows you to **define and run multi-container Docker applications**. It is used to manage the configuration and orchestration of multiple Docker containers as a single application, making it easier to deploy and manage complex applications.

With Docker Compose, you can define the services that make up your application in a YAML file. Each service can be defined with its own configuration options, including image, environment variables, ports, and volumes. You can also define dependencies between services, so that one service can be started only after another service is running.

Once you have defined your application in a Compose file, you can use the `docker-compose` command to start, stop, and manage your containers. Docker Compose will automatically create and manage the necessary containers and networks to run your application, based on the configuration specified in the Compose file.

<center><img src="./images/docker_compose_arch.png" width=700 height = 80/></center>

[[Image source](https://www.biaudelle.fr/docker-compose/)]

A Docker Compose file, usually named `docker-compose.yml`, is written in YAML format and consists of one or more services that define the containers, volumes, and networks that make up an application.

- The `version` field specifies the version of the Docker Compose syntax to use. In this example, we are using version 3.8.
- The `services` section defines the containers that make up our application. Each service has a name, which is used to refer to it within the Compose file and other Docker commands.
- `environment` section in each app is used to set environment variables.
- The `volumes` section defines named volumes that can be used by the services.

Docker Compose files can be much more complex depending on the needs of the application. However, the basic structure remains the same: define the services that make up the application, configure the containers, volumes, and networks, and specify any dependencies between services.

**Commands**:
- `docker-compose up`: This command starts the containers defined in the `docker-compose.yml` file. If the containers don't exist, Docker Compose will create them. The `up` command also builds any images that need to be built and attaches the containers to a network.
- `docker-compose down`: This command stops and removes the containers defined in the `docker-compose.yml` file. It also removes any volumes and networks associated with the containers.
- `docker-compose ps`: This command lists the containers started by Docker Compose, along with their status.
- `docker-compose logs`: This command shows the logs of the containers started by Docker Compose. By default, it shows the logs of all containers. You can specify a service name to show only the logs of that service.
- `docker-compose build`: This command builds the images for the services defined in the `docker-compose.yml` file.
- `docker-compose exec`: This command runs a command inside a running container. You can specify the service name and the command to run. For example, `docker-compose exec web bash` will start a shell inside the web container.
- `docker-compose restart`: This command restarts the containers defined in the `docker-compose.yml` file. You can specify a service name to restart only that service.

### Examine the `docker-compose.yml` file

#### app

```
├── app/
│   ├── requirements.txt
│   ├── tables.py
│   ├── prepare_db.py
│   ├── endpoints.py
│   ├── swagger.yml.py
│   ├── openapi_main.py
│   ├── start_service.sh
│   └── Dockerfile
```


`start_service.sh`

```bash
#!/bin/bash

set -e

exec python prepare_db.py &  # Creates a table in the database (runs in the background).
exec python openapi_main.py  # Runs the API.
```

**`set -e`**   
> `set -e` is a command in a Linux bash script that tells the shell to exit immediately if any command in the script fails (i.e., returns a non-zero exit status).

> By default, if a command in a bash script fails, the script will continue running unless you specifically handle the error with an if statement or some other mechanism. However, in some cases, you might want the script to stop immediately if any command fails, to prevent any further damage or unwanted behavior.

> In this example, `set -e` is used at the beginning of the script to ensure that any command that fails will cause the script to immediately exit.

> Note that `set -e` can be disabled in a specific part of the script by using `set +e`. This can be useful if you want to handle errors in a specific section of the script without exiting the script entirely.

> It's generally a good practice to use `set -e` in your bash scripts to ensure that errors are caught early and the script doesn't continue executing with potentially invalid or incomplete data.

**`exec`**  
> In a bash script, `exec` is a command that is used **to replace the current shell process with a new process**. This can be useful in situations where you want to run a new command or script, but you don't want to create a new process for it as done by default.

> Note that `exec` can also be used without specifying a command, in which case it will simply replace the current shell process with a new, empty process. This can be useful for cleaning up the environment or resetting the shell state.

> Note that if we remove `exec` part from the above commands the result won't change 😊.

**`&`**  
> In a bash script, you can run commands in **detached mode** by adding an ampersand (`&`) at the end of the command. **This will run the command in the background, allowing the script to continue executing without waiting for the command to complete.**

> Note that when a command is run in detached mode, its output will not be visible in the terminal unless it is redirected to a file or another command. You can redirect the output to a file using the `>` operator or pipe it to another command using the `|` operator.

#### database

```yml
version: '3.8'

services:

  app:
    build:
      context: ./app
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes: 
      - ./app:/app
    depends_on:
      database:
        condition: service_healthy

  database:
    image: postgres:15.2-alpine
    restart: always
    environment:
      POSTGRES_USER: docker_db_user
      POSTGRES_PASSWORD: the_most_secure_pass
      POSTGRES_DB: aua_mlops_test_db
    ports:
      - "5454:5432"
    volumes: 
      - database:/var/lib/postgresql/data
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5
      
volumes:
  database:
```

**Volumes**  
> Docker volumes are a way to store persistent data used by Docker containers outside of the container's filesystem. Volumes are created and managed by Docker and can be shared among multiple containers, allowing data to persist even when a container is deleted or recreated.

**environment**  
> In a Docker Compose file, the `POSTGRES_USER`, `POSTGRES_PASSWORD`, and `POSTGRES_DB` environment variables are used to configure a PostgreSQL container.

> `POSTGRES_USER`: This environment variable sets the username for the default PostgreSQL user *postgres*. When you start a PostgreSQL container, this user is created automatically. You can use `POSTGRES_USER` to set a custom username for this user.

> `POSTGRES_PASSWORD`: This environment variable sets the password for the postgres user. It's important to set a password for the postgres user to ensure that your database is secure.

> `POSTGRES_DB`: This environment variable sets the name of the default database that will be created when the PostgreSQL container starts. By default, the database is named **postgres**, but you can use `POSTGRES_DB` to set a custom name.

**`restart: always`**  
> In a Docker Compose file, the `restart: always` option for a PostgreSQL service specifies that the container should always be restarted if it stops for any reason (to not try to "continue" running). 

> Generally, if you stop and then start a container, the container will resume running from the point where it was stopped. Any processes that were running inside the container before it was stopped will continue running as if nothing had happened.

> This option is particularly useful for services that need to be highly available and should always be running. By using the `restart: always` option, you can ensure that your PostgreSQL service is automatically restarted in case of any unexpected downtime.



### Networks and dependencies

<center><img src="./images/bridge_network.png" width=500 height = 80/></center>

<div class="alert alert-block alert-success">
When you run `docker-compose up`, Docker Compose creates a network for your services and connects the containers to that network. <b>This allows the containers to communicate with each other using their service names as the hostnames, without the need to expose ports or use external IP addresses</b>.

By default, Docker Compose creates a <b>bridge</b> network for your services, and assigns a unique name to the network based on the name of your project. For example, if your project is named **myapp**, the default network name would be <b>myapp_default</b>.
</div> 
    
The `depends_on` option in Docker Compose is used **to specify the order in which services should be started or stopped**. It's useful when you have multiple services that depend on each other and need to be started or stopped in a specific order.

When you run `docker-compose up`, Compose will start the services in the order specified by `depends_on`. In our case, it will first start the **database** service, and then the **app** service (once the **database** service is up – read the below statement carefully).

<mark>**IMPORTANT**</mark>  
It's important to note that `depends_on` only **specifies the order in which services should be started or stopped**. **It doesn't wait for the services to be fully available or ready before starting the dependent service**.

<mark>But how to wait for the service to be available/up ?</mark>

**In Docker, a health check is a command that a container runs to determine whether or not it is healthy. A healthy container is one that is able to respond to requests and perform its intended function**. A health check is a way for Docker to monitor the status of a container and determine whether or not it needs to be restarted.

The `service_healthy` option in Docker Compose is used to wait for a service to be healthy before proceeding with the start-up of dependent services. In our example we have a **app** service that depends on a **database** service. You can define a health check for the **database** service using `healthcheck`, and then use the `service_healthy` option to make sure that the **database** service is healthy before starting the **app** service. This is what we have done in our `docker-compose.yml`.

By using health checks and the `service_healthy` option, you can ensure that your services are running correctly before proceeding with the start-up of dependent services.

***

The `docker-compose --build` command is used to build (or rebuild) the Docker images for services defined in a Docker Compose file. It is useful when you need to make changes to your application's code or dependencies and want to rebuild the Docker images that your services are based on.

<div class="alert alert-block alert-danger">
<b>Action</b>:
    <b>Open a terminal an run</b>: `docker-compose up --build`
</div> 

1) Visit http://localhost:8000/ui
2) Explore the API request methods and play with it

In [None]:
###################################################################
########## If you want to connect to the db on container ##########
###################################################################

### You can use the following credentials to connect to the database
### that is running on a container and execute, for example, all
### the operations provided in the starting part of this notebook.

# username = "docker_db_user"
# password = "the_most_secure_pass"
# hostname = "localhost"
# port = "5454"
# db_name = "aua_mlops_test_db"

<div class="alert alert-block alert-danger">
<b>Action</b>:
    <b>Connect to the database on container using pgAdmin 4.
</div> 

<mark>Note</mark> : If you decide to change username or password above (in `docker-compose.yml`) you will first need to remove the volume (`docker volume rm <VOLUME-NAME>`) defined in `docker-compose.yml`. 

# Shadow mode deployment

Shadow mode deployment is a technique used in MLOps (Machine Learning Operations) to validate the performance of a new version of a machine learning model before it is fully deployed to production.

In this approach, the new version of the model is deployed in a "shadow" mode alongside the current production model. **The shadow model receives the same inputs and produces the same outputs as the current model, but its predictions are not used by the application**.

**Instead, the predictions made by the shadow model are compared to those made by the current model to identify any discrepancies. This allows developers to validate the performance of the new model against the current production model, and to make sure that the new model is producing accurate and consistent results**.

<center><img src="./images/shadow_mode.jpeg" width=500 height = 100/></center>

[[image source](https://christophergs.com/machine%20learning/2019/03/30/deploying-machine-learning-applications-in-shadow-mode/)]

Shadow mode deployment is particularly useful when deploying changes to models that have a high impact on business outcomes or user experience, such as models that drive recommendations or predictions for critical business decisions. It helps reduce the risk of unintended consequences and ensures that the new model performs as expected before it is fully deployed to production.

# References
- [SQLAlchemy](https://docs.sqlalchemy.org/en/20/)
- [PostgreSQL Engine Configuration](https://docs.sqlalchemy.org/en/20/core/engines.html)
- [An example of creating a simple RESTful App using OpenAPI, Flask & Connexions](https://haseebmajid.dev/posts/2019-08-16-creating-a-simple-restful-app-using-openapi-flask-connexions/)
- [Communication between docker containers](https://docs.docker.com/compose/networking/)
- [Serivce startup control](https://docs.docker.com/compose/startup-order/)
- Docker [depends_on](https://docs.docker.com/compose/compose-file/05-services/#depends_on) and [healthcheck](https://docs.docker.com/compose/compose-file/05-services/#healthcheck)