# Week 1 Notes

## Running Docker Containers

Run a docker container with an ubuntu env:<br>
````docker run -it ubuntu bash````

Run a docker container with an interactive terminal python env:<br>
```docker run -it python:3.9```

Same, but with entrypoint as bash:<br>
```docker run -it --entrypoint=bash  python:3.9```

Once there, you can ```pip install pandas``` or whatever packages are needed.

However, installation steps will have to be repeated, as each time the container is spun up it starts from scratch.

## Using Dockerfile

This can be solved using a Dockerfile:

```Dockerfile
FROM python:3.9

RUN pip install pandas

ENTRYPOINT [ "bash" ]
```

To build the image, run:<br>
```docker build -t test:pandas .```<br>
docker build -t <"name of container">:<"version"> <"location to look for Dockerfile">

To run the image:<br>
```docker run -it test:pandas```

## Using the Dockerfile and pipeline.py

To build the image:<br>
```docker build -t test:pandas .```<br>
docker build -t <"name of container">:<"version"> <"location to look for Dockerfile">

To run the image with argument:<br>
```docker run -it test:pandas 2023-12-27```

## Docker Compose

```yaml
services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
      # - "name-of-volume":"/path/to/data/in/container
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always

Notes on docker run parameters

```bash
docker run -it \
  # -e for env variables to pass to container
  -e POSTGRES_USER="root" \
  -e POSTGRES_PASSWORD="root" \
  -e POSTGRES_DB="ny_taxi" \
  # -v volume_mounting_local_folder_location:/path/to/data/in/container
  -v ${PWD}/ny_taxi_postgres_data:/var/lib/postgresql/data \
  # -p port mapping
  -p 5432:5432 \
  postgres:13
```

## Docker volume for postgres

```bash
# Create volume
docker volume create --name ny_taxi_postgres_data -d local

# Run docker container for db interactively
# Had to change port mapping the second time running this
docker run -it \
  -e POSTGRES_USER="root" \
  -e POSTGRES_PASSWORD="root" \
  -e POSTGRES_DB="ny_taxi" \
  -v ny_taxi_postgres_data:/var/lib/postgresql/data \
  -p 5431:5432 \
  postgres:13
```
In a separate terminal:
```bash
# Connect to database from terminal using pgcli
pgcli -h localhost -p 5431 -u root -d ny_taxi
```

## Docker for pgadmin

```bash
docker run -it \
  -e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \
  -e PGADMIN_DEFAULT_PASSWORD="root" \
  -p 8080:80 \
  dpage/pgadmin4
```

## Connect containers using docker networks
```bash
# create network
docker network create pg-network

# Create volume
docker volume create --name ny_taxi_postgres_data -d local

# Run docker container for db interactively
docker run -it \
  -e POSTGRES_USER="root" \
  -e POSTGRES_PASSWORD="root" \
  -e POSTGRES_DB="ny_taxi" \
  -v ny_taxi_postgres_data:/var/lib/postgresql/data \
  -p 5431:5432 \
  --network=pg-network \
  --name pg-database \
  postgres:13

# Run docker container for pgadmin
docker run -it \
  -e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \
  -e PGADMIN_DEFAULT_PASSWORD="root" \
  -p 8080:80 \
  --network=pg-network \
  --name pgadmin \
  dpage/pgadmin4
```

Once logged into pgadmin with above credentials, register server with:<br>
name: Docker localhost<br>
connection name/address: pg-database (name of container)<br>
port: 5432<br>
username: root<br>
password: root<br>

## Running the ingest_data.py script

**The database container needs to be running for the following to work:**

```bash
URL="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz"

python3 ingest_data.py \
  --user=root \
  --password=root \
  --host=localhost \
  --port=5431 \
  --db=ny_taxi \
  --table_name=yellow_taxi_trips  \
  --url=${URL}
```

## Ingesting Data using Docker
**NOTE** must have database container and pgadmin running

```bash
# Add data url to bash
URL="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz"

# Build the container using the Dockerfile
docker build -t taxi_ingest:v001 .

# Run the container
# Check the network name generated by docker compose
docker run -it \
  --network=week_1_pg-network \
  taxi_ingest:v001 \
    --user=root \
    --password=root \
    --host=pgdatabase \
    --port=5432 \
    --db=ny_taxi \
    --table_name=yellow_taxi_trips  \
    --url=${URL}
```

## docker-compose for pgadmin and postgres

**From this directory run:**
```bash
docker compose up
```

Go to port 8080 for pgadmin, login with above credentials in the docker-compose.yaml, and register server with:<br>
name: Docker localhost<br>
connection name/address: pgdatabase (name of container)<br>
port: 5432<br>
username: root<br>
password: root<br>

## Ingest zone data

```bash

python3 ingest_zone_data.py \
  --user=root \
  --password=root \
  --host=pgdatabase \
  --port=5432 \
  --db=ny_taxi \
  --table_name=zones


docker build -t taxi_zone_ingest:v001 .

# Run the container
# Check the network name generated by docker compose
docker run -it \
  --network=week_1_pg-network \
  taxi_ingest:v001 \
    --user=root \
    --password=root \
    --host=pgdatabase \
    --port=5432 \
    --db=ny_taxi \
    --table_name=zones
```

