In [2]:
import os

os.getcwd()

'/Users/prbnrs/GIT/Data-Engineering-ZoomCamp/HW/HW01'

# installation
```bash
pip install psycopg2
pip install psycopg2-binary
```

# Download data

```bash
wget https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-10.csv.gz -O data/green_tripdata_2019-10.csv.gz
```
```bash
wget https://github.com/DataTalksClub/nyc-tlc-data/releases/download/misc/taxi_zone_lookup.csv -O data/taxi_zone_lookup.csv
```

---

PostGres server

```bash
docker run -d \
  -e POSTGRES_USER="root" \
  -e POSTGRES_PASSWORD="root" \
  -e POSTGRES_DB="tripdata" \
  -v $(pwd)/vol/postgres_data:/var/lib/postgresql/data \
  -p 5433:5432 \
  postgres:13
```

---

Dockerfile:

```docker
FROM python:3.12.8

RUN apt-get install wget
RUN pip install pandas sqlalchemy psycopg2

WORKDIR /app
COPY ingest_data.py ingest_data.py 

ENTRYPOINT [ "python", "ingest_data.py" ]
```

```bash
docker build -t test:hw01 .
```

```bash
docker run -it test:hw01
```

---

Download Data

```bash
URL="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-10.csv.gz"

python ingest_data.py \
    --user=root \
    --password=root \
    --host=localhost \
    --port=5433 \
    --db=tripdata \
    --table_name=green_tripdata \
    --url=${URL}
    --dtype '{"lpep_pickup_datetime": "str", "lpep_dropoff_datetime": "str"}' \
    --parse_dates "lpep_pickup_datetime,lpep_dropoff_datetime"
```

```bash
URL="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/misc/taxi_zone_lookup.csv"

python ingest_data.py \
    --user=root \
    --password=root \
    --host=localhost \
    --port=5432 \
    --db=tripdata \
    --table_name=taxi_zone_lookup \
    --url=${URL}
```

---

In [18]:
import pandas as pd
from sqlalchemy import create_engine

In [30]:
engine = create_engine('postgresql://root:root@localhost:5433/tripdata')

In [31]:
engine.connect()

<sqlalchemy.engine.base.Connection at 0x1257a2410>

In [32]:
q1 = '''
SELECT 1 as number;
'''

pd.read_sql(q1, con = engine)

Unnamed: 0,number
0,1


In [40]:
q1 = '''
SELECT
    table_schema || '.' || table_name as Table
FROM
    information_schema.tables
WHERE
    table_type = 'BASE TABLE'
AND
    table_schema NOT IN ('pg_catalog', 'information_schema');
'''

pd.read_sql(q1, con = engine)

Unnamed: 0,table
0,public.green_tripdata
1,public.taxi_zone_lookup


---

pgadmin:

```bash
docker run -it \
    -e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \
    -e PGADMIN_DEFAULT_PASSWORD="root" \
    -p 8080:80 \
    dpage/pgadmin4
```

However we cannot connect the both 

to connect we need to form a network

## Form a network

Network:
```bash
docker network create pg-network
```

PostGres server:
```bash
docker run -d \
    -e POSTGRES_USER="root" \
    -e POSTGRES_PASSWORD="root" \
    -e POSTGRES_DB="tripdata" \
    -v $(pwd)/vol/postgres_data:/var/lib/postgresql/data \
    -p 5433:5432 \
    --network=pg-network \
    --name pg-db \
    postgres:13
```

pgadmin:
```bash
docker run -it \
    -e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \
    -e PGADMIN_DEFAULT_PASSWORD="root" \
    -p 8080:80 \
    --network=pg-network \
    --name pg-admin \
    dpage/pgadmin4
```

Load URL:
```bash
URL_green_tripdata="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-10.csv.gz"
URL_zone_lookup="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/misc/taxi_zone_lookup.csv"
```

Load green_tripdata:
```bash
docker run -it \
    --network=pg-network \
    --name py-notebook \
    test:hw01 \
        --user=root \
        --password=root \
        --host=localhost \
        --port=5433 \
        --db=tripdata \
        --table_name=green_tripdata \
        --url=${URL_green_tripdata}
        --dtype '{"lpep_pickup_datetime": "str", "lpep_dropoff_datetime": "str"}' \
        --parse_dates "lpep_pickup_datetime,lpep_dropoff_datetime"
```

Load zone_lookup:
```bash
docker run -it \
    --network=pg-network \
    --name py-notebook \
    test:hw01 \
        --user=root \
        --password=root \
        --host=localhost \
        --port=5433 \
        --db=tripdata \
        --table_name=taxi_zone_lookup \
        --url=${URL_zone_lookup}
```
