## Question 1. Understanding docker first run

Run docker with the ```python:3.12.8``` image in an interactive mode, use the entrypoint bash.

What's the version of pip in the image?

---------------------------------------

Inside /dez/homeworks/module_1 I created a Dockerfile that pulls the python:3.12.8 image and runs the ["pip", "--version"] entrypoint. After building and running the container, ```pip 24.3.1 from /usr/local/lib/python3.13/site-packages/pip (python 3.13)``` gets printed.

**Final Answer: 24.3.1**

## Question 2. Understanding Docker networking and docker-compose

Given the following docker-compose.yaml, what is the hostname and port that pgadmin should use to connect to the postgres database?

```
services:
  db:
    container_name: postgres
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: 'postgres'
      POSTGRES_DB: 'ny_taxi'
    ports:
      - '5433:5432'
    volumes:
      - vol-pgdata:/var/lib/postgresql/data

  pgadmin:
    container_name: pgadmin
    image: dpage/pgadmin4:latest
    environment:
      PGADMIN_DEFAULT_EMAIL: "pgadmin@pgadmin.com"
      PGADMIN_DEFAULT_PASSWORD: "pgadmin"
    ports:
      - "8080:80"
    volumes:
      - vol-pgadmin_data:/var/lib/pgadmin  

volumes:
  vol-pgdata:
    name: vol-pgdata
  vol-pgadmin_data:
    name: vol-pgadmin_data
```
-----------

In line 2, we see that the name of the Postgres service is "db", but also the container_name is also set as "postgres". Line 10 specifies that it is run on port on 5432. 

To be 100% sure of this, first I stopped and removed all docker containers in my VM using this command:

```
docker ps -aq | xargs docker stop | xargs docker rm
```

And then I ran the previous docker-compose.yaml with ```docker compose up```. When connecting pgadmin with the database, I discovered that both "db" and "postgres" work as the hostname. In this [stackoverflow post](https://stackoverflow.com/questions/55522620/docker-compose-yml-container-name-and-hostname), it is explained why this is the case.

**Final Answer: db:5432, postgres:5432**

## Question 3. Trip Segmentation Count

During the period of October 1st 2019 (inclusive) and November 1st 2019 (exclusive), how many trips, respectively, happened:

- Up to 1 mile
- In between 1 (exclusive) and 3 miles (inclusive),
- In between 3 (exclusive) and 7 miles (inclusive),
- In between 7 (exclusive) and 10 miles (inclusive),
- Over 10 miles

--------------------------

I ran the following queries inside PG Admin:

```
select count(*)
from "green_tripdata_2019-10"
where trip_distance <= 1
```

```
select count(*)
from "green_tripdata_2019-10"
where trip_distance > 1 and trip_distance <= 3
```

```
select count(*)
from "green_tripdata_2019-10"
where trip_distance > 3 and trip_distance <= 7
```

```
select count(*)
from "green_tripdata_2019-10"
where trip_distance > 7 and trip_distance <= 10
```

```
select count(*)
from "green_tripdata_2019-10"
where trip_distance > 10
```

Which gave the following results: 104,838; 199,013; 109,645; 27,688; 35,202. So,

**Final Answer: 104,838; 199,013; 109,645; 27,688; 35,202**

## Question 4. Longest trip for each day

Which was the pick up day with the longest trip distance? Use the pick up time for your calculations.

---------------------------------------------------

This query orders the trip distances from highest to lowest, paired with their pickup date, and limit the result for the first row only:
```
select
	date(lpep_pickup_datetime) as date,
	trip_distance
from "green_tripdata_2019-10"
order by trip_distance desc
limit 1
```

The first highest value for trip_distance is 515.89, and this trip happened on 2019-10-31.

**Final Answer: 2019-10-31**

## Question 5. Three biggest pickup zones

Which were the top pickup locations with over 13,000 in total_amount (across all trips) for 2019-10-18?

Consider only lpep_pickup_datetime when filtering by date.

--------------------------------

As we saw in one of the videos, we have only the location IDs inside the trips table. So, the trip table has to be joined with the zones lookup table to get the borough and the zone. The query below aggregates the total amount, joins the tables as described before, and orders from highest to lowest:

```
select
	date(lpep_pickup_datetime) as date,
	"Borough",
	"Zone",
	sum(total_amount) as total_amount
from "green_tripdata_2019-10"
left join taxi_zone_lookup
	on "LocationID" = "PULocationID"
where date(lpep_pickup_datetime) = date('2019-10-18')
group by "Borough", "Zone", date
order by total_amount desc
```

The zones with over 13,000 trip amount were: East Harlem North, East Harlem South and Morningside Heights.

**Final Answer: East Harlem North, East Harlem South and Morningside Heights**

## Question 6. Largest tip

For the passengers picked up in October 2019 in the zone name "East Harlem North" which was the drop off zone that had the largest tip?

Note: it's tip , not trip

---------------------------------------------------

Same as the last question, but this time we're not aggregating the tips, and we have to double-join so we can get the drop off zone:

```
select
	date(lpep_pickup_datetime) as date,
	pick_up."Zone" as pick_up_zone,
	drop_off."Zone" as drop_off_zone,
	tip_amount
from "green_tripdata_2019-10" as green_trips
left join taxi_zone_lookup as pick_up
	on pick_up."LocationID" = green_trips."PULocationID"
left join taxi_zone_lookup as drop_off
	on drop_off."LocationID" = green_trips."DOLocationID"
where pick_up."Zone" = 'East Harlem North'
order by tip_amount desc
limit 1
```

The largest tip was made at the JFK Airport drop off zone, and the total tip amount was 87.3.

**Final Answer: JFK Airport**

## Question 7. Terraform Workflow
Which of the following sequences, respectively, describes the workflow for:

1. Downloading the provider plugins and setting up backend,
2. Generating proposed changes and auto-executing the plan
3. Remove all resources managed by terraform`

Options:
- terraform import, terraform apply -y, terraform destroy
- teraform init, terraform plan -auto-apply, terraform rm
- terraform init, terraform run -auto-approve, terraform destroy
- terraform init, terraform apply -auto-approve, terraform destroy
- terraform import, terraform apply -y, terraform rm

-------------------------------------------

I saw and practiced the terraform workflow with the help of the following resources: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/01-docker-terraform/1_terraform_gcp and https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2025/01-docker-terraform/homework.md

With those instructions, I created a ```main.tf``` file and a ```variables.tf``` file, which describe my terraform infrastructure in GCP. To create the sample GCS bucket and BigQuery Dataset, I ran the following commands in order:

- terraform init, terraform apply -auto-approve, terraform destroy

**Final Answer: terraform init, terraform apply -auto-approve, terraform destroy**