## Question 1. Understanding docker first run 

Run docker with the `python:3.12.8` image in an interactive mode, use the entrypoint `bash`.

What's the version of `pip` in the image?

- 24.3.1
- 24.2.1
- 23.3.1
- 23.2.1

**Answer: 24.3.1**

Explain/Command:
```bash
docker run -it --entrypoint=bash python:3.12.8

Unable to find image 'python:3.12.8' locally
3.12.8: Pulling from library/python
cc9fb0d581e7: Download complete 
e3696bf6d1ae: Download complete 
9668c23f1ccb: Download complete 
Digest: sha256:251ef8e69b6ccdf3c7bf7effaa51179d59af35364dd9c86469142aa72a2c8cfc
Status: Downloaded newer image for python:3.12.8

root@157f851a4adc:/# pip --version
pip 24.3.1 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)
```

## Question 2. Understanding Docker networking and docker-compose

Given the following `docker-compose.yaml`, what is the `hostname` and `port` that **pgadmin** should use to connect to the postgres database?

```yaml
services:
  db:
    container_name: postgres
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: 'postgres'
      POSTGRES_DB: 'ny_taxi'
    ports:
      - '5433:5432'
    volumes:
      - vol-pgdata:/var/lib/postgresql/data

  pgadmin:
    container_name: pgadmin
    image: dpage/pgadmin4:latest
    environment:
      PGADMIN_DEFAULT_EMAIL: "pgadmin@pgadmin.com"
      PGADMIN_DEFAULT_PASSWORD: "pgadmin"
    ports:
      - "8080:80"
    volumes:
      - vol-pgadmin_data:/var/lib/pgadmin  

volumes:
  vol-pgdata:
    name: vol-pgdata
  vol-pgadmin_data:
    name: vol-pgadmin_data
```

- postgres:5433
- localhost:5432
- db:5433
- postgres:5432
- db:5432

**Answer: postgres:5432**

Explain: 

The container name is 'postgres' and will serve as the hostname for the pgadmin service to cennect to the db.
Under the db section is a mapped port '5433:5432'. The port 5432 inside the PostgreSQL container is mapped to port 5433 on the host machine, but pgadmin is running in separated container and should use the internal port 5432.


## Question 3. Trip Segmentation Count

During the period of October 1st 2019 (inclusive) and November 1st 2019 (exclusive), how many trips, **respectively**, happened:
1. Up to 1 mile
2. In between 1 (exclusive) and 3 miles (inclusive),
3. In between 3 (exclusive) and 7 miles (inclusive),
4. In between 7 (exclusive) and 10 miles (inclusive),
5. Over 10 miles 


- 104,802;  197,670;  110,612;  27,831;  35,281
- 104,802;  198,924;  109,603;  27,678;  35,189
- 104,793;  201,407;  110,612;  27,831;  35,281
- 104,793;  202,661;  109,603;  27,678;  35,189
- 104,838;  199,013;  109,645;  27,688;  35,202

**Answer: 104,838;  199,013;  109,645;  27,688;  35,202**

Query:
```sql
SELECT
    COUNT(*) FILTER (WHERE trip_distance <= 1) AS trips_up_to_1_mile,
    COUNT(*) FILTER (WHERE trip_distance > 1 AND trip_distance <= 3) AS trips_between_1_and_3_miles,
    COUNT(*) FILTER (WHERE trip_distance > 3 AND trip_distance <= 7) AS trips_between_3_and_7_miles,
    COUNT(*) FILTER (WHERE trip_distance > 7 AND trip_distance <= 10) AS trips_between_7_and_10_miles,
    COUNT(*) FILTER (WHERE trip_distance > 10) AS trips_over_10_miles
FROM green_taxi_data
WHERE lpep_pickup_datetime >= '2019-10-01' AND lpep_pickup_datetime < '2019-11-01';
```

## Question 4. Longest trip for each day

Which was the pick up day with the longest trip distance?
Use the pick up time for your calculations.

Tip: For every day, we only care about one single trip with the longest distance. 

- 2019-10-11
- 2019-10-24
- 2019-10-26
- 2019-10-31

**Answer: 2019-10-31**

Query:
```sql
SELECT
    lpep_pickup_datetime::date AS pickup_date,
    MAX(trip_distance) AS longest_trip_distance
FROM green_taxi_data
WHERE lpep_pickup_datetime >= '2019-10-01' AND lpep_pickup_datetime < '2019-11-01'
GROUP BY pickup_date
ORDER BY longest_trip_distance DESC
LIMIT 1;
```

## Question 5. Three biggest pickup zones

Which were the top pickup locations with over 13,000 in
`total_amount` (across all trips) for 2019-10-18?

Consider only `lpep_pickup_datetime` when filtering by date.
 
- East Harlem North, East Harlem South, Morningside Heights
- East Harlem North, Morningside Heights
- Morningside Heights, Astoria Park, East Harlem South
- Bedford, East Harlem North, Astoria Park

**Answer: East Harlem North, East Harlem South, Morningside Heights**

Query:
```sql
SELECT
    gtd."PULocationID",
    tz."Zone",
    SUM(gtd.total_amount) AS total_revenue
FROM green_taxi_data gtd
JOIN taxi_zone tz ON gtd."PULocationID" = tz."LocationID"
WHERE gtd.lpep_pickup_datetime::date = '2019-10-18'
GROUP BY gtd."PULocationID", tz."Zone"
HAVING SUM(gtd.total_amount) > 13000
ORDER BY total_revenue DESC;
```

## Question 6. Largest tip

For the passengers picked up in Ocrober 2019 in the zone
name "East Harlem North" which was the drop off zone that had
the largest tip?

Note: it's `tip` , not `trip`

We need the name of the zone, not the ID.

- Yorkville West
- JFK Airport
- East Harlem North
- East Harlem South

**Answer: JFK Airport**

Query:
```sql
SELECT
    tz_dropoff."Zone" AS dropoff_zone,
    MAX(gtd."tip_amount") AS largest_tip
FROM green_taxi_data gtd
JOIN taxi_zone tz_pickup ON gtd."PULocationID" = tz_pickup."LocationID"
JOIN taxi_zone tz_dropoff ON gtd."DOLocationID" = tz_dropoff."LocationID"
WHERE tz_pickup."Zone" = 'East Harlem North'
AND gtd.lpep_pickup_datetime >= '2019-10-01'
AND gtd.lpep_pickup_datetime < '2019-11-01'
GROUP BY dropoff_zone
ORDER BY largest_tip DESC
LIMIT 1;
```

## Question 7. Terraform Workflow

Which of the following sequences, **respectively**, describes the workflow for: 
1. Downloading the provider plugins and setting up backend,
2. Generating proposed changes and auto-executing the plan
3. Remove all resources managed by terraform`

- terraform import, terraform apply -y, terraform destroy
- teraform init, terraform plan -auto-apply, terraform rm
- terraform init, terraform run -auto-aprove, terraform destroy
- terraform init, terraform apply -auto-aprove, terraform destroy
- terraform import, terraform apply -y, terraform rm

**Answer: terraform init, terraform apply -auto-aprove, terraform destroy**


Explain/Commands

Downloading the provider plugins and setting up backend:
   - `terraform init`
        ```bash
            terraform init
            Initializing the backend...
            Initializing provider plugins...
            - Reusing previous version of hashicorp/google from the dependency lock file
            - Using previously-installed hashicorp/google v6.16.0
            
            Terraform has been successfully initialized!
        ```

Generating proposed changes and auto-executing the plan:
   - `terraform apply -auto-approve`
        ```bash
            terraform apply -auto-approve

            Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
              + create
            (...)
            Plan: 1 to add, 0 to change, 0 to destroy.
            google_storage_bucket.learn-bucket: Creating...
            google_storage_bucket.learn-bucket: Creation complete after 2s [id=learning-v001-bucket]

            Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
        ```

Remove all resources managed by terraform:
   - `terraform destroy`
        ```bash
            terraform destroy
            google_storage_bucket.learn-bucket: Refreshing state... [id=learning-v001-bucket]

            Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
              - destroy
            (...)
            google_storage_bucket.learn-bucket: Destroying... [id=learning-v001-bucket]
            google_storage_bucket.learn-bucket: Destruction complete after 2s

            Destroy complete! Resources: 1 destroyed.
        ```
 