# Investigating Missing Packages in Boston

This project involves analyzing the `packages.db` database, associated with the problem set available at [https://cs50.harvard.edu/sql/2024/psets/1/packages/](https://cs50.harvard.edu/sql/2024/psets/1/packages/), to locate missing packages reported by concerned residents of Boston. As a mail clerk, the task is to use the provided database schema and the limited information given by each customer to determine the current location, address type, and contents of their lost parcels. The process of investigation will be documented through SQL queries, demonstrating the steps taken to solve each mystery.

## Database Schema: `packages.db`

The `packages.db` database contains several tables that track the delivery of packages within Boston. The relationships between these tables are essential for tracing the journey of each package. The schema of each table is detailed below:

### `addresses` Table

| Column Name | Data Type | Description                               |
|-------------|-----------|-------------------------------------------|
| `id`        | INTEGER   | Unique identifier for each address.       |
| `address`   | TEXT      | The full street address (e.g., 7660 Sharon Street). |
| `type`      | TEXT      | The type of address (e.g., residential, commercial). |

### `drivers` Table

| Column Name | Data Type | Description                     |
|-------------|-----------|---------------------------------|
| `id`        | INTEGER   | Unique identifier for each driver. |
| `name`      | TEXT      | The first name of the driver.   |

### `packages` Table

| Column Name      | Data Type | Description                                                              |
|------------------|-----------|--------------------------------------------------------------------------|
| `id`             | INTEGER   | Unique identifier for each package.                                      |
| `contents`       | TEXT      | A description of the package's contents.                               |
| `from_address_id`| INTEGER   | Foreign key referencing the `id` in the `addresses` table for the sender. |
| `to_address_id`  | INTEGER   | Foreign key referencing the `id` in the `addresses` table for the intended recipient. **Note:** This is not necessarily the final destination. |

### `scans` Table

| Column Name | Data Type | Description                                                                 |
|-------------|-----------|-----------------------------------------------------------------------------|
| `id`        | INTEGER   | Unique identifier for each scan record.                                     |
| `driver_id` | INTEGER   | Foreign key referencing the `id` in the `drivers` table for the driver who performed the scan. |
| `package_id`| INTEGER   | Foreign key referencing the `id` in the `packages` table for the scanned package. |
| `address_id`| INTEGER   | Foreign key referencing the `id` in the `addresses` table where the scan occurred. |
| `action`    | TEXT      | The type of scan action: "Pick" (picked up) or "Drop" (dropped off).          |
| `timestamp` | TEXT      | The date and time of the scan.                                              |

## Investigation Process

For each reported missing package, as detailed in the linked problem set, the investigation will involve formulating and executing SQL queries on the `packages.db` database. These queries will aim to trace the last known location of the package by examining the `scans` table, identify the type of address where the package was last scanned, and retrieve the contents of the package from the `packages` table.

In [18]:
import sqlite3
import pandas as pd

### Connecting to the Database

In [19]:
connection = sqlite3.connect("data_bases/packages.db")
print("Connected to the database successfully!")

Connected to the database successfully!


## The Lost Letter

Clerk, my name’s Anneke. I live over at 900 Somerville Avenue. Not long ago, I sent out a special letter. It’s meant for my friend Varsha. She’s starting a new chapter of her life at 2 Finnegan Street, uptown. (That address, let me tell you: it was a bit tricky to get right the first time.) The letter is a congratulatory note—a cheery little paper hug from me to her, to celebrate this big move of hers. Can you check if it’s made its way to her yet?

In [20]:
query = """
SELECT
    scans."package_id",
    scans."action",
    addresses."address",
    addresses."type" AS "address_type",
    packages."contents",
    scans."timestamp" AS "scan_time"
FROM scans
JOIN addresses ON scans."address_id" = addresses."id"
JOIN packages ON packages."id" = scans."package_id"
WHERE packages."from_address_id" = (
        SELECT "id"
        FROM addresses
        WHERE "address" = '900 Somerville Avenue'
    )
    AND packages."contents" LIKE '%congrat%'
ORDER BY "address_type"
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,package_id,action,address,address_type,contents,scan_time
0,384,Pick,900 Somerville Avenue,Residential,Congratulatory letter,2023-07-11 19:33:55.241794
1,384,Drop,2 Finnigan Street,Residential,Congratulatory letter,2023-07-11 23:07:04.432178


### Answer

Our investigation of the delivery records shows a discrepancy in the destination address:
- **Original Address**: 2 Finnegan Street
- **Actual Delivery**: 2 Finnigan Street
- **Status**: Delivered
- **Contents**: Congratulatory letter

## The Devious Delivery

Good day to you, deliverer of the mail. You might remember that not too long ago I made my way over from the town of Fiftyville. I gave a certain box into your reliable hands and asked you to keep things low. My associate has been expecting the package for a while now. And yet, it appears to have grown wings and flown away. Ha! Any chance you could help clarify this mystery? Afraid there’s no “From” address. It’s the kind of parcel that would add a bit more… quack to someone’s bath times, if you catch my drift.

In [21]:
query = """
SELECT
    scans."package_id",
    scans."action",
    addresses."address",
    addresses."type" AS "address_type",
    packages."contents",
    scans."timestamp" AS "scan_time"
FROM scans
JOIN addresses ON scans."address_id" = addresses."id"
JOIN packages ON packages."id" = scans."package_id"
WHERE packages."from_address_id" IS NULL
ORDER BY "address_type"
;
"""



df = pd.read_sql_query(query, connection)

df

Unnamed: 0,package_id,action,address,address_type,contents,scan_time
0,5098,Drop,7 Humboldt Place,Police Station,Duck debugger,2023-10-24 10:08:55.610754
1,5098,Pick,123 Sesame Street,Residential,Duck debugger,2023-10-24 08:40:16.246648


### Answer

The tracking data reveals the following package details:
- **Package ID**: 5098
- **Contents**: Duck debugger
- **Current Location**: 7 Humboldt Place (Police Station)
- **Timeline**:
  1. Picked up: 123 Sesame Street at 08:40:16 (Oct 24, 2023)
  2. Delivered: Police Station at 10:08:55 (Oct 24, 2023)
- **Status**: Under police custody

## The Forgotten Gift

Oh, excuse me, Clerk. I had sent a mystery gift, you see, to my wonderful granddaughter, off at 728 Maple Place. That was about two weeks ago. Now the delivery date has passed by seven whole days and I hear she still waits, her hands empty and heart filled with anticipation. I’m a bit worried wondering where my package has gone. I cannot for the life of me remember what’s inside, but I do know it’s filled to the brim with my love for her. Can we possibly track it down so it can fill her day with joy? I did send it from my home at 109 Tileston Street.

In [22]:
query = """
SELECT
    scans."package_id",
    scans."action",
    addresses."address",
    addresses."type" AS "address_type",
    packages."contents",
    scans."timestamp" AS "scan_time"
FROM scans
JOIN addresses ON scans."address_id" = addresses."id"
JOIN packages ON packages."id" = scans."package_id"
WHERE packages."from_address_id" = (
    SELECT "id"
    FROM addresses
    WHERE address = '109 Tileston Street'
    ) AND packages."to_address_id" = (
    SELECT "id"
    FROM addresses
    WHERE address = '728 Maple Place'
    )
ORDER BY "address_type"
;
"""



df = pd.read_sql_query(query, connection)

df

Unnamed: 0,package_id,action,address,address_type,contents,scan_time
0,9523,Pick,109 Tileston Street,Residential,Flowers,2023-08-16 21:41:43.219831
1,9523,Drop,950 Brannon Harris Way,Warehouse,Flowers,2023-08-17 03:31:36.856889
2,9523,Pick,950 Brannon Harris Way,Warehouse,Flowers,2023-08-23 19:41:47.913410


Investigation of the delivery records shows:
- **Package ID**: 9523
- **Contents**: Flowers
- **Shipping Timeline**:
  1. Picked up from 109 Tileston Street (Aug 16, 21:41)
  2. Delivered to warehouse at 950 Brannon Harris Way (Aug 17, 03:31)
  3. Remained in warehouse for 6 days
- **Status**: Delayed in transit
- **Impact**: Due to the extended warehouse storage time, the floral delivery has likely been compromised

### Disconnecting from the Database

In [23]:
connection.close()