# Fetch Challenge - Requirement 2

Generate a query that answers a predetermined business question. 

We will use DuckDB to build a database off of the files we exported from the other notebook.

In [25]:
import duckdb

from IPython.display import display, HTML # Widen the cells 
display(HTML("<style>.container { width:100% !important; }</style>"))

In [3]:
conn = duckdb.connect('Fetch')

In [28]:
# Here is how easily we can read from a csv file
conn.sql("SELECT * FROM '/Users/wynnephilpott/Documents/Missing_Pauldron/Fetch-Challenge/CSV-Files/users_table.csv' LIMIT 3;")

┌──────────────────────────┬─────────┬─────────────┬────────────┬──────────┬──────────────┬─────────┐
│           _id            │ active  │ createdDate │ lastLogin  │   role   │ signUpSource │  state  │
│         varchar          │ boolean │    date     │    date    │ varchar  │   varchar    │ varchar │
├──────────────────────────┼─────────┼─────────────┼────────────┼──────────┼──────────────┼─────────┤
│ 5ff1e194b6a9d73a3a9f1052 │ true    │ 2021-01-03  │ 2021-01-03 │ consumer │ Email        │ WI      │
│ 5ff1e1eacfcf6c399c274ae6 │ true    │ 2021-01-03  │ 2021-01-03 │ consumer │ Email        │ WI      │
│ 5ff1e1e8cfcf6c399c274ad9 │ true    │ 2021-01-03  │ 2021-01-03 │ consumer │ Email        │ WI      │
└──────────────────────────┴─────────┴─────────────┴────────────┴──────────┴──────────────┴─────────┘

In [11]:
# Here is how we create the users table in the Fetch database
conn.sql("CREATE TABLE users AS\
    SELECT * FROM read_csv('CSV-Files/users_table.csv', AUTO_DETECT=TRUE);")

In [21]:
# Create brands table in database
conn.sql("CREATE TABLE brands AS\
    SELECT * FROM read_csv('CSV-Files/brands_table.csv', AUTO_DETECT=TRUE);")

In [12]:
# Create receipts table in database
conn.sql("CREATE TABLE receipts AS\
    SELECT * FROM read_csv('CSV-Files/receipts_table.csv', AUTO_DETECT=TRUE);")

In [16]:
# Create receipt_items table
conn.sql("CREATE TABLE receipt_items AS\
    SELECT * FROM read_csv('CSV-Files/receipt_item_table.csv', AUTO_DETECT=TRUE);")

## Question 1

What are the top 5 brands by receipts scanned for most recent month?

In [22]:
conn.sql("""WITH most_recent_month AS (
    SELECT DATE_TRUNC('month', MAX(dateScanned)) as month
    FROM receipts
)
SELECT 
    b.name,
    COUNT(DISTINCT r._id) as number_of_scans,
    SUM(CAST(ri.finalPrice AS DECIMAL)) as total_spent
FROM receipt_items as ri
JOIN receipts as r ON ri.receipt_id = r._id
JOIN brands as b ON CAST(ri.barcode AS VARCHAR) = CAST(b.barcode AS VARCHAR)
WHERE DATE_TRUNC('month', r.dateScanned) = (SELECT month FROM most_recent_month)
GROUP BY b.name
ORDER BY number_of_scans DESC
LIMIT 5;""")

┌─────────┬─────────────────┬───────────────┐
│  name   │ number_of_scans │  total_spent  │
│ varchar │      int64      │ decimal(38,3) │
├───────────────────────────────────────────┤
│                  0 rows                   │
└───────────────────────────────────────────┘

Probably something going on with the JOIN. Lets check our brand matches.

In [23]:
conn.sql("""SELECT COUNT(*)
FROM receipt_items as ri
JOIN brands as b ON CAST(ri.barcode AS VARCHAR) = CAST(b.barcode AS VARCHAR);""")

┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│           89 │
└──────────────┘

Thats not a lot at all. This must be because there are a ton of nulls in the barcode column. This is a problem becuase it severey limits our analysis. With 7381 receipt_items but only 89 with matching barcodes, there is not a lot we can glean.

Lets see if we can get a better story using the description column from the reciepts_items table.

In [24]:
conn.sql("""
WITH most_recent_month AS (
    SELECT DATE_TRUNC('month', MAX(dateScanned)) as month
    FROM receipts
)
SELECT 
    description,
    COUNT(DISTINCT r._id) as number_of_receipts,
    SUM(CAST(finalPrice AS DECIMAL)) as total_spent
FROM receipt_items ri
JOIN receipts r ON ri.receipt_id = r._id
WHERE description IS NOT NULL
    AND DATE_TRUNC('month', r.dateScanned) = (SELECT month FROM most_recent_month)
GROUP BY description
ORDER BY number_of_receipts DESC
LIMIT 5;""")

┌─────────────────────────────────────────────────────────────────────────────────┬────────────────────┬───────────────┐
│                                   description                                   │ number_of_receipts │  total_spent  │
│                                     varchar                                     │       int64        │ decimal(38,3) │
├─────────────────────────────────────────────────────────────────────────────────┼────────────────────┼───────────────┤
│ mueller austria hypergrind precision electric spice/coffee grinder millwith l…  │                 13 │       298.610 │
│ thindust summer face mask - sun protection neck gaiter for outdooractivities    │                 13 │       155.870 │
└─────────────────────────────────────────────────────────────────────────────────┴────────────────────┴───────────────┘

## Question 4

When considering total number of items purchased from receipts with 'rewardsReceiptStatus’ of ‘Accepted’ or ‘Rejected’, which is greater?

Asumming that 'Accepted' is the same as 'FINISHED', we can see that the number of items purchased is greater for 'FINISHED' receipts.

In [26]:
conn.sql("""SELECT 
    r.rewardsReceiptStatus,
    COUNT(DISTINCT r._id) as number_of_receipts,
    COUNT(ri.*) as total_items_purchased,
    SUM(ri.quantityPurchased) as total_quantity_purchased,
FROM receipts r
LEFT JOIN receipt_items ri ON r._id = ri.receipt_id
WHERE r.rewardsReceiptStatus IN ('FINISHED', 'REJECTED')
GROUP BY r.rewardsReceiptStatus
ORDER BY total_items_purchased DESC;""")

┌──────────────────────┬────────────────────┬───────────────────────┬──────────────────────────┐
│ rewardsReceiptStatus │ number_of_receipts │ total_items_purchased │ total_quantity_purchased │
│       varchar        │       int64        │         int64         │          double          │
├──────────────────────┼────────────────────┼───────────────────────┼──────────────────────────┤
│ FINISHED             │                518 │                  5920 │                   8176.0 │
│ REJECTED             │                 71 │                   167 │                    141.0 │
└──────────────────────┴────────────────────┴───────────────────────┴──────────────────────────┘