# Graded Lab: Environment Setup Challenge 

## 📋 Overview 
Welcome to your first graded SQL lab! You'll be working as a data analyst for an e-commerce company that needs to audit their product catalog data. Using SQL queries in Jupyter Notebook, you'll explore the data, identify quality issues, and generate business insights that could help optimize their inventory across different cities.

## 🎯 Learning Outcomes 
By the end of this lab, you will be able to:
- Write basic SELECT statements to retrieve data from a product catalog
- Use WHERE clauses to filter query results
- Identify data quality issues using SQL

## 📚 Dataset Information 
You'll be working with the product catalog dataset <b>(products.csv)</b> containing:
- Product IDs and locations
- Product URLs and tags
- Product image links
- Data across multiple cities

## 🚀 Activities

### Activity 1: Initial Set Up and Exploration

Before analyzing the data quality, we need to understand what's in our product catalog.

<b>Step 1:</b> Connect to the Database

In [1]:
import sqlite3
import pandas as pd
from db_setup_challenge import setup_database
setup_database() 

# Connect to the database
conn = sqlite3.connect('product_catalog.db')
cursor = conn.cursor()

<b>Step 2:</b> View the basic Data Structure

In [2]:
query = "SELECT * FROM products LIMIT 5"
df = pd.read_sql_query(query, conn)

# Display the result
df.head()

Unnamed: 0,product_id,city,product_url,tags,product_picture
0,1,New York,https://data_shop_inc.com/product/1,electronics,https://picsum.photos/300?random=1
1,2,Miami,https://data_shop_inc.com/product/2,electronics,https://picsum.photos/300?random=2
2,3,New York,https://data_shop_inc.com/product/3,summer,https://picsum.photos/300?random=3
3,4,Houston,https://data_shop_inc.com/product/4,beauty,https://picsum.photos/300?random=4
4,5,Miami,https://data_shop_inc.com/product/5,beauty,https://picsum.photos/300?random=5


<b>💡 Tip:</b> Always start with LIMIT to avoid overwhelming your output.


<b>Step 3:</b> Let's count how many products we have

In [3]:
# Query to count the total number of products
query = "SELECT COUNT(*) as total_products FROM products"
df = pd.read_sql_query(query, conn)

# Display the result
df.head()

Unnamed: 0,total_products
0,103


<b>Step 4:</b> Now, count Products by City

In [4]:
# Query to count products by city
query = "SELECT city, COUNT(*) as product_count FROM products GROUP BY city"
df = pd.read_sql_query(query, conn)

# Display the result
df.head()

Unnamed: 0,city,product_count
0,Chicago,23
1,Houston,21
2,Los Angeles,17
3,Miami,24
4,New York,18


### Activity 2: Graded Challenges

For the tasks given below, replace <b>"YOUR CODE HERE"</b> with the required query statement which will provide the necessary output. Add your query in a single line, not as a multi-line SQL query <br>
<b>For eg: </b> query = "SELECT column_name FROM table_name" <br>


<b>Task 1:</b> Write a query to find all distinct products categories (tags)

In [5]:
# Query to find distinct product categories (tags)

query= "SELECT DISTINCT tags FROM products"

df = pd.read_sql_query(query, conn)
# Display the result
display(df)

Unnamed: 0,tags
0,electronics
1,summer
2,beauty
3,sale
4,home
5,fashion
6,sports
7,


In [6]:
# Do not edit this cell, just run it. This cell contains test cases.

cursor.execute(query)
rows = cursor.fetchall()
category_count = len(rows)


<b>Task 2:</b> Find how many unique product tags exist in Miami.  

In [7]:
## Query to count unique product tags in Miami

query= "SELECT COUNT(DISTINCT tags) AS unique_tag_count FROM products WHERE city = 'Miami'"

df = pd.read_sql_query(query, conn)
# Display the result
display(df)

Unnamed: 0,unique_tag_count
0,7


In [8]:
# Do not edit this cell, just run it. This cell contains test cases.

cursor.execute(query)
rows = cursor.fetchall()
product_count = rows[0][0]


<b>Task 3:</b> Write a query that will identify which products are lacking visual assets.

In [9]:
# Query to find products lacking visual assets (NULL product_picture)

query= "SELECT * FROM products WHERE product_picture IS NULL"

df = pd.read_sql_query(query, conn)
# Display the result
display(df)

Unnamed: 0,product_id,city,product_url,tags,product_picture
0,6,Miami,https://data_shop_inc.com/product/6,beauty,
1,10,Chicago,https://data_shop_inc.com/product/10,beauty,
2,14,Miami,https://data_shop_inc.com/product/14,home,
3,23,Chicago,https://data_shop_inc.com/product/23,fashion,
4,33,Miami,https://data_shop_inc.com/product/33,,
5,42,New York,https://data_shop_inc.com/product/42,summer,
6,72,Chicago,https://data_shop_inc.com/product/72,fashion,
7,82,Houston,https://data_shop_inc.com/product/82,sale,
8,92,New York,https://data_shop_inc.com/product/92,sale,


In [10]:
# Do not edit this cell, just run it. This cell contains test cases.

cursor.execute(query)
row = cursor.fetchall()
null_picture_count = len(row) 


<b>Task 4:</b> Find products where the product_url doesn't contain 'data_shop_inc.com'.

Tip : Use <b>NOT LIKE</b> to check whether url contains 'data_shop_inc.com' .

In [11]:
# Query to find products where product_url doesn't contain 'data_shop_inc.com'

query= "SELECT * FROM products WHERE product_url NOT LIKE '%data_shop_inc.com%'"

df = pd.read_sql_query(query, conn)
# Display the result
display(df)

Unnamed: 0,product_id,city,product_url,tags,product_picture
0,8,New York,https://product/8,summer,https://picsum.photos/300?random=8
1,13,Miami,https://product/13,home,https://picsum.photos/300?random=13
2,52,Miami,https://product/52,fashion,https://picsum.photos/300?random=52
3,87,Chicago,https://product/87,sale,https://picsum.photos/300?random=87
4,97,Miami,https://product/97,electronics,https://picsum.photos/300?random=97


In [12]:
# Do not edit this cell, just run it. This cell contains test cases.
cursor.execute(query)
rows = cursor.fetchall()
null_producturl_count = len(rows) 


## 🔍 Verify Your Results Before Submission
Please check that your queries returned the following expected results:
1. Task 1: Your query should return 8 distinct product categories (tags)
2. Task 2: You should find 7 unique product tags in Miami
3. Task 3: Your query should identify 9 products lacking visual assets (NULL product_picture)
4. Task 4: You should find 5 products where product_url doesn't contain 'data_shop_inc.com'

## 📋 Pre-Submission Checklist:
- All queries run without syntax errors
- Each query returns the expected number of results
- You've used the correct SQL clauses:
  * Task 1: SELECT DISTINCT
  * Task 2: COUNT, DISTINCT, WHERE
  * Task 3: WHERE ... IS NULL
  * Task 4: WHERE ... NOT LIKE
  

## 🔄 Need to Review?
If your results don't match the expected outputs:
    1. Double-check your WHERE clauses for correct conditions
    2. Verify you're using DISTINCT where required
    3. Ensure your column names match exactly (product_picture, product_url, etc.)
    4. Run each query again after making corrections<br>
    
You can retry the lab as many times as needed until you get the correct results.
Ready to submit? Make sure all your cells have been run in order and show the expected outputs!
