# Ungraded Lab: Data Exploration Lab

## 📋 Overview 
Welcome to the Data Exploration lab! In this hands-on session, you'll investigate BookCycle's data to help the management team gain valuable insights about their inventory and sales. You'll learn how to sort query results and use basic aggregate functions to summarize data, skills that are crucial for data analysis in real business settings.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:
<ul>
    <li>Sort query results using ORDER BY</li>
    <li>Use basic aggregate functions (COUNT, SUM, AVG) to summarize data</li>
    <li>Apply filters with WHERE clauses in combination with aggregations</li>
    <li>Interpret summarized data to derive business insights</li>
</ul>

## 📚 Dataset Information
We'll be working with the 'books' table from the BookCycle database. This table contains information about the books in inventory, including details like title, author, genre, condition, pricing, and location.

## 🖥️ Activities

### Activity 1: Connecting to the Database and Basic Sorting 

As a data analyst at BookCycle, your first task is to organize the book inventory data to help the management team quickly access information about their stock.

<b>Step 1</b>: Import the necessary libraries and connect to the database:


In [1]:
import sqlite3
import pandas as pd

# Setting up the database. DO NOT edit the code given below
from db_setup import setup_database
setup_database() 

✅ Database setup complete: Tables created and populated with data!


In [2]:
# Connect to the SQLite database
conn = sqlite3.connect('bookcycle.db')

# Test the connection by querying the first 5 rows of the books table
query = """
SELECT *
FROM books
LIMIT 5;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,book_id,title,author,isbn,genre,condition,purchase_price,list_price,date_acquired,current_location,quantity
0,B1001,A Christmas Carol,Charles Dickens,9780141324524,Classic Fiction,Good,5.5,8.99,2023-01-15,Suburban,2
1,B1002,A Farewell to Arms,Ernest Hemingway,9780684801469,Classic Fiction,Very Good,7.0,11.99,2023-01-15,University,3
2,B1003,A Tale of Two Cities,Charles Dickens,9780141439600,Classic Fiction,Fair,4.5,7.99,2023-01-16,Downtown,2
3,B1004,Adventures of Huckleberry Finn,Mark Twain,9780142437179,Classic Fiction,Good,6.0,9.99,2023-01-16,University,4
4,B1005,Agnes Grey,Anne Bronte,9780140432107,Classic Fiction,Fair,4.0,7.99,2023-01-17,Suburban,1


<b>Step 2:</b> Let's sort the books by their list price in descending order

In [3]:
query = """
SELECT title, author, list_price
FROM books
ORDER BY list_price DESC
LIMIT 10;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,title,author,list_price
0,Crime and Punishment,Fyodor Dostoevsky,13.99
1,Iliad,Homer,13.99
2,Moby Dick,Herman Melville,13.99
3,The Catcher in the Rye,J.D. Salinger,13.99
4,The Republic,Plato,13.99
5,Thus Spoke Zarathustra,Friedrich Nietzsche,13.99
6,Ulysses,James Joyce,13.99
7,Anne of Green Gables,L.M. Montgomery,12.99
8,Brave New World,Aldous Huxley,12.99
9,Fahrenheit 451,Ray Bradbury,12.99


 <b>💡 Tip:</b> The ORDER BY clause is used to sort the results. DESC specifies descending order.

<b>Step 3: Try it yourself:</b> Sort the books by title in alphabetical order and display the first 15 results.

In [None]:
query = """
<YOUR CODE HERE>
"""

df = pd.read_sql_query(query, conn)
display(df)

#### ⚙️ Test Your Work:
<ul>
    <li>Did your query execute without errors?</li>
    <li>Are the books sorted alphabetically by title?</li>
    <li>Did you see 15 results?</li>

</ul>

### Activity 2: Using Aggregate Functions 

The management team wants to understand the overall state of their inventory. You'll use aggregate functions to provide summary statistics.

<b>Step 1:</b> Let's start by counting the total number of books in the inventory:

In [4]:
query = """
SELECT COUNT(*) as total_books
FROM books;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,total_books
0,100


<b>Step 2:</b> Let's calculate the average purchase price and list price of the books:

In [5]:
query = """
SELECT 
    AVG(purchase_price) as avg_purchase_price,
    AVG(list_price) as avg_list_price
FROM books;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,avg_purchase_price,avg_list_price
0,6.44,11.01


<b>Step 3: Try it yourself:</b> Calculate the total inventory value (sum of all list prices) and the number of unique authors in the database.


In [None]:
query = """
<YOUR CODE HERE>
"""

df = pd.read_sql_query(query, conn)
display(df)

#### ⚙️ Test Your Work:
<ul>
    <li>Did your query execute without errors?</li>
    <li>Do you see two columns: total_inventory_value and unique_authors?</li>
    <li>Are the values reasonable given what you know about the bookstore?</li>

</ul>

### Activity 3: Combining Aggregations with Filters

The management wants to analyze the inventory of specific genres and conditions to make informed decisions about future purchases.

<b>Step 1:</b> Let's find the average list price for books in 'Classic Fiction' genre:

In [6]:
query = """
SELECT AVG(list_price) as avg_price_classic_fiction
FROM books
WHERE genre = 'Classic Fiction';
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,avg_price_classic_fiction
0,11.08589


<b>Step 2:</b> Let's count the number of books in 'Very Good' condition for each genre:

In [7]:
query = """
SELECT genre, COUNT(*) as count_very_good
FROM books
WHERE condition = 'Very Good'
GROUP BY genre
ORDER BY count_very_good DESC;
"""

df = pd.read_sql_query(query, conn)
display(df)

Unnamed: 0,genre,count_very_good
0,Classic Fiction,19
1,Classic Poetry,4
2,Classic Drama,4
3,Classic Mystery,1


<b>Step 3: Try it yourself:</b>  Find the total value (sum of list prices) of books in 'Like New' condition for each location, sorted by total value in descending order.

In [None]:
query = """
<YOUR CODE HERE>
"""

df = pd.read_sql_query(query, conn)
display(df)

#### Close the Connection
It's good practice to close the database connection when you're done

In [8]:
# Close the database connection 
conn.close()

#### ⚙️ Test Your Work:

- Did your query execute without errors?</li>
- Do you see results for different locations?</li>
- Are the results sorted with the highest total value first?</li>

## ✅ Success Checklist
- You can sort query results using ORDER BY
- You can use COUNT, SUM, and AVG functions to summarize data
- You can combine WHERE clauses with aggregations
- You can interpret the results to derive business insights

## 🔍 Common Issues & Solutions 
- Problem: Syntax error in SQL query 
    - Solution: Double-check your SQL syntax, especially commas between selected columns and semicolons at the end of queries
- Problem: Unexpected results from aggregations 
    - Solution: Verify that you're grouping correctly and using the right aggregate function for your needs

## ➡️ Summary

Congratulations on completing the Data Exploration lab – you've now mastered essential SQL skills for sorting and aggregating data, enabling you to extract valuable insights from complex datasets and make data-driven decisions in real-world business scenarios.

### 🔑 Key Points
- ORDER BY is used to sort results in ascending (default) or descending (DESC) order
- Aggregate functions like COUNT, SUM, and AVG summarize data
- WHERE clauses can be used with aggregations to filter data before summarizing
- GROUP BY is used with aggregations to summarize data for each group
