# SQL for Analysts

## Introduction
SQL (Structured Query Language) is essential for analysts to query, filter, aggregate, and join data directly from databases.
In this notebook, we’ll focus on the SQL commands analysts use most often for reporting and analytics.
We’ll use SQLite within Python for a self-contained example.

## 1. Setting up SQLite Database in Python




In [1]:
import sqlite3
import pandas as pd

# Create an in-memory SQLite database
conn = sqlite3.connect(":memory:")

# Sample dataset
data = {
    "CustomerID": [1, 2, 3, 4, 5],
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Country": ["USA", "USA", "UK", "Germany", "USA"],
    "Sales": [250, 450, 300, 500, 150]
}

df = pd.DataFrame(data)

# Load into SQLite
df.to_sql("customers", conn, index=False, if_exists="replace")

df

Unnamed: 0,CustomerID,Name,Country,Sales
0,1,Alice,USA,250
1,2,Bob,USA,450
2,3,Charlie,UK,300
3,4,David,Germany,500
4,5,Eva,USA,150


### Explanation

- We created a small customer dataset in Pandas

- Loaded it into an in-memory SQLite database to simulate querying real-world data.

## 2. Selecting & Filtering Data



In [2]:
query = "SELECT * FROM customers WHERE Sales > 200;"
pd.read_sql_query(query, conn)

Unnamed: 0,CustomerID,Name,Country,Sales
0,1,Alice,USA,250
1,2,Bob,USA,450
2,3,Charlie,UK,300
3,4,David,Germany,500


### Explanation

- `SELECT *` fetches all columns

- `WHERE Sales > 200` filters rows based on a condition.

## 3. Aggregating Data



In [3]:
query = """
SELECT Country, SUM(Sales) AS TotalSales
FROM customers
GROUP BY Country
ORDER BY TotalSales DESC;
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Country,TotalSales
0,USA,850
1,Germany,500
2,UK,300


### Explanation

- `GROUP BY` groups rows by a column (Country)

- `SUM()` aggregates the sales per country

- `ORDER BY` sorts results by Total Sales.

## 4. Joining Tables



In [4]:
# Create another table
orders = pd.DataFrame({
    "OrderID": [101, 102, 103, 104],
    "CustomerID": [1, 2, 2, 3],
    "OrderAmount": [100, 200, 150, 300]
})

orders.to_sql("orders", conn, index=False, if_exists="replace")

# SQL Join
query = """
SELECT c.Name, o.OrderAmount
FROM customers c
JOIN orders o
ON c.CustomerID = o.CustomerID;
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Name,OrderAmount
0,Alice,100
1,Bob,150
2,Bob,200
3,Charlie,300


### Explanation

- `JOIN` merges data from multiple tables

- The `ON` condition specifies matching rows between tables.

## 5. Using SQL for Insights


In [5]:
query = """
SELECT Country, AVG(Sales) AS AvgSales
FROM customers
GROUP BY Country
HAVING AvgSales > 200;
"""
pd.read_sql_query(query, conn)

Unnamed: 0,Country,AvgSales
0,Germany,500.0
1,UK,300.0
2,USA,283.333333


### Explanation

- `HAVING` filters aggregated results

- Only countries with average sales above 200 are shown.

## Conclusion
In this notebook, we learned:

- How to connect Pandas with SQLite for SQL queries

- Basic querying, filtering, and aggregation

- Joining multiple datasets

- Extracting insights with SQL functions

SQL is a must-have tool for analysts to efficiently retrieve and manipulate data from databases.