<a href="https://colab.research.google.com/github/boonecabaldev/pandas_exercises/blob/main/Pandas_Exercises_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Exercises


your doin' grate son

## Problem 1: Merging and Joining DataFrames

---

*Files: `orders.csv` and `customers.csv`*

`orders.csv`:

```
OrderID,CustomerID,OrderDate,Amount
101,1,2023-01-15,250.50
102,3,2023-02-10,180.75
103,2,2023-03-05,315.00
104,1,2023-03-20,400.25
105,4,2023-04-12,95.80
```

`customers.csv`:

```
CustomerID,FirstName,LastName,City
1,John,Doe,New York
2,Jane,Smith,Los Angeles
3,Alice,Johnson,Chicago
4,Bob,Brown,Miami
```

**Tasks:**

1. Read both CSV files into DataFrames.
2. Merge the DataFrames based on the `CustomerID` column.
3. Calculate the total amount spent per customer.
4. Find customers who haven't placed any orders.

**Solution:**

In [None]:
import pandas as pd

orders_frame = pd.read_csv('sample_data/orders.csv')
customers_frame = pd.read_csv('sample_data/customers.csv')

merged_frame = pd.merge(orders_frame, customers_frame, on='CustomerID')

total_amount_per_customer = merged_frame.groupby('CustomerID')['Amount'].sum()

customers_without_orders = customers_frame[~customers_frame['CustomerID'].isin(merged_frame['CustomerID'])]
total_amount_per_customer

CustomerID
1    650.75
2    315.00
3    180.75
4     95.80
Name: Amount, dtype: float64

In [None]:
orders_frame

Unnamed: 0,OrderID,CustomerID,OrderDate,Amount
0,101,1,2023-01-15,250.5
1,102,3,2023-02-10,180.75
2,103,2,2023-03-05,315.0
3,104,1,2023-03-20,400.25
4,105,4,2023-04-12,95.8


In [None]:
customers_frame

Unnamed: 0,CustomerID,FirstName,LastName,City
0,1,John,Doe,New York
1,2,Jane,Smith,Los Angeles
2,3,Alice,Johnson,Chicago
3,4,Bob,Brown,Miami


In [None]:
merged_frame

Unnamed: 0,OrderID,CustomerID,OrderDate,Amount,FirstName,LastName,City
0,101,1,2023-01-15,250.5,John,Doe,New York
1,104,1,2023-03-20,400.25,John,Doe,New York
2,102,3,2023-02-10,180.75,Alice,Johnson,Chicago
3,103,2,2023-03-05,315.0,Jane,Smith,Los Angeles
4,105,4,2023-04-12,95.8,Bob,Brown,Miami


## Problem 2:  Reshaping Data with Pivot Tables

*File: `sales_data_pivot.csv`*

```
Region,Product,Quarter,Sales
North,A,Q1,5000
South,A,Q1,4500
East,B,Q1,7000
West,B,Q1,6800
North,A,Q2,5200
South,A,Q2,4800
East,B,Q2,7200
West,B,Q2,7100
```

**Tasks:**

1. Read the CSV.
2. Create a pivot table with 'Region' as the index, 'Quarter' as columns, and 'Sales' as values.
3. Fill any missing values in the pivot table with 0.
4. Calculate the total sales for each region across all quarters.

**Solution:**

In [None]:
import pandas as pd

df_sales = pd.read_csv('sample_data/sales_data_pivot.csv')

pivot_table = df_sales.pivot_table(index='Region', columns='Quarter', values='Sales', fill_value=0)
print(pivot_table)

total_sales_per_region = pivot_table.sum(axis=1)

print(f"\n{total_sales_per_region}")

Quarter    Q1    Q2
Region             
East     7000  7200
North    5000  5200
South    4500  4800
West     6800  7100

Region
East     14200
North    10200
South     9300
West     13900
dtype: int64


## Problem 3:  Dealing with Duplicate Data

*File: `product_catalog.csv`* (some products listed multiple times with slight variations)

```
ProductID,ProductName,Category,Price
123,Laptop A,Electronics,999.99
123,Laptop A,Electronics,1050.00 # Price difference
456,Smartphone X,Electronics,599.99
456,Smartphone X ,Electronics,599.99  # Extra space in name
789,Headphones Y,Audio,149.99
```

**Tasks:**

1. Read the CSV.
2. Identify and remove duplicate rows based on `ProductID`.
3. For remaining duplicates, keep the row with the lowest price.

**Solution:**

In [None]:
import pandas as pd

# Read CSV
df_products = pd.read_csv('sample_data/product_catalog.csv')

# Identify and remove duplicate rows
df_products.drop_duplicates(subset='ProductID', keep='first', inplace=True)

# For remaining duplicates, keep the row with the lowest price
df_products.sort_values(['ProductID', 'Price']).drop_duplicates(subset='ProductID', keep='first', inplace=True)

df_products

Unnamed: 0,ProductID,ProductName,Category,Price
0,123,Laptop A,Electronics,999.99
2,456,Smartphone X,Electronics,599.99
4,789,Headphones Y,Audio,149.99


Let me know if you'd like more advanced challenges or have any questions!