# Customer Purchase Data Exercises

This notebook contains a series of exercises based on a customer purchase dataset. The dataset was generated using the following code:

## Exercise 1: Loading the Data

### Task 1.1: Load from CSV

1. Load the data from `customer_purchase_data.csv` into a DataFrame. 
2. Display the first 5 rows using `.head()`.

### Task 1.2: Load from Excel

1. Load the data from `customer_purchase_data.xlsx` into a DataFrame. 
2. Display the column names of the DataFrame.

### Task 1.3: Load from JSON

1. Load the data from `customer_purchase_data.json` (each record is a JSON line) into a DataFrame.
2. Display the shape of the DataFrame.

In [None]:
#Your code here

## Exercise 2: Data Inspection and Overview

### Task 2.1: Inspecting the Data

1. Using the DataFrame loaded from CSV (or Excel/JSON), check the following:
   - Display the last 5 rows using `.tail()`.
   - Use `.info()` to see the data types and non-null counts.
   - Use `.describe()` to get summary statistics for the numerical columns.

### Task 2.2: Data Cleaning Overview

1. Check for missing values in each column and print the count of missing values.
2. List the unique values in the `ProductCategory` column.

In [None]:
# Your code here

## Exercise 3: Selection and Indexing

### Task 3.1: Column Selection

1. Select and print the `CustomerID`, `CustomerName`, and `Email` columns from the CSV DataFrame.

### Task 3.2: Row Selection and Filtering

1. Using boolean indexing, select all rows where `PurchaseAmount` is greater than 250.
2. Select rows where the `Country` is either 'USA' or 'Canada'.

### Task 3.3: Using `.loc` and `.iloc`

1. Use `.loc` to select the row with `CustomerID` 10 and display all columns (if `CustomerID` is not the index, use a condition).
2. Use `.iloc` to select the first 10 rows of the DataFrame.

In [None]:
#Your code here

## Exercise 4: DataFrame Operations

### Task 4.1: Sorting and Ordering

1. Sort the DataFrame by `Age` in ascending order and display the first 10 rows.
2. Sort the DataFrame by `PurchaseAmount` in descending order and display the first 10 rows.

### Task 4.2: Grouping and Aggregation

1. Group the data by `Country` and compute the average `PurchaseAmount` for each country.
2. Group the data by `ProductCategory` and compute the count and average `ReviewScore` for each category.

### Task 4.3: Applying Custom Functions

1. Create a new column `DiscountedAmount` that applies a 10% discount to the `PurchaseAmount` (if not null).
2. Create a custom function that categorizes a `ReviewScore` into `'Low'` (score 1-2), `'Medium'` (score 3), and `'High'` (score 4-5). Use `.apply()` to create a new column `ReviewCategory`.

In [None]:
# Your code here

## Exercise 5: Handling Missing Data

### Task 5.1: Detecting Missing Data

1. Check for missing values in the loaded DataFrame and print the total missing values per column.

### Task 5.2: Filling and Dropping Missing Data

1. Create a copy of the original DataFrame and fill missing values in numeric columns (e.g., `Age`, `PurchaseAmount`, `ReviewScore`) with the column median.
2. For the categorical columns, fill missing values using forward-fill.
3. Display the number of missing values after filling.
4. Create another copy of the original DataFrame and drop all rows that have any missing value. Compare the shapes of the two DataFrames.

In [None]:
#Your code here

## Exercise 6: Exporting the Processed Data

### Task 6.1: Save the Cleaned Data

1. Save the DataFrame with filled missing values (`df_filled`) into a new CSV file named `customer_purchase_data_cleaned.csv`.
2. Save the same DataFrame to an Excel file named `customer_purchase_data_cleaned.xlsx`.

### Task 6.2: Verify the Exports

1. Load the newly exported CSV file and display its info to confirm the export was successful.

In [None]:
#Your code here

## Summary

In this notebook, you have practiced the following tasks using the customer purchase data:

- **Data Loading:** Reading data from CSV, Excel, and JSON formats.
- **Data Inspection:** Viewing heads/tails, checking data types and missing values, and reviewing summary statistics.
- **Selection and Indexing:** Selecting specific columns, filtering rows, and using both `.loc` and `.iloc` for data selection.
- **DataFrame Operations:** Sorting, grouping/aggregating data, and applying custom functions to create new columns.
- **Handling Missing Data:** Detecting, filling, and dropping missing values.
- **Exporting Data:** Saving the processed data to new files and verifying the exports.
