# Customer Data Cleaning

This notebook loads the raw customer data, applies cleaning transformations, and creates a cleaned dataset ready for analysis.

## Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
from clean_data import clean_data  # Import the clean_data DataFrame from our script

## Display the Cleaned Data

The `clean_data` DataFrame has been created with the following cleaning operations applied:
- Missing `time_spent` values replaced with the median
- Missing `pages_viewed` values replaced with the mean (converted to integer)
- Missing `basket_value` values replaced with 0
- Missing `device_type` values replaced with "Unknown"
- Missing `customer_type` values replaced with "New"
- All data types properly set according to specifications

In [None]:
# Display the first few rows of the cleaned data
print("First few rows of cleaned data:")
display(clean_data.head())

In [None]:
# Display information about the cleaned data
print("\nDataFrame info:")
clean_data.info()

In [None]:
# Display summary statistics
print("\nSummary statistics:")
display(clean_data.describe(include='all'))

In [None]:
# Check for any remaining missing values
print("\nMissing values in each column:")
print(clean_data.isnull().sum())

## Data Validation

Let's verify that all the data meets the specified criteria:

In [None]:
# Check data types
print("\nData types:")
print(clean_data.dtypes)

# Check unique values for categorical columns
print("\nUnique device types:", clean_data['device_type'].unique())
print("Unique customer types:", clean_data['customer_type'].unique())
print("Unique purchase values:", clean_data['purchase'].unique())

The cleaned data is now ready for analysis. The `clean_data` DataFrame contains all the required columns with the specified data types and missing value handling applied.