# Module 2: Data Exploration and Visualization

## Data Cleaning and Preprocessing
- Handling missing values
- Data types and conversions
- Outlier detection

## Exploratory Data Analysis (EDA)
- Summary statistics
- Correlation analysis

## Visualization Techniques
- Bar charts, histograms, scatter plots
- Using matplotlib/seaborn (Python)

## Practice Exercise
1. Load a sample marketing dataset and perform EDA.
2. Visualize the distribution of a key variable.

## Practice with Imaginary Data

Let's create a small imaginary marketing dataset for hands-on analytics practice. This will help you understand the structure and key variables before working with real data.

In [None]:
import pandas as pd
import numpy as np

# Create imaginary marketing data
data = {
    'ID': range(1, 21),
    'Age': np.random.randint(22, 60, 20),
    'Gender': np.random.choice(['Male', 'Female'], 20),
    'Income': np.random.randint(30000, 120000, 20),
    'Campaign_Contacted': np.random.choice([0, 1], 20, p=[0.3, 0.7]),
    'Response': np.random.choice([0, 1], 20, p=[0.7, 0.3]),
    'Purchases': np.random.randint(1, 10, 20)
}
df_imaginary = pd.DataFrame(data)
df_imaginary.head()

### Example: Calculate Key Metrics from Imaginary Data

In [None]:
# Total number of customers
num_customers = df_imaginary['ID'].nunique()
print(f'Total customers: {num_customers}')

# Average sales per customer
avg_sales = df_imaginary['Income'].mean()
print(f'Average sales (Income): {avg_sales:.2f}')

# Campaign response rate
response_rate = df_imaginary['Response'].mean() * 100
print(f'Campaign response rate: {response_rate:.2f}%')

# Average purchases per customer
avg_purchases = df_imaginary['Purchases'].mean()
print(f'Average purchases per customer: {avg_purchases:.2f}')

### Try it Yourself
- Calculate the percentage of customers contacted by the campaign.
- Find the average income by gender.
- Visualize the distribution of purchases using a histogram.