# 🎮 Welcome to Data Science Sandbox!

## Level 1: Data Explorer - Getting Started

Welcome to your data science journey! This notebook will introduce you to the fundamentals of working with data using Python. 

### What You'll Learn:
- Loading and exploring datasets
- Basic data manipulation with pandas
- Understanding data types and structures
- Creating your first visualizations

### Game Elements:
- 🎯 Complete exercises to earn XP
- 🏆 Unlock achievement badges
- 📈 Track your progress through levels

Let's get started!

## 📚 Section 1: Setting Up Your Environment

First, let's import the essential libraries you'll use throughout your data science journey.

In [None]:
# Essential data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Configure display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

# Set up plotting
plt.rcParams['figure.figsize'] = (10, 6)
sns.set_style("whitegrid")

print("✅ Environment setup complete!")
print(f"📊 Pandas version: {pd.__version__}")
print(f"🔢 NumPy version: {np.__version__}")

## 🎯 Challenge 1: Your First Dataset

Let's start with a simple dataset to get familiar with pandas basics.

In [None]:
# Load your first dataset
df_simple = pd.read_csv('../data/datasets/simple_data.csv')

# Display the data
print("🎉 Your first dataset:")
print(df_simple)

In [None]:
# Explore the dataset structure
print("📊 Dataset Info:")
print(f"Shape: {df_simple.shape}")
print(f"Columns: {list(df_simple.columns)}")
print(f"\nData Types:")
print(df_simple.dtypes)

### 🎯 Exercise 1.1: Basic Operations

Try these operations on your dataset:

In [None]:
# TODO: Find the average age
avg_age = df_simple['age'].mean()
print(f"Average age: {avg_age}")

# TODO: Find the highest score
max_score = df_simple['score'].max()
print(f"Highest score: {max_score}")

# TODO: Count unique cities
unique_cities = df_simple['city'].nunique()
print(f"Number of unique cities: {unique_cities}")

print("\n🎉 Great job! You've completed your first data analysis!")

## 🎯 Challenge 2: Working with Real Data

Now let's work with a more realistic dataset - sales data!

In [None]:
# Load the sales dataset
df_sales = pd.read_csv('../data/datasets/sample_sales.csv')

# Quick overview
print("📈 Sales Dataset Overview:")
print(f"Records: {len(df_sales):,}")
print(f"Columns: {len(df_sales.columns)}")
print("\nFirst 5 rows:")
df_sales.head()

In [None]:
# Explore the data structure
print("📊 Dataset Information:")
df_sales.info()

print("\n📈 Statistical Summary:")
df_sales.describe()

### 🎯 Exercise 2.1: Data Quality Check

Always check your data quality first!

In [None]:
# Check for missing values
print("🔍 Missing Values Check:")
missing_data = df_sales.isnull().sum()
print(missing_data[missing_data > 0])

if missing_data.sum() > 0:
    print(f"\n⚠️ Found {missing_data.sum()} missing values across the dataset")
else:
    print("\n✅ No missing values found!")

## 📊 Challenge 3: Your First Visualizations

Data visualization is crucial for understanding patterns in your data.

In [None]:
# Create a histogram of sales amounts
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(df_sales['sales'], bins=30, alpha=0.7, color='skyblue', edgecolor='black')
plt.title('Distribution of Sales Amounts')
plt.xlabel('Sales Amount ($)')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)

# Create a bar chart of sales by region
plt.subplot(1, 2, 2)
sales_by_region = df_sales.groupby('region')['sales'].sum().sort_values(ascending=False)
bars = plt.bar(sales_by_region.index, sales_by_region.values, color='lightgreen', edgecolor='black')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height + height*0.01,
             f'${height:,.0f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

print("🎉 Excellent! You've created your first data visualizations!")

### 🎯 Exercise 3.1: Create Your Own Visualization

Now it's your turn to create a visualization!

In [None]:
# TODO: Create a visualization of your choice
# Ideas: 
# - Sales by category
# - Quantity vs Sales scatter plot  
# - Customer satisfaction distribution

# Your code here:
plt.figure(figsize=(10, 6))

# Example: Sales by category
sales_by_category = df_sales.groupby('category')['sales'].sum().sort_values(ascending=True)
plt.barh(sales_by_category.index, sales_by_category.values, color='coral')
plt.title('Total Sales by Product Category')
plt.xlabel('Total Sales ($)')
plt.ylabel('Category')

# Add value labels
for i, v in enumerate(sales_by_category.values):
    plt.text(v + v*0.01, i, f'${v:,.0f}', va='center')

plt.tight_layout()
plt.show()

print("🏆 Outstanding work! You're becoming a data visualization expert!")

## 🎯 Final Challenge: Data Insights

Let's answer some business questions using our data!

In [None]:
print("🔍 Business Insights from Sales Data:")
print("=" * 50)

# 1. Which region generates the most revenue?
top_region = df_sales.groupby('region')['sales'].sum().idxmax()
top_revenue = df_sales.groupby('region')['sales'].sum().max()
print(f"💰 Top revenue region: {top_region} (${top_revenue:,.2f})")

# 2. What's the average order size?
avg_order = df_sales['sales'].mean()
print(f"📊 Average order value: ${avg_order:.2f}")

# 3. Which category has the highest average satisfaction?
satisfaction_by_category = df_sales.groupby('category')['customer_satisfaction'].mean().sort_values(ascending=False)
print(f"😊 Highest satisfaction category: {satisfaction_by_category.index[0]} ({satisfaction_by_category.iloc[0]:.2f}/5)")

# 4. How many unique customers do we have? (approximated by sales reps)
unique_reps = df_sales['sales_rep'].nunique()
print(f"👥 Number of sales representatives: {unique_reps}")

print("\n🎉 Congratulations! You've completed Level 1: Data Explorer!")
print("🚀 Ready to move on to Level 2: Analytics Apprentice?")

## 🏆 Level 1 Complete!

### What You've Accomplished:
- ✅ Loaded and explored datasets
- ✅ Performed basic data analysis operations
- ✅ Created your first data visualizations
- ✅ Generated business insights from data

### Skills Unlocked:
- 🐼 **Pandas Basics**: DataFrames, series, basic operations
- 📊 **Data Exploration**: head(), info(), describe(), shape
- 📈 **Basic Visualization**: histograms, bar charts, customization
- 🔍 **Data Quality**: checking for missing values

### Next Level Preview - Level 2: Analytics Apprentice
- Data cleaning and preprocessing
- Advanced data manipulation
- Statistical analysis
- More complex visualizations

### 🎯 Achievement Unlocked: "First Steps" Badge!

Keep going - your data science journey is just beginning! 🚀