# 🐼 Pandas Fundamentals - Your Data Superpower!

## What is Pandas?

**Pandas** is like Excel on steroids! 💪 It's a Python library that makes working with data super easy and powerful.

### 📊 DataFrames vs Excel

| Excel | Pandas DataFrame |
|-------|------------------|
| Click and drag | Write code |
| Manual work | Automated |
| Slow with big data | Lightning fast ⚡ |
| Limited to ~1M rows | Handle millions easily |
| Hard to reproduce | Repeatable scripts |

### Why Learn Pandas?

- 📈 Analyze data like a pro
- 🔄 Clean messy data automatically
- 📊 Create powerful visualizations
- 🚀 Process huge datasets quickly

**Time to dive in!** 🏊‍♂️

## 📦 Import Pandas

First, we need to import the Pandas library. The standard convention is to import it as `pd`:

In [None]:
import pandas as pd
import numpy as np  # NumPy often works hand-in-hand with Pandas

print(f"Pandas version: {pd.__version__}")
print("🎉 Pandas is ready to rock!")

---

## 📝 Part 1: Series - Your First Data Structure

### What is a Series?

A **Series** is like a single column in Excel - a one-dimensional array with labels (index).

Think of it as a list with superpowers! 🦸‍♀️

```
Index    Values
  0   →  Apple
  1   →  Banana
  2   →  Cherry
```

In [None]:
# Create a Series from a list
fruits = pd.Series(['Apple', 'Banana', 'Cherry', 'Durian', 'Elderberry'])
print("🍎 Fruits Series:")
print(fruits)
print()

# Create a Series from a dictionary (custom index)
prices = pd.Series({
    'Apple': 2.50,
    'Banana': 1.20,
    'Cherry': 4.00,
    'Durian': 15.00,
    'Elderberry': 8.50
})
print("💰 Prices Series:")
print(prices)
print()

# Access elements
print(f"The price of a Banana is: ${prices['Banana']}")

### 🎯 YOUR TURN: Create a Series of Your Favorite Foods

**TODO:** Create a Series called `my_foods` with at least 5 of your favorite foods. Try using a dictionary to give each food a rating from 1-10!

Example structure:
```python
my_foods = pd.Series({
    'Pizza': 10,
    'Sushi': 9,
    # Add your own foods here!
})
```

In [None]:
# YOUR CODE HERE
# Create your my_foods Series



---

## 📊 Part 2: DataFrames - The Main Event!

### What is a DataFrame?

A **DataFrame** is like an entire Excel spreadsheet - a 2D table with rows and columns!

It's basically multiple Series stacked together. 🥞

```
       Name    Age  Grade
  0    Alice   20    A
  1    Bob     21    B
  2    Carol   19    A+
```

Each column is a Series, and together they make a powerful DataFrame!

In [None]:
# Create a DataFrame from a dictionary
students = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve', 'Frank'],
    'Age': [20, 21, 19, 22, 20, 23],
    'Grade': ['A', 'B', 'A+', 'B+', 'A', 'A-'],
    'Score': [92, 85, 98, 88, 94, 90]
})

print("👨‍🎓 Student DataFrame:")
print(students)

### 🔍 Exploring Your DataFrame

Pandas gives us powerful methods to understand our data:

In [None]:
# .head() - See the first few rows (default 5)
print("📋 First 3 students:")
print(students.head(3))
print()

# .info() - Get information about the DataFrame
print("ℹ️ DataFrame Info:")
students.info()
print()

# .describe() - Statistical summary of numeric columns
print("📊 Statistical Summary:")
print(students.describe())

### 🎯 YOUR TURN: Create Your Own Product DataFrame

**TODO:** Create a DataFrame called `products` with information about 5 products from your favorite store.

Include these columns:
- `name`: Product name (string)
- `price`: Price in dollars (float)
- `quantity`: How many in stock (integer)

Then display it and use `.describe()` to see the statistics!

In [None]:
# YOUR CODE HERE
# Create your products DataFrame



---

## 📁 Part 3: Reading CSV Files - Real-World Data

### Loading External Data

In real life, data often comes from CSV files (Comma-Separated Values). Pandas makes reading them super easy!

Let's create some sample data and show how CSV loading works:

In [None]:
# First, let's create a sample DataFrame
sales_data = pd.DataFrame({
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones'],
    'Category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories'],
    'Price': [899.99, 24.99, 79.99, 299.99, 149.99],
    'Units_Sold': [50, 200, 120, 75, 150]
})

print("🛒 Sales Data:")
print(sales_data)
print()

# In real scenarios, you would read from a file like this:
# df = pd.read_csv('sales_data.csv')
# df = pd.read_csv('https://example.com/data.csv')  # Can even read from URLs!

print("💡 Tip: Use pd.read_csv('filename.csv') to load real CSV files!")

---

## 🎯 Part 4: Selecting Data - Finding What You Need

The real power of Pandas comes from selecting and filtering data!

In [None]:
# Select a single column (returns a Series)
print("📦 Just the Product names:")
print(sales_data['Product'])
print()

# Select multiple columns (returns a DataFrame)
print("💰 Product and Price:")
print(sales_data[['Product', 'Price']])
print()

# Select rows by position with .loc[] (label-based)
print("🔢 First three rows:")
print(sales_data.loc[0:2])
print()

# Conditional filtering - THIS IS POWERFUL! 💪
print("🔍 Products with price > $100:")
expensive_products = sales_data[sales_data['Price'] > 100]
print(expensive_products)
print()

# Multiple conditions
print("🎯 Electronics products with price > $200:")
premium_electronics = sales_data[
    (sales_data['Category'] == 'Electronics') & 
    (sales_data['Price'] > 200)
]
print(premium_electronics)

### 🎯 YOUR TURN: Filter the Data

**TODO:** Using the `sales_data` DataFrame above, filter it to show only items with price > $50.

Store the result in a variable called `affordable_items` and print it.

**Bonus Challenge:** Try filtering for items that sold more than 100 units!

In [None]:
# YOUR CODE HERE
# Filter for items with price > 50



---

## 🧮 Part 5: Basic Operations - Data Manipulation Magic

Now let's do some real analysis!

In [None]:
# Add a new calculated column
sales_data['Revenue'] = sales_data['Price'] * sales_data['Units_Sold']

print("💵 Sales Data with Revenue Column:")
print(sales_data)
print()

# Group by category and calculate total revenue
print("📊 Total Revenue by Category:")
category_revenue = sales_data.groupby('Category')['Revenue'].sum()
print(category_revenue)
print()

# Sort by revenue (descending)
print("🏆 Products sorted by Revenue (highest first):")
sorted_sales = sales_data.sort_values('Revenue', ascending=False)
print(sorted_sales)
print()

# Quick statistics
print(f"💰 Total Revenue: ${sales_data['Revenue'].sum():,.2f}")
print(f"📈 Average Price: ${sales_data['Price'].mean():.2f}")
print(f"🔝 Best Seller: {sales_data.loc[sales_data['Units_Sold'].idxmax(), 'Product']}")

### 🎯 YOUR TURN: Calculate Total Revenue

**TODO:** Go back to your `products` DataFrame from earlier.

1. Add a new column called `total_value` that calculates the total value (price × quantity) for each product
2. Find the product with the highest total value
3. Calculate the sum of all total values

**Hint:** Use the same techniques you saw above!

```python
# Example structure:
products['total_value'] = products['price'] * products['quantity']
```

In [None]:
# YOUR CODE HERE
# Add total_value column and analyze your products



---

## 🎉 Summary - You're a Pandas Pro Now!

### What We Learned Today:

✅ **Series** - One-dimensional labeled arrays  
✅ **DataFrames** - Two-dimensional tables (the real MVP!)  
✅ **Reading Data** - Import from CSV files  
✅ **Selecting Data** - Columns, rows, and filtering  
✅ **Operations** - Calculations, grouping, and sorting  

### Key Methods to Remember:

| Method | What It Does |
|--------|-------------|
| `.head()` | Preview first rows |
| `.info()` | Get DataFrame info |
| `.describe()` | Statistical summary |
| `.loc[]` | Select rows by label |
| `.groupby()` | Group data for analysis |
| `.sort_values()` | Sort by column |

### 🚀 Next Steps:

- 📈 **Data Visualization** - Create amazing charts with matplotlib/seaborn
- 🧹 **Data Cleaning** - Handle missing values, duplicates
- 🔗 **Merging Data** - Combine multiple DataFrames
- ⚡ **Advanced Operations** - Pivot tables, time series

### 💡 Pro Tips:

1. Always use `.head()` to preview your data first
2. Check for missing values with `.isnull().sum()`
3. Use meaningful column names (no spaces!)
4. Save your work: `df.to_csv('output.csv', index=False)`

---

## 🎓 Challenge Yourself!

Try combining everything you learned:

1. Create a DataFrame of 10 movies with: title, genre, rating (1-10), year
2. Filter for movies rated above 7
3. Group by genre and find average rating per genre
4. Find the highest-rated movie

**You've got this!** 💪🐼

In [None]:
# BONUS CHALLENGE SPACE
# Try the movie DataFrame challenge here!

