# Part 1: Python & Pandas Foundations
## CSC 2053 - Lab 12

Welcome to data science with Python! This week, let's dive into **RadioLand sports radio data** to learn how those in computer science and related fields analyze datasets.

**What You'll Learn:**
- Advanced Python concepts (list comprehensions, lambdas)
- Working with NumPy for numerical computing
- Introduction to Pandas for data analysis
- Loading and exploring real datasets

**Dataset:** Sports radio affiliates (MLB, NHL, NFL, NBA)

---
## Part 1: Advanced Python Concepts

Before we dive into data science libraries, let's cover some powerful Python features you'll use constantly.

### List Comprehensions

**What are they?** List comprehensions are a Pythonic way to create new lists by transforming or filtering existing lists.

**When to use them:** 
- When you need to transform each item in a list (e.g., converting units, formatting data)
- When filtering lists based on conditions
- When you want cleaner, more readable code than traditional for loops

**Why they matter in data science:** You'll constantly transform and filter data - list comprehensions make this fast and readable.

In [None]:
# Traditional way
frequencies = [88.5, 90.1, 93.3, 97.5, 101.1]
rounded = []
for freq in frequencies:
    rounded.append(int(freq))
print("Traditional:", rounded)

# List comprehension way (one line!)
rounded = [int(freq) for freq in frequencies]
print("Comprehension:", rounded)

In [None]:
# With conditions
teams = ["Eagles", "Phillies", "76ers", "Flyers", "Union"]

# Only teams with 'e' in the name
teams_with_e = [team for team in teams if 'e' in team.lower()]
print(teams_with_e)

### Exercise 1.1: List Comprehensions

Given a list of radio callsigns, create a new list containing only callsigns that start with 'W'.

In [None]:
callsigns = ["WXPN", "WMMR", "KYW", "WYSP", "KROQ", "WDAS"]

# YOUR CODE HERE
w_stations = 

print(w_stations)  # Should print: ['WXPN', 'WMMR', 'WYSP', 'WDAS']

### Lambda Functions

**What are they?** Lambda functions are small, anonymous functions defined in a single line without using `def`.

**When to use them:**
- When you need a simple function for a short time (sorting, filtering, mapping)
- As arguments to functions like `sorted()`, `map()`, `filter()`
- When defining a full function would be overkill

**Why they matter in data science:** Pandas and data analysis tools frequently need small transformation functions - lambdas keep your code concise.

In [None]:
# Traditional function
def double(x):
    return x * 2

# Lambda (one-liner)
double_lambda = lambda x: x * 2

print(double(5))
print(double_lambda(5))

In [None]:
# Useful with map() and filter()
powers = [1, 5, 10, 50, 100]  # kW

# Convert to watts
watts = list(map(lambda kw: kw * 1000, powers))
print("Watts:", watts)

# Filter for high power (>= 50 kW)
high_power = list(filter(lambda kw: kw >= 50, powers))
print("High power:", high_power)

### Exercise 1.2: Lambda Functions

Use a lambda function with `sorted()` to sort stations by frequency.

In [None]:
stations = [
    {"callsign": "WXPN", "frequency": 88.5},
    {"callsign": "WMMR", "frequency": 93.3},
    {"callsign": "WYSP", "frequency": 94.1}
]

# YOUR CODE HERE - sort by frequency
sorted_stations = sorted(stations, key=lambda s: s["frequency"])

for s in sorted_stations:
    print(f"{s['callsign']}: {s['frequency']}")

---
## Part 2: Introduction to NumPy

**What is NumPy?** NumPy (Numerical Python) is the foundation of scientific computing in Python. It provides fast operations on arrays of numbers.

**Why use NumPy instead of Python lists?**
- **Speed:** NumPy operations are 10-100x faster than Python loops
- **Convenience:** Built-in mathematical and statistical functions
- **Vectorization:** Operate on entire arrays at once, not element-by-element

**When to use NumPy:**
- Mathematical operations on large datasets
- Statistical calculations (mean, standard deviation, etc.)
- Scientific computing and numerical analysis
- As the foundation for Pandas, Matplotlib, and other data science libraries

In [None]:
import numpy as np

print(f"NumPy version: {np.__version__}")

### NumPy Arrays vs Python Lists

In [None]:
# Python list
py_list = [1, 2, 3, 4, 5]

# NumPy array
np_array = np.array([1, 2, 3, 4, 5])

print("Python list:", py_list)
print("NumPy array:", np_array)
print("Type:", type(np_array))

In [None]:
# NumPy operations are vectorized (fast!)
frequencies = np.array([88.5, 90.1, 93.3, 97.5, 101.1])

# Multiply all by 2 (one operation)
doubled = frequencies * 2
print("Doubled:", doubled)

# Statistics are built-in
print(f"Mean: {frequencies.mean():.2f}")
print(f"Std: {frequencies.std():.2f}")
print(f"Min: {frequencies.min()}")
print(f"Max: {frequencies.max()}")

### Exercise 2.1: NumPy Arrays

Create a NumPy array of station powers (kW) and calculate statistics.

In [None]:
# Station powers in kilowatts
powers_kw = [1.0, 5.5, 10.0, 50.0, 100.0, 0.25, 3.0]

# YOUR CODE HERE
# 1. Convert to NumPy array
powers = 

# 2. Calculate mean, median, and standard deviation
mean_power = 
median_power = 
std_power = 

print(f"Mean power: {mean_power:.2f} kW")
print(f"Median power: {median_power:.2f} kW")
print(f"Std deviation: {std_power:.2f} kW")

---
## Part 3: Introduction to Pandas

**What is Pandas?** Pandas is THE library for data analysis in Python. It provides DataFrames - think Excel spreadsheets, but way more powerful.

**Why use Pandas?**
- **DataFrames:** Work with tabular data (rows and columns) naturally
- **Built-in operations:** Filtering, sorting, grouping, merging - all easy
- **Handles messy data:** Missing values, different data types, dates - Pandas handles it all
- **Industry standard:** Used everywhere in data science, from startups to Fortune 500

**When to use Pandas:**
- Loading data from CSV, Excel, databases
- Cleaning and preprocessing data
- Exploratory data analysis
- Transforming data for visualization or machine learning

**Key concept:** If your data looks like a table, use Pandas.

In [None]:
import pandas as pd

print(f"Pandas version: {pd.__version__}")

### Creating DataFrames

In [None]:
# From a dictionary
data = {
    'callsign': ['WXPN', 'WMMR', 'WYSP'],
    'frequency': [88.5, 93.3, 94.1],
    'format': ['AAA', 'Classic Rock', 'Rock']
}

df = pd.DataFrame(data)
print(df)

In [None]:
# Basic DataFrame info
print("Shape:", df.shape)  # (rows, columns)
print("\nColumns:", df.columns.tolist())
print("\nData types:")
print(df.dtypes)

### Exercise 3.1: Create a DataFrame

Create a DataFrame of Villanova basketball stats.

In [None]:
# YOUR CODE HERE
# Create a DataFrame with these columns: player, points, rebounds, assists
# Add at least 3 players with their stats

nova_stats = pd.DataFrame({
    'player': [],  # Fill in
    'points': [],  # Fill in
    'rebounds': [],  # Fill in
    'assists': []  # Fill in
})

print(nova_stats)

---
## Part 4: Loading Real Data

Now let's work with real sports radio data!

### Choose Your Sport!

Pick your favorite sport and load that data.

In [None]:
# Choose one: "MLB", "NHL", "NFL", "NBA"
sport = "MLB"  # Change this to your favorite!

# Load from GitHub
url = f'https://raw.githubusercontent.com/CSC-2053-100-Fall25/python-datascience-template/main/{sport}.csv'
df = pd.read_csv(url)

print(f"✓ Loaded {sport} data!")
print(f"Dataset has {len(df)} stations")

### First Look at the Data

In [None]:
# First few rows
df.head()

In [None]:
# Dataset info
df.info()

In [None]:
# Statistical summary
df.describe()

### Exercise 4.1: Explore Your Data

Answer these questions about your dataset.

In [None]:
# YOUR CODE HERE

# 1. How many columns are in the dataset?
num_columns = 

# 2. What are all the column names?
column_names = 

# 3. How many stations are in the dataset?
num_stations = 

print(f"Columns: {num_columns}")
print(f"Column names: {column_names}")
print(f"Total stations: {num_stations}")

---
## Part 5: Basic Data Exploration

**Why explore data?** Before analyzing data, you need to understand what you're working with. Data exploration helps you:
- Understand the structure and contents
- Identify patterns and anomalies
- Form hypotheses to test
- Catch data quality issues early

**Key exploration techniques:**
- **Value counts:** See distributions of categorical data (formats, states, etc.)
- **Statistical summaries:** Understand numerical data (mean, median, range)
- **Column access:** Extract specific information for deeper analysis

**Data science principle:** Always explore before you analyze!

### Accessing Columns

In [None]:
# Get a single column
callsigns = df['callsign']
print("Type:", type(callsigns))
print("\nFirst 5 callsigns:")
print(callsigns.head())

In [None]:
# Get multiple columns
station_info = df[['callsign', 'frequency', 'city']]
print(station_info.head())

### Value Counts - What's in the Data?

In [None]:
# Most common formats
print("Top 10 formats:")
print(df['new_format'].value_counts().head(10))

In [None]:
# Countries represented
print("Stations by country:")
print(df['country'].value_counts())

### Exercise 5.1: Data Exploration

Explore different aspects of your dataset.

In [None]:
# YOUR CODE HERE

# 1. What are the top 5 most common owners?
top_owners = 

# 2. How many unique states/provinces are represented?
num_states = 

# 3. What's the average frequency of all stations?
avg_frequency = 

print("Top 5 owners:")
print(top_owners)
print(f"\nUnique states/provinces: {num_states}")
print(f"Average frequency: {avg_frequency:.2f} MHz")

---
## Putting It All Together

Create a comprehensive summary of your chosen sport's radio affiliate data.

In [None]:
# YOUR CODE HERE
# Create a summary report that includes:
# 1. Sport name and total stations
# 2. Frequency range (min to max)
# 3. Top 3 formats
# 4. Top 3 states/provinces by station count
# 5. Average station power (erp)

print(f"{'='*50}")
print(f"{sport} RADIO AFFILIATE SUMMARY")
print(f"{'='*50}")

# Your code here to print the summary


---
## Challenge Problem

**Advanced Data Summary**

Create a function that takes a DataFrame and returns a dictionary with:
- Total stations
- Countries represented (list)
- Most common format
- Highest power station (callsign and ERP)
- Percentage of stations in USA vs Canada

Test it on your sports data!

In [None]:
def analyze_dataset(df):
    """
    Create comprehensive summary statistics for a radio affiliate dataset.
    
    Parameters:
        df (DataFrame): Sports radio affiliate data
    
    Returns:
        dict: Summary statistics
    """
    # YOUR CODE HERE
    pass

# Test your function
summary = analyze_dataset(df)
print(summary)

---
## Wrap-Up

Congratulations! You've learned:

✅ **Advanced Python** - List comprehensions and lambda functions
✅ **NumPy basics** - Arrays and vectorized operations
✅ **Pandas foundations** - DataFrames, loading CSVs, basic exploration
✅ **Real data analysis** - Working with actual sports radio data

### Next Steps:
- **Lab 2:** Deep dive into Pandas for complex analysis
- **Lab 3:** Data visualization with Matplotlib and Seaborn
- **Lab 4:** Interactive mapping with Folium

### Key Takeaways:
- NumPy provides fast numerical operations
- Pandas makes working with tabular data easy
- Real datasets are messy but interesting!
- Data exploration is the first step in any analysis

