# Python Data Structures for Data Analysis - Part 1

## Week 1, Day 2 (Thursday) - April 10th, 2025

### Overview
Understanding data structures is essential for effective data analysis in Python. Today, we'll explore the fundamental Python data structures and how they're used in data analysis workflows.

### Learning Objectives
- Understand and work with lists, tuples, and dictionaries
- Learn indexing and slicing operations
- Master list and dictionary comprehensions
- Apply these concepts to real-world data examples

### Prerequisites
- Basic Python syntax (covered in Day 1)
- Google Colab environment setup

## 1. Lists in Python

Lists are one of the most versatile data structures in Python. They are ordered, mutable, and can contain elements of different types.

### Creating Lists

In [None]:
# Basic list creation
empty_list = []
numbers = [1, 2, 3, 4, 5]
mixed_list = [1, "hello", 3.14, True]
nested_list = [1, [2, 3], [4, [5, 6]]]

print("Empty list:", empty_list)
print("Numbers:", numbers)
print("Mixed types:", mixed_list)
print("Nested list:", nested_list)

### List Indexing and Slicing

Python uses zero-based indexing, meaning the first element is at index 0.

In [None]:
# Sample list for demonstration
fruits = ["apple", "banana", "cherry", "durian", "elderberry"]

# Single element access
print("First fruit:", fruits[0])  # apple
print("Last fruit:", fruits[-1])  # elderberry

# Slicing: list[start:end:step]
print("First three fruits:", fruits[0:3])  # ["apple", "banana", "cherry"]
print("Every other fruit:", fruits[::2])   # ["apple", "cherry", "elderberry"]
print("Reversed list:", fruits[::-1])      # ["elderberry", "durian", "cherry", "banana", "apple"]

### Common List Methods

Lists provide many built-in methods for manipulation.

In [None]:
# Let's start with a basic list
data = [3, 1, 4, 1, 5, 9]

# Adding elements
data.append(2)       # Add to the end
print("After append:", data)  # [3, 1, 4, 1, 5, 9, 2]

data.insert(2, 7)    # Insert at index 2
print("After insert:", data)  # [3, 1, 7, 4, 1, 5, 9, 2]

# Removing elements
data.remove(1)       # Remove first occurrence of 1
print("After remove:", data)  # [3, 7, 4, 1, 5, 9, 2]

popped = data.pop()  # Remove and return last element
print("Popped value:", popped, "List after pop:", data)  # Popped: 2, List: [3, 7, 4, 1, 5, 9]

# Other common operations
data.sort()          # Sort in place
print("Sorted list:", data)  # [1, 3, 4, 5, 7, 9]

data.reverse()       # Reverse in place
print("Reversed list:", data)  # [9, 7, 5, 4, 3, 1]

# Finding elements
print("Index of 5:", data.index(5))  # 2
print("Count of 7:", data.count(7))  # 1

### List Comprehensions

List comprehensions provide a concise way to create lists based on existing lists.

In [None]:
# Basic syntax: [expression for item in iterable if condition]

# Create a list of squares from 0 to 9
squares = [x**2 for x in range(10)]
print("Squares:", squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Only even numbers
even_squares = [x**2 for x in range(10) if x % 2 == 0]
print("Even squares:", even_squares)  # [0, 4, 16, 36, 64]

# Multiple if conditions
filtered = [x for x in range(100) if x % 3 == 0 if x % 5 == 0]
print("Multiples of 3 and 5:", filtered)  # [0, 15, 30, 45, 60, 75, 90]

# If-else in list comprehension
parity = ["even" if x % 2 == 0 else "odd" for x in range(5)]
print("Parity strings:", parity)  # ['even', 'odd', 'even', 'odd', 'even']

# Nested list comprehension (creating a 3x3 matrix)
matrix = [[i*3 + j + 1 for j in range(3)] for i in range(3)]
print("Matrix:")
for row in matrix:
    print(row)

### Application to Data Analysis

Let's see how lists can be applied to real-world data analysis scenarios.

In [None]:
# Sample sales data
sales_data = [
    ["2025-01-01", "Product A", 150, 10.99],
    ["2025-01-02", "Product B", 200, 20.49],
    ["2025-01-02", "Product A", 100, 10.99],
    ["2025-01-03", "Product C", 50, 5.99],
    ["2025-01-03", "Product B", 300, 20.49],
    ["2025-01-03", "Product A", 200, 10.99],
]

# Calculate total revenue per product
product_revenues = {}

for date, product, quantity, price in sales_data:
    if product not in product_revenues:
        product_revenues[product] = 0
    product_revenues[product] += quantity * price

print("Product Revenues:")
for product, revenue in product_revenues.items():
    print(f"{product}: ${revenue:.2f}")

# Find the day with highest sales using list comprehension
daily_revenues = {}
for date, _, quantity, price in sales_data:
    if date not in daily_revenues:
        daily_revenues[date] = 0
    daily_revenues[date] += quantity * price

# Get day with maximum revenue
best_day = max(daily_revenues.items(), key=lambda x: x[1])
print(f"\nBest sales day: {best_day[0]} with ${best_day[1]:.2f} in revenue")

## 2. Tuples in Python

Tuples are similar to lists but they are immutable (cannot be changed after creation). They are often used for data that shouldn't change.

In [None]:
# Creating tuples
empty_tuple = ()
single_item = (1,)  # Note the comma - needed for single-item tuples
coordinates = (10.5, 20.8)
person = ("John", "Doe", 30, "Analyst")

print("Empty tuple:", empty_tuple)
print("Single item tuple:", single_item)
print("Coordinates:", coordinates)
print("Person:", person)

# Tuple unpacking
x, y = coordinates
print(f"X coordinate: {x}, Y coordinate: {y}")

first_name, last_name, age, role = person
print(f"{first_name} {last_name} is a {age}-year-old {role}")

# Tuples are immutable
try:
    coordinates[0] = 15.0  # This will raise an error
except TypeError as e:
    print(f"Error: {e}")

### Tuple Methods and Operations

Tuples have fewer methods than lists because they are immutable.

In [None]:
# Sample tuple
data = (3, 1, 4, 1, 5, 9, 2, 6, 5)

# Counting and indexing
print(f"Count of 5: {data.count(5)}")
print(f"Index of first 5: {data.index(5)}")

# Slicing works just like with lists
print(f"First three elements: {data[:3]}")
print(f"Last three elements: {data[-3:]}")
print(f"Every other element: {data[::2]}")

# Concatenation
more_data = data + (5, 3, 5)
print(f"Concatenated tuple: {more_data}")

# Tuple unpacking in a for loop
student_records = [
    ("Alice", "Smith", 3.9),
    ("Bob", "Johnson", 3.7),
    ("Charlie", "Williams", 4.0)
]

print("\nStudent GPAs:")
for first, last, gpa in student_records:
    print(f"{first} {last}: {gpa}")

### When to Use Tuples in Data Analysis

Tuples are particularly useful in data analysis for:
- Representing record structures with fixed fields
- Storing data that should not change (like dates, coordinates)
- Dictionary keys (lists cannot be used as keys)
- Function return values with multiple elements

In [None]:
# Using tuples to represent geographic data points
city_data = [
    ("New York", (40.7128, -74.0060), 8.4),  # (name, (lat, long), population in millions)
    ("Tokyo", (35.6762, 139.6503), 13.9),
    ("Paris", (48.8566, 2.3522), 2.2),
    ("London", (51.5074, -0.1278), 8.9),
    ("Beijing", (39.9042, 116.4074), 21.5)
]

# Calculate the average population
total_population = sum(city[2] for city in city_data)
avg_population = total_population / len(city_data)
print(f"Average population: {avg_population:.1f} million")

# Find the northernmost city
northernmost = max(city_data, key=lambda city: city[1][0])  # city[1][0] is latitude
print(f"Northernmost city: {northernmost[0]} at latitude {northernmost[1][0]}°")

# Function returning multiple values as a tuple
def analyze_city_data(city_list):
    populations = [city[2] for city in city_list]
    return (min(populations), max(populations), sum(populations)/len(populations))

min_pop, max_pop, avg_pop = analyze_city_data(city_data)
print(f"Population stats (in millions) - Min: {min_pop}, Max: {max_pop}, Avg: {avg_pop:.1f}")

## Summary - Part 1

In this first part of our data structures lesson, we've explored:

1. **Lists** - Mutable, ordered collections that can hold mixed data types
   - Creation and basic operations
   - Indexing and slicing
   - Common methods (append, insert, remove, etc.)
   - List comprehensions

2. **Tuples** - Immutable, ordered collections
   - Creation and basic operations
   - Unpacking and multiple returns
   - Applications in data analysis

In Part 2, we'll continue with dictionaries, sets, and more advanced data structure concepts relevant to data analysis.