# Module 01: Python for Data Science

**Estimated Time**: 45 minutes

## Learning Objectives

By the end of this module, you will:
- Master Python data structures essential for data science
- Use list comprehensions for efficient data transformation
- Work with lambda functions and functional programming
- Handle errors gracefully with try-except blocks
- Write clean, reusable functions for data analysis

## Prerequisites

- Basic Python knowledge (variables, loops, conditionals)
- Module 00 completed

---

## 1. Essential Data Structures

Python has four core data structures that you'll use constantly in data science.

### Lists: Ordered Collections

In [None]:
# Lists store ordered sequences of items
temperatures = [72, 68, 75, 71, 69]
cities = ["New York", "London", "Tokyo", "Paris"]
mixed = [1, "text", 3.14, True]  # Can contain different types

print("Temperatures:", temperatures)
print("First temperature:", temperatures[0])
print("Last temperature:", temperatures[-1])
print("First three:", temperatures[:3])

In [None]:
# Common list operations
numbers = [1, 2, 3, 4, 5]

# Add items
numbers.append(6)  # Add to end
numbers.insert(0, 0)  # Insert at position
print("After adding:", numbers)

# Remove items
numbers.remove(3)  # Remove specific value
last = numbers.pop()  # Remove and return last item
print("After removing:", numbers)
print("Popped value:", last)

# Useful methods
print("Length:", len(numbers))
print("Sum:", sum(numbers))
print("Max:", max(numbers))
print("Min:", min(numbers))

### Dictionaries: Key-Value Pairs

In [None]:
# Dictionaries map keys to values (like a real dictionary maps words to definitions)
student = {"name": "Alice", "age": 25, "grades": [85, 90, 88], "is_enrolled": True}

print("Student name:", student["name"])
print("Average grade:", sum(student["grades"]) / len(student["grades"]))

# Add or modify
student["major"] = "Data Science"
student["age"] = 26
print("\nUpdated student:", student)

In [None]:
# Dictionary methods
sales = {"Jan": 1000, "Feb": 1200, "Mar": 1100}

print("Keys:", sales.keys())
print("Values:", sales.values())
print("Items:", sales.items())

# Safe access with get()
print("\nApril sales:", sales.get("Apr", 0))  # Returns 0 if key doesn't exist

# Iterate over dictionary
print("\nMonthly sales:")
for month, amount in sales.items():
    print(f"{month}: ${amount}")

### Tuples: Immutable Sequences

In [None]:
# Tuples are like lists but cannot be modified (immutable)
coordinates = (40.7128, -74.0060)  # New York City lat/long
rgb_color = (255, 0, 128)

print("Latitude:", coordinates[0])
print("Longitude:", coordinates[1])

# Tuple unpacking
lat, lon = coordinates
print(f"\nLocation: {lat}, {lon}")


# Use case: Multiple return values
def get_stats(numbers):
    return min(numbers), max(numbers), sum(numbers) / len(numbers)


minimum, maximum, average = get_stats([1, 2, 3, 4, 5])
print(f"\nStats - Min: {minimum}, Max: {maximum}, Avg: {average}")

### Sets: Unique Collections

In [None]:
# Sets store unique values (no duplicates)
unique_ids = {1, 2, 3, 4, 5}
duplicates = [1, 2, 2, 3, 3, 3, 4]
unique = set(duplicates)  # Convert list to set

print("Original:", duplicates)
print("Unique:", unique)

# Set operations
customers_jan = {"Alice", "Bob", "Charlie"}
customers_feb = {"Bob", "Diana", "Eve"}

print("\nBoth months:", customers_jan & customers_feb)  # Intersection
print("Either month:", customers_jan | customers_feb)  # Union
print("Only Jan:", customers_jan - customers_feb)  # Difference

## 2. List Comprehensions

List comprehensions provide a concise way to create lists - essential for data transformations.

In [None]:
# Traditional approach
squares_traditional = []
for i in range(10):
    squares_traditional.append(i**2)

# List comprehension (more Pythonic)
squares = [i**2 for i in range(10)]

print("Traditional:", squares_traditional)
print("Comprehension:", squares)
print("Same result:", squares_traditional == squares)

In [None]:
# With conditions
temperatures = [72, 65, 80, 55, 90, 68]

# Only temperatures above 70
warm_days = [temp for temp in temperatures if temp > 70]
print("Warm days:", warm_days)

# Convert Fahrenheit to Celsius
celsius = [(temp - 32) * 5 / 9 for temp in temperatures]
print("Celsius:", [f"{c:.1f}" for c in celsius])

In [None]:
# Dictionary comprehensions
numbers = [1, 2, 3, 4, 5]
squares_dict = {n: n**2 for n in numbers}
print("Squares dictionary:", squares_dict)

# Filter dictionary
sales = {"Jan": 1000, "Feb": 1200, "Mar": 900, "Apr": 1500}
good_months = {month: amount for month, amount in sales.items() if amount >= 1000}
print("Months with $1000+ sales:", good_months)

## 3. Functions and Lambda Expressions

Functions make code reusable and organized - critical for data science workflows.

In [None]:
# Regular functions
def calculate_mean(numbers):
    """
    Calculate the arithmetic mean of a list of numbers.

    Args:
        numbers (list): List of numeric values

    Returns:
        float: The mean value
    """
    return sum(numbers) / len(numbers)


def calculate_statistics(numbers):
    """
    Calculate min, max, and mean of numbers.
    """
    return {
        "min": min(numbers),
        "max": max(numbers),
        "mean": calculate_mean(numbers),
        "range": max(numbers) - min(numbers),
    }


data = [23, 45, 67, 34, 89, 12, 56]
stats = calculate_statistics(data)

print("Statistics:")
for key, value in stats.items():
    print(f"  {key}: {value}")

In [None]:
# Default arguments
def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"


print(greet("Alice"))  # Uses default
print(greet("Bob", "Hi"))  # Custom greeting


# Keyword arguments
def describe_data(data, precision=2, currency="$"):
    mean = sum(data) / len(data)
    return f"Average: {currency}{mean:.{precision}f}"


sales = [100, 150, 200]
print(describe_data(sales))
print(describe_data(sales, precision=0))
print(describe_data(sales, currency="‚Ç¨", precision=1))

In [None]:
# Lambda functions (anonymous functions)
# Useful for simple, one-time operations


# Regular function
def square(x):
    return x**2


# Equivalent lambda
square_lambda = lambda x: x**2

print("Regular:", square(5))
print("Lambda:", square_lambda(5))

# Common use: with map(), filter(), sorted()
numbers = [1, 2, 3, 4, 5]

# map(): Apply function to each item
squared = list(map(lambda x: x**2, numbers))
print("\nSquared:", squared)

# filter(): Keep items that match condition
evens = list(filter(lambda x: x % 2 == 0, numbers))
print("Evens:", evens)

# sorted(): Sort with custom key
students = [("Alice", 85), ("Bob", 92), ("Charlie", 78)]
sorted_by_grade = sorted(students, key=lambda x: x[1], reverse=True)
print("\nTop students:", sorted_by_grade)

## 4. Error Handling

Real-world data is messy. Handle errors gracefully to prevent crashes.

In [None]:
# Without error handling
def risky_division(a, b):
    return a / b


# This works
print("10 / 2 =", risky_division(10, 2))

# This crashes!
# print(risky_division(10, 0))  # ZeroDivisionError

In [None]:
# With error handling
def safe_division(a, b):
    try:
        result = a / b
        return result
    except ZeroDivisionError:
        print("Error: Cannot divide by zero!")
        return None
    except TypeError:
        print("Error: Invalid types for division")
        return None


print("10 / 2 =", safe_division(10, 2))
print("10 / 0 =", safe_division(10, 0))
print("10 / 'a' =", safe_division(10, "a"))

In [None]:
# try-except-else-finally
def process_data(numbers):
    try:
        mean = sum(numbers) / len(numbers)
    except ZeroDivisionError:
        print("Error: Empty list!")
        mean = None
    except TypeError:
        print("Error: Invalid data type!")
        mean = None
    else:
        print("Success: Calculation completed")
    finally:
        print("Processing finished\n")

    return mean


print("Result:", process_data([1, 2, 3, 4, 5]))
print("Result:", process_data([]))
print("Result:", process_data("not a list"))

## 5. File I/O Basics

Data science starts with loading data from files.

In [None]:
# Writing to a file
sales_data = [
    "Date,Product,Sales",
    "2024-01-01,Widget,100",
    "2024-01-02,Gadget,150",
    "2024-01-03,Widget,120",
]

# Write to file
with open("temp_sales.csv", "w") as file:
    for line in sales_data:
        file.write(line + "\n")

print("File written successfully!")

In [None]:
# Reading from a file
with open("temp_sales.csv", "r") as file:
    lines = file.readlines()

print("File contents:")
for line in lines:
    print(line.strip())  # strip() removes newline characters

In [None]:
# Parse CSV manually (we'll use Pandas for real work)
data = []
with open("temp_sales.csv", "r") as file:
    header = file.readline().strip().split(",")
    for line in file:
        values = line.strip().split(",")
        row_dict = dict(zip(header, values))
        data.append(row_dict)

print("\nParsed data:")
for row in data:
    print(row)

# Clean up
import os

os.remove("temp_sales.csv")

## 6. Exercises

Practice what you've learned!

In [None]:
# Exercise 1: Data Structure Manipulation
# TODO: Create a dictionary of stock prices with at least 5 stocks
# TODO: Calculate the average stock price
# TODO: Find the most expensive stock
# TODO: Create a list of stocks that cost more than $100

# Your code here
stocks = {}

# Solution will be provided separately

In [None]:
# Exercise 2: List Comprehensions
# TODO: Given a list of temperatures in Celsius, convert to Fahrenheit
# TODO: Filter to only show temperatures above 70¬∞F

celsius_temps = [20, 25, 30, 15, 35, 10]

# Formula: F = C * 9/5 + 32
# Your code here

In [None]:
# Exercise 3: Function Writing
# TODO: Write a function that takes a list of numbers and returns:
#   - Count of numbers
#   - Sum of numbers
#   - Mean (average)
#   - Median (middle value)
#   - Standard deviation (bonus!)


def analyze_numbers(numbers):
    # Your code here
    pass


# Test with: [10, 20, 30, 40, 50]
# Expected mean: 30

In [None]:
# Exercise 4: Error Handling
# TODO: Write a function that safely converts a string to int
# TODO: Return the integer if successful, or 0 if conversion fails
# TODO: Test with: "123", "abc", "45.6", None


def safe_int_conversion(value):
    # Your code here
    pass

## 7. Key Takeaways

Excellent work completing Module 01! Here's what you mastered:

‚úì **Data structures**: Lists, dictionaries, tuples, and sets  
‚úì **List comprehensions**: Concise data transformations  
‚úì **Functions**: Reusable code with parameters and return values  
‚úì **Lambda functions**: Quick, anonymous functions  
‚úì **Error handling**: Graceful failure with try-except  
‚úì **File I/O**: Reading and writing data files  

## Next Steps

**Next Module**: `02_numpy_fundamentals.ipynb`

In Module 02, you'll learn NumPy - the foundation of numerical computing in Python!

---

**Keep practicing!** The more you code, the more natural it becomes. üêç