# Introduction to Python

**Overview**: Python is one of the most popular programming languages for data science, and it's the language we'll use throughout this baseball analytics series. This notebook covers the fundamentals you'll need — from variables and data types to functions and data structures.

**No prior programming experience required.** We'll start from the very beginning.

**What you'll learn:**
- Variables, data types, and basic operations
- Data structures (lists, dictionaries, tuples, sets)
- Control flow (if/else, loops, comprehensions)
- Writing reusable functions
- Putting it all together with baseball examples

## Table of Contents

1. [Variables & Data Types](#1-variables--data-types)
2. [Operators](#2-operators)
3. [Data Structures](#3-data-structures)
4. [Control Flow](#4-control-flow)
5. [Functions](#5-functions)
6. [Putting It All Together](#6-putting-it-all-together)

## 1. Variables & Data Types <a id="1-variables--data-types"></a>

In Python, a **variable** is a name that stores a value. Unlike some other languages, you don't need to declare a type — Python figures it out automatically.

Think of a variable like a jersey number — it's a label that points to something (in this case, data instead of a player).

In [None]:
# Creating variables -- just pick a name and use the = sign
player_name = "Shohei Ohtani"
home_runs = 54
batting_average = 0.310
is_mvp = True

# print() displays values to the screen
print(player_name)
print(home_runs)

### The Four Basic Data Types

Python has four fundamental data types you'll use constantly:

| Type | What it stores | Example |
|------|---------------|---------|
| `str` | Text (strings) | `"Aaron Judge"` |
| `int` | Whole numbers (integers) | `62` |
| `float` | Decimal numbers | `0.311` |
| `bool` | True or False (boolean) | `True` |

Use the `type()` function to check what type a variable is:

In [None]:
# Check the type of each variable
print(type(player_name))   # str  -- text
print(type(home_runs))     # int  -- whole number
print(type(batting_average))  # float -- decimal number
print(type(is_mvp))        # bool -- True or False

In [None]:
# Type conversion -- changing one type to another
jersey_number = "99"          # This is a string, not a number!
print(type(jersey_number))

jersey_as_int = int(jersey_number)  # Convert string to integer
print(type(jersey_as_int))

# You can also go the other direction
hr_as_string = str(home_runs)  # Convert integer to string
print(f"{hr_as_string} is now type: {type(hr_as_string)}")

### Printing and f-strings

The `print()` function outputs text. **f-strings** (formatted strings) let you embed variables directly inside text — just put `f` before the quotes and wrap variables in `{curly braces}`.

In [None]:
# Basic print
print("Hello, baseball!")

# f-strings -- the most useful way to format output
print(f"{player_name} hit {home_runs} home runs with a {batting_average} batting average.")

# You can even do math inside f-strings
games = 162
print(f"That's roughly {home_runs / games:.2f} home runs per game.")
# The :.2f means "format as a float with 2 decimal places"

## 2. Operators <a id="2-operators"></a>

Operators let you perform calculations, make comparisons, and combine conditions.

In [None]:
# Arithmetic operators
hits = 180
at_bats = 580

print(f"Hits + Walks:    {hits + 40}")          # Addition
print(f"At bats - Hits:  {at_bats - hits}")     # Subtraction
print(f"Hits x 2:        {hits * 2}")           # Multiplication
print(f"Batting avg:     {hits / at_bats:.3f}") # Division (returns float)
print(f"Floor division:  {at_bats // 3}")       # Division rounded down
print(f"Remainder:       {at_bats % 3}")        # Modulo (remainder)
print(f"2 to the 10th:   {2 ** 10}")            # Exponent (power)

In [None]:
# Comparison operators -- these return True or False
era = 2.45

print(f"ERA == 2.45:  {era == 2.45}")   # Equal to
print(f"ERA != 3.00:  {era != 3.00}")   # Not equal to
print(f"ERA > 3.00:   {era > 3.00}")    # Greater than
print(f"ERA < 3.00:   {era < 3.00}")    # Less than
print(f"ERA >= 2.45:  {era >= 2.45}")   # Greater than or equal
print(f"ERA <= 2.00:  {era <= 2.00}")   # Less than or equal

In [None]:
# Logical operators -- combine True/False conditions
hr = 40
avg = 0.280

# 'and' -- both must be True
print(f"40+ HR AND .280+ AVG: {hr >= 40 and avg >= 0.280}")

# 'or' -- at least one must be True
print(f"40+ HR OR .300+ AVG:  {hr >= 40 or avg >= 0.300}")

# 'not' -- flips True to False (and vice versa)
is_pitcher = False
print(f"Is a position player:  {not is_pitcher}")

### A Note on Comments

Lines starting with `#` are **comments** — Python ignores them. Use comments to explain *why* your code does something, not *what* it does (the code itself should be clear enough for that).

```python
# Good comment -- explains the why
avg = hits / at_bats  # Exclude sacrifice bunts from at-bats per MLB rules

# Bad comment -- just restates the code
avg = hits / at_bats  # Divide hits by at bats
```

## 3. Data Structures <a id="3-data-structures"></a>

Data structures let you store **collections** of values. Python has four built-in types, each with different strengths.

### Lists

A **list** is an ordered, changeable collection. It's the most commonly used data structure in Python. Create one with square brackets `[]`.

In [None]:
# Creating a list
al_east = ["Yankees", "Red Sox", "Blue Jays", "Orioles", "Rays"]
print(al_east)

# Lists can hold any type (even mixed types)
ohtani_stats = ["Shohei Ohtani", 54, 0.310, True]
print(ohtani_stats)

In [None]:
# Indexing -- access items by position (starts at 0, not 1!)
print(f"First team:  {al_east[0]}")   # Yankees
print(f"Third team:  {al_east[2]}")   # Blue Jays
print(f"Last team:   {al_east[-1]}")  # Rays (negative index counts from the end)

# Slicing -- get a sub-list with [start:stop]
print(f"First three: {al_east[0:3]}")  # Items at index 0, 1, 2 (stop is exclusive)
print(f"Last two:    {al_east[-2:]}")  # From second-to-last to the end

In [None]:
# Common list operations
al_east.append("New Team")     # Add to the end
print(f"After append:  {al_east}")

al_east.remove("New Team")     # Remove by value
print(f"After remove:  {al_east}")

print(f"Length:        {len(al_east)}")          # Number of items
print(f"Contains Rays? {'Rays' in al_east}")     # Check membership with 'in'

### Dictionaries

A **dictionary** stores data as **key-value pairs** — like a lookup table. Create one with curly braces `{}`. This is extremely useful for structured data like player stats.

In [None]:
# Creating a dictionary
player = {
    "name": "Aaron Judge",
    "team": "Yankees",
    "position": "RF",
    "home_runs": 58,
    "batting_avg": 0.322
}

# Access values by key
print(f"Name: {player['name']}")
print(f"Home runs: {player['home_runs']}")

In [None]:
# Adding/updating entries
player["rbi"] = 131                # Add a new key
player["home_runs"] = 62           # Update an existing key

# Useful dictionary methods
print(f"Keys:   {list(player.keys())}")
print(f"Values: {list(player.values())}")

# Loop through key-value pairs with .items()
for key, value in player.items():
    print(f"  {key}: {value}")

### Tuples and Sets

**Tuples** are like lists, but **immutable** (can't be changed after creation). Use parentheses `()`. They're great for data that shouldn't change, like coordinates or fixed records.

**Sets** store only **unique values** with no order. Use curly braces `{}` without key-value pairs.

In [None]:
# Tuples -- immutable (can't change after creation)
game_score = ("Yankees", 5, "Red Sox", 3)
print(f"Game: {game_score}")

# Tuple unpacking -- assign multiple variables at once
home_team, home_score, away_team, away_score = game_score
print(f"{home_team} beat {away_team} {home_score}-{away_score}")

# Sets -- unique values only
positions_played = {"OF", "DH", "OF", "1B", "DH"}  # Duplicates are removed
print(f"Unique positions: {positions_played}")

# Set operations
nl_teams = {"Dodgers", "Mets", "Braves", "Cubs"}
world_series_winners = {"Dodgers", "Braves", "Astros"}
print(f"NL teams that won WS: {nl_teams & world_series_winners}")  # Intersection

### When to Use What

| Structure | Syntax | Ordered? | Changeable? | Duplicates? | Best For |
|-----------|--------|----------|-------------|-------------|----------|
| **List** | `[a, b, c]` | Yes | Yes | Yes | Most general-purpose collections |
| **Dictionary** | `{k: v}` | Yes* | Yes | Keys: No | Structured/labeled data |
| **Tuple** | `(a, b, c)` | Yes | No | Yes | Fixed records, function returns |
| **Set** | `{a, b, c}` | No | Yes | No | Unique values, membership testing |

*Dicts maintain insertion order in Python 3.7+

## 4. Control Flow <a id="4-control-flow"></a>

Control flow lets your code make decisions and repeat actions. The two key concepts are **conditionals** (if/else) and **loops** (for/while).

### If / Elif / Else

Use `if` to run code only when a condition is True. Add `elif` (else if) for additional conditions, and `else` as a catch-all.

**Important**: Python uses **indentation** (4 spaces) to define code blocks, not curly braces like some other languages.

In [None]:
# Simple if/elif/else
era = 2.80

if era < 2.50:
    print("Elite pitcher -- Cy Young contender!")
elif era < 3.50:
    print("Above average pitcher.")
elif era < 4.50:
    print("Average pitcher.")
else:
    print("Below average -- needs improvement.")

In [None]:
# Combining conditions
hr = 45
sb = 35

if hr >= 40 and sb >= 40:
    print("40-40 club! Incredibly rare.")
elif hr >= 30 and sb >= 30:
    print("30-30 club -- impressive power and speed!")
else:
    print("Still a great player, just not in the 30-30 club.")

### For Loops

A `for` loop repeats code once for each item in a collection. This is how you process lists of players, stats, or games.

In [None]:
# Loop through a list
teams = ["Yankees", "Dodgers", "Astros", "Braves"]

for team in teams:
    print(f"Let's go {team}!")

In [None]:
# range() generates a sequence of numbers -- great for counting
# range(start, stop) -- stop is exclusive
for inning in range(1, 10):
    print(f"Inning {inning}")

print("\nThat's the ballgame!")

In [None]:
# Looping through a dictionary
player = {"name": "Mookie Betts", "HR": 39, "SB": 14, "AVG": 0.307}

for stat, value in player.items():
    print(f"  {stat}: {value}")

### While Loops

A `while` loop repeats as long as a condition is True. Use `break` to exit early and `continue` to skip to the next iteration.

In [None]:
# While loop -- count strikes
strikes = 0

while strikes < 3:
    strikes += 1  # Same as: strikes = strikes + 1
    print(f"Strike {strikes}!")

print("You're out!")

### List Comprehensions

List comprehensions are a concise way to create lists from loops. They're a Python favorite — you'll see them everywhere in data science code.

The syntax is: `[expression for item in iterable if condition]`

In [None]:
# Traditional loop approach
hr_totals = [12, 45, 8, 33, 51, 22, 40]
big_hr_hitters = []

for hr in hr_totals:
    if hr >= 30:
        big_hr_hitters.append(hr)

print(f"Traditional: {big_hr_hitters}")

# Same thing as a list comprehension -- one line!
big_hr_hitters = [hr for hr in hr_totals if hr >= 30]
print(f"Comprehension: {big_hr_hitters}")

# Transform values: convert batting averages to percentages
averages = [0.301, 0.275, 0.322, 0.198, 0.288]
percentages = [f"{avg * 100:.1f}%" for avg in averages]
print(f"As percentages: {percentages}")

## 5. Functions <a id="5-functions"></a>

A **function** is a reusable block of code that performs a specific task. You've already used built-in functions like `print()`, `len()`, and `type()`. Now let's write our own.

Functions help you:
- Avoid repeating code
- Organize logic into named, testable pieces
- Make your code easier to read

In [None]:
# Define a function with 'def', give it a name and parameters
def calculate_batting_avg(hits, at_bats):
    """Calculate batting average from hits and at-bats."""
    return hits / at_bats

# Call the function
avg = calculate_batting_avg(180, 550)
print(f"Batting average: {avg:.3f}")

In [None]:
# Default parameters -- provide a fallback value
def classify_pitcher(era, threshold=3.50):
    """Classify a pitcher as 'Ace', 'Solid', or 'Struggling' based on ERA."""
    if era < 2.50:
        return "Ace"
    elif era < threshold:
        return "Solid"
    else:
        return "Struggling"

# Using the default threshold
print(classify_pitcher(2.15))  # Ace
print(classify_pitcher(3.20))  # Solid
print(classify_pitcher(4.80))  # Struggling

# Overriding the default with a keyword argument
print(classify_pitcher(3.20, threshold=3.00))  # Now classified as Struggling

In [None]:
# Functions can return multiple values (as a tuple)
def slugging_and_obp(hits, doubles, triples, home_runs, at_bats, walks):
    """Calculate SLG and OBP."""
    singles = hits - doubles - triples - home_runs
    total_bases = singles + (2 * doubles) + (3 * triples) + (4 * home_runs)
    slg = total_bases / at_bats
    obp = (hits + walks) / (at_bats + walks)
    return slg, obp  # Returns both values

# Unpack the results
slg, obp = slugging_and_obp(
    hits=180, doubles=35, triples=3, home_runs=40,
    at_bats=550, walks=80
)
print(f"SLG: {slg:.3f}")
print(f"OBP: {obp:.3f}")
print(f"OPS: {slg + obp:.3f}")

### Variable Scope

Variables created inside a function are **local** — they only exist within that function. Variables created outside functions are **global** and accessible everywhere.

In [None]:
# Scope example
league = "MLB"  # Global variable

def get_league_info():
    division = "AL East"  # Local variable -- only exists inside this function
    return f"{division} in {league}"  # Can access global 'league'

print(get_league_info())
print(f"Global league: {league}")
# print(division)  # This would cause an error -- 'division' doesn't exist here

## 6. Putting It All Together <a id="6-putting-it-all-together"></a>

Let's combine everything we've learned into a mini-project: a program that stores player stats, processes them, and prints a formatted report.

In [None]:
# Store player data as a list of dictionaries
roster = [
    {"name": "Aaron Judge",    "position": "RF",  "HR": 58, "AVG": 0.322, "RBI": 131},
    {"name": "Juan Soto",      "position": "LF",  "HR": 41, "AVG": 0.288, "RBI": 109},
    {"name": "Shohei Ohtani",  "position": "DH",  "HR": 54, "AVG": 0.310, "RBI": 130},
    {"name": "Mookie Betts",   "position": "SS",  "HR": 39, "AVG": 0.307, "RBI": 98},
    {"name": "Freddie Freeman","position": "1B",  "HR": 22, "AVG": 0.282, "RBI": 89},
]

# Define helper functions
def calculate_ops_estimate(avg, hr, at_bats=550):
    """Rough OPS estimate from AVG and HR (simplified)."""
    slg_estimate = avg + (hr * 3 / at_bats)
    obp_estimate = avg + 0.05
    return slg_estimate + obp_estimate

def player_tier(hr, avg):
    """Classify player into a tier."""
    if hr >= 50 or avg >= 0.320:
        return "MVP Caliber"
    elif hr >= 35 or avg >= 0.300:
        return "All-Star"
    else:
        return "Starter"

In [None]:
# Process and display results using everything we've learned
print("=" * 60)
print("  PLAYER REPORT")
print("=" * 60)

for p in roster:
    tier = player_tier(p["HR"], p["AVG"])
    ops_est = calculate_ops_estimate(p["AVG"], p["HR"])

    print(f"\n  {p['name']} ({p['position']})")
    print(f"    HR: {p['HR']}  |  AVG: {p['AVG']:.3f}  |  RBI: {p['RBI']}")
    print(f"    OPS (est): {ops_est:.3f}  |  Tier: {tier}")

# Summary stats using list comprehensions
total_hr = sum([p["HR"] for p in roster])
mvp_candidates = [p["name"] for p in roster if player_tier(p["HR"], p["AVG"]) == "MVP Caliber"]

print(f"\n{'=' * 60}")
print(f"  Total HR: {total_hr}")
print(f"  MVP Candidates: {', '.join(mvp_candidates)}")

### Key Takeaways

| Concept | What You Learned |
|---------|-----------------|
| **Variables** | Store values with `=`. Python infers the type. |
| **Data Types** | `str`, `int`, `float`, `bool` — check with `type()` |
| **Operators** | Arithmetic (`+`, `-`, `*`, `/`), comparison (`==`, `>`, `<`), logical (`and`, `or`, `not`) |
| **Lists** | Ordered, mutable collections: `[a, b, c]` |
| **Dictionaries** | Key-value pairs: `{"name": "Judge", "HR": 62}` |
| **Tuples** | Immutable sequences: `(a, b, c)` |
| **Sets** | Unique values: `{a, b, c}` |
| **If/Elif/Else** | Make decisions based on conditions |
| **For Loops** | Iterate over collections |
| **List Comprehensions** | Concise list creation: `[x for x in list if cond]` |
| **Functions** | Reusable code blocks: `def name(params): return result` |

---

**Next up: [Data Manipulation with Pandas & NumPy](02_data_manipulation.ipynb)** — where we'll use these Python fundamentals to work with real tabular data.