# Module 1: Python Fundamentals

## Topics Covered
1. Introduction to Python & Jupyter Notebooks
2. Variables and Data Types (int, float, str, bool)
3. Basic Operators (arithmetic, comparison, logical)
4. Strings and String Methods
5. Getting User Input and Type Conversion
6. Comments and Code Documentation

## Learning Objectives

By the end of this module, you will be able to:
- Understand what Python is and why it's popular in data science
- Navigate and use Jupyter Notebooks effectively
- Create and use variables to store different types of data
- Perform calculations and comparisons using operators
- Manipulate text data using string methods
- Get input from users and convert between data types
- Write clean, well-documented code with comments

---

---
# Section 1: Introduction to Python & Jupyter Notebooks
---

## What is Python?

Python is a high-level, general-purpose programming language created by Guido van Rossum and first released in 1991. It's designed to be easy to read and write, making it an excellent choice for beginners.

Think of Python like English among programming languages – it's relatively easy to understand, widely spoken (used), and gets the job done without unnecessary complexity.

### Why Python for Data Science?

Python has become the **#1 language for data science** for several reasons:

1. **Easy to learn** - Clean, readable syntax that looks almost like plain English
2. **Powerful libraries** - NumPy, Pandas, Matplotlib, Scikit-learn, and more
3. **Large community** - Millions of developers, endless tutorials and resources
4. **Industry adoption** - Used by Google, Netflix, Meta, banks, and research institutions
5. **Versatility** - From web development to machine learning to automation

### Why This Matters in Data Science

As a data analyst or data scientist, you'll spend most of your time:
- Cleaning and transforming data
- Performing statistical analysis
- Creating visualizations
- Building predictive models

Python makes all of these tasks efficient and enjoyable. The skills you learn here will directly apply to your data career.

## What is a Jupyter Notebook?

A Jupyter Notebook is an interactive document that combines:
- **Code cells** - Where you write and execute Python code
- **Markdown cells** - Where you write formatted text, explanations, and notes
- **Output** - Results appear directly below each code cell

Think of it like a lab notebook for data scientists – you can mix your code, results, and explanations all in one place.

### Why Jupyter for Data Science?

- **Interactive exploration** - Run code in chunks and see results immediately
- **Documentation** - Explain your analysis alongside your code
- **Visualization** - Charts and graphs display inline
- **Sharing** - Easy to share your work with colleagues
- **Industry standard** - Used in companies and research worldwide

## Your First Python Code

Let's write your very first Python program! The traditional first program in any language prints "Hello, World!" to the screen.

In [None]:
# Example: Your first Python program
# The print() function displays output to the screen

print("Hello, World!")

In [None]:
# Example: Print a message relevant to data science

print("Welcome to Python for Data Science!")
print("Let's analyze some data!")

## Syntax

```python
print(value)
```

**Parameters:**
- `value`: The text, number, or variable you want to display

**Returns:** None (just displays output)

## Practice Exercise 1.1

**Task:** Write a program that prints your name and your goal for learning Python.

**Expected Output:**
```
My name is [Your Name]
I want to learn Python to become a data analyst
```

In [None]:
# Your code here


In [None]:
# Solution 1.1

print("My name is Alex")
print("I want to learn Python to become a data analyst")

## Practice Exercise 1.2

**Task:** Print three separate lines showing:
1. The current year
2. Your favorite data-related topic (e.g., "Machine Learning", "Data Visualization")
3. A motivational message for your learning journey

**Expected Output:**
```
2024
I love Data Visualization
I will master Python!
```

In [None]:
# Your code here


In [None]:
# Solution 1.2

print("2024")
print("I love Data Visualization")
print("I will master Python!")

---
# Section 2: Variables and Data Types
---

## What are Variables?

A **variable** is a container that stores a value. Think of it like a labeled box – you give the box a name, and you can put something inside it.

For example, if you're analyzing sales data:
- `total_sales` might store the number `150000`
- `product_name` might store the text `"Laptop"`
- `is_profitable` might store `True` or `False`

### Why This Matters in Data Science

Variables are the foundation of all data analysis. You'll use them to:
- Store data values you're analyzing
- Keep track of calculations and results
- Make your code readable and reusable
- Pass data between different parts of your analysis

## Syntax

```python
variable_name = value
```

**Rules for Variable Names:**
- Must start with a letter or underscore (`_`)
- Can contain letters, numbers, and underscores
- Cannot contain spaces (use underscores instead)
- Case-sensitive (`age` and `Age` are different)
- Cannot use Python reserved words (`if`, `for`, `while`, etc.)

In [None]:
# Example: Creating variables

# Storing a customer's information
customer_name = "Sarah Johnson"
customer_age = 28
account_balance = 15750.50
is_premium_member = True

# Print the variables
print(customer_name)
print(customer_age)
print(account_balance)
print(is_premium_member)

## Python Data Types

Python has several built-in data types. The four most common ones are:

| Type | Name | Example | Use Case |
|------|------|---------|----------|
| `int` | Integer | `42`, `-7`, `0` | Counting, whole numbers |
| `float` | Floating-point | `3.14`, `-0.5`, `2.0` | Decimals, measurements |
| `str` | String | `"hello"`, `'data'` | Text, names, categories |
| `bool` | Boolean | `True`, `False` | Yes/no, on/off, conditions |

In [None]:
# Example: Different data types in a data context

# Integer - whole numbers (counts, IDs)
num_employees = 250
customer_id = 10042

# Float - decimal numbers (measurements, money, percentages)
average_salary = 65432.50
growth_rate = 0.15  # 15% growth

# String - text data (names, categories, descriptions)
department = "Data Science"
status = "Active"

# Boolean - True/False values (flags, conditions)
is_full_time = True
has_bonus = False

print("Number of employees:", num_employees)
print("Average salary:", average_salary)
print("Department:", department)
print("Full-time:", is_full_time)

## Checking Data Types with `type()`

You can check what type of data a variable holds using the `type()` function. This is especially useful when working with data from external sources.

In [None]:
# Example: Checking data types

sales_count = 1500
revenue = 45000.75
product = "Wireless Mouse"
in_stock = True

print(type(sales_count))
print(type(revenue))
print(type(product))
print(type(in_stock))

## Variable Naming Best Practices

Good variable names make your code **self-documenting**. Someone reading your code should understand what each variable represents.

In [None]:
# Example: Good vs. Bad variable names

# ❌ Bad - unclear what these mean
x = 50000
y = 12
z = x / y

# Good - self-explanatory
annual_revenue = 50000
num_months = 12
monthly_revenue = annual_revenue / num_months

print(f"Monthly revenue: ${monthly_revenue:.2f}")

## Practice Exercise 2.1

**Task:** Create variables to store information about a product in an online store:
- Product name (string)
- Price (float)
- Quantity in stock (integer)
- Is the product on sale? (boolean)

Print each variable and its type.

**Expected Output:**
```
Product: Mechanical Keyboard
Type: <class 'str'>
Price: 89.99
Type: <class 'float'>
Stock: 150
Type: <class 'int'>
On Sale: True
Type: <class 'bool'>
```

In [None]:
# Your code here


In [None]:
# Solution 2.1

product_name = "Mechanical Keyboard"
price = 89.99
quantity_in_stock = 150
is_on_sale = True

print("Product:", product_name)
print("Type:", type(product_name))
print("Price:", price)
print("Type:", type(price))
print("Stock:", quantity_in_stock)
print("Type:", type(quantity_in_stock))
print("On Sale:", is_on_sale)
print("Type:", type(is_on_sale))

## Practice Exercise 2.2

**Task:** You're analyzing employee data. Create variables for:
- Employee ID: 10052
- Name: "Maria Garcia"
- Hourly rate: $35.50
- Hours worked this week: 42
- Is manager: False

Calculate and print the weekly pay (hourly rate × hours worked).

**Expected Output:**
```
Employee: Maria Garcia (ID: 10052)
Weekly Pay: $1491.00
```

In [None]:
# Your code here


In [None]:
# Solution 2.2

employee_id = 10052
name = "Maria Garcia"
hourly_rate = 35.50
hours_worked = 42
is_manager = False

weekly_pay = hourly_rate * hours_worked

print(f"Employee: {name} (ID: {employee_id})")
print(f"Weekly Pay: ${weekly_pay:.2f}")

## Practice Exercise 2.3

**Task:** Create variables containing information about your favorite dataset or data topic:
- Topic name (string)
- Number of rows in a dataset you'd like to analyze (integer)
- Estimated time in hours to analyze it (float)
- Have you worked with this data before? (boolean)

Print a summary of your data project.

**Expected Output:**
```
Data Project Summary
-------------------
Topic: Customer Churn Analysis
Dataset Size: 10000 rows
Estimated Time: 4.5 hours
Previous Experience: False
```

In [None]:
# Your code here


In [None]:
# Solution 2.3

topic_name = "Customer Churn Analysis"
num_rows = 10000
estimated_hours = 4.5
previous_experience = False

print("Data Project Summary")
print("-------------------")
print(f"Topic: {topic_name}")
print(f"Dataset Size: {num_rows} rows")
print(f"Estimated Time: {estimated_hours} hours")
print(f"Previous Experience: {previous_experience}")

---
# Section 3: Basic Operators
---

## What are Operators?

Operators are symbols that perform operations on values and variables. Think of them as the verbs of programming – they tell Python what action to take.

### Why This Matters in Data Science

Operators are essential for:
- Calculating metrics (sum, average, percentage change)
- Filtering data based on conditions
- Combining multiple conditions for complex queries
- Creating new features from existing data

## Arithmetic Operators

Used for mathematical calculations.

| Operator | Name | Example | Result |
|----------|------|---------|--------|
| `+` | Addition | `5 + 3` | `8` |
| `-` | Subtraction | `5 - 3` | `2` |
| `*` | Multiplication | `5 * 3` | `15` |
| `/` | Division | `5 / 3` | `1.666...` |
| `//` | Floor Division | `5 // 3` | `1` |
| `%` | Modulus (remainder) | `5 % 3` | `2` |
| `**` | Exponentiation | `5 ** 3` | `125` |

In [None]:
# Example: Arithmetic operators in a sales context

# Monthly sales data
january_sales = 15000
february_sales = 18000

# Addition - total sales
total_sales = january_sales + february_sales
print(f"Total Sales: ${total_sales}")

# Subtraction - difference
sales_increase = february_sales - january_sales
print(f"Sales Increase: ${sales_increase}")

# Multiplication - projected annual (if Feb rate continues)
projected_annual = february_sales * 12
print(f"Projected Annual: ${projected_annual}")

# Division - average per month
average_monthly = total_sales / 2
print(f"Average Monthly: ${average_monthly}")

In [None]:
# Example: Floor division and modulus

total_items = 100
items_per_box = 12

# Floor division - how many full boxes can we fill?
full_boxes = total_items // items_per_box
print(f"Full boxes: {full_boxes}")

# Modulus - how many items left over?
leftover_items = total_items % items_per_box
print(f"Leftover items: {leftover_items}")

In [None]:
# Example: Exponentiation - compound growth

initial_investment = 1000
annual_return = 1.07  # 7% return
years = 10

# Compound growth formula: P * (1 + r)^t
future_value = initial_investment * (annual_return ** years)
print(f"Initial Investment: ${initial_investment}")
print(f"After {years} years: ${future_value:.2f}")

## Comparison Operators

Used to compare values. They always return `True` or `False`.

| Operator | Name | Example | Result |
|----------|------|---------|--------|
| `==` | Equal to | `5 == 5` | `True` |
| `!=` | Not equal to | `5 != 3` | `True` |
| `>` | Greater than | `5 > 3` | `True` |
| `<` | Less than | `5 < 3` | `False` |
| `>=` | Greater than or equal | `5 >= 5` | `True` |
| `<=` | Less than or equal | `5 <= 3` | `False` |

In [None]:
# Example: Comparison operators in data analysis

sales_target = 50000
actual_sales = 52000
previous_sales = 48000

# Did we meet the target?
target_met = actual_sales >= sales_target
print(f"Target met: {target_met}")

# Did sales improve from last period?
sales_improved = actual_sales > previous_sales
print(f"Sales improved: {sales_improved}")

# Exactly on target?
exactly_on_target = actual_sales == sales_target
print(f"Exactly on target: {exactly_on_target}")

# Below target?
below_target = actual_sales < sales_target
print(f"Below target: {below_target}")

## Logical Operators

Used to combine multiple conditions.

| Operator | Description | Example |
|----------|-------------|--------|
| `and` | True if BOTH conditions are true | `x > 0 and x < 10` |
| `or` | True if AT LEAST ONE condition is true | `x < 0 or x > 100` |
| `not` | Reverses the result | `not(x > 5)` |

In [None]:
# Example: Logical operators for customer segmentation

customer_age = 35
annual_income = 75000
has_premium_account = True

# Check if customer qualifies for special offer
# Must be: age 25-45 AND income > 50000
qualifies_for_offer = (customer_age >= 25 and customer_age <= 45) and (annual_income > 50000)
print(f"Qualifies for special offer: {qualifies_for_offer}")

# Check if customer gets priority support
# Either: premium account OR income > 100000
gets_priority_support = has_premium_account or annual_income > 100000
print(f"Gets priority support: {gets_priority_support}")

# Check if customer is NOT premium
needs_upgrade_prompt = not has_premium_account
print(f"Show upgrade prompt: {needs_upgrade_prompt}")

In [None]:
# Example: Combining multiple conditions

temperature = 72
humidity = 45
is_weekend = True

# Perfect weather for outdoor event:
# Temperature between 65-80 AND humidity below 60 AND it's a weekend
perfect_conditions = (temperature >= 65 and temperature <= 80) and (humidity < 60) and is_weekend

print(f"Temperature: {temperature}°F")
print(f"Humidity: {humidity}%")
print(f"Weekend: {is_weekend}")
print(f"Perfect for outdoor event: {perfect_conditions}")

## Practice Exercise 3.1

**Task:** You're calculating metrics for a retail store:
- Total revenue: $125,000
- Total costs: $87,500
- Number of transactions: 2,500

Calculate and print:
1. Profit (revenue - costs)
2. Profit margin percentage (profit / revenue * 100)
3. Average transaction value (revenue / transactions)

**Expected Output:**
```
Profit: $37500
Profit Margin: 30.0%
Average Transaction: $50.0
```

In [None]:
# Your code here


In [None]:
# Solution 3.1

total_revenue = 125000
total_costs = 87500
num_transactions = 2500

profit = total_revenue - total_costs
profit_margin = (profit / total_revenue) * 100
avg_transaction = total_revenue / num_transactions

print(f"Profit: ${profit}")
print(f"Profit Margin: {profit_margin}%")
print(f"Average Transaction: ${avg_transaction}")

## Practice Exercise 3.2

**Task:** A company has 247 employees and wants to form teams of 8. Calculate:
1. How many complete teams can be formed?
2. How many employees will be left without a full team?

**Hint:** Use floor division (`//`) and modulus (`%`).

**Expected Output:**
```
Complete teams: 30
Remaining employees: 7
```

In [None]:
# Your code here


In [None]:
# Solution 3.2

total_employees = 247
team_size = 8

complete_teams = total_employees // team_size
remaining_employees = total_employees % team_size

print(f"Complete teams: {complete_teams}")
print(f"Remaining employees: {remaining_employees}")

## Practice Exercise 3.3

**Task:** You're filtering customers for a marketing campaign. A customer qualifies if:
- Age is between 25 and 55 (inclusive), AND
- Either has made a purchase in the last 30 days OR is a loyalty member

Test with this customer:
- Age: 42
- Days since last purchase: 45
- Is loyalty member: True

Does the customer qualify?

**Expected Output:**
```
Customer Age: 42
Days Since Purchase: 45
Loyalty Member: True
Qualifies for Campaign: True
```

In [None]:
# Your code here


In [None]:
# Solution 3.3

customer_age = 42
days_since_purchase = 45
is_loyalty_member = True

# Check age requirement
age_qualifies = customer_age >= 25 and customer_age <= 55

# Check purchase or loyalty requirement
recent_purchase = days_since_purchase <= 30
activity_qualifies = recent_purchase or is_loyalty_member

# Both conditions must be met
qualifies_for_campaign = age_qualifies and activity_qualifies

print(f"Customer Age: {customer_age}")
print(f"Days Since Purchase: {days_since_purchase}")
print(f"Loyalty Member: {is_loyalty_member}")
print(f"Qualifies for Campaign: {qualifies_for_campaign}")

---
# Section 4: Strings and String Methods
---

## What are Strings?

A **string** is a sequence of characters enclosed in quotes. In Python, you can use single quotes (`'hello'`) or double quotes (`"hello"`) – both work the same way.

### Why This Matters in Data Science

Text data is everywhere:
- Customer names and addresses
- Product descriptions
- Survey responses
- Social media posts
- Log files and error messages

Learning to manipulate strings is essential for data cleaning and text analysis.

## Creating Strings

In [None]:
# Example: Different ways to create strings

# Single quotes
product_name = 'Wireless Mouse'

# Double quotes
customer_name = "John Smith"

# String with quotes inside - use the other type
message = "The customer said 'I love this product!'"
review = 'She wrote: "Excellent quality!"'

# Multi-line strings with triple quotes
description = """This is a detailed product description
that spans multiple lines.
It's perfect for long text."""

print(product_name)
print(customer_name)
print(message)
print(review)
print(description)

## String Concatenation and Repetition

In [None]:
# Example: Combining strings

first_name = "Sarah"
last_name = "Johnson"

# Concatenation with +
full_name = first_name + " " + last_name
print(f"Full name: {full_name}")

# Repetition with *
separator = "-" * 30
print(separator)
print("SALES REPORT")
print(separator)

## String Indexing and Slicing

You can access individual characters or portions of a string using **indexing** and **slicing**.

```
String:   P  y  t  h  o  n
Index:    0  1  2  3  4  5
Negative: -6 -5 -4 -3 -2 -1
```

In [None]:
# Example: String indexing

product_code = "LAPTOP-2024-PRO"

# Access single characters
print(f"First character: {product_code[0]}")
print(f"Last character: {product_code[-1]}")

# Slicing [start:end] - end is exclusive
print(f"First 6 characters: {product_code[0:6]}")
print(f"Last 3 characters: {product_code[-3:]}")

# Extract the year from the code
year = product_code[7:11]
print(f"Year: {year}")

## Common String Methods

Python provides many built-in methods to work with strings.

In [None]:
# Example: Case methods

customer_input = "  jOhN sMiTh  "

# Convert case
print(f"Upper: {customer_input.upper()}")
print(f"Lower: {customer_input.lower()}")
print(f"Title: {customer_input.title()}")

# Remove whitespace
cleaned = customer_input.strip()
print(f"Stripped: '{cleaned}'")

# Clean and format in one step
formatted_name = customer_input.strip().title()
print(f"Final: {formatted_name}")

In [None]:
# Example: Searching in strings

email = "john.smith@example.com"

# Find position of a substring
at_position = email.find("@")
print(f"@ is at position: {at_position}")

# Check if string contains a substring
has_com = ".com" in email
print(f"Contains .com: {has_com}")

# Check how string starts or ends
print(f"Starts with 'john': {email.startswith('john')}")
print(f"Ends with '.com': {email.endswith('.com')}")

# Count occurrences
sentence = "data science uses data to create data-driven decisions"
data_count = sentence.count("data")
print(f"'data' appears {data_count} times")

In [None]:
# Example: Replacing and splitting

# Replace substrings
old_format = "2024/01/15"
new_format = old_format.replace("/", "-")
print(f"New date format: {new_format}")

# Split string into a list
csv_line = "John,Smith,35,New York,Engineer"
fields = csv_line.split(",")
print(f"Fields: {fields}")
print(f"Name: {fields[0]} {fields[1]}")
print(f"Age: {fields[2]}")

# Join list into string
words = ["Python", "for", "Data", "Science"]
title = " ".join(words)
print(f"Title: {title}")

## String Formatting with f-strings

F-strings (formatted string literals) are the modern way to embed variables and expressions in strings.

In [None]:
# Example: f-strings for formatting

product = "Laptop"
price = 999.99
quantity = 5

# Basic f-string
message = f"Product: {product}, Price: ${price}, Qty: {quantity}"
print(message)

# Calculations inside f-strings
total = f"Total: ${price * quantity:.2f}"
print(total)

# Formatting numbers
large_number = 1234567.89
print(f"With commas: {large_number:,.2f}")
print(f"Percentage: {0.156:.1%}")
print(f"Padded: {42:05d}")  # 5 digits, zero-padded

## Practice Exercise 4.1

**Task:** You have messy customer data. Clean it up:
- Name: "  JENNIFER WILLIAMS  " (has extra spaces, all caps)
- Email: "JENNIFER.WILLIAMS@EMAIL.COM" (should be lowercase)

Print the cleaned data.

**Expected Output:**
```
Name: Jennifer Williams
Email: jennifer.williams@email.com
```

In [None]:
# Your code here


In [None]:
# Solution 4.1

raw_name = "  JENNIFER WILLIAMS  "
raw_email = "JENNIFER.WILLIAMS@EMAIL.COM"

clean_name = raw_name.strip().title()
clean_email = raw_email.lower()

print(f"Name: {clean_name}")
print(f"Email: {clean_email}")

## Practice Exercise 4.2

**Task:** Extract information from a product code: "ELEC-TV-55IN-2024"
- Category (first 4 characters): ELEC
- Product type: TV
- Size: 55IN
- Year: 2024

**Hint:** Use the `split()` method.

**Expected Output:**
```
Category: ELEC
Product: TV
Size: 55IN
Year: 2024
```

In [None]:
# Your code here


In [None]:
# Solution 4.2

product_code = "ELEC-TV-55IN-2024"

parts = product_code.split("-")

category = parts[0]
product_type = parts[1]
size = parts[2]
year = parts[3]

print(f"Category: {category}")
print(f"Product: {product_type}")
print(f"Size: {size}")
print(f"Year: {year}")

## Practice Exercise 4.3

**Task:** Create a formatted sales report line:
- Product: "Wireless Headphones"
- Units sold: 1250
- Revenue: 87500.00

Format the output with proper number formatting (commas for thousands).

**Expected Output:**
```
================================
SALES REPORT
================================
Product: Wireless Headphones
Units Sold: 1,250
Revenue: $87,500.00
================================
```

In [None]:
# Your code here


In [None]:
# Solution 4.3

product = "Wireless Headphones"
units_sold = 1250
revenue = 87500.00

separator = "=" * 32

print(separator)
print("SALES REPORT")
print(separator)
print(f"Product: {product}")
print(f"Units Sold: {units_sold:,}")
print(f"Revenue: ${revenue:,.2f}")
print(separator)

---
# Section 5: Getting User Input and Type Conversion
---

## Getting User Input

The `input()` function allows you to get data from the user. This is useful for creating interactive programs and testing your code with different values.

**Important:** `input()` always returns a **string**, even if the user types a number!

### Why This Matters in Data Science

While you won't use `input()` much in data analysis scripts, understanding it helps you:
- Create interactive data exploration tools
- Build simple data entry applications
- Understand how data comes in as strings and needs conversion

## Syntax

```python
variable = input("prompt message")
```

**Parameters:**
- `prompt message`: Text displayed to the user before they type

**Returns:** A string containing what the user typed

In [None]:
# Example: Basic input (run this cell and type something!)

name = input("Enter your name: ")
print(f"Hello, {name}!")

In [None]:
# Example: Input is always a string!

age_input = input("Enter your age: ")
print(f"You entered: {age_input}")
print(f"Type: {type(age_input)}")

# This will NOT work as expected:
# next_year = age_input + 1  # Error! Can't add string and int

## Type Conversion

Python provides built-in functions to convert between data types:

| Function | Converts to | Example |
|----------|-------------|--------|
| `int()` | Integer | `int("42")` → `42` |
| `float()` | Float | `float("3.14")` → `3.14` |
| `str()` | String | `str(42)` → `"42"` |
| `bool()` | Boolean | `bool(1)` → `True` |

In [None]:
# Example: Converting string input to numbers

# Get quantity from user
quantity_str = input("Enter quantity: ")
quantity = int(quantity_str)  # Convert to integer

# Get price from user
price_str = input("Enter price: ")
price = float(price_str)  # Convert to float

# Now we can do math!
total = quantity * price
print(f"Total: ${total:.2f}")

In [None]:
# Example: Converting in one line

# You can convert immediately when getting input
age = int(input("Enter your age: "))
years_to_retirement = 65 - age
print(f"Years until retirement: {years_to_retirement}")

In [None]:
# Example: Converting between types for calculations

# Sometimes you need to convert for display or calculations
count = 42
percentage = 0.75

# Convert int to float for division
result = float(count) / 3
print(f"Result as float: {result}")

# Convert float to int (truncates decimals)
whole_number = int(result)
print(f"Result as int: {whole_number}")

# Convert numbers to strings for concatenation
message = "The count is " + str(count)
print(message)

## Common Conversion Errors

Be careful when converting! Invalid conversions cause errors.

In [None]:
# Example: What works and what doesn't

# These work
print(int("42"))        # String with only digits
print(float("3.14"))    # String with decimal number
print(int(3.99))        # Float to int (truncates to 3)
print(float(42))        # Int to float (42.0)

# ❌ These would cause errors (commented out)
# print(int("42.5"))    # Can't convert string with decimal to int directly
# print(int("hello"))   # Can't convert non-numeric string
# print(int(""))        # Can't convert empty string

In [None]:
# Example: Safe way to convert string with decimal to int

price_str = "42.99"

# First convert to float, then to int
price_float = float(price_str)
price_int = int(price_float)

print(f"Original: {price_str}")
print(f"As float: {price_float}")
print(f"As int: {price_int}")

## Practice Exercise 5.1

**Task:** Create a simple tip calculator:
1. Ask the user for the bill amount
2. Ask the user for the tip percentage (as a whole number like 15, 18, or 20)
3. Calculate and display the tip amount and total bill

**Example Run:**
```
Enter bill amount: 85.50
Enter tip percentage: 18

Bill: $85.50
Tip (18%): $15.39
Total: $100.89
```

In [None]:
# Your code here


In [None]:
# Solution 5.1

# Get user input
bill_amount = float(input("Enter bill amount: "))
tip_percentage = int(input("Enter tip percentage: "))

# Calculate tip and total
tip_amount = bill_amount * (tip_percentage / 100)
total = bill_amount + tip_amount

# Display results
print()
print(f"Bill: ${bill_amount:.2f}")
print(f"Tip ({tip_percentage}%): ${tip_amount:.2f}")
print(f"Total: ${total:.2f}")

## Practice Exercise 5.2

**Task:** Create a data type converter that:
1. Asks the user for a decimal number (as a string)
2. Shows the original value and its type
3. Converts it to float and shows the result
4. Converts the float to int and shows the result
5. Converts the int back to string and shows the result

**Example Run:**
```
Enter a decimal number: 42.75

Original: 42.75 (type: str)
As float: 42.75 (type: float)
As int: 42 (type: int)
As string: 42 (type: str)
```

In [None]:
# Your code here


In [None]:
# Solution 5.2

# Get user input
original = input("Enter a decimal number: ")

# Convert through different types
as_float = float(original)
as_int = int(as_float)
back_to_string = str(as_int)

# Display results
print()
print(f"Original: {original} (type: {type(original).__name__})")
print(f"As float: {as_float} (type: {type(as_float).__name__})")
print(f"As int: {as_int} (type: {type(as_int).__name__})")
print(f"As string: {back_to_string} (type: {type(back_to_string).__name__})")

## Practice Exercise 5.3

**Task:** Create a simple data entry form for an employee:
1. Ask for employee name (string)
2. Ask for employee ID (integer)
3. Ask for hourly rate (float)
4. Ask for hours worked this week (float)

Calculate weekly pay and display a summary.

**Example Run:**
```
=== Employee Data Entry ===
Enter employee name: Carlos Martinez
Enter employee ID: 10045
Enter hourly rate: 28.50
Enter hours worked: 42.5

=== Employee Summary ===
Name: Carlos Martinez
ID: 10045
Hourly Rate: $28.50
Hours Worked: 42.5
Weekly Pay: $1,211.25
```

In [None]:
# Your code here


In [None]:
# Solution 5.3

print("=== Employee Data Entry ===")
name = input("Enter employee name: ")
employee_id = int(input("Enter employee ID: "))
hourly_rate = float(input("Enter hourly rate: "))
hours_worked = float(input("Enter hours worked: "))

weekly_pay = hourly_rate * hours_worked

print()
print("=== Employee Summary ===")
print(f"Name: {name}")
print(f"ID: {employee_id}")
print(f"Hourly Rate: ${hourly_rate:.2f}")
print(f"Hours Worked: {hours_worked}")
print(f"Weekly Pay: ${weekly_pay:,.2f}")

---
# Section 6: Comments and Code Documentation
---

## Why Comments Matter

Comments are notes in your code that Python ignores. They're written for humans – including your future self!

### Why This Matters in Data Science

Data analysis code often involves:
- Complex business logic that isn't obvious from the code
- Assumptions about the data
- Reasons why you chose a particular approach
- Warnings about edge cases

Good comments make your analysis **reproducible** and **understandable** by teammates (and yourself in 6 months).

## Types of Comments

In [None]:
# Example: Single-line comments

# This is a single-line comment
# It starts with a # symbol

revenue = 150000  # You can also comment at the end of a line

# Calculate profit margin
# Formula: (revenue - costs) / revenue * 100
costs = 95000
profit_margin = (revenue - costs) / revenue * 100

print(f"Profit margin: {profit_margin:.1f}%")

In [None]:
# Example: Multi-line comments (docstrings)

"""
This is a multi-line comment (docstring).
It can span multiple lines and is often used
at the beginning of files or functions to
describe their purpose.
"""

'''
You can also use single quotes for
multi-line comments, but double quotes
are more common by convention.
'''

print("Multi-line comments are useful for longer explanations")

## Good vs. Bad Comments

Not all comments are helpful. The best comments explain **why**, not just **what**.

In [None]:
# Example: Bad comments (state the obvious)

# ❌ Bad - just repeats what the code says
x = 5  # Set x to 5
total = a + b  # Add a and b
counter = counter + 1  # Increment counter

# These comments add no value - the code is already clear

In [None]:
# Example: Good comments (explain why and provide context)

# Good - explains business logic
discount_threshold = 1000  # Customers spending over $1000 get loyalty discount

# Good - explains a non-obvious decision
# Using 2.5 as multiplier based on Q3 seasonal adjustment
# See marketing report: https://internal.docs/q3-analysis
seasonal_factor = 2.5

# Good - warns about edge cases
# WARNING: Division assumes sales_count > 0
# Zero sales will cause ZeroDivisionError
average_sale = total_revenue / sales_count

# Good - clarifies complex calculation
# Calculate compound annual growth rate (CAGR)
# Formula: (end_value / start_value)^(1/years) - 1
cagr = (end_value / start_value) ** (1 / years) - 1

## Commenting for Data Analysis

In [None]:
# Example: Well-documented data analysis code

"""
Customer Segmentation Analysis
==============================
This analysis categorizes customers based on their
purchase behavior for targeted marketing campaigns.

Data source: sales_database.customers (updated daily)
Author: Data Science Team
Last updated: 2024-01-15
"""

# Define segment thresholds
# These values were determined through customer research (see doc: segmentation-study-2024)
HIGH_VALUE_THRESHOLD = 10000  # Annual spend > $10,000
MEDIUM_VALUE_THRESHOLD = 2500  # Annual spend > $2,500

# Sample customer data
customer_spend = 7500
customer_tenure_years = 3

# Segment classification
# Priority is given to high-value customers regardless of tenure
if customer_spend > HIGH_VALUE_THRESHOLD:
    segment = "Premium"
elif customer_spend > MEDIUM_VALUE_THRESHOLD:
    segment = "Growth"
else:
    segment = "Standard"

print(f"Customer Segment: {segment}")

## Using Comments to Plan Your Code

A great technique is to write comments FIRST as a plan, then fill in the code.

In [None]:
# Example: Planning with comments first

# Step 1: Define the input data
monthly_sales = [12000, 15000, 18000, 14000, 22000, 19000]

# Step 2: Calculate the total sales
total_sales = sum(monthly_sales)

# Step 3: Calculate the average monthly sales
average_sales = total_sales / len(monthly_sales)

# Step 4: Find the best and worst months
best_month = max(monthly_sales)
worst_month = min(monthly_sales)

# Step 5: Display the results
print(f"Total Sales: ${total_sales:,}")
print(f"Average Monthly: ${average_sales:,.2f}")
print(f"Best Month: ${best_month:,}")
print(f"Worst Month: ${worst_month:,}")

## Practice Exercise 6.1

**Task:** The following code works but has no comments. Add appropriate comments to explain what it does and why.

```python
revenue = 250000
costs = 175000
tax_rate = 0.21

gross_profit = revenue - costs
tax_amount = gross_profit * tax_rate
net_profit = gross_profit - tax_amount
profit_margin = (net_profit / revenue) * 100

print(f"Net Profit: ${net_profit:,.2f}")
print(f"Profit Margin: {profit_margin:.1f}%")
```

In [None]:
# Your code here - add comments to the code above


In [None]:
# Solution 6.1

"""
Profit Analysis Calculator
Calculates net profit and profit margin after taxes
"""

# Input values (in dollars)
revenue = 250000
costs = 175000
tax_rate = 0.21  # US corporate tax rate (21%)

# Calculate gross profit (before taxes)
gross_profit = revenue - costs

# Calculate tax on profits
# Note: Only profits are taxed, not revenue
tax_amount = gross_profit * tax_rate

# Calculate net profit (after taxes)
net_profit = gross_profit - tax_amount

# Calculate profit margin as percentage of revenue
# This shows how much of each dollar earned becomes profit
profit_margin = (net_profit / revenue) * 100

# Display results
print(f"Net Profit: ${net_profit:,.2f}")
print(f"Profit Margin: {profit_margin:.1f}%")

## Practice Exercise 6.2

**Task:** Write a well-commented program that calculates a customer's loyalty status:
- Gold: 5+ years OR $50,000+ lifetime spending
- Silver: 2+ years AND $10,000+ lifetime spending  
- Bronze: Everyone else

Test with:
- Customer years: 3
- Lifetime spending: $25,000

Include:
- A header comment explaining the program
- Comments explaining the loyalty tier logic
- Comments on any non-obvious decisions

In [None]:
# Your code here


In [None]:
# Solution 6.2

"""
Customer Loyalty Status Calculator
===================================
Determines customer loyalty tier based on tenure and spending.
Used by the marketing team for targeted campaigns.

Tier Criteria:
- Gold: 5+ years membership OR $50,000+ lifetime spend
- Silver: 2+ years membership AND $10,000+ lifetime spend
- Bronze: All other customers
"""

# Customer data
customer_years = 3
lifetime_spending = 25000

# Threshold constants
# These align with marketing's customer value analysis (2024)
GOLD_YEARS = 5
GOLD_SPENDING = 50000
SILVER_YEARS = 2
SILVER_SPENDING = 10000

# Determine loyalty tier
# Gold tier uses OR logic - either condition qualifies
# This rewards both long-term loyalty AND high spenders
if customer_years >= GOLD_YEARS or lifetime_spending >= GOLD_SPENDING:
    tier = "Gold"
# Silver tier uses AND logic - both conditions required
# Ensures customers have demonstrated sustained engagement
elif customer_years >= SILVER_YEARS and lifetime_spending >= SILVER_SPENDING:
    tier = "Silver"
else:
    tier = "Bronze"

# Display result
print(f"Customer Profile:")
print(f"  Years: {customer_years}")
print(f"  Lifetime Spending: ${lifetime_spending:,}")
print(f"  Loyalty Tier: {tier}")

## Practice Exercise 6.3

**Task:** Write a mini data analysis script with proper documentation:
1. Create variables for a product's monthly sales data (use any 6 numbers)
2. Calculate: total, average, highest, lowest
3. Calculate month-over-month growth from first to last month
4. Document everything properly with comments

Include a header docstring and explanatory comments throughout.

In [None]:
# Your code here


In [None]:
# Solution 6.3

"""
Product Sales Analysis - H1 2024
=================================
Analyzes 6 months of sales data for Product SKU-12345
to identify trends and calculate key metrics.

Data Source: Sales Database (monthly aggregates)
Time Period: January - June 2024
"""

# Monthly sales data (in units)
# Each value represents total units sold that month
jan_sales = 1200
feb_sales = 1350
mar_sales = 1500
apr_sales = 1400
may_sales = 1650
jun_sales = 1800

# Calculate aggregate metrics
total_sales = jan_sales + feb_sales + mar_sales + apr_sales + may_sales + jun_sales
average_sales = total_sales / 6  # 6 months of data

# Find extremes using built-in functions
# Note: In a real scenario, we'd use a list for this
highest_month = max(jan_sales, feb_sales, mar_sales, apr_sales, may_sales, jun_sales)
lowest_month = min(jan_sales, feb_sales, mar_sales, apr_sales, may_sales, jun_sales)

# Calculate overall growth rate (first month to last month)
# Formula: ((end - start) / start) * 100
# This shows percentage change over the 6-month period
growth_rate = ((jun_sales - jan_sales) / jan_sales) * 100

# Display analysis results
print("=" * 40)
print("PRODUCT SALES ANALYSIS - H1 2024")
print("=" * 40)
print(f"Total Units Sold: {total_sales:,}")
print(f"Monthly Average: {average_sales:,.0f} units")
print(f"Best Month: {highest_month:,} units")
print(f"Worst Month: {lowest_month:,} units")
print(f"6-Month Growth: {growth_rate:.1f}%")
print("=" * 40)

---
# Module Summary

## Key Takeaways

### 1. Python & Jupyter Notebooks
- Python is the #1 language for data science due to its simplicity and powerful libraries
- Jupyter Notebooks combine code, output, and documentation in one interactive document
- Use `print()` to display output

### 2. Variables and Data Types
- Variables store data using the `variable = value` syntax
- Four main types: `int` (whole numbers), `float` (decimals), `str` (text), `bool` (True/False)
- Use `type()` to check a variable's data type
- Use descriptive, meaningful variable names

### 3. Operators
- **Arithmetic**: `+`, `-`, `*`, `/`, `//`, `%`, `**`
- **Comparison**: `==`, `!=`, `>`, `<`, `>=`, `<=`
- **Logical**: `and`, `or`, `not`

### 4. Strings
- Create with single or double quotes
- Use indexing `[0]` and slicing `[start:end]` to access parts
- Common methods: `.upper()`, `.lower()`, `.strip()`, `.split()`, `.replace()`, `.find()`
- Format with f-strings: `f"Hello, {name}!"`

### 5. Input and Type Conversion
- `input()` gets user input (always returns a string)
- Convert types with `int()`, `float()`, `str()`, `bool()`
- Always convert input before doing math

### 6. Comments
- Single-line: `# comment`
- Multi-line: `"""docstring"""`
- Good comments explain **why**, not just **what**
- Document your data analysis for reproducibility

---

## Next Module

In **Module 2: Data Structures**, we'll learn about:
- **Lists** - Ordered collections of items
- **Tuples** - Immutable sequences
- **Dictionaries** - Key-value pairs for structured data
- **Sets** - Collections of unique items

These data structures are essential for organizing and manipulating data in Python!

---

## Additional Practice

For extra practice, try these challenges:

1. **Temperature Converter**: Create a program that converts Celsius to Fahrenheit and vice versa. Ask the user which conversion they want, get the temperature, and display the result with proper formatting.

2. **Simple Interest Calculator**: Calculate simple interest given principal, rate, and time. Formula: Interest = Principal × Rate × Time. Display both the interest and the final amount.

3. **Email Parser**: Given an email address, extract and display:
   - The username (before @)
   - The domain (after @)
   - Whether it's a .com, .org, or other domain

4. **Data Summary Generator**: Create variables for a dataset (name, rows, columns, file size in MB, date created). Generate a formatted summary report with proper documentation.