# Python for Data Analytics I: Introduction to Python Basics

In today's business world, data is everywhere - from sales figures and customer behavior to market trends and financial metrics. But data alone isn't enough; we need the tools and skills to transform this data into actionable business insights.

## What You'll Learn

Imagine you're a business analyst at a retail company. You have sales data from multiple stores, customer feedback, and inventory levels. How do you make sense of all this data to make better business decisions?

In this workshop, you'll learn how Python can help you:

- Organize and structure your business data effectively
- Calculate key metrics like sales growth and customer retention
- Perform basic analysis to uncover patterns and trends
- Automate repetitive analysis tasks

Whether you're analyzing sales performance, understanding customer segments, or tracking market trends, Python will become your trusted tool for data analytics.

Let's get started!

---

## Your First Python Program

In Python, we can use the `print()` function to display text or messages.

Let's try it out by printing a message!

In [None]:
# Print a simple message
print("Hello, world!")

### What's Happening Here?
- `print()` is a function. Think of it as a command that asks Python to display something. But, why do we need to use `print()`?
  - Display results, such as totals or averages, during analysis.
  - Debug code by checking intermediate values or verifying logic.
  - Communicate results clearly in a step-by-step manner.

- `"Hello, World!"` is a **string**, which means it's a piece of text enclosed in quotes.

Try changing the message inside the quotes to something else!

## Variables and Data Types

A **variable** is a way to store data in Python.
Think of it as a labeled box where you can store numbers, text, or other data.

In business analytics, we work with different types of data:
1. **Numerical** values:
   - **Integers**: Whole numbers, such as number of products sold, customer count, inventory units
   - **Floats**: Numbers with decimals, like revenue amounts, price points, growth rates
2. **Non-Numerical** values:
   - **Strings**: Text data, such as product names, customer feedback, market segments
   - **Booleans**: True/False values, for example, to represent yes/no answers. Examples include: customer active status, product availability, promotion eligibility
   - **None**: A special indicator to represent the absence of a value.

We can assign a value to a variable using the assignment operator `=`. Let's explore these with examples.

In [None]:
# Storing data in variables
Q1_revenue = 420060.50  # Float for precise financial data
num_customers = 250  # Integer for counting
product_category = "Electronics"  # String for labels
is_profitable = True  # Boolean for status checks

# Printing the variables
print("First quarter revenue:", Q1_revenue)
print("Product category:", product_category)

We can perform arithmetic operations on numerical variables:

In [None]:
Q2_revenue = 396_000.75  # Using underscores for better readability

# Adding two numbers
total_revenue = Q1_revenue + Q2_revenue
print("Total revenue (Q1 & Q2):", total_revenue)

# Dividing two numbers
print("Average revenue (Q1 & Q2):", total_revenue / 2)

## Containers: Organizing Data Efficiently

So far, we've worked with individual variables like `Q1_revenue` and `Q2_revenue`. But what if we:

1. Need to analyze revenue or sales over multiple periods (e.g., days)?
2. Want to track all products sold in a week?
3. Need to identify unique product categories?

For such tasks, we use **containers** in Python to organize and manage collections of data efficiently.

### Lists: Storing Multiple Items

A **list** is an ordered collection of items, often used to represent time series data or multiple data points. Think of a list as a column in Excel, where each row holds a different value.

Lists are useful for:
- Storing daily sales figures.
- Keeping track of product names or categories.

#### Example: Daily Sales Figures

|       | Day       | Sales |
|-------|-----------|-------|
| 0     | Sunday    | 200   |
| 1     | Monday    | 250   |
| 2     | Tuesday   | 300   |
| 3     | Wednesday | 200   |
| 4     | Thursday  | 400   |
| 5     | Friday    | 350   |
| 6     | Saturday  | 450   |



In [None]:
# Example: Daily sales in a week
days = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]
daily_sales = [200, 250, 300, 200, 400, 350, 450]

print("Daily sales for the week:", daily_sales)

We can count the number of items in a list using the `len()` function.

In [None]:
num_of_days = len(daily_sales)
print("Number of records:", num_of_days)

#### Example: Analyzing Daily Sales

Now that we have sales data for the past few days, let's:
1. Calculate the total sales over the days.
2. Find the average daily sales.
3. Identify the highest and lowest sales days.

In [None]:
# Total and average sales
total_daily_sales = sum(daily_sales)
average_daily_sales = total_daily_sales / num_of_days

# Highest and lowest sales
highest_sales = max(daily_sales)
lowest_sales = min(daily_sales)

# Display results
print("Total sales for the week:", total_daily_sales)
print("Average daily sales:", average_daily_sales)
print("Highest sales in a day:", highest_sales)
print("Lowest sales in a day:", lowest_sales)

The `sum()`, `min()`, and `max()` functions are built-in Python functions that help us perform essential mathematical calculations on containers like lists.

#### Indexing: Accessing Data in Lists

Since lists are ordered collections, we can retrieve specific items using their **index**. Think of a list as a numbered locker room, where each locker (index) holds an item (data).

In Python:
1. Indexing starts at `0` (the first item is at position `0`).
2. Negative indexing allows you to access items from the end of the list (e.g., `-1` refers to the last item).


#### Example: Daily Sales
Let's say we have a list of sales for the week:
- Sunday's sales are at index `0`.
- Monday's sales are at index `1`, and so on.

We use `[]` with the position of the item (the locker number) to retrieve it.


In [None]:
# Accessing sales data using positive indexing
print("Sales on Sunday (index 0):", daily_sales[0])  # first element
print("Sales on Tuesday (index 2):", daily_sales[2])

# Accessing sales data using negative indexing
print("Sales on Saturday (index -1):", daily_sales[-1])  # last element
print("Sales on Thursday (index -3):", daily_sales[-3])

#### How Indexing Works
- **Positive Indexing**: Counts from the beginning of the list (`0` to `len(list) - 1`).
- **Negative Indexing**: Counts from the end of the list (`-1` to `-len(list)`).
 
> **Note**: Ensure your index is within `0` to `len(list) - 1` for positive indexing.

#### Slicing: Extracting Portions of Lists

Slicing is a powerful feature in Python that allows you to extract specific portions of a list. It provides a way to access a range of elements from a sequence (like a `list` or `string`) without modifying the original list.

The basic syntax for slicing is:

```python
sequence[start:stop]
```

- `start`: The index where the slice begins (inclusive). Defaults to 0 if omitted.
- `stop`: The index where the slice ends (exclusive). Defaults to the length of the sequence if omitted.

In [None]:
# Example: Using slices to access parts of a list

daily_sales = [200, 250, 300, 200, 400, 350, 450]

mid_week_sales = daily_sales[2:5]  # Slicing from index 2 to 4 (5-1)
print("Mid-week sales:", mid_week_sales)

In [None]:
early_week_sales = daily_sales[:3]  # Slicing from the beginning to index 2
print("First three days sales:", early_week_sales)

final_days_sales_total = sum(daily_sales[-2:])  # Slicing from the second last to the end
print("Sum of last two days sales:", final_days_sales_total)

#### Why Indexing and Slicing Matters
 Indexing helps us open the right locker to access its contents quickly. We use it to retrieve specific data points from a collection, such as:

- The sales on a particular day (e.g., Monday).
- The most recent data point in a dataset (e.g., the last recorded sale).

Slicing allows us to extract a subset of data from a collection. It's like selecting a range of lockers in a row to see what's inside. We use slicing to obtain:

- The sales figures for a range of days (e.g., Monday through Thursday).
- The most recent data points in a dataset (e.g., the last five recorded sales).


#### Key Features of Lists

1. **Ordered**: The items have a specific sequence.
2. **Indexed**: Access items using their position.
3. **Dynamic**: Add or remove items as needed.

In [None]:
# Example: Adding new data to a list
daily_sales = [200, 250, 300, 200, 400, 350, 450]

daily_sales.append(500)  # Append 500 to the end
print("Updated sales data:", daily_sales)

### Sets: Unique Collections of Data

A **set** is a collection of unique items. Think of it as a way to store distinct categories. Sets are useful for:

- Identifying unique products sold in a week.
- Checking for overlap between two groups of data.
- Removing duplicate entries from a list.
- Find the difference between two sets of data.

In [None]:
# Example: Creating a set using curly braces {}
customer_segments = {"Premium", "Standard", "Basic"}
print("Customer segments:", customer_segments)

In [None]:
# Example: Finding the difference between two sets

current_categories = {"Electronics", "Beauty", "Books", "Home", "Apparel"}
discontinued_categories = {"Books", "Games"}

active_categories = current_categories - discontinued_categories
print("Active categories:", active_categories)

inactive_categories = current_categories & discontinued_categories
print("Inactive categories:", inactive_categories)

#### Example: Products Sold in a Day

Imagine you want to find all unique products sold today from a list of transactions. Using a set ensures no duplicate product names appear, making your analysis cleaner and more accurate.

In [None]:
# Creating a set based on a list of values
products_sold = ["Laptop", "Tablet", "Smartphone", "Tablet", "Laptop", "Camera", "Camera"]
unique_products = set(products_sold)  # Converting list to set

print("All products sold:", products_sold)
print("Unique products sold:", unique_products)

In [None]:
# Finding the number of unique products
products_sold = ["Laptop", "Tablet", "Smartphone", "Tablet", "Laptop", "Camera", "Camera"]
unique_products = set(products_sold)
print("Number of unique products sold:", len(unique_products))

You can modify a set using the `add()` method to include new items or the `remove()` method to delete existing items.

In [None]:
unique_products.add("Smartwatch")  # Adding a new product
print("Updated unique products:", unique_products)
print("Number of unique products:", len(unique_products))

In [None]:
unique_products.add("Laptop")  # Adding a duplicate product
print("Updated unique products:", unique_products)
print("Number of unique products:", len(unique_products))

#### Key Features of Sets

1. **Unique**: Automatically removes duplicate items.
2. **Unordered**: Items have no specific sequence.
3. **Efficient**: Fast operations for checking membership or finding intersections.

#### Recap: Lists vs. Sets

1. **Lists**: Ordered, indexed, and allow duplicates.
   - Ideal for storing or tracking sequential data, such as daily sales figures over time.
   - Functions like `sum()`, `max()`, `min()`, and `len()` enable quick calculations of totals, averages, trends, and more for lists.
2. **Sets**: Unordered, ensure uniqueness.
   - Great for analytics tasks where uniqueness matters, such as identifying unique customers or products.
   - Use them to store distinct items or to remove duplicates from a large dataset.


Next, we'll explore another powerful container: **dictionaries**, which allow us to map keys to values for quick lookups.


### Dictionaries: Mapping Keys to Values

Imagine you're managing sales data for multiple products. 
- Using a list, you'd have to keep track of indices manually to know which value belongs to which product.
- Using a set, you'd lose the ability to link products with their specific sales figures.

**Lists** and **Sets** are useful containers for storing and organizing data.
But **dictionaries** allow us to:
1. **Map one value to another**
   - Storing sales by product (e.g., `"Laptop": 530`).
   - Mapping customer names to their purchase history.
2. **Quickly retrieve specific values by a key?**
   - Looking up the sales figure for `"Tablet"`.
   - Finding the details for a given store.
3. **Add or update data dynamically?**
   - Add sales for a new product or update the sales for `"Camera"`.
   - Store additional details for a customer.

Lists and sets aren't designed for this kind of task. Instead, we use a **dictionary**.

#### What Is a Dictionary?

A **dictionary** is a collection of key-value pairs, where:
- A **key** acts like a label (e.g., product name).
- A **value** is the data associated with the key (e.g., sales figure).


A dictionary is created using curly braces `{}`:
```python
dictionary_name = {key1: value1, key2: value2, ...}
```

Dictionaries in Python allow us to organize and retrieve data efficiently using **key-value** pairs.

#### Example: Total Product Sales

| Product (key) | Sales (value) |
|---------------|---------------|
| Laptop        | 530           |
| Tablet        | 400           |
| Smartphone    | 690           |
| Camera        | 530           |

In [None]:
# Total sales by product
total_sales_by_product = {
    "Laptop": 530,
    "Tablet": 400,
    "Smartphone": 690,
    "Camera": 530,
}

# Accessing values using the key inside square brackets
laptop_sales = total_sales_by_product["Laptop"]
print("Laptop sales:", laptop_sales)

#### Keys and Values in Dictionaries

We can access the keys and values of a dictionary using the `keys()` and `values()` methods, respectively. To retrieve both keys and values together, we use the `items()` method.

In [None]:
# Accessing keys
keys = total_sales_by_product.keys()
print("Keys:", keys)

# Accessing values
values = total_sales_by_product.values()
print("Values:", values)

# Accessing items
items = total_sales_by_product.items()
print("Items:", items)

In [None]:
# Daily sales by product
daily_sales_by_product = {
    "Laptop": [50, 60, 70, 40, 100, 90, 120],
    "Tablet": [30, 40, 50, 30, 80, 70, 100],
    "Smartphone": [70, 80, 100, 60, 120, 110, 150],
    "Camera": [50, 70, 80, 70, 100, 80, 80],
}

# Accessing values using the key inside square brackets
print("Sales for Laptop:", daily_sales_by_product["Laptop"])
print("Last day's Tablet sales:", daily_sales_by_product["Tablet"][-1])
print("Total Camera sales:", sum(daily_sales_by_product["Camera"]))

#### Key Features of Dictionaries
1. **Key-Value Pairing**:
   - Keys act as labels (e.g., `"Laptop"`), and values store the associated data (e.g., a list of recent sales).
2. **Fast Lookups**:
   - Access values directly using keys, which is faster than searching in a list.
3. **Dynamic**:
   - Easily add, update, or remove key-value pairs.

Let's explore these features in more detail.


In [None]:
# Adding a new product
daily_sales_by_product["Smartwatch"] = [40, 50, 60, 30, 70, 60, 80]
print("Updated sales data:", daily_sales_by_product)

In [None]:
# Updating sales for an existing product
daily_sales_by_product["Smartwatch"][-1] = 81
print("Smartwatch sales after update:", daily_sales_by_product["Smartwatch"])

#### Modifying a Dictionary

1. To **add** a new key-value pair, use the syntax:
   ```python
    dictionary_name[new_key] = new_value
   ```
2. To **update** an existing key's value, assign a new value to the key:
   ```python
    dictionary_name[existing_key] = new_value
   ```
3. To **remove** a key-value pair, use the `pop()` method:
   ```python
    dictionary_name.pop(key_to_remove)
   ```


> **Note**: Dictionary keys must be unique. If you assign a value to an existing key, it will update the existing value, not create a new key.

Dictionaries can contain other dictionaries as values. This is known as a *nested dictionary*.

In [None]:
# Nested dictionary: Store performance metrics
store_metrics = {
    "Edmonton": {
        "revenue": 89000.75,
        "customer_count": 900,
        "top_products": {"Camera", "Smartwatch", "Laptop"},
        "quarterly_sales": [500, 400, 650, 700],
    },
    "Calgary": {
        "revenue": 120000.50,
        "customer_count": 1200,
        "top_products": {"Laptop", "Smartphone", "Tablet"},
        "quarterly_sales": [600, 550, 750, 600],
    },
    "Red_Deer": {
        "revenue": 45000.25,
        "customer_count": 510,
        "top_products": {"Tablet", "Smartphone", "Laptop"},
        "quarterly_sales": [150, 200, 300, 350],
    },
}

# Accessing specific information
print("Calgary store revenue:", store_metrics["Calgary"]["revenue"])
print("Edmonton store sales in Q1:", store_metrics["Edmonton"]["quarterly_sales"][0])

#### Nested Dictionaries
- Organize complex business data (store profiles, sales metrics, inventory, customer information)
- Access specific data points easily, such as the sales for a particular store.
- Easy updates as new data comes in
- Efficient data analysis and reporting

#### Recap: Why Use Dictionaries?
1. Use **dictionaries** when working with natural key-value mapping for real-world data, such as store details.
2. Keys act as labels, making the data structure more intuitive and readable.
3. Direct access to values using unique keys makes the lookups more efficient.
4. Can mix different data types as values.
5. Nested data structures enables complex business models.

##### Comparing Lists, Sets, and Dictionaries

| Feature            | List                     | Set                  | Dictionary                         |
|--------------------|--------------------------|----------------------|------------------------------------|
| Order              | Ordered                  | Unordered            | Keys: Unordered, Values: Any       |
| Duplicates         | Allowed                  | Not Allowed          | Keys: Not Allowed, Values: Allowed |
| Access             | By index                 | By membership        | By key                             |
| Use Case           | Sequential data          | Unique items         | Key-value pairs                    |

---

## Making Decisions with Conditional Statements

In data analytics, we often need to make decisions based on the data. For example:

- Apply a discount if a customer's total spending exceeds $100.
- Categorize sales as "low," "medium," or "high" based on thresholds.
- Check for missing or invalid data in a dataset.

In Python, we use **conditional statements** to implement such decision-making.
Let's explore how Python's `if`, `elif`, and `else` statements help us make decisions and tailor our analyses to different data scenarios.

### Example: Identify A Sales Figure Below Average

In [None]:
daily_sales = [200, 250, 300, 200, 400, 350, 450]
average_sales = sum(daily_sales) / len(daily_sales)
print("Average daily sales:", average_sales)

In [None]:
# Lets investigate the first day's sales
sales = daily_sales[0]

if sales > average_sales:
    print("Selected sales were above average.")
elif sales < average_sales:
    print("Selected sales were below average.")
else:
    print("Selected sales were exactly at the average.")

### How `if/else` Statements Work


Conditional statements allow Python to execute specific blocks of code based on one or multiple conditions.
They follow this structure:

```python
if condition_1:
    # Code Block 1
elif condition_2:
    # Code Block 2
...
elif condition_n:
    # Code Block n
else:
    # The default code block
    # Code to execute if none of the conditions are True
```

Python uses Boolean logic to determine the result of the conditions in order. If a condition is `True`, the corresponding code block is executed. Otherwise, the next condition is checked. If none of the conditions are `True`, the `else` block is executed (if present).


```mermaid
flowchart TD
    A[Start] --> B{condition_1}
    B -- True --> C[Execute Code Block 1]
    B -- False --> D{condition_2}
    D -- True --> E[Execute Code Block 2]
    D -- False --> X{...}
    X --> F{condition_n}
    F -- True --> G[Execute Code Block n]
    F -- False --> H[Execute Code Block for else]
    C --> I[End]
    E --> I
    G --> I
    H --> I
```


#### What is a Condition?

A **condition** is any expression that evaluates to `True` or `False`. In Python, we use **comparison operators** to create conditions:

- Comparison Operators:
    - `>` (greater than), `<` (less than), `==` (equal to), `!=` (not equal to).
- Logical Operators:
    - `and`, `or`, `not` to combine multiple conditions.
- Membership Operators:
    - `in`, `not in` to check if a value exists in a collection.

### Example: Assigning Bonus Points to Store Members

Suppose that we have a nested dictionary containing information about our stores:

In [None]:
store_metrics = {
    "Edmonton": {
        "revenue": 89000.75,
        "customer_count": 900,
        "top_products": {"Camera", "Smartwatch", "Laptop"},
        "quarterly_sales": [500, 400, 650, 700],
    },
    "Calgary": {
        "revenue": 120000.50,
        "customer_count": 1200,
        "top_products": {"Laptop", "Smartphone", "Tablet"},
        "quarterly_sales": [600, 550, 750, 600],
    },
    "Red_Deer": {
        "revenue": 45000.25,
        "customer_count": 510,
        "top_products": {"Tablet", "Smartphone", "Laptop"},
        "quarterly_sales": [150, 200, 300, 350],
    },
}

Now, imagine that we want to assign bonus points to store teams based on their performance. We follow these rules:
- Assign a high bonus percentage if the store has reached a target monthly revenue ($100,000.00)
- Assign a small bonus percentage if the store's revenue is close to the target (within 15% range)
- No bonus is assigned if the store's revenue is far from the target

We can use conditional statements to assign bonus points based on the `"revenue"` metric:

In [None]:
target_revenue = 100_000.0

store = "Calgary"
revenue = store_metrics[store]["revenue"]  # Access the revenue for the selected store

if revenue >= target_revenue:
    bonus_percent = 0.15
    print("Monthly target achieved! Team bonus approved.")
elif revenue > 0.85 * target_revenue:
    bonus_percent = 0.05
    print("Close to target. Small bonus approved.")
else:
    bonus_percent = 0.0
    print("Target not met. Review needed.")

# Add bonus information to the store's data
store_metrics[store]["bonus"] = bonus_percent

print("Updated store data:", store_metrics[store])

Now, try to assign bonus points to another store by changing the store name!

#### Key Features of This Example
1. **Using a Dictionary**: Stores multiple attributes for a store.
2. **Conditional Statements**: Categorizes the stores based on their `"revenue"`.
3. **Personalized Messages**: Combines data from the dictionary to display meaningful outputs.
4. **Updating Store Data**: Easily modify the records to include a new attribute `"bonus"`.

In [None]:
store = "Red_Deer"

# Check if a bonus percentage is assigned to the store
if "bonus" not in store_metrics[store]:
    print("No bonus information available for store", store)
else:
    bonus_percent = store_metrics[store]["bonus"]
    print("Bonus percentage for store", store, ":", bonus_percent)

### 🧑‍💻 Exercise: Even More Bonuses!

A store is eligible if revenue is at least 90% of the target and customer count is 600 or more. Check if our stores are eligible for a special bonus.

In [None]:
store = "Edmonton"

# <YOUR CODE HERE>

revenue = store_metrics[store]["revenue"]
num_customers = store_metrics[store]["customer_count"]

condition_1 = revenue > 0.9 * target_revenue
condition_2 = num_customers >= 600

if condition_1 and condition_2:
    print("Store", store, "is eligible for a special bonus point!")
else:
    print("Store did not meet the criteria for a special bonus.")
    print("Revenue condition:", condition_1)
    print("Customer condition:", condition_2)

### Recap: Conditional Statements

Controlling the decision flow in our programs is a crucial skill in data analytics for implementing logic and decision-making.

1. **What Are Conditional Statements?**
   - Tools to execute specific code blocks based on whether a condition is `True` or `False`.

2. **How They Work:**
   - Python evaluates conditions in order (`if`, `elif`, `else`).
   - Stops as soon as a `True` condition is found.
   - Provides a fallback (`else`) for cases that don't match any condition.

3. **Why They Matter in Data Analytics:**
   - **Decision-Making**: Automatically categorize or process data based on criteria.
     - Example: Assigning bonuses to employees, Categorizing customers by their order amount.
   - **Dynamic Analysis**: Apply business logic like discounts or flags for anomalies.
     - Example: Detecting orders exceeding a certain threshold.
   - **Custom Logic**: Tailor workflows based on data characteristics.
     - Example: Handling missing or invalid data.


In the next section, we'll explore **loops**, which allow us to repeat tasks efficiently. This is especially useful for analyzing larger datasets or automating repetitive operations.

---

## Automating Repetitive Tasks with Loops

In data analytics, we often work with datasets containing thousands or even millions of records. Imagine needing to:
1. Analyze daily sales for an entire year.
2. Send personalized messages to hundreds of customers.
3. Apply bonuses to all employees based on their team performance.
3. Identify and remove duplicate entries in a dataset.

Doing these tasks manually would be inefficient and time-consuming. This is where **loops** come to the rescue! Loops empower us to automate repetitive tasks, making our workflows faster and more efficient.

### Types of Loops in Python:
1. **`while` Loop**: Perfect for repeating a task until a specific condition is met.
2. **`for` Loop**: Ideal for iterating over a sequence when you know the exact number of iterations.

Let's dive into how loops work and discover how they can make our data analysis tasks easier.

### `while` Loops: Repeating Tasks Until a Condition is Met

A `while` loop keeps repeating as long as a specified condition is `True`.
It's like saying, "Keep doing this until something changes."

```mermaid
flowchart TD
    Start --> Condition{Condition True?}
    Condition -- Yes --> Body[Execute Loop Body]
    Body --> Condition
    Condition -- No --> End[Exit Loop]
```

Let's see how a `while` loop works with an example.

In [None]:
# Example: Simulating stock level changes

inventory = 5
while inventory >= 0:
    print(inventory)
    inventory -= 1  # Equivalent to inventory = inventory - 1

#### Example: Calculating the Total Sales

In [None]:
daily_sales = [200, 250, 300, 200, 400, 350, 450]

print("Daily sales figures:")
index = 0
while index < len(daily_sales):
    sales = daily_sales[index]
    print("Day", index + 1, "sales:", sales)
    index += 1  # Go to the next day

##### How It Works

The goal of the `while` loop in this example is to iterate through the `daily_sales` list and print the sales figures for each day. Here's how it works:

1. **Initialization**  
   The loop begins with an `index` variable set to `0`, which tracks the current position in the `daily_sales` list. This index is used to access each day's sales data.

2. **Condition Evaluation**  
   The loop runs as long as the `index` is less than the total number of items in the list (`len(daily_sales)`). On each iteration, Python checks if the current `index` is still within the bounds of the list.

3. **Processing and Incrementing**  
   During each iteration, the sales for the current day are retrieved using the index. The day number and corresponding sales figure are printed, and the `index` is incremented by 1 to move to the next day.
   
4. **Termination**: When `index` equals `len(daily_sales)`, the condition `index < len(daily_sales)` evaluates to `False`, and the loop stops.

#### Example: Cumulative Sales

Let's use Python loops to determine the day on which cumulative sales exceed a certain threshold.

In [None]:
daily_sales = [200, 250, 300, 200, 400, 350, 450]

threshold = 1000  # Threshold for cumulative sales

index = 0  # Start from the first day
cumulative_sales = 0

# Using a while loop
while index < len(daily_sales) and cumulative_sales < threshold:
    cumulative_sales += daily_sales[index]  # Add the current day's sales to the cumulative total
    index += 1


print("Reached the threshold on day", index)

### `for` Loops: Iterating Through Collections

A `for` loop is perfect for iterating over collections like lists, dictionaries, or ranges.
It's like saying, "For each item in this collection, do something."

#### Example: Iterating through daily sales

In [None]:
daily_sales = [200, 250, 300, 200, 400, 350, 450]

print("Daily sales figures:")
for sales in daily_sales:
    print(sales)

##### How It Works

1. The `for` loop goes through each item in `daily_sales`, one by one.
2. The variable `sales` represents the current item in the list.
3. The `print(sales)` statement runs for each item, displaying the daily sales figure.

This is much easier than writing separate `print()` statements for each day!

#### Example: Calculating Total and Average Sales

In [None]:
total_sales = 0
num_of_days = 0
for sales in daily_sales:
    total_sales += sales  # equivalent to: total_sales = total_sales + sales
    num_of_days += 1

print("Total weekly sales:", total_sales)
print("Average daily sales:", total_sales / num_of_days)

#### 🧑‍💻 Exercise: Dealing with Outliers in Sales Data

In [None]:
daily_sales = [200, 250, 3000, 200, 400, 350, 0]  # Contains outliers
clean_sales = []  # An empty list to store cleaned data

common_max_sales = 1000
common_min_sales = 20

# Solution 1: Remove the outliers

for sales in daily_sales:
    if common_min_sales <= sales <= common_max_sales:  # Only add non-outliers
        clean_sales.append(sales)
    # No else block needed

num_outliers = len(daily_sales) - len(clean_sales)
print("Removed", num_outliers, "outlier(s)")
print("Cleaned sales data:", clean_sales)

In [None]:
# Solution 2: Replace the outliers with the average of the non-outliers

average_sales = sum(clean_sales) / len(clean_sales)

# <YOUR CODE HERE>

clean_sales = []

for sales in daily_sales:
    if common_min_sales <= sales <= common_max_sales:
        clean_sales.append(sales)
    else:
        clean_sales.append(average_sales)

print("Cleaned sales data:", clean_sales)

#### 🧑‍💻 Exercise (continued)

As an alternative solution, you'll replace any sales figures that are outside the range of `common_min_sales` and `common_max_sales` with the closest non-outlier value. This will help in cleaning the data and making it more reliable for analysis.

In [None]:
# Solution 3: Replace the outliers with the closest non-outlier

daily_sales = [200, 250, 3000, 200, 400, 350, 0]  # Contains outliers
non_outliers = []  # An empty list to store non-outliers
clean_sales = []  # An empty list to store cleaned data

common_max_sales = 1000
common_min_sales = 20

# Step 1: Identify the non-outliers

# <YOUR CODE HERE>

for sales in daily_sales:
    if common_min_sales <= sales <= common_max_sales:
        non_outliers.append(sales)

# Step 2: Replace the outliers with the closest non-outlier

# <YOUR CODE HERE>

for sales in daily_sales:
    if sales in non_outliers:
        clean_sales.append(sales)
    else:
        # Find the closest non-outlier
        closest_nonoutlier = non_outliers[0]  # Initialize with the first non-outlier
        min_difference = abs(sales - closest_nonoutlier)  # Tracks the minimum difference

        for non_outlier in non_outliers[1:]:
            if abs(sales - non_outlier) < min_difference:
                closest_nonoutlier = non_outlier
                min_difference = abs(sales - non_outlier)
        clean_sales.append(closest_nonoutlier)

print("Cleaned sales data:", clean_sales)

##### How It Works

This code cleans the `daily_sales` data by replacing outliers with the closest non-outlier. Here's how it works:

1. **Step 1 - Identify Non-Outliers**:
   The first loop iterates through `daily_sales` to collect all values within the acceptable range and stores them in the `non_outliers` list.  

2. **Step 2 - Replace Outliers**:
   The second loop checks each value in `daily_sales`:
   - If the value is a non-outlier, it is added directly to `clean_sales`.
   - If it is an outlier, the loop finds the closest non-outlier by comparing the difference between the outlier and each value in the `non_outliers` list. The closest value is then added to `clean_sales`.

3. **Output**:
   Finally, the `clean_sales` list contains the cleaned data where all outliers are replaced with the nearest valid sales values.  

#### Example: Categorizing Stores

We can use loops to iterate through dictionaries, making it easy to analyze or update store information.

In [None]:
store_metrics = {
    "Edmonton": {
        "revenue": 89000.75,
        "customer_count": 900,
        "top_products": {"Camera", "Smartwatch", "Laptop"},
        "quarterly_sales": [500, 400, 650, 700],
    },
    "Calgary": {
        "revenue": 120000.50,
        "customer_count": 1200,
        "top_products": {"Laptop", "Smartphone", "Tablet"},
        "quarterly_sales": [600, 550, 750, 600],
    },
    "Red_Deer": {
        "revenue": 45000.25,
        "customer_count": 510,
        "top_products": {"Tablet", "Smartphone", "Laptop"},
        "quarterly_sales": [150, 200, 300, 350],
    },
}

# Iterate over the dictionary using the keys
for store in store_metrics:
    revenue = store_metrics[store]["revenue"]
    if revenue >= target_revenue:
        bonus_percent = 0.15
    elif revenue > 0.85 * target_revenue:
        bonus_percent = 0.05
    else:
        bonus_percent = 0.0

    # Add bonus information to the store's data
    store_metrics[store]["bonus"] = bonus_percent

print("Updated store data:")
print(store_metrics)

#### 🧑‍💻 Exercise: Finding the Store with the Highest Total Sales

In [None]:
highest_sales = 0
top_store = None  # Not identified yet

# <YOUR CODE HERE>

# Iterate over the dictionary using the key-value pairs (items)
for store, metrics in store_metrics.items():
    total_sales = sum(metrics["quarterly_sales"])
    if total_sales > highest_sales:
        highest_sales = total_sales
        top_store = store  # Update the top store

print("Store with the highest sales:", top_store)

##### How It Works

The goal of the `for` loop in this example is to identify the top store from the `store_metrics` dictionary based on its quarterly sales. Here's how it works:

1. **Iterating Through the Dictionary**
    - The `for` loop iterates through each key-value pair in the `store_metrics` dictionary. The `items()` method allows the loop to process all store data systematically.
    - `store`: Represents the unique key (name) for each store (e.g., `"Edmonton"`).
    - `metrics`: Represents the associated nested dictionary containing store details like revenue, quarterly sales, etc.
2. **Evaluating the Condition**
    - Inside the loop, the program uses an `if` statement to compare the `sum()` of `quarterly_sales` of the current store with the current value of `highest_sales`.
    > **Note**: The initial value of `highest_sales` is `0`, so the first store's total sales always satisfies the condition, and it becomes the top store so far.
3. **Updating the Top Store**
    - If the current store sells more than the previous top store, the program updates the `highest_sales` and `top_store` variables.
    - This ensures that by the end of the loop, the variables hold the details of top store.
4. **After the Loop**
    - After iterating through all stores, `top_store` contains the key of the top store.

This example demonstrates how for loops and conditional logic work together to analyze datasets and extract meaningful insights.

#### Example: Finding The Top Product Sold in All Stores

In [None]:
all_top_products = []  # An empty dictionary to store product sales

# Iterate over each store's metrics only (i.e., the dictionary values)
for metrics in store_metrics.values():
    top_product = metrics["top_products"]
    all_top_products.append(top_product)

print("Top products in all stores:", all_top_products)

In [None]:
common_products = all_top_products[0]  # Initialize with the first store's data
for products in all_top_products[1:]:  # Iterate over the remaining stores
    common_products = common_products & products  # Find the intersection

print("The top product(s) sold in all stores:", common_products)

#### Example: Finding Cumulative Sales

In [None]:
daily_sales = [200, 250, 300, 200, 400, 350, 450]
num_of_days = len(daily_sales)

cumulative_sales = 0

for index in range(0, num_of_days):
    cumulative_sales += daily_sales[index]
    if cumulative_sales >= threshold:
        print("Reached the threshold on day", index + 1)
        break  # Exit the loop

##### How It Works

This example calculates cumulative sales over several days and checks when the total reaches a specified threshold. Here's a breakdown of how the `for` loop and its components work:

1. **Iterating with `range()`**  
   - The `range(start, stop)` function generates a sequence of numbers starting at `start` (inclusive) and ending before `stop` (exclusive). If `start` is omitted, it defaults to `0`.
   - In this loop, `index` takes on each value from `0` to `num_of_days - 1`, representing the position (index) of the current day in the `daily_sales` list.
   - The `range()` function is particularly useful when you need a series of numbers to control iterations, especially when working with indexed data like lists.

2. **Exiting the Loop with `break`**  
   - The `break` statement is used to exit the loop immediately once a certain condition is met.  
   - This prevents unnecessary iterations after achieving the goal, making the program more efficient. Without the `break`, the loop would continue until all days are processed, even though the threshold had already been reached.
   - Used here to stop the loop as soon as the `cumulative_sales` reaches or passes the `threshold`.

This combination of `range()` and `break` demonstrates how to efficiently process indexed data and terminate iterations based on dynamic conditions.

#### Example: Separating Workday and Weekend Sales

In [None]:
daily_sales = [200, 250, 300, 400, 350, 450, 500]
workday_sales = 0
weekend_sales = 0

for index in range(len(daily_sales)):  # Equivalent to range(0, len(daily_sales))
    if index in range(1, 6):  # Workdays
        workday_sales += daily_sales[index]
    else:
        weekend_sales += daily_sales[index]

print("Total workday sales:", workday_sales)
print("Total weekend sales:", weekend_sales)

#### 🧑‍💻 Exercise: Identify the product with the highest total sales.

In [None]:
# Daily sales by product
daily_sales_by_product = {
    "Laptop": [50, 60, 70, 40, 100, 90, 120],
    "Tablet": [30, 40, 50, 30, 80, 70, 100],
    "Smartphone": [70, 80, 100, 60, 120, 110, 150],
    "Camera": [50, 70, 80, 70, 100, 80, 80],
}

# Step 1: Create a dictionary to store the total sales for each product

# <YOUR CODE HERE>

total_sales_by_product = {}
for product, sales in daily_sales_by_product.items():
    total_sales_by_product[product] = sum(sales)

print("Total sales by product:", total_sales_by_product)

# Step 2: Identify the product with the highest sales

# <YOUR CODE HERE>

max_total_sales = max(total_sales_by_product.values())
for product, sales in total_sales_by_product.items():
    if sales == max_total_sales:
        top_product = product
        break

print("Product with the highest sales:", top_product)

#### Example: Finding Average Growth Rate

In [None]:
store_metrics = {
    "Edmonton": {
        "revenue": 89000.75,
        "customer_count": 900,
        "top_products": {"Camera", "Smartwatch", "Laptop"},
        "quarterly_sales": [500, 400, 650, 700],
    },
    "Calgary": {
        "revenue": 120000.50,
        "customer_count": 1200,
        "top_products": {"Laptop", "Smartphone", "Tablet"},
        "quarterly_sales": [600, 550, 750, 600],
    },
    "Red_Deer": {
        "revenue": 45000.25,
        "customer_count": 510,
        "top_products": {"Tablet", "Smartphone", "Laptop"},
        "quarterly_sales": [150, 200, 300, 350],
    },
}

print("Average quarterly sales growth rates (%):")
for store, metrics in store_metrics.items():
    quarterly_sales = metrics["quarterly_sales"]
    rates = []
    for i in range(1, len(quarterly_sales)):
        growth_rate = (quarterly_sales[i] - quarterly_sales[i - 1]) / quarterly_sales[i - 1]
        rates.append(growth_rate)

    average_rate = sum(rates) / len(rates)
    print(store, ":", average_rate * 100)

### Recap: Why Loops Matter in Data Analytics:

In this section, we explored two essential loop structures in Python:

1. **`for` Loops**  
   - Used to iterate over a sequence of items (like lists, dictionaries, or ranges).  
   - Ideal when you know the number of iterations in advance.  
   - Example Use Case: Calculating total sales from a list of daily sales.  

2. **`while` Loops**  
   - Repeats a block of code as long as a condition is `True`.  
   - Useful for tasks where the number of iterations depends on a dynamic condition.  
   - Example Use Case: Processing data until a threshold is met.

Here is a summary of key differences between `for` and `while` loops:

| Feature             | `for` Loop                                | `while` Loop                                |
|---------------------|-------------------------------------------|--------------------------------------------|
| Use Case            | Iterating over a sequence (e.g., `list`, `range`)    | Repeating tasks until a condition is met   |
| Control Flow        | Sequential, controlled by the number of iterations     | Conditional, controlled by logical evaluation                 |
| Syntax Simplicity   | Easier for fixed iterations               | Better for dynamic stopping conditions     |


Loops are fundamental tools for automating repetitive tasks and processing datasets in data analytics. With loops, you can iterate through lists, process dictionaries, and dynamically adjust workflows, allowing you to:

- **Automate Repetitive Tasks**: Efficiently analyze, categorize, and process large datasets.
- **Dynamic Workflows**: Adjust or terminate loops based on real-time data.
- **Scalability**: Handle operations for thousands or even millions of data points with minimal effort.

---

## Conclusion

Congratulations on completing Part I of the Python for Data Analytics workshop! 🎉  

### What We Covered Today
1. **Programming Basics**: 
   - Variables, data types, and basic operations.
2. **Data Structures**:
   - Lists, sets, and dictionaries for organizing and accessing data efficiently.
3. **Flow Control**:
   - Conditional statements for decision-making.
   - Loops for automating repetitive tasks.

These foundational concepts are the building blocks of Python, setting the stage for more advanced analytics tasks.


### Looking Ahead to Part II

In the next session, we'll dive deeper into Python's capabilities, focusing on:
1. **Functions**: Writing reusable and modular code.
2. **Data Processing with Python Libraries**: Working with tools like `pandas` for cleaning and transforming datasets.
3. **Data Visualization**: Creating impactful charts and graphs to uncover trends and tell stories with your data.

We look forward to seeing you in Part II! Until then, keep practicing and exploring! 🚀