# Lecture 4 



## 1) Recap and more


In [3]:
# Loop → filter (>100) + transform (add 5% tax)
sales = [120, 80, 150, 95, 220, 60, 175]

# Let's keep only sales over $100 and add a 5% tax
sales_with_tax = []
for sale in sales:
    if sale > 100:  # FILTER: Only sales above $100
        taxed_sale = sale * 1.05  # TRANSFORM: Add 5% tax
        sales_with_tax.append(round(taxed_sale, 2))

print("Loop version:", sales_with_tax)

Loop version: [126.0, 157.5, 231.0, 183.75]


In [1]:
# Basic dictionary access
person = {
    "name": "Alice",
    "age": 30,
    "city": "New York"
}

# Accessing values by key
print(person["name"])  # Output: Alice
print(person["age"])   # Output: 30

#  Using .get() method (safe access)
print(person.get("city"))        # Output: New York
print(person.get("country"))     # Output: None (doesn't exist)

#  Iterating over dictionary keys and values
for key in person:
    print(f"{key}: {person[key]}")

# Or using .items() for key-value pairs
for key, value in person.items():
    print(f"{key}: {value}")

# Adding and updating dictionary entries
person["email"] = "alice@example.com"
person["age"] = 31  # Update age

print(person)

# Example 5: Removing dictionary entries
del person["city"]
removed_value = person.pop("email", "not found")  # Remove email safely
print(person)

Alice
30
New York
None
name: Alice
age: 30
city: New York
name: Alice
age: 30
city: New York
{'name': 'Alice', 'age': 31, 'city': 'New York', 'email': 'alice@example.com'}
{'name': 'Alice', 'age': 31}


## 2) List Comprehensions — Compact Loop Patterns
Turn the previous loop into a single line (pythonic).

In [21]:
## [ <expression> for <item> in <list> ]

#<expression>: What you want to do to each item.
#<item>: Each element from your list.
#<list>: The original list.

### Loop vs. List Comprehension — Side-by-Side Comparison

| **Operation Type** | **Loop Version** | **List Comprehension Version** |
|---------------------|------------------|--------------------------------|
| **Transform (apply a function or formula)** | `new = []`<br>`for x in xs:`<br>&nbsp;&nbsp;&nbsp;&nbsp;`new.append(f(x))` | `[f(x) for x in xs]` |
| **Filter (keep items meeting a condition)** | `new = []`<br>`for x in xs:`<br>&nbsp;&nbsp;&nbsp;&nbsp;`if cond(x):`<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`new.append(x)` | `[x for x in xs if cond(x)]` |
| **Filter + Transform (apply only to some items)** | `new = []`<br>`for x in xs:`<br>&nbsp;&nbsp;&nbsp;&nbsp;`if cond(x):`<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`new.append(f(x))` | `[f(x) for x in xs if cond(x)]` |
| **Conditional Expression (if–else inline)** | `new = []`<br>`for x in xs:`<br>&nbsp;&nbsp;&nbsp;&nbsp;`if cond(x):`<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`new.append(A)`<br>&nbsp;&nbsp;&nbsp;&nbsp;`else:`<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`new.append(B)` | `[A if cond(x) else B for x in xs]` |



In [4]:
# Transform 
transformed_sales_comp = [round(sale * 1.05, 1) for sale in sales]
print("Transform only (comprehension):", transformed_sales_comp)





Transform only (comprehension): [126.0, 84.0, 157.5, 99.8, 231.0, 63.0, 183.8]


In [5]:
# Filter + transform
sales = [120, 80, 150, 95, 220, 60, 175]
sales_with_tax_comp = [round(sale * 1.05, 1) for sale in sales if sale > 100]
print("Filter + transform (comp):", sales_with_tax_comp)

Filter + transform (comp): [126.0, 157.5, 231.0, 183.8]


In [5]:
# Conditional inside comprehension
labels = ["High" if sale > 100 else "Low" for sale in sales]
print("Labels:", labels)

Labels: ['High', 'Low', 'High', 'Low', 'High', 'Low', 'High']


## 3) String Manipulation (Cleaning Text Data)
Text cleaning is everywhere in analytics. Focus on: `.strip()`, `.lower()`, `.replace()`, `.split()`, `' '.join()`.

| Method           | Purpose                            | Example                  | Result            |
| ---------------- | ---------------------------------- | ------------------------ | ----------------- |
| `.strip()`       | Remove spaces from both ends       | `"  hello ".strip()`     | `"hello"`         |
| `.lower()`       | Convert to lowercase               | `"HeLLo".lower()`        | `"hello"`         |
| `.replace(a, b)` | Replace one substring with another | `"A_B".replace("_"," ")` | `"A B"`           |
| `.split()`       | Split into list of words           | `"A B C".split()`        | `['A', 'B', 'C']` |
| `.join(list)`    | Combine list back into string      | `" ".join(['A','B'])`    | `"A B"`           |


In [6]:
### STRING MANIPULATION (CLEANING AND REFORMATTING BUSINESS TEXT DATA)


# messy email data

emails = [" Alice@Company.com ", "bob@business.org", "  CHARLIE@SHOP.net"]


# - Extra spaces before or after
# - Inconsistent capitalization (some uppercase, some lowercase)

print("Raw emails:", emails)

# Cleaning step-by-step: using .strip() and .lower()


# .strip() → removes leading and trailing spaces
# .lower() → converts entire string to lowercase (for uniformity)

# Let’s first test this for one email 
example_email = "  CHARLIE@SHOP.net  "
print("Before cleaning:", example_email)
print("After strip:", example_email.strip())
print("After strip + lower:", example_email.strip().lower())






Raw emails: [' Alice@Company.com ', 'bob@business.org', '  CHARLIE@SHOP.net']
Before cleaning:   CHARLIE@SHOP.net  
After strip: CHARLIE@SHOP.net
After strip + lower: charlie@shop.net


In [7]:
# Apply to all emails using a list comprehension


# Clean all emails
cleaned_emails = [email.strip().lower() for email in emails]
print("Cleaned emails:", cleaned_emails)

Cleaned emails: ['alice@company.com', 'bob@business.org', 'charlie@shop.net']


In [8]:

# using replace, split, and join

# Cleaning a company string
raw_string = " ACME_CORP_2025 "
print("Raw company string:", repr(raw_string))

clean = raw_string.strip()
print("After strip:", repr(clean))

clean = clean.replace("_", " ")
print("After replace:", repr(clean))

parts = clean.split()
print("After split:", parts)

# Remove the year
company_no_year = parts[:-1]
print("Without year:", company_no_year)

final_name = " ".join(company_no_year)
print("Clean company name:", final_name)



Raw company string: ' ACME_CORP_2025 '
After strip: 'ACME_CORP_2025'
After replace: 'ACME CORP 2025'
After split: ['ACME', 'CORP', '2025']
Without year: ['ACME', 'CORP']
Clean company name: ACME CORP


## 4) Iterating Nested Business Data 
Real data often comes as **list of dictionaries** — each dict is one record.

In [9]:
# Dictionary iteration
product_prices = {"Shoes": 79.5, "Bags": 120.0, "Watches": 249.0}
for product, price in product_prices.items():
    print(f"{product}: ${price:.2f}")

# List of dicts → per-row totals
transactions = [
    {"sku":"A101", "units":5,  "price":20},
    {"sku":"B202", "units":2,  "price":120},
    {"sku":"C303", "units":8,  "price":15},
    {"sku":"D404", "units":3,  "price":40}
]

row_totals = []
for transaction in transactions:
    total = transaction["units"] * transaction["price"]
    row_totals.append(total)
print("Row totals:", row_totals)

# Filter rows by business rule
kept_transactions = [transaction for transaction in transactions if (transaction['units'] * transaction['price']) >= 100]
print("Kept rows (total>=100):", kept_transactions)

Shoes: $79.50
Bags: $120.00
Watches: $249.00
Row totals: [100, 240, 120, 120]
Kept rows (total>=100): [{'sku': 'A101', 'units': 5, 'price': 20}, {'sku': 'B202', 'units': 2, 'price': 120}, {'sku': 'C303', 'units': 8, 'price': 15}, {'sku': 'D404', 'units': 3, 'price': 40}]


## 5) From Records to DataFrames 
A `list of dicts` maps naturally to a DataFrame (each dict → row). We won’t go deep here, just show the idea.

In [None]:
# Uncomment when pandas is available in your environment
# import pandas as pd
# df = pd.DataFrame(transactions)
# df

## 6) Functions

In [17]:

# repetitive calculation inline (filter + transform pattern)
transactions = [
    {"sku":"A101", "units":5,  "price":20},
    {"sku":"B202", "units":2,  "price":120},
    {"sku":"C303", "units":8,  "price":15},
    {"sku":"D404", "units":3,  "price":40}
]

discount = 0.10
tax = 0.08
final_revenues = []
for transaction in transactions:
    raw_total = transaction["units"] * transaction["price"]
    if raw_total >= 120:
        final = raw_total * (1 - discount) * (1 + tax)
        final_revenues.append(round(final, 2))
print("Finals (before):", final_revenues)

Finals (before): [233.28, 116.64, 116.64]


### A function is a reusable block of code that performs a specific task.

### Why this function?
- **Modularity**: the business rule (filter) + transform live in one place.
- **Reusability**: call it anywhere (loops, list comprehensions, future DataFrame `.apply`).
- **Maintainability**: if discount changes to 12%, update **one** function.

In [13]:
def add(a, b):                       # def starts the function definition.return gives back the result.
    result = a + b
    return result

# Using (calling) the function:
sum1 = add(2, 3)
sum2 = add(10, 5)
print(sum1, sum2)  # Output: 5 15

5 15


In [14]:
# you can set default values (so users don’t have to specify every argument):

def final_price(price, discount=0.1, tax=0.08):
    return price * (1 - discount) * (1 + tax)

print(final_price(100))         # uses default discount and tax
print(final_price(100, 0.2))    # uses default tax, but 20% discount

97.2
86.4


**Now refactor into a function** : easier to reuse, test, and change once.

In [18]:
# After (using a function):
def calculate_final_revenue(units, price, discount=0.10, tax=0.08):
    raw_total = units * price
    if raw_total < 120:
        return None
    return round(raw_total * (1 - discount) * (1 + tax), 2)

final_revenues_func = []
for transaction in transactions:
    result = calculate_final_revenue(transaction["units"], transaction["price"])
    if result is not None:
        final_revenues_func.append(result)
print("Finals (after):", final_revenues_func)

Finals (after): [233.28, 116.64, 116.64]


In [19]:
# List comprehension with function:
finals_comp = [calculate_final_revenue(t["units"], t["price"]) for t in transactions]
finals_comp = [x for x in finals_comp if x is not None]
print("Finals (comp + function):", finals_comp)

Finals (comp + function): [233.28, 116.64, 116.64]
