# TechFlow Python Foundations - Module 0.2
## Strings and Text Operations

**Your Role:** Future Data Analyst at TechFlow (B2B SaaS Company)

**Your Mission:** Master text manipulation - essential for data cleaning.

**Why this matters:**
- 80% of real-world data contains text that needs cleaning
- Customer names, emails, industries - all strings
- Inconsistent text ("Healthcare" vs "HEALTHCARE" vs " healthcare ") breaks analysis
- String skills = data cleaning skills

**This module covers:**
- Creating and combining strings
- Accessing parts of strings (indexing & slicing)
- String methods for cleaning and transformation
- Finding and replacing text
- Splitting and joining strings

**Time to complete:** ~45 minutes

---

# PART 1: Creating Strings

Strings are sequences of characters - letters, numbers, symbols, spaces.

In Python, strings are surrounded by quotes (single or double).

**Single vs double quotes**

Both work identically. Choose one style and be consistent.

```python
company1 = 'TechFlow'
company2 = "TechFlow"
print(company1)
print(company2)
print(company1 == company2)  # They're equal!
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**When to use which quote**

Use double quotes when your text contains an apostrophe.
Use single quotes when your text contains double quotes.

```python
# Apostrophe in text - use double quotes
message1 = "It's a great product"
print(message1)

# Quote in text - use single quotes
message2 = 'He said "Hello"'
print(message2)
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Escape characters**

Use backslash `\` for special characters:
- `\'` - single quote
- `\"` - double quote
- `\n` - new line
- `\t` - tab
- `\\` - backslash itself

```python
# Escape quote inside same quote type
message = 'It\'s working!'
print(message)

# New line
two_lines = "Line 1\nLine 2"
print(two_lines)

# Tab
tabbed = "Name:\tTechFlow"
print(tabbed)
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Multi-line strings**

Use triple quotes for text that spans multiple lines.

```python
description = """TechFlow is a B2B SaaS company.
We help businesses manage their data.
Our customers span multiple industries."""

print(description)
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Empty string**

An empty string has zero characters. It's still a valid string!

```python
empty = ""
print(f"Content: '{empty}'")
print(f"Length: {len(empty)}")
print(f"Is empty string: {empty == ''}")
print(f"Boolean value: {bool(empty)}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Raw strings**

Prefix with `r` to treat backslashes literally (useful for file paths).

```python
# Normal string - \n becomes newline
path1 = "C:\new_folder\names.csv"
print(f"Normal: {path1}")

# Raw string - \n stays as \n
path2 = r"C:\new_folder\names.csv"
print(f"Raw: {path2}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 2: String Operations

Basic operations: length, concatenation, repetition.

**String length**

`len()` returns the number of characters (including spaces!).

```python
company = "TechFlow"
print(f"'{company}' has {len(company)} characters")

with_spaces = "Tech Flow Inc"
print(f"'{with_spaces}' has {len(with_spaces)} characters")

empty = ""
print(f"Empty string has {len(empty)} characters")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Concatenation (joining strings)**

Use `+` to join strings together.

```python
first = "Tech"
second = "Flow"

# Simple concatenation
combined = first + second
print(combined)

# With space
full_name = first + " " + second + " Inc"
print(full_name)
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Concatenation with variables**

```python
first_name = "John"
last_name = "Doe"
title = "Manager"

# Building a full name
full_name = first_name + " " + last_name
print(full_name)

# Building a display string
display = full_name + " (" + title + ")"
print(display)
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**String repetition**

Use `*` to repeat a string.

```python
# Create a divider line
line = "-" * 40
print(line)
print("CUSTOMER REPORT")
print(line)

# Create a pattern
pattern = "=-" * 20
print(pattern)
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**String membership (in operator)**

Check if one string exists inside another.

```python
email = "analyst@techflow.com"

print(f"Contains '@': {'@' in email}")
print(f"Contains 'techflow': {'techflow' in email}")
print(f"Contains 'gmail': {'gmail' in email}")

# Case sensitive!
print(f"Contains 'TechFlow': {'TechFlow' in email}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 3: Indexing and Slicing

Access individual characters or portions of a string.

**Key concept:** Python uses zero-based indexing. The first character is at position 0.

```
String:   T  e  c  h  F  l  o  w
Index:    0  1  2  3  4  5  6  7
Negative: -8 -7 -6 -5 -4 -3 -2 -1
```

**Access single character by index**

```python
company = "TechFlow"

print(f"First character: {company[0]}")
print(f"Second character: {company[1]}")
print(f"Fifth character: {company[4]}")
print(f"Last character: {company[7]}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Negative indexing (count from end)**

Use negative numbers to count from the end. -1 is the last character.

```python
company = "TechFlow"

print(f"Last character: {company[-1]}")
print(f"Second to last: {company[-2]}")
print(f"Third to last: {company[-3]}")
print(f"First (from end): {company[-8]}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Slicing: Get a range of characters**

Syntax: `string[start:end]`

**Important:** The end index is NOT included!

```python
company = "TechFlow"

# Get characters 0,1,2,3 (not 4)
print(f"[0:4] = {company[0:4]}")

# Get characters 4,5,6,7
print(f"[4:8] = {company[4:8]}")

# Get characters 2,3,4
print(f"[2:5] = {company[2:5]}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Slice shortcuts**

Omit start or end for convenience.

```python
company = "TechFlow"

# From start to index 4
print(f"[:4] = {company[:4]}")

# From index 4 to end
print(f"[4:] = {company[4:]}")

# Copy entire string
print(f"[:] = {company[:]}")

# Last 4 characters
print(f"[-4:] = {company[-4:]}")

# All except last 4
print(f"[:-4] = {company[:-4]}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Slice with step**

Syntax: `string[start:end:step]`

```python
alphabet = "ABCDEFGHIJ"

# Every character (step=1, default)
print(f"[::1] = {alphabet[::1]}")

# Every 2nd character
print(f"[::2] = {alphabet[::2]}")

# Every 3rd character
print(f"[::3] = {alphabet[::3]}")

# Reverse the string!
print(f"[::-1] = {alphabet[::-1]}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Practical slicing examples**

```python
# Extract parts of a date
date = "2024-01-15"
year = date[:4]
month = date[5:7]
day = date[8:]
print(f"Year: {year}, Month: {month}, Day: {day}")

# Extract domain from email
email = "analyst@techflow.com"
at_position = email.find("@")
domain = email[at_position + 1:]
print(f"Domain: {domain}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 4: String Methods - Case Conversion

**Methods** are functions that belong to strings. Call them with `.method()`

**Important:** String methods return NEW strings. The original is unchanged!

**Case conversion methods**

```python
text = "TechFlow Analytics"

print(f"Original: {text}")
print(f"upper(): {text.upper()}")
print(f"lower(): {text.lower()}")
print(f"title(): {text.title()}")
print(f"capitalize(): {text.capitalize()}")
print(f"swapcase(): {text.swapcase()}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Original string unchanged!**

Methods don't modify the original - they return a new string.

```python
original = "TechFlow"
result = original.upper()

print(f"Original: {original}")  # Still "TechFlow"
print(f"Result: {result}")      # "TECHFLOW"

# To update, reassign:
original = original.upper()
print(f"After reassign: {original}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Why case matters in data**

In real data, the same value appears in different cases. This breaks analysis!

```python
# These look the same to humans but NOT to Python
industry1 = "Healthcare"
industry2 = "healthcare"
industry3 = "HEALTHCARE"

print(f"Healthcare == healthcare: {industry1 == industry2}")
print(f"Healthcare == HEALTHCARE: {industry1 == industry3}")

# Solution: normalize before comparing
print(f"\nAfter .lower():")
print(f"{industry1.lower()} == {industry2.lower()}: {industry1.lower() == industry2.lower()}")
print(f"{industry1.lower()} == {industry3.lower()}: {industry1.lower() == industry3.lower()}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 5: String Methods - Whitespace

**Whitespace** = spaces, tabs, newlines. Often invisible but causes problems!

**Remove whitespace**

```python
messy = "   TechFlow   "

print(f"Original: '{messy}'")
print(f"strip(): '{messy.strip()}'")
print(f"lstrip(): '{messy.lstrip()}'")
print(f"rstrip(): '{messy.rstrip()}'")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Strip other characters**

Pass characters to remove as argument.

```python
# Remove specific characters
data = "###TechFlow###"
print(f"Original: {data}")
print(f"strip('#'): {data.strip('#')}")

# Remove multiple characters
data2 = "...TechFlow..."
print(f"strip('.'): {data2.strip('.')}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Real-world whitespace problem**

```python
# Data imported from a file often has hidden whitespace
company1 = "TechFlow"
company2 = "TechFlow "
company3 = " TechFlow"

print(f"'{company1}' == '{company2}': {company1 == company2}")
print(f"'{company1}' == '{company3}': {company1 == company3}")

# Solution: always strip when comparing
print(f"\nAfter strip():")
print(f"{company1.strip() == company2.strip()}")
print(f"{company1.strip() == company3.strip()}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 6: String Methods - Find and Search

**Find position of substring**

```python
text = "TechFlow is a great company"

# find() returns position of first match, or -1 if not found
print(f"Position of 'great': {text.find('great')}")
print(f"Position of 'Flow': {text.find('Flow')}")
print(f"Position of 'bad': {text.find('bad')}")

# index() is similar but raises error if not found
print(f"index('great'): {text.index('great')}")
# text.index('bad')  # Would raise ValueError!
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Count occurrences**

```python
text = "TechFlow helps customers. TechFlow grows. TechFlow wins."

print(f"'TechFlow' appears {text.count('TechFlow')} times")
print(f"'.' appears {text.count('.')} times")
print(f"'e' appears {text.count('e')} times")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Check start and end**

```python
filename = "customer_data.csv"
email = "analyst@techflow.com"

print(f"Starts with 'customer': {filename.startswith('customer')}")
print(f"Ends with '.csv': {filename.endswith('.csv')}")
print(f"Ends with '.xlsx': {filename.endswith('.xlsx')}")

print(f"\nIs TechFlow email: {email.endswith('@techflow.com')}")
print(f"Is Gmail: {email.endswith('@gmail.com')}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 7: String Methods - Replace

**Replace substring**

```python
text = "TechFlow is great. TechFlow is growing."

# Replace all occurrences
new_text = text.replace("TechFlow", "Our Company")
print(f"Original: {text}")
print(f"Replaced: {new_text}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Replace with count limit**

```python
text = "apple, apple, apple, apple"

# Replace only first 2 occurrences
print(f"Original: {text}")
print(f"Replace all: {text.replace('apple', 'orange')}")
print(f"Replace first 2: {text.replace('apple', 'orange', 2)}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Remove by replacing with empty string**

```python
phone = "(555) 123-4567"

# Remove special characters
clean = phone.replace("(", "").replace(")", "").replace(" ", "").replace("-", "")
print(f"Original: {phone}")
print(f"Clean: {clean}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 8: String Methods - Check Content

**Content checking methods**

All return True or False.

```python
# Numbers only
print(f"'12345'.isdigit(): {'12345'.isdigit()}")
print(f"'123.45'.isdigit(): {'123.45'.isdigit()}")  # False! Has dot

# Letters only
print(f"'Hello'.isalpha(): {'Hello'.isalpha()}")
print(f"'Hello World'.isalpha(): {'Hello World'.isalpha()}")  # False! Has space

# Letters and numbers
print(f"'Hello123'.isalnum(): {'Hello123'.isalnum()}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**More content checks**

```python
# Case checks
print(f"'HELLO'.isupper(): {'HELLO'.isupper()}")
print(f"'hello'.islower(): {'hello'.islower()}")
print(f"'Hello World'.istitle(): {'Hello World'.istitle()}")

# Space check
print(f"'   '.isspace(): {'   '.isspace()}")
print(f"'  x  '.isspace(): {'  x  '.isspace()}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 9: Split and Join

Converting between strings and lists - essential for parsing data!

**Split a string into a list**

```python
# Split by comma
data = "Healthcare,Technology,Finance,Retail"
industries = data.split(",")
print(f"Original: {data}")
print(f"Split: {industries}")
print(f"Type: {type(industries)}")
print(f"Count: {len(industries)}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Split by whitespace (default)**

```python
sentence = "TechFlow is growing fast"
words = sentence.split()  # No argument = split by whitespace
print(f"Words: {words}")

# Works with multiple spaces too
messy = "word1    word2   word3"
clean_words = messy.split()
print(f"Clean split: {clean_words}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Split with limit**

```python
data = "John,Doe,Manager,Sales,NY"

# Split into maximum 3 parts
parts = data.split(",", 2)
print(parts)

# Useful for "first, rest" pattern
first_name, rest = data.split(",", 1)
print(f"First: {first_name}")
print(f"Rest: {rest}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Split lines**

```python
multi_line = """Line 1
Line 2
Line 3"""

lines = multi_line.splitlines()
print(lines)

for i, line in enumerate(lines, 1):
    print(f"{i}: {line}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Join list back to string**

Syntax: `"separator".join(list)`

```python
industries = ["Healthcare", "Technology", "Finance"]

# Join with comma
joined = ",".join(industries)
print(f"Comma: {joined}")

# Join with comma and space
joined2 = ", ".join(industries)
print(f"Comma-space: {joined2}")

# Join with pipe
joined3 = " | ".join(industries)
print(f"Pipe: {joined3}")

# Join with newline
joined4 = "\n".join(industries)
print(f"Newlines:\n{joined4}")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PART 10: Chaining Methods

You can call multiple methods in sequence. Each returns a new string for the next method.

**Method chaining**

```python
messy = "  HEALTHCARE  "

# Chain: strip spaces, then convert to title case
clean = messy.strip().title()
print(f"'{messy}' â†’ '{clean}'")

# More complex chain
raw = "   tEcHfLoW  aNaLyTiCs   "
processed = raw.strip().lower().title()
print(f"'{raw}' â†’ '{processed}'")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


**Real data cleaning example**

```python
# Raw data from import
raw_industries = ["  HEALTHCARE ", "technology", "  Finance  ", "RETAIL"]

print("Raw data:")
for industry in raw_industries:
    print(f"  '{industry}'")

print("\nCleaned data:")
for industry in raw_industries:
    cleaned = industry.strip().title()
    print(f"  '{cleaned}'")
```

In [None]:
# â†“ Type the code below, then press Shift+Enter to run


---
# PRACTICE: Data Cleaning Scenarios

Apply what you've learned to real business data problems.

### Q1: Clean inconsistent industry names

Transform `"  HEALTHCARE  "` into `"Healthcare"`

In [None]:
# Your answer:
messy_industry = "  HEALTHCARE  "


### Q2: Extract domain from email

From `"analyst@techflow.com"`, extract `"techflow.com"`

In [None]:
# Your answer:
email = "analyst@techflow.com"


### Q3: Parse CSV-like data

Split `"TechFlow,Enterprise,500"` into three separate variables: name, plan, revenue

In [None]:
# Your answer:
row = "TechFlow,Enterprise,500"


### Q4: Validate email format

Check if `"user@company.com"` contains `"@"` and ends with `".com"`

In [None]:
# Your answer:
email = "user@company.com"


### Q5: Clean phone number

Convert `"(555) 123-4567"` to `"5551234567"`

In [None]:
# Your answer:
phone = "(555) 123-4567"


### Q6: Build a formatted address

Given these variables, create: `"123 Main St, New York, NY 10001"`

In [None]:
# Your answer:
street = "123 Main St"
city = "New York"
state = "NY"
zip_code = "10001"


### Q7: Reverse a string

Reverse `"TechFlow"` to get `"wolFhceT"`

In [None]:
# Your answer:
text = "TechFlow"


---
# CHEAT SHEET

## Creating Strings
| What | Code |
|------|------|
| Single quotes | `'text'` |
| Double quotes | `"text"` |
| Multi-line | `"""text"""` |
| Raw string | `r"path\file"` |
| Escape chars | `\n` `\t` `\'` `\"` `\\` |

## Operations
| What | Code |
|------|------|
| Length | `len(string)` |
| Concatenate | `str1 + str2` |
| Repeat | `string * 3` |
| Contains | `"text" in string` |

## Indexing & Slicing
| What | Code |
|------|------|
| First char | `string[0]` |
| Last char | `string[-1]` |
| Slice | `string[start:end]` |
| From start | `string[:end]` |
| To end | `string[start:]` |
| Step | `string[::2]` |
| Reverse | `string[::-1]` |

## Case Methods
| What | Code |
|------|------|
| Uppercase | `string.upper()` |
| Lowercase | `string.lower()` |
| Title Case | `string.title()` |
| Capitalize | `string.capitalize()` |

## Whitespace
| What | Code |
|------|------|
| Trim both | `string.strip()` |
| Trim left | `string.lstrip()` |
| Trim right | `string.rstrip()` |

## Find & Replace
| What | Code |
|------|------|
| Find position | `string.find("text")` |
| Count | `string.count("text")` |
| Starts with | `string.startswith("text")` |
| Ends with | `string.endswith("text")` |
| Replace | `string.replace("old", "new")` |

## Split & Join
| What | Code |
|------|------|
| Split by char | `string.split(",")` |
| Split whitespace | `string.split()` |
| Split lines | `string.splitlines()` |
| Join list | `", ".join(list)` |

---
## Module 0.2 Complete! ðŸŽ‰

**You now know how to:**
- âœ… Create strings with single, double, and triple quotes
- âœ… Use escape characters for special chars
- âœ… Concatenate and repeat strings
- âœ… Index and slice to extract parts
- âœ… Convert case (upper, lower, title)
- âœ… Strip whitespace for data cleaning
- âœ… Find, count, and replace text
- âœ… Split strings into lists
- âœ… Join lists back into strings
- âœ… Chain methods for complex cleaning

**Key Takeaways:**
1. Always `.strip()` data from external sources
2. Normalize case with `.lower()` before comparing
3. String methods return NEW strings - they don't modify originals
4. `split()` and `join()` are your best friends for parsing

**Next: Module 0.3 - Numbers and Math Operations**