# Python Fundamentals: Strings - Immutable Text Sequences

## Introduction

A **string** (`str`) is an ordered, immutable sequence of Unicode characters. They are the primary way to represent text data in Python.

**Key Characteristics:**
*   **Sequence:** Strings are ordered sequences of characters.
*   **Immutable:** Once a string is created, it cannot be changed in place. Operations that seem to modify a string actually create *new* string objects.
*   **Unicode:** Python strings handle a wide range of characters from different languages and symbols.
*   **Iterable:** You can loop through the characters of a string.

## Real-World Analogies & Use Cases

*   **Text Messages & Emails:** Storing and manipulating textual communication.
*   **Web Content:** Representing HTML, URLs, JSON data.
*   **Code:** Source code itself is text.
*   **File Paths:** Representing locations in a file system.
*   **User Input:** Capturing and processing text entered by users.
*   **Data Parsing:** Extracting information from log files, CSVs, or other text-based formats.
*   **Configuration:** Storing settings and parameters as text.
*   **Data Serialization:** Representing data structures as strings (e.g., JSON).

## 1. Explain & Demonstrate: Creating Strings

Strings are created using single (`'`) or double (`"`) quotes. Triple quotes (`'''` or `"""`) are used for multi-line strings or docstrings.

In [1]:
# Single or double quotes are equivalent
s1: str = 'Hello, World!'
s2: str = "Python Programming"
print(f"Single quotes: {s1}")
print(f"Double quotes: {s2}\n")

# Quotes inside strings
s3: str = "He said, 'Python is fun!'"
s4: str = 'She replied, "Indeed!"'
print(f"Quotes inside: {s3}")
print(f"Quotes inside: {s4}\n")

# Escaping quotes
s5: str = 'It\'s a beautiful day.' # Use \ to escape the inner quote
s6: str = "This is a \"quoted\" word."
print(f"Escaped quote: {s5}")
print(f"Escaped quote: {s6}\n")

# Triple quotes for multi-line strings
multi_line: str = """This is line one.
This is line two.
  Indentation is preserved.
"""
print(f"Multi-line string:\n{multi_line}")

# Line continuation (less common for strings, often used in code)
continued_string: str = "This is a very long string that " \
                       "continues on the next line."
print(f"Continued string: {continued_string}\n")

# Raw strings (r-prefix): Backslashes are treated as literal characters
# Useful for regular expressions or Windows file paths
path: str = r"C:\Users\Documents\file.txt"
regex_pattern: str = r"\d+\.\d+"
print(f"Raw string (path): {path}")
print(f"Raw string (regex): {regex_pattern}")

Single quotes: Hello, World!
Double quotes: Python Programming

Quotes inside: He said, 'Python is fun!'
Quotes inside: She replied, "Indeed!"

Escaped quote: It's a beautiful day.
Escaped quote: This is a "quoted" word.

Multi-line string:
This is line one.
This is line two.
  Indentation is preserved.

Continued string: This is a very long string that continues on the next line.

Raw string (path): C:\Users\Documents\file.txt
Raw string (regex): \d+\.\d+


## 2. Explain & Demonstrate: Accessing Characters & Slicing

Individual characters and subsequences (substrings) are accessed using indexing and slicing, similar to lists and tuples.

In [2]:
text: str = "Programming"

# --- Indexing (0-based) ---
first_char: str = text[0]   # 'P'
third_char: str = text[2]   # 'o'
last_char: str = text[-1]  # 'g' (negative indexing from the end)
second_last: str = text[-2] # 'n'

print(f"Original string: {text}")
print(f"First char: {first_char}")
print(f"Last char: {last_char}\n")

# --- Slicing [start:stop:step] ---
# 'stop' index is exclusive

substring1: str = text[0:4]    # 'Prog' (index 0 up to, but not including, 4)
substring2: str = text[4:7]    # 'ram'
print(f"Slice [0:4]: {substring1}")
print(f"Slice [4:7]: {substring2}\n")

# Omitting start or stop
from_start: str = text[:7]     # 'Program' (from beginning up to index 7)
to_end: str = text[8:]       # 'ing' (from index 8 to the end)
print(f"Slice [:7]: {from_start}")
print(f"Slice [8:]: {to_end}\n")

# Using step
every_other: str = text[::2]    # 'Pormig' (every second character)
reversed_str: str = text[::-1]  # 'gnimmargorP' (reverse the string)
print(f"Slice [::2]: {every_other}")
print(f"Slice [::-1] (Reversed): {reversed_str}\n")

# --- Immutability Reminder ---
try:
    text[0] = 'p' # This will raise a TypeError
except TypeError as e:
    print(f"Cannot assign to index: {e}")

Original string: Programming
First char: P
Last char: g

Slice [0:4]: Prog
Slice [4:7]: ram

Slice [:7]: Program
Slice [8:]: ing

Slice [::2]: Pormig
Slice [::-1] (Reversed): gnimmargorP

Cannot assign to index: 'str' object does not support item assignment


## 3. Explain & Demonstrate: String Concatenation & Formatting

Combining strings and embedding values within them are common tasks.

In [3]:
# --- Concatenation (+) ---
# Creates a *new* string object.
first_name: str = "Ada"
last_name: str = "Lovelace"
full_name: str = first_name + " " + last_name
print(f"Concatenation (+): {full_name}\n")

# --- Repetition (*) ---
# Creates a *new* string object.
separator: str = "-" * 10
print(f"Repetition (*): {separator}\n")

# --- Formatting (Modern & Preferred: f-Strings) ---
# Introduced in Python 3.6. Clear, concise, and efficient.
item: str = "laptop"
price: float = 1250.99
quantity: int = 3

f_string_basic: str = f"Item: {item}, Price: ${price}"
print(f"f-string (basic): {f_string_basic}")

# f-strings support expressions inside the braces
f_string_expr: str = f"Total cost for {quantity} {item}s: ${price * quantity}"
print(f"f-string (expression): {f_string_expr}")

# f-strings support format specifiers (after a colon)
# :.2f means float with 2 decimal places
# :>10 means right-aligned in a field of 10 characters
# :03d means integer, zero-padded to 3 digits
f_string_formatted: str = f"Item: {item:>10}, Price: ${price:.2f}, ID: {quantity:03d}"
print(f"f-string (formatted): {f_string_formatted}\n")

# --- Formatting (Alternative: str.format()) ---
# Older than f-strings, still used sometimes.
format_method: str = "Item: {}, Price: ${:.2f}, Quantity: {}".format(item, price, quantity)
format_method_indexed: str = "Price: ${1:.2f}, Item: {0}, Qty: {2}".format(item, price, quantity)
format_method_named: str = "Item: {itm}, Price: ${prc:.2f}".format(itm=item, prc=price)
print(f"str.format() (positional): {format_method}")
print(f"str.format() (indexed): {format_method_indexed}")
print(f"str.format() (named): {format_method_named}\n")

# --- Formatting (Old Style: % Operator - Avoid in New Code) ---
# Less readable and flexible than f-strings or .format(). Included for completeness.
old_style: str = "Item: %s, Price: $%.2f" % (item, price)
print(f"Old style (%): {old_style}")

Concatenation (+): Ada Lovelace

Repetition (*): ----------

f-string (basic): Item: laptop, Price: $1250.99
f-string (expression): Total cost for 3 laptops: $3752.9700000000003
f-string (formatted): Item:     laptop, Price: $1250.99, ID: 003

str.format() (positional): Item: laptop, Price: $1250.99, Quantity: 3
str.format() (indexed): Price: $1250.99, Item: laptop, Qty: 3
str.format() (named): Item: laptop, Price: $1250.99

Old style (%): Item: laptop, Price: $1250.99


## 4. Apply: Common String Methods

Python provides a rich set of built-in methods for string manipulation.

In [4]:
message: str = "   learn Python Programming!   "
print(f"Original message: '{message}'")

# --- Cleaning Whitespace ---
stripped: str = message.strip()   # Remove leading/trailing whitespace
left_stripped: str = message.lstrip() # Remove leading whitespace
right_stripped: str = message.rstrip() # Remove trailing whitespace
print(f".strip(): '{stripped}'")
print(f".lstrip(): '{left_stripped}'")
print(f".rstrip(): '{right_stripped}'\n")

# --- Case Conversion ---
upper_case: str = stripped.upper()
lower_case: str = stripped.lower()
title_case: str = stripped.title() # Capitalize first letter of each word
capitalized: str = stripped.capitalize() # Capitalize first letter of the string
print(f".upper(): {upper_case}")
print(f".lower(): {lower_case}")
print(f".title(): {title_case}")
print(f".capitalize(): {capitalized}\n")

# --- Searching & Counting ---
starts_with_learn: bool = stripped.startswith("learn")
ends_with_mark: bool = stripped.endswith("!")
count_p: int = stripped.count("P") # Case-sensitive
find_python: int = stripped.find("Python") # Returns start index or -1 if not found
find_java: int = stripped.find("Java")
# .index() is similar to .find() but raises ValueError if not found
try:
    index_python = stripped.index("Python")
    index_java = stripped.index("Java")
except ValueError as e:
    print(f"Error using .index(): {e}")

print(f".startswith('learn'): {starts_with_learn}")
print(f".endswith('!'): {ends_with_mark}")
print(f".count('P'): {count_p}")
print(f".find('Python'): {find_python}")
print(f".find('Java'): {find_java}\n")

# --- Replacing ---
# Returns a *new* string with replacements.
replaced: str = stripped.replace("Python", "Data Science")
replaced_limit: str = "aaa".replace("a", "b", 2) # Replace max 2 occurrences
print(f"Original (stripped): '{stripped}'")
print(f".replace('Python', 'Data Science'): '{replaced}'")
print(f".replace('a', 'b', 2): '{replaced_limit}'\n")

# --- Splitting & Joining ---
# .split() turns a string into a list of substrings
words: list[str] = stripped.split() # Splits by whitespace by default
csv_data: str = "apple,banana,orange"
fruits: list[str] = csv_data.split(',') # Split by comma
print(f".split(): {words}")
print(f".split(','): {fruits}\n")

# .join() joins elements of an iterable (like a list) into a single string, using the string it's called on as a separator.
separator: str = " "
joined_words: str = separator.join(words)
csv_joined: str = ",".join(fruits)
print(f"' '.join(words): '{joined_words}'")
print(f"','.join(fruits): '{csv_joined}'\n")

# --- Checking Character Types ---
print(f"'Python123'.isalnum(): {'Python123'.isalnum()}") # Alpha-numeric?
print(f"'Python'.isalpha(): {'Python'.isalpha()}")    # Alphabetic?
print(f"'123'.isdigit(): {'123'.isdigit()}")       # Digits?
print(f"'   '.isspace(): {'   '.isspace()}")       # Whitespace?
print(f"'Title Case'.istitle(): {'Title Case'.istitle()}") # Title case?
print(f"'lowercase'.islower(): {'lowercase'.islower()}") # Lowercase?
print(f"'UPPERCASE'.isupper(): {'UPPERCASE'.isupper()}") # Uppercase?

# --- Prefix/Suffix Removal (Python 3.9+) ---
filename = "document.txt"
print(f"'{filename}'.removeprefix('docu'): {filename.removeprefix('docu')}")
print(f"'{filename}'.removesuffix('.txt'): {filename.removesuffix('.txt')}")

Original message: '   learn Python Programming!   '
.strip(): 'learn Python Programming!'
.lstrip(): 'learn Python Programming!   '
.rstrip(): '   learn Python Programming!'

.upper(): LEARN PYTHON PROGRAMMING!
.lower(): learn python programming!
.title(): Learn Python Programming!
.capitalize(): Learn python programming!

Error using .index(): substring not found
.startswith('learn'): True
.endswith('!'): True
.count('P'): 2
.find('Python'): 6
.find('Java'): -1

Original (stripped): 'learn Python Programming!'
.replace('Python', 'Data Science'): 'learn Data Science Programming!'
.replace('a', 'b', 2): 'bba'

.split(): ['learn', 'Python', 'Programming!']
.split(','): ['apple', 'banana', 'orange']

' '.join(words): 'learn Python Programming!'
','.join(fruits): 'apple,banana,orange'

'Python123'.isalnum(): True
'Python'.isalpha(): True
'123'.isdigit(): True
'   '.isspace(): True
'Title Case'.istitle(): True
'lowercase'.islower(): True
'UPPERCASE'.isupper(): True
'document.txt'.removepref

## Performance: Concatenation in Loops (`+` vs `.join()`)

Because strings are immutable, repeatedly concatenating inside a loop using `+` or `+=` is inefficient. Each concatenation creates a new string object, leading to O(n^2) complexity in the worst case.

The **`.join()`** method is the recommended, efficient way to combine multiple strings from an iterable.

In [5]:
from timeit import default_timer as timer
import sys

# Generate a list of characters
num_chars = 100_000
char_list = ["a"] * num_chars

# Inefficient method: += in a loop
start_time = timer()
result_plus = ""
for char in char_list:
    result_plus += char # Creates many intermediate strings
end_time = timer()
time_plus = end_time - start_time
print(f"Time using += : {time_plus:.5f} seconds")
# print(f"Memory (approx): {sys.getsizeof(result_plus)} bytes") # Memory usage can also be high

# Efficient method: .join()
start_time = timer()
result_join = "".join(char_list) # Creates the final string more directly
end_time = timer()
time_join = end_time - start_time
print(f"Time using join: {time_join:.5f} seconds")
# print(f"Memory (approx): {sys.getsizeof(result_join)} bytes")

if time_join > 0:
    print(f"\n.join() was approximately {time_plus / time_join:.1f} times faster.")

Time using += : 0.14851 seconds
Time using join: 0.00087 seconds

.join() was approximately 171.5 times faster.


## Best Practices & Enterprise Context

*   **Use f-Strings:** For string formatting, prefer f-strings (Python 3.6+) for clarity, conciseness, and performance.
*   **Use `.join()` for Concatenating Iterables:** Avoid `+=` inside loops for building strings from multiple parts; use `separator.join(iterable)` instead.
*   **Understand Immutability:** Remember that string methods always return *new* strings. Assign the result back to a variable if you want to keep the changes (e.g., `my_string = my_string.strip()`).
*   **Be Mindful of Case:** Many string operations (`find`, `count`, `replace`, `startswith`, `endswith`) are case-sensitive by default. Convert case (`.lower()` or `.upper()`) before comparison if needed.
*   **Raw Strings for Paths/Regex:** Use the `r"..."` prefix for strings containing literal backslashes, like Windows paths or regular expression patterns, to avoid excessive escaping.
*   **Encoding Awareness:** In enterprise systems dealing with files, network streams, or databases, be aware of text encoding (e.g., UTF-8, ASCII). Use explicit encoding/decoding (`my_bytes.decode('utf-8')`, `my_string.encode('utf-8')`).

## Common Pitfalls & Interview Questions

*   **Pitfall: Forgetting Immutability:** Trying to modify a string by index (`my_str[0] = 'x'`) leads to `TypeError`. Forgetting to reassign the result of methods like `strip` or `replace` (`my_str.strip()` doesn't change `my_str` itself).
*   **Pitfall: `+` Concatenation in Loops:** Using `+=` to build large strings iteratively is inefficient.
*   **Pitfall: `split()` Behavior:** Understanding default `split()` (splits on any whitespace, discards empty strings) vs. `split(',')` (splits only on comma, can result in empty strings).
*   **Pitfall: `find()` vs `index()`:** `find()` returns -1 if not found, `index()` raises `ValueError`.
*   **Pitfall: Slicing Off-by-One:** Remembering the `stop` index in slicing is exclusive.

*   **Interview Question:** "Are Python strings mutable or immutable? What does that mean?"
    *   *Answer:* Immutable. They cannot be changed in place after creation. Operations produce new strings.
*   **Interview Question:** "What is the most efficient way to join a list of strings into a single string?"
    *   *Answer:* Use the `separator.join(list_of_strings)` method.
*   **Interview Question:** "What are f-strings and why are they preferred?"
    *   *Answer:* A modern way (Python 3.6+) to format strings using an `f` prefix and embedding variables/expressions directly in `{}`. They are generally more readable and performant than `.format()` or `%`.
*   **Interview Question:** "How would you reverse a string in Python?"
    *   *Answer:* The idiomatic way is using slicing: `reversed_string = my_string[::-1]`.
*   **Interview Question:** "Explain the difference between `str.find()` and `str.index()`."
    *   *Answer:* Both search for a substring. `find()` returns -1 if not found, while `index()` raises a `ValueError`.
*   **Interview Question:** "How do you remove leading and trailing whitespace from a string?"
    *   *Answer:* Use the `.strip()` method. `.lstrip()` for leading, `.rstrip()` for trailing.

## 6. Challenge: Simple Log Line Parser

Given a log line string in a specific format, extract key information.

**Format:** `"<TIMESTAMP> [LEVEL] - MESSAGE"`
Example: `"2023-03-15T10:30:05 [INFO] - User logged in successfully."`

1.  Write a function `parse_log_line` that takes a single log line string as input.
2.  Handle potential leading/trailing whitespace.
3.  Extract the timestamp, log level, and message content.
4.  Return these three pieces of information, perhaps as a tuple or dictionary.

In [6]:
from typing import Tuple, Optional, Dict, Any

LogParts = Tuple[str, str, str]
LogPartsDict = Dict[str, str]

def parse_log_line(log_line: str) -> Optional[LogPartsDict]:
    """Parses a log line into timestamp, level, and message.

    Args:
        log_line: The log line string.

    Returns:
        A dictionary {'timestamp': ..., 'level': ..., 'message': ...}
        or None if parsing fails.
    """
    line = log_line.strip()
    
    # Find split points
    try:
        level_start = line.index('[')
        level_end = line.index(']')
        message_separator = line.index(' - ')
        
        # Check if parts are in expected order
        if not (level_start < level_end < message_separator):
             return None # Order is wrong
            
    except ValueError:
        # One of the delimiters wasn't found
        return None

    # Extract parts
    timestamp = line[:level_start].strip()
    level = line[level_start + 1:level_end].strip()
    message = line[message_separator + 3:].strip() # Length of ' - ' is 3

    if not timestamp or not level or not message: # Check for empty parts
        return None
        
    return {"timestamp": timestamp, "level": level, "message": message}

# --- Test the function ---
log1 = "  2023-03-15T10:30:05 [INFO] - User logged in successfully.  "
log2 = "2023-03-15T10:32:11 [WARNING] - Disk space low."
log3 = "Invalid log line format"
log4 = "2023-03-15T10:35:00 [ERROR] - Database connection failed"
log5 = "2023-03-15T10:35:00 - [DEBUG] Missing brackets"

parsed1 = parse_log_line(log1)
parsed2 = parse_log_line(log2)
parsed3 = parse_log_line(log3)
parsed4 = parse_log_line(log4)
parsed5 = parse_log_line(log5)

print(f"Parsed log 1: {parsed1}")
print(f"Parsed log 2: {parsed2}")
print(f"Parsed log 3: {parsed3}")
print(f"Parsed log 4: {parsed4}")
print(f"Parsed log 5: {parsed5}")

Parsed log 1: {'timestamp': '2023-03-15T10:30:05', 'level': 'INFO', 'message': 'User logged in successfully.'}
Parsed log 3: None
Parsed log 4: {'timestamp': '2023-03-15T10:35:00', 'level': 'ERROR', 'message': 'Database connection failed'}
Parsed log 5: None


## Quiz

1.  What is the output of `"Python"[1:4]`?
    a) `"Pyth"`
    b) `"yth"`
    c) `"ytho"`
    d) `"tho"`

2.  Which method is best for creating the string `"apple-banana-cherry"` from the list `["apple", "banana", "cherry"]`?
    a) `"-".append(["apple", "banana", "cherry"])`
    b) `"-".join(["apple", "banana", "cherry"])`
    c) `str.concat("-", ["apple", "banana", "cherry"])`
    d) `"-" + "apple" + "-" + "banana" + "-" + "cherry"`

3.  If `my_str = " Hello "`, what is the result of `my_str.strip()`?
    a) `" Hello"`
    b) `"Hello "`
    c) `"Hello"`
    d) `my_str` is modified in place to `"Hello"`.

4.  Which is the **most recommended** way to format the string `"Name: Alice, Age: 30"` in modern Python (3.6+)?
    a) `"Name: %s, Age: %d" % ("Alice", 30)`
    b) `"Name: {}, Age: {}".format("Alice", 30)`
    c) `name="Alice"; age=30; f"Name: {name}, Age: {age}"`
    d) `"Name: " + "Alice" + ", Age: " + str(30)`

*(Answers: 1-b, 2-b, 3-c, 4-c)*

## Conclusion

Strings are fundamental to almost any Python program. Understanding their immutable nature, mastering slicing, leveraging the powerful built-in methods, and using modern formatting techniques like f-strings are essential skills for efficient and readable Python coding. Remember the performance implications of concatenation versus joining, especially when dealing with large amounts of text data.