# Lecture 2: Python Code Examples

This notebook contains all the Python code examples from the Beamer presentation `Lecture 2: Introduction to Python for Data Analytics`.

## 1. The Python Environment: IPython & Jupyter

Data analysis is an interactive and exploratory process. IPython (Interactive Python) and Jupyter Notebooks are essential tools that provide an environment perfect for this kind of work, allowing you to mix executable code, text, equations, and visualisations.

### Introspection with `?`

A powerful feature of IPython/Jupyter is **introspection**. By placing a question mark (`?`) after a function or object, you can pull up its documentation (known as a "docstring") and other helpful information. This is incredibly useful for understanding how a library function works without having to search online.

In [None]:
# First, we need to import the pandas library to use its functions.
# We use the standard alias 'pd' to make our code shorter and more readable.
import pandas as pd

# Now, let's use introspection to get help on the 'read_csv' function from pandas.
# This command will open a help pane at the bottom of the screen in Jupyter.
pd.read_csv?

### Magic Commands

Magic commands, which start with a `%` or `%%`, provide special functionalities within the IPython/Jupyter environment. They are not part of the Python language itself but are shortcuts for common tasks.

#### `%timeit`: Measuring Performance

The `%timeit` magic command is used to accurately measure the execution time of a small piece of code. It runs the code multiple times and provides a statistical average, giving a more reliable performance measure than a single run.

In [None]:
# We use %timeit to measure how long it takes to create a list of the first 1000 square numbers.
# This is a list comprehension, a concise way to create lists.
%timeit [x**2 for x in range(1000)]

### Standard Import Conventions

In data science, we use standard aliases when importing common libraries. This practice makes code consistent and easily readable by others in the field.

In [None]:
# Import NumPy for numerical operations, aliased as np.
import numpy as np

# Import pandas for data manipulation, aliased as pd.
import pandas as pd

# Import Matplotlib's pyplot for plotting, aliased as plt.
import matplotlib.pyplot as plt

# The Path object from pathlib provides an object-oriented way to handle filesystem paths.
from pathlib import Path

print("Libraries imported successfully!")

## 2. Python Language Basics

This section covers the fundamental syntax and semantics of the Python language.

### Indentation and Function Definition

Unlike languages that use braces `{}` to define code blocks, Python uses **indentation** (typically 4 spaces). A colon `:` signifies the start of an indented block. This enforces a clean and readable code style.

In [None]:
# The 'def' keyword starts a function definition.
# 'add' is the function name, and (a, b) are its parameters.
def add(a, b):
    # This indented block is the body of the function.
    # A '#' symbol denotes a comment, which is ignored by Python.
    # This function returns the sum of the two input numbers.
    return a + b

# Call the function 'add' with arguments 5 and 3.
result = add(5, 3)

# An f-string (formatted string literal) lets us embed expressions inside string literals.
# Here, we print the value of the 'result' variable.
print(f"The result is {result}")

### Variables as References

When you assign a variable to another (e.g., `b = a`), you are not creating a copy of the object. Instead, you are creating a new label or **reference** that points to the *exact same object* in memory. If the object is mutable (like a list), changes made through one reference will be visible through the other.

In [None]:
# 'a' is a variable that refers to a list object in memory.
a = [1, 2, 3]

# 'b = a' does NOT copy the list. 'b' now points to the SAME list object as 'a'.
b = a

# We modify the list using the reference 'a'.
a.append(4)

# When we print 'b', we see the change because both variables point to the same list.
print(b) # Output will be [1, 2, 3, 4]

### Scalar Data Types
These are the basic building blocks for data in Python.

#### Integers (`int`) and Floats (`float`)

In [None]:
# An integer is a whole number.
my_integer = 100

# Python's integers can be arbitrarily large.
large_integer = 10**12 # This is 1 trillion.

print(f"My integer: {my_integer}")
print(f"Large integer: {large_integer}")

In [None]:
# A float is a number with a decimal point.
my_float = 3.14159

# Floats can also be written in scientific notation.
scientific_notation = 6.78e-5 # This is equal to 0.0000678

print(f"My float: {my_float}")
print(f"Scientific notation: {scientific_notation}")

#### Arithmetic

Pay attention to the difference between standard division (`/`), which always produces a float, and floor division (`//`), which discards the remainder.

In [None]:
# Standard division (/) always results in a float, even if the numbers divide evenly.
result = 10 / 3
print(f"10 / 3 = {result}")

# Floor division (//) discards the fractional part and returns an integer.
floor_result = 10 // 3
print(f"10 // 3 = {floor_result}")

#### Strings (`str`)

Strings are used to store text data. They are **immutable**, meaning they cannot be changed after creation. String methods always return a *new* string.

In [None]:
# Define a string with leading and trailing whitespace.
my_string = "  Hello, World!  "

# The .strip() method removes whitespace from the beginning and end.
print(my_string.strip()) 

# The .lower() method converts the entire string to lowercase.
print(my_string.lower()) 

# The .replace() method returns a new string with specified phrases replaced.
print(my_string.replace("World", "Python"))

#### Booleans (`bool`) and `None`

Booleans represent the truth values `True` and `False`. They are the result of comparison operations and are fundamental to control flow.

`None` is a special type that represents the absence of a value. It's often used as a placeholder or to signal that a variable is empty.

In [None]:
# A comparison operator (>) results in a boolean value.
is_greater = 10 > 5 # This evaluates to True.

# The equality operator (==) checks if two values are equal.
is_equal = 5 == 6   # This evaluates to False.

# Boolean values can be combined with logical operators 'and', 'or', and 'not'.
result = is_greater and not is_equal # True and not False -> True and True -> True

print(f"is_greater is: {is_greater}")
print(f"is_equal is: {is_equal}")
print(f"The final result is: {result}")

In [None]:
# 'None' is used to represent an empty or null value.
my_variable = None

# Use 'is None' to check if a variable has the value None.
if my_variable is None:
    print("The variable has no value")

#### Dates and Times (`datetime`)

Python's built-in `datetime` module is the standard way to handle dates and times.

In [None]:
# Import the specific classes 'datetime' and 'timedelta' from the 'datetime' module.
from datetime import datetime, timedelta

# Get the current date and time.
now = datetime.now()

# A 'timedelta' represents a duration of time.
# Here, we subtract 14 days from the current moment.
two_weeks_ago = now - timedelta(days=14)

# The strftime method formats a datetime object into a string.
# '%Y' is the full year, '%m' is the month, and '%d' is the day.
print(f"The current datetime is: {now}")
print(f"Two weeks ago was: {two_weeks_ago}")
print(f"Formatted date for two weeks ago: {two_weeks_ago.strftime('%Y-%m-%d')}")

#### Type Casting

You can explicitly convert values from one type to another using functions like `int()`, `float()`, `str()`, and `bool()`.

In [None]:
# Start with a string that looks like a number.
s = '3.14159'

# Convert the string 's' to a floating-point number.
fval = float(s)
# The type() function confirms the new type of the variable.
print(f"fval is {fval} and its type is {type(fval)}")

# Convert the float 'fval' to an integer. This truncates (cuts off) the decimal part.
ival = int(fval)
print(f"ival is {ival} and its type is {type(ival)}")

# Convert the integer to a boolean. Any non-zero number is True.
print(f"The boolean value of {ival} is {bool(ival)}")

# Converting 0 to a boolean results in False.
print(f"The boolean value of 0 is {bool(0)}")

### Control Flow

Control flow statements allow you to direct the execution of your code based on certain conditions or to repeat blocks of code.

#### `for` Loops

A `for` loop is used for **definite iteration**â€”that is, iterating over a sequence (like a list) where you know how many items there are.

In [None]:
# Initialise an empty list to store the results of our processing.
processed = []
# Define the list that we will iterate over.
my_list = [1, 2, None, 4]

# The 'for' loop will assign each item from 'my_list' to the variable 'val' one by one.
for val in my_list:
    # Check if the current value ('val') is None.
    if val is None:
        # The 'continue' keyword immediately stops the current iteration and jumps to the next one.
        continue
    # If the value is not None, we double it and append it to our 'processed' list.
    processed.append(val * 2)

print(f"The original list was: {my_list}")
print(f"The processed list is: {processed}")

#### `while` Loops

A `while` loop is used for **indefinite iteration**. The loop continues to run as long as a specified condition is `True`. You must ensure the condition eventually becomes `False`, or you will have an infinite loop.

In [None]:
# Initialise the variable 'x' to 100.
x = 100

# This loop will continue as long as 'x' is greater than 0.
while x > 0:
    # Print the current value of x in each iteration.
    print(f"x is currently {x}")
    # Decrease x by 10. This is crucial to ensure the loop eventually ends.
    x = x - 10
    # Check if x has dropped below 50.
    if x < 50:
        # The 'break' keyword exits the loop immediately, regardless of the 'while' condition.
        print("x is less than 50, breaking the loop!")
        break

print(f"The loop finished with x = {x}")

#### Conditionals (`if`, `elif`, `else`)

These statements allow you to execute different blocks of code based on a series of checks. Python evaluates them in order and runs the code block for the *first* condition that is met.

In [None]:
# Assign an initial value to x.
x = 10 

# The 'if' statement checks the first condition.
if x < 0:
    status = "negative"
# If the first condition was false, the 'elif' (else if) checks the next condition.
elif x == 0:
    status = "zero"
# If none of the preceding conditions were true, the 'else' block is executed as a default.
else:
    status = "positive"

# Display the resulting status.
print(f"The number {x} is {status}.")

#### `range`

The `range` function generates a sequence of integers. It is very efficient as it doesn't store all the numbers in memory at once. It is commonly used with `for` loops to repeat an action a specific number of times. To see the numbers, you can convert the range object to a list.

In [None]:
# range(5) generates integers from 0 up to (but not including) 5.
print(f"range(5): {list(range(5))}") # -> [0, 1, 2, 3, 4]

# range(0, 10, 2) generates integers starting at 0, up to 10, in steps of 2.
print(f"range(0, 10, 2): {list(range(0, 10, 2))}") # -> [0, 2, 4, 6, 8]

## 3. Built-in Data Structures

Python comes with several powerful and flexible data structures for organising collections of data.

### Tuples

A tuple is a fixed-length, **immutable** sequence of Python objects. Once a tuple is created, its contents cannot be changed. They are often used for data that should not be modified, such as the components of a coordinate or a key in a dictionary.

In [None]:
# A tuple is created with parentheses ().
tup = (4, 5, 6)
# trying to change an element like `tup[1] = 0` would cause a TypeError.

# Unpacking a tuple assigns its elements to variables.
# The number of variables must match the number of elements in the tuple.
a, b, c = tup
print(f"Unpacked variable 'a' is: {a}")

# Tuples have a .count() method to count occurrences of a value.
another_tuple = (1, 2, 2, 2, 3)
print(f"The number 2 appears {another_tuple.count(2)} times.")

### Lists

A list is a variable-length, **mutable** sequence. It is the most common and versatile sequence type in Python. You can add, remove, and change elements after the list has been created.

#### Adding and Removing Elements from a List

In [None]:
# A list is created with square brackets [].
a_list = [2, 3, 7]
print(f"Initial list: {a_list}")

# .append() adds an element to the end of the list.
a_list.append(9)
print(f"After append(9): {a_list}")

# .insert(index, value) adds an element at a specific position.
a_list.insert(1, 5)
print(f"After insert(1, 5): {a_list}")

# .pop(index) removes and returns the element at a specific position.
popped_value = a_list.pop(2)
print(f"Popped value at index 2 was: {popped_value}")
print(f"List after pop(2): {a_list}")

# .remove(value) removes the first occurrence of a specific value.
a_list.remove(9)
print(f"List after remove(9): {a_list}")

#### Combining and Sorting Lists

In [None]:
# Define two lists.
x = [1, 2, 3]
y = [4, 5]

# Using the '+' operator creates a new list without modifying the originals.
concatenated_list = x + y
print(f"Result of x + y: {concatenated_list}")
print(f"Original x is still: {x}") # x is unchanged

# The .extend() method modifies the list in-place by appending elements from another sequence.
x.extend(y)
print(f"Result of x.extend(y): {x}") # x is now changed

In [None]:
# Define an unsorted list.
a = [7, 2, 5, 1, 3]
print(f"Unsorted list: {a}")

# The .sort() method sorts the list in-place (it modifies the original list).
a.sort()
print(f"Sorted list: {a}")

#### Slicing Lists

Slicing is a powerful feature that lets you select sub-lists using the syntax `start:stop:step`. The `start` index is included, but the `stop` index is not.

In [None]:
# Define a sequence to slice.
seq = [7, 2, 3, 7, 5, 6, 0, 1]

# Slice from index 1 up to (but not including) index 5.
print(f"Slice [1:5]: {seq[1:5]}")

# Omitting the start index defaults to the beginning of the list.
print(f"First 5 elements: {seq[:5]}")

# Negative indices count from the end of the list.
print(f"Last 4 elements: {seq[-4:]}")

# The 'step' argument allows you to skip elements. Here we take every second element.
print(f"Every other element: {seq[::2]}")

# A step of -1 reverses the list.
print(f"Reversed: {seq[::-1]}")

### Dictionaries (`dict`)

A dictionary stores a collection of key-value pairs. They are highly optimised for retrieving a value when you know the key. Keys must be immutable (e.g., strings, numbers, or tuples).

In [None]:
# A dictionary is created with curly braces {} and key: value pairs.
d1 = {'a': 'some value', 'b': [1, 2, 3]}

# Access the value associated with a key using square brackets.
print(f"The value for key 'b' is: {d1['b']}")

# Add a new key-value pair.
d1['c'] = 'new value'
print(f"The dictionary after adding key 'c': {d1}")

# Check if a key exists in the dictionary using the 'in' keyword.
print(f"Is 'b' a key in d1? {'b' in d1}")

# The .keys() method returns a view of all keys.
print(f"All keys: {list(d1.keys())}")

# The .values() method returns a view of all values.
print(f"All values: {list(d1.values())}")

### Sets

A set is an unordered collection of **unique** elements. They are very fast for membership testing (checking if an element is present) and for performing mathematical set operations like union, intersection, and difference.

In [None]:
# A set is created with curly braces {} or the set() function.
# Duplicate elements are automatically removed.
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

# The union (|) contains all elements that are in either set.
print(f"Union (a | b): {a | b}")

# The intersection (&) contains only the elements that are in both sets.
print(f"Intersection (a & b): {a & b}")

# The difference (-) contains elements that are in set 'a' but not in set 'b'.
print(f"Difference (a - b): {a - b}")

### Comprehensions

Comprehensions are a concise and readable way to create collections (lists, sets, dictionaries) in Python. They are often a more efficient and "Pythonic" alternative to a standard `for` loop.

#### The `for` loop way:

In [None]:
# Define a list of words.
words = ['apple', 'bat', 'bar', 'atom']
# Create an empty list to store the results.
upper_words = []
# Loop through each word in the list.
for x in words:
    # Check if the word starts with 'a'.
    if x.startswith('a'):
        # If it does, convert it to uppercase and add it to the results list.
        upper_words.append(x.upper())

print(f"Result from for loop: {upper_words}")

#### The comprehension way:

In [None]:
# Redefine the list of words.
words = ['apple', 'bat', 'bar', 'atom']

# This single line does the exact same thing as the for loop above.
# [expression for item in list if condition]
upper_words_comp = [x.upper() for x in words if x.startswith('a')]

print(f"Result from comprehension: {upper_words_comp}")

## 4. Functions and Files

### Functions

Functions are the primary way to organise code into logical, reusable blocks. They help make your code more modular, readable, and easier to debug.

In [None]:
# A function is a block of reusable code. We define one
# called `clean_up_text` that takes one argument, `text`.
def clean_up_text(text):
    
    # Use the .strip() method to remove any whitespace
    # from the beginning and end of the text.
    text_stripped = text.strip()

    # Use the .lower() method to convert the entire
    # string to lowercase letters.
    text_lower = text_stripped.lower()

    # Use the .replace() method to find all '!' characters
    # and replace them with nothing (which removes them).
    text_no_punct = text_lower.replace('!', '')

    # The 'return' keyword sends the final, cleaned version of the text
    # back as the function's output.
    return text_no_punct


# Let's define a string to test our function.
original_string = "   Hello World!   "

# Call the function with our string and store the returned result.
cleaned_string = clean_up_text(original_string)


# Print both versions to see the result of the cleaning.
print(f"Original: '{original_string}'")
print(f"Cleaned: '{cleaned_string}'")

### Advanced Function Concepts

#### Returning Multiple Values

A function can return multiple values. Python automatically packs them into a tuple, which can then be conveniently unpacked into separate variables.

In [None]:
# This function returns three values.
def f():
    return 1, 2, 3

# Call the function and unpack the returned tuple into three variables.
a, b, c = f()
print(f"a={a}, b={b}, c={c}")

#### Lambda (Anonymous) Functions

For simple, one-line functions, you can use the `lambda` keyword to create a small, anonymous function. This is often used when you need to pass a simple function as an argument to another function, like the `key` argument in `sort()`.

In [None]:
# Define a list of strings.
words = ['banana', 'apple', 'fig']
print(f"Original list: {words}")

# Sort the list using the 'key' argument.
# The lambda function `lambda x: x[-1]` takes an element `x` 
# and returns its last character `x[-1]`. 
# The list is then sorted based on these last characters ('a', 'e', 'g').
words.sort(key=lambda x: x[-1])

print(f"List sorted by last letter: {words}")

### Generators

A generator is a special kind of iterator, created by a function that uses the `yield` keyword. Instead of computing all values at once and storing them in memory, a generator produces values one at a time, on-the-fly. This is highly memory-efficient for working with very large sequences.

In [None]:
# This function is a generator because it uses 'yield'.
def squares(n=5):
    print("Generator started!")
    # Loop from 1 to n (inclusive).
    for i in range(1, n + 1):
        # 'yield' pauses the function, returns the value, and waits for the next call.
        print(f"Yielding {i**2}...")
        yield i ** 2

# When we loop over the generator, the code inside 'squares' only runs when a value is requested.
for val in squares():
    print(f"Received {val} from generator.")

### Error Handling

Robust code should be able to handle potential errors gracefully without crashing. The `try...except` block is used for this. Code that might cause an error is placed in the `try` block, and the code to run if that specific error occurs is placed in the `except` block.

In [None]:
# The 'try' block contains code that might fail.
try:
    # This line will cause a ZeroDivisionError.
    result = 10 / 0
# If a ZeroDivisionError occurs in the try block, the code in this 'except' block is executed.
except ZeroDivisionError:
    # Instead of crashing, the program will print this message.
    print("Error: Cannot divide by zero!")

### Reading and Writing Files

The standard and safest way to work with files is using the `with open(...)` statement. This ensures that the file is automatically closed when you are finished with it, even if errors occur.

#### Writing to a File

In [None]:
# 'open' is used with the filename and a mode ('w' for write).
# The 'with' statement handles opening and closing the file.
with open('output.txt', 'w') as f:
    # The .write() method writes a string to the file.
    # '\n' is the newline character to move to the next line.
    f.write('Line 1\n')
    f.write('Line 2\n')

print("File 'output.txt' has been written successfully.")

#### Reading from a File

In [None]:
# Create an empty list to hold the lines from the file.
lines = []
# Open the file we just created in read mode ('r').
with open('output.txt', 'r') as f:
    # We can iterate directly over the file object 'f' to read it line by line.
    for line in f:
        # .strip() is used to remove the trailing newline character ('\n') from each line.
        lines.append(line.strip())

print(f"The lines read from the file are: {lines}")