# üêç Python Bootcamp for Big Data and Analytics

**Module:** WM9G1-15 Big Data and Analytics for Industry  
**Department:** WMG, University of Warwick  

---

## Welcome to the Python Bootcamp!

This bootcamp is designed for **complete beginners** with no prior programming experience. Python is one of the most versatile and in-demand programming skills in the job market, particularly in data analytics, artificial intelligence, and engineering management.

### üìã What You Will Learn

By the end of this bootcamp, you will be able to:

1. Write and run basic Python code
2. Work with different data types (text, numbers, lists)
3. Use variables to store and manipulate data
4. Control program flow with conditions and loops
5. Create reusable functions
6. Handle data using the Pandas library

### ‚è±Ô∏è Estimated Time

This bootcamp should take approximately **4-6 hours** to complete. Work at your own pace and don't rush‚Äîunderstanding the fundamentals is more important than speed.

### üìù Is This Compulsory?

**No, this is optional preparatory material.** It is not assessed and carries no credits. However, completing this bootcamp will significantly help you:

- Understand code examples faster during lectures
- Complete practical exercises more confidently
- Work more independently on your assessments

During the module, you will receive template code and full guidance, so you can participate even without completing this bootcamp. However, having these basics will make everything easier.

> **Note:** Python is not the only tool we will use. The module also covers Excel, SQL, and Tableau for data visualisation.

---

## Getting Help

If you encounter problems while working through this bootcamp:

### 1. Use the Module Teams Channel
Post your questions in the Teams space. This helps everyone learn and saves time for all students facing similar issues.

### 2. Online Resources
These free resources are excellent for self-study:

| Resource | Best For |
|----------|----------|
| [Python Documentation](https://docs.python.org/3/) | Official reference |
| [W3Schools Python](https://www.w3schools.com/python/) | Beginner tutorials |
| [Stack Overflow](https://stackoverflow.com/questions/tagged/python) | Problem-solving |
| [Google's Python Class](https://developers.google.com/edu/python) | Structured learning |
| [Microsoft's Python Course](https://learn.microsoft.com/en-us/training/paths/beginner-python/) | Video-based learning |

### 3. Warwick Library Digital Books
Recommended titles (available digitally):
- *Python Crash Course* by Eric Matthes
- *Automate the Boring Stuff with Python* by Al Sweigart
- *Think Python* by Allen B. Downey

### 4. Using AI Tools Wisely
AI assistants like Gemini can help explain concepts and provide hints. However:
- ‚úÖ **Do** use AI to understand concepts you find difficult
- ‚úÖ **Do** ask AI to explain error messages
- ‚ùå **Don't** copy AI code without understanding it
- ‚ùå **Don't** use AI to do your entire assessments

### 5. Contact the Module Team
If you've tried the above resources and still need help:
- **Module Leader:** Dr Leonardo Alves Dias ‚Äî leonardo.alves-dias@warwick.ac.uk

---

## Setting Up Your Programming Environment

A **programming environment** is the software you use to write, test, and run code. Think of it as a workshop with all the tools you need.

### What We Will Use: Google Colab

For this bootcamp (and the module), we use **Google Colaboratory (Colab)**‚Äîa free, cloud-based platform that:

- ‚úÖ Requires **no installation**‚Äîruns entirely in your web browser
- ‚úÖ Works on any device (Windows, Mac, Linux, Chromebook)
- ‚úÖ Provides free computing resources
- ‚úÖ Saves your work automatically to Google Drive
- ‚úÖ Supports Python notebooks (`.ipynb` files)

### Python Notebooks vs Python Files

| Type | Extension | Description |
|------|-----------|-------------|
| Python Notebook | `.ipynb` | Interactive document mixing text, code, and outputs. Ideal for learning, data analysis, and experimentation. |
| Python Script | `.py` | Plain text file containing only code. Used for building applications and automation. |

We use **notebooks** because they let you run code in small chunks and see results immediately‚Äîperfect for learning!

### How to Save Your Work

In Google Colab, you can download your notebook in different formats:
1. Click **File** in the menu
2. Select **Download**
3. Choose either:
   - **Download .ipynb** ‚Äî keeps the notebook format
   - **Download .py** ‚Äî converts to a Python script

### Optional: Running Python Locally

If you prefer working on your own computer, you have two options:

**Option 1: Anaconda (Recommended)**
1. Download from [anaconda.com/download](https://www.anaconda.com/download)
2. Install using the graphical installer
3. Launch **Jupyter Notebook** from Anaconda Navigator

**Option 2: Python + VS Code**
1. Download Python from [python.org](https://www.python.org/downloads/)
2. Download VS Code from [code.visualstudio.com](https://code.visualstudio.com/)
3. Install the Python extension in VS Code

> **For this module, Google Colab is sufficient and recommended.**

---

## Introduction to Google Colab

Before we start coding, let's understand how Colab works.

### Cell Types

A Colab notebook consists of **cells**. There are two types:

1. **Text cells** (like this one) ‚Äî contain explanations and instructions using Markdown formatting
2. **Code cells** ‚Äî contain Python code that you can run

### How to Run Code

To run a code cell:
- Click the **‚ñ∂ Play button** on the left of the cell, OR
- Press **Shift + Enter** on your keyboard

Try running the cell below:

In [None]:
# This is your first code cell!
# Click the play button or press Shift+Enter to run it

print("Welcome to Python!")

### Useful Colab Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Shift + Enter` | Run current cell and move to next |
| `Ctrl + Enter` | Run current cell and stay |
| `Ctrl + M B` | Insert cell below |
| `Ctrl + M A` | Insert cell above |
| `Ctrl + M D` | Delete current cell |
| `Ctrl + /` | Comment/uncomment selected lines |

---

# Section 1: Your First Python Code

**Learning Objectives:**
- Use the `print()` function to display output
- Understand the difference between strings and numbers
- Write comments to explain your code

## 1.1 The `print()` Function

The `print()` function displays information on the screen. It's one of the most commonly used functions in Python.

### Printing Text (Strings)

Text in Python is called a **string**. Strings must be enclosed in quotation marks‚Äîeither single `'` or double `"` quotes.

In [None]:
# Printing text using double quotes
print("Hello, World!")

# Printing text using single quotes (works the same way)
print('Welcome to Big Data and Analytics!')

### Printing Numbers

Numbers do **not** need quotation marks. Python recognises them automatically.

In [None]:
# Printing a whole number (integer)
print(42)

# Printing a decimal number (float)
print(3.14159)

# Printing the result of a calculation
print(10 + 5)

### ‚ö†Ô∏è Important: String vs Number

There's a crucial difference between `"123"` (a string) and `123` (a number):

In [None]:
# This is a STRING (text) - note the quotation marks
print("123")

# This is a NUMBER (integer) - no quotation marks
print(123)

# They look the same when printed, but behave differently:
print("10" + "5")  # String concatenation: joins text together
print(10 + 5)      # Arithmetic addition: calculates the sum

## 1.2 Comments

**Comments** are notes for humans reading the code. Python ignores them completely.

- Comments start with the `#` symbol
- Use comments to explain **why** you're doing something, not just what

In [None]:
# This is a single-line comment
print("This code runs")  # Comments can also go at the end of a line

# The line below is "commented out" - it won't run
# print("This code is ignored")

# Good comment: Explains the business purpose
# Calculate total revenue including 20% VAT
total = 100 * 1.20
print(total)

### ‚úèÔ∏è Try It Yourself

In the cell below, write code to:
1. Print your name
2. Print your age
3. Add a comment explaining what the code does

In [None]:
# Write your code below this line



---

# Section 2: Variables

**Learning Objectives:**
- Create and use variables to store data
- Follow Python naming conventions
- Understand variable reassignment

## 2.1 What Are Variables?

A **variable** is like a labelled container that stores data in your computer's memory. You give it a name, and Python remembers the value for you.

### Creating a Variable

To create a variable, choose a name and use the `=` sign to assign a value:

In [None]:
# Creating variables
product_name = "Steel Beam"  # A string variable
quantity = 150               # An integer variable
unit_price = 24.99           # A float variable

# Using the variables
print(product_name)
print(quantity)
print(unit_price)

### Using Variables in Calculations

Variables can be used in calculations and combined together:

In [None]:
# Calculate total cost
quantity = 150
unit_price = 24.99

total_cost = quantity * unit_price
print(total_cost)

## 2.2 Variable Naming Rules

### Rules (Must Follow)

| ‚úÖ Allowed | ‚ùå Not Allowed |
|-----------|---------------|
| Start with letter or underscore | Start with number |
| Contain letters, numbers, underscores | Contain spaces |
| `my_variable`, `_count`, `data2024` | `2024data`, `my variable`, `my-var` |

### Conventions (Best Practice)

- Use **lowercase** letters
- Separate words with **underscores** (called `snake_case`)
- Choose **descriptive** names that explain the purpose
- Use **UPPERCASE** for constants (values that shouldn't change)

In [None]:
# Good variable names (descriptive and follow conventions)
customer_name = "Precision Steel Ltd"
order_quantity = 500
unit_price_gbp = 12.50

# Poor variable names (avoid these)
x = "Precision Steel Ltd"  # Not descriptive
n = 500                     # Single letter, unclear meaning
UnitPriceGBP = 12.50       # Should use snake_case

# Constants (use UPPERCASE)
VAT_RATE = 0.20
MAX_ORDER_SIZE = 10000

## 2.3 Case Sensitivity

Python is **case-sensitive**‚Äî`name`, `Name`, and `NAME` are three different variables:

In [None]:
name = "Alice"
Name = "Bob"
NAME = "Charlie"

print(name)   # Alice
print(Name)   # Bob
print(NAME)   # Charlie

## 2.4 Reassigning Variables

You can change a variable's value at any time by assigning a new value:

In [None]:
# Initial value
stock_level = 100
print(f"Initial stock: {stock_level}")

# Update after a sale
stock_level = stock_level - 25
print(f"After sale: {stock_level}")

# Shorthand notation for updating
stock_level -= 10  # Same as: stock_level = stock_level - 10
print(f"After another sale: {stock_level}")

## 2.5 Multiple Assignment

Python allows you to assign multiple variables in a single line:

In [None]:
# Assign different values to multiple variables
name, age, department = "Sarah", 28, "Engineering"
print(name)
print(age)
print(department)

# Assign the same value to multiple variables
x = y = z = 0
print(x, y, z)

---

# Section 3: Data Types

**Learning Objectives:**
- Identify the main data types in Python
- Use the `type()` function to check data types
- Convert between data types

## 3.1 Basic Data Types

Python has several built-in data types. The most common ones are:

| Type | Name | Example | Description |
|------|------|---------|-------------|
| `str` | String | `"Hello"` | Text data |
| `int` | Integer | `42` | Whole numbers |
| `float` | Float | `3.14` | Decimal numbers |
| `bool` | Boolean | `True` or `False` | Logical values |
| `list` | List | `[1, 2, 3]` | Ordered collection |
| `dict` | Dictionary | `{"key": "value"}` | Key-value pairs |

## 3.2 Checking Data Types

Use the `type()` function to check what type of data you're working with:

In [None]:
# Check the type of different values
print(type("Hello"))        # str (string)
print(type(42))             # int (integer)
print(type(3.14))           # float
print(type(True))           # bool (boolean)
print(type([1, 2, 3]))      # list

## 3.3 Converting Between Types

Sometimes you need to convert data from one type to another:

| Function | Purpose | Example |
|----------|---------|--------|
| `str()` | Convert to string | `str(42)` ‚Üí `"42"` |
| `int()` | Convert to integer | `int("42")` ‚Üí `42` |
| `float()` | Convert to float | `float("3.14")` ‚Üí `3.14` |

In [None]:
# Converting string to integer
quantity_text = "100"
quantity_number = int(quantity_text)
print(quantity_number + 50)  # Now we can do maths with it

# Converting number to string (useful for display)
price = 24.99
message = "The price is ¬£" + str(price)
print(message)

### Why Conversion Matters: User Input

Python has a built-in function called `input()` that allows us to take user input. This function pauses our program and waits for the user to type something. Once the user presses Enter, the function returns what was typed as a string.

The `input()` function always returns a **string**, even if the user types a number. You must convert it if you want to do calculations:

In [None]:
# Get user input (always returns a string)
age_text = input("Enter your age: ")
print(f"You entered: {age_text}")
print(f"Type: {type(age_text)}")

# Convert to integer for calculations
age_number = int(age_text)
years_to_retirement = 65 - age_number
print(f"Years until retirement: {years_to_retirement}")

---

# Section 4: Working with Strings

**Learning Objectives:**
- Create and manipulate strings
- Use string methods for common operations
- Format strings to include variables

## 4.1 String Basics

Strings can be created with single quotes `'...'` or double quotes `"..."`. Use double quotes if your string contains an apostrophe:

In [None]:
# Both work the same
message1 = 'Hello, World!'
message2 = "Hello, World!"

# Use double quotes when the string contains an apostrophe
sentence = "It's a beautiful day."
print(sentence)

# Use single quotes when the string contains double quotes
quote = 'She said, "Hello!"'
print(quote)

## 4.2 String Concatenation

**Concatenation** means joining strings together using the `+` operator:

In [None]:
first_name = "Jane"
last_name = "Smith"

# Join strings with +
full_name = first_name + " " + last_name
print(full_name)

# Repeat strings with *
divider = "=" * 30
print(divider)

## 4.3 Useful String Methods

Strings have many built-in **methods** (functions that belong to the string) for common operations:

In [None]:
text = "  Big Data Analytics  "

# Case conversion
print(text.upper())        # "  BIG DATA ANALYTICS  "
print(text.lower())        # "  big data analytics  "
print(text.title())        # "  Big Data Analytics  "

# Whitespace handling
print(text.strip())        # "Big Data Analytics" (removes leading/trailing spaces)

# Finding and replacing
print(text.replace("Big", "Large"))  # "  Large Data Analytics  "

# Checking content
print(text.startswith("  Big"))  # True
print("Data" in text)             # True

## 4.4 F-Strings (Formatted String Literals) ‚≠ê

**F-strings** are the modern, recommended way to include variables in strings. Add `f` before the opening quote and put variables inside `{}`:

In [None]:
product = "Steel Beam"
quantity = 150
price = 24.99

# F-string - clean and readable
message = f"Order: {quantity} x {product} at ¬£{price} each"
print(message)

# You can include expressions inside the braces
total = f"Total cost: ¬£{quantity * price}"
print(total)

# Formatting numbers (2 decimal places)
formatted = f"Total cost: ¬£{quantity * price:.2f}"
print(formatted)

## 4.5 String Indexing and Slicing

Each character in a string has a position called an **index**. Python starts counting from 0:

In [None]:
word = "Python"

# Indexing: access individual characters
print(word[0])    # 'P' - first character
print(word[1])    # 'y' - second character
print(word[-1])   # 'n' - last character

# Slicing: get a portion of the string [start:stop]
print(word[0:3])  # 'Pyt' - characters 0, 1, 2 (stop is exclusive)
print(word[2:])   # 'thon' - from index 2 to end
print(word[:4])   # 'Pyth' - from start to index 3

---

# Section 5: Numbers and Arithmetic

**Learning Objectives:**
- Perform arithmetic operations
- Understand integers vs floats
- Use comparison operators

## 5.1 Arithmetic Operators

| Operator | Name | Example | Result |
|----------|------|---------|--------|
| `+` | Addition | `5 + 3` | `8` |
| `-` | Subtraction | `5 - 3` | `2` |
| `*` | Multiplication | `5 * 3` | `15` |
| `/` | Division | `5 / 3` | `1.666...` |
| `//` | Floor Division | `5 // 3` | `1` |
| `%` | Modulus (remainder) | `5 % 3` | `2` |
| `**` | Exponentiation | `5 ** 3` | `125` |

In [None]:
# Basic arithmetic
print(10 + 3)   # 13
print(10 - 3)   # 7
print(10 * 3)   # 30
print(10 / 3)   # 3.333... (always returns a float)

# Floor division - rounds down to nearest whole number
print(10 // 3)  # 3

# Modulus - returns the remainder
print(10 % 3)   # 1

# Exponentiation - raises to a power
print(2 ** 10)  # 1024

### Practical Example: Engineering Calculation

In [None]:
# Calculate total cost with VAT
unit_price = 45.50
quantity = 200
vat_rate = 0.20

subtotal = unit_price * quantity
vat_amount = subtotal * vat_rate
total = subtotal + vat_amount

print(f"Subtotal: ¬£{subtotal:.2f}")
print(f"VAT (20%): ¬£{vat_amount:.2f}")
print(f"Total: ¬£{total:.2f}")

### Order of Operations

Just like in standard math, Python follows a specific order when evaluating arithmetic expressions. This is often remembered by the acronym PEMDAS or similar:

- **P**arentheses: `( )`
- **E**xponents: `**`
- **M**ultiplication and **D**ivision: `*`, `/`, `%`, and `//` (from left to right)
- **A**ddition and **S**ubtraction: `+` and `-` (from left to right)

Let's demonstrate using an example where we're coding a video game that calculates the player's score by adding the number of kills to the time they've been alive and then tripling this sum.

Here is the code:

In [None]:
kills = 20
time = 60
final_score = (kills + time) * 3
print(final_score) # Correct: Outputs 240

Without brackets, multiplication has precedence over addition, leading to an incorrect score.

See below:

In [None]:
kills = 20
time = 60
final_score = kills + time * 3
print(final_score)  # Incorrect: Outputs 200

### Different Behaviors with Different Data Types

In Python, the plus (`+`) operator works differently depending on the data type of the objects we are using.

- With **strings**, + concatenates them.
- With **numbers**, + adds them.

Remember, anything inside quotation marks in Python is treated as a string. This means we can't perform mathematical operations on them like we would with numbers.

In [None]:
# This is string concatenation, not addition.
string_numbers = "7" + "7" + "7"
print(string_numbers)  # Outputs: 777

## 5.2 Comparison Operators

Comparison operators return `True` or `False`:

| Operator | Meaning | Example | Result |
|----------|---------|---------|--------|
| `==` | Equal to | `5 == 5` | `True` |
| `!=` | Not equal to | `5 != 3` | `True` |
| `>` | Greater than | `5 > 3` | `True` |
| `<` | Less than | `5 < 3` | `False` |
| `>=` | Greater than or equal | `5 >= 5` | `True` |
| `<=` | Less than or equal | `5 <= 3` | `False` |

In [None]:
# Comparison examples
x = 10
y = 5

print(x == y)   # False - is x equal to y?
print(x != y)   # True - is x not equal to y?
print(x > y)    # True - is x greater than y?
print(x >= 10)  # True - is x greater than or equal to 10?

## 5.3 Logical Operators

Combine multiple conditions:

| Operator | Description | Example |
|----------|-------------|--------|
| `and` | Both must be True | `(x > 0) and (x < 10)` |
| `or` | At least one must be True | `(x < 0) or (x > 10)` |
| `not` | Reverses the result | `not (x == 5)` |

In [None]:
age = 25
has_license = True

# Both conditions must be True
can_rent_car = (age >= 21) and has_license
print(f"Can rent a car: {can_rent_car}")  # True

# At least one condition must be True
is_eligible = (age < 18) or (age >= 65)
print(f"Eligible for discount: {is_eligible}")  # False

# Negate a condition
is_adult = not (age < 18)
print(f"Is adult: {is_adult}")  # True

---

# Section 6: Lists

**Learning Objectives:**
- Create and modify lists
- Access elements using indexing and slicing
- Use common list methods

## 6.1 What Are Lists?

A **list** is an ordered collection of items. Lists are one of the most versatile data structures in Python.

- Created with square brackets `[]`
- Items are separated by commas
- Can contain mixed data types
- Can be modified (mutable)

In [None]:
# Creating lists
products = ["Steel Beam", "Copper Wire", "Aluminium Sheet"]
quantities = [150, 500, 75]
mixed_list = ["Product A", 100, 25.99, True]

print(products)
print(quantities)
print(mixed_list)

## 6.2 Accessing List Elements

Use indexing (starting from 0) to access individual elements:

In [None]:
products = ["Steel Beam", "Copper Wire", "Aluminium Sheet", "Brass Tube"]

# Access by index
print(products[0])    # First item: "Steel Beam"
print(products[2])    # Third item: "Aluminium Sheet"
print(products[-1])   # Last item: "Brass Tube"

# Slicing
print(products[1:3])  # Items at index 1 and 2
print(products[:2])   # First two items
print(products[2:])   # From index 2 to end

## 6.3 Modifying Lists

Lists are **mutable**‚Äîyou can change, add, and remove items:

In [None]:
products = ["Steel Beam", "Copper Wire", "Aluminium Sheet"]

# Change an item
products[1] = "Silver Wire"
print(products)  # ['Steel Beam', 'Silver Wire', 'Aluminium Sheet']

# Add items
products.append("Brass Tube")      # Add to end
print(products)

products.insert(1, "Gold Bar")     # Insert at specific position
print(products)

# Remove items
products.remove("Gold Bar")        # Remove by value
print(products)

removed = products.pop()           # Remove and return last item
print(f"Removed: {removed}")
print(products)

## 6.4 Useful List Methods

In [None]:
numbers = [3, 1, 4, 1, 5, 9, 2, 6]

# Length
print(f"Length: {len(numbers)}")        # 8

# Count occurrences
print(f"Count of 1: {numbers.count(1)}")  # 2

# Find index of value
print(f"Index of 5: {numbers.index(5)}")  # 4

# Sort (modifies the list in place)
numbers.sort()
print(f"Sorted: {numbers}")

# Reverse
numbers.reverse()
print(f"Reversed: {numbers}")

# Min, Max, Sum
print(f"Min: {min(numbers)}, Max: {max(numbers)}, Sum: {sum(numbers)}")

---

# Section 7: Dictionaries

**Learning Objectives:**
- Create and use dictionaries
- Access and modify key-value pairs
- Understand when to use dictionaries vs lists

## 7.1 What Are Dictionaries?

A **dictionary** stores data as **key-value pairs**. Think of it like a real dictionary where each word (key) has a definition (value).

- Created with curly braces `{}`
- Keys must be unique
- Values can be any data type
- Accessed by key, not by position

In [None]:
# Creating a dictionary
product = {
    "name": "Steel Beam",
    "sku": "SB-100",
    "price": 24.99,
    "quantity": 150,
    "in_stock": True
}

print(product)

## 7.2 Accessing Dictionary Values

In [None]:
product = {
    "name": "Steel Beam",
    "sku": "SB-100",
    "price": 24.99,
    "quantity": 150
}

# Access by key
print(product["name"])      # "Steel Beam"
print(product["price"])     # 24.99

# Using .get() - safer, returns None if key doesn't exist
print(product.get("name"))           # "Steel Beam"
print(product.get("weight"))         # None (key doesn't exist)
print(product.get("weight", "N/A"))  # "N/A" (default value)

## 7.3 Modifying Dictionaries

In [None]:
product = {"name": "Steel Beam", "price": 24.99, "quantity": 150}

# Update a value
product["price"] = 26.99
print(product)

# Add a new key-value pair
product["weight"] = 15.5
print(product)

# Remove a key-value pair
del product["weight"]
print(product)

# Get all keys and values
print(f"Keys: {list(product.keys())}")
print(f"Values: {list(product.values())}")

## 7.4 Nested Structures

Dictionaries can contain lists, and vice versa:

In [None]:
# Dictionary with a list
customer = {
    "name": "Precision Steel Ltd",
    "contact": "John Smith",
    "orders": ["ORD001", "ORD002", "ORD003"]
}

print(customer["orders"])       # ['ORD001', 'ORD002', 'ORD003']
print(customer["orders"][0])    # 'ORD001'

# List of dictionaries (very common in data analysis)
products = [
    {"name": "Steel Beam", "price": 24.99},
    {"name": "Copper Wire", "price": 12.50},
    {"name": "Aluminium Sheet", "price": 18.75}
]

print(products[0]["name"])  # 'Steel Beam'

---

# Section 8: More Data Types

**Learning Objectives:**
- Recap logical operators
- Get more familair with Booleans
- Introduction to Tuples and Sets

A Boolean is a data type that can only be `True` or `False`.

Here is how you might see Booleans from comparison operators:

In [None]:
print(5 == 10) # Outputs: False
print(5 != 10 ) # Outputs: True
print(5 > 10) # Outputs: False
print(5 < 10) # Outputs: True
print(5 >= 10) # Outputs: False
print(5 <= 5) # Outputs True

#### Points to remember

- Use comparison operators to compare values.
- The result of a comparison is always a Boolean: True or False.
- Booleans are essential for decision-making in programming.
- In programming, a statement or expression that evaluates to either `True` or `False` is referred to as a condition.

### 8.1 Introduction to Tuples

A tuple is an **ordered** and **unchangeable** collection of items. Tuples are similar to lists, but the key difference is that tuples are immutable. This means that once a tuple is created, the items in it cannot be changed.

Tuples are used when we have a collection of items that we want to keep together and unmodified throughout the life of a program. They can be faster than lists and protect your data against accidental change.

#### Tuple Syntax

A tuple is created by placing all the items inside parentheses `()`, separated by commas.

In [None]:
my_tuple = (1, 2, 3)
print(my_tuple)

If we wish to store a single element in a tuple, we need to include a comma after the item, even though there is only one value.

In [None]:
my_tuple = (1)
print(type(my_tuple)) # Outputs: <class 'int'>

my_tuple = (1,)
print(type(my_tuple)) # Outputs: <class 'tuple'>

#### Unpack Tuple

When we create a tuple, we normally assign values to it. This is called "packing" a tuple. We are also allowed to extract the values back into variables. This is called "unpacking".

In [None]:
my_tuple_2 = (4, 5, 6)
(number_1, number_2, number_3) = my_tuple_2

print(number_1)
print(number_2)
print(number_3)

#### Accessing Tuple Elements by Index

Tuples, being ordered collections, allow us to access their elements using their index, just like we would with a list or a string. Each element in the tuple has a specific position assigned to it, starting from 0 for the first element.

In [None]:
colours = ("red", "yellow", "pink", "green", "orange", "purple", "blue")

first_colour = colours[0]
last_colour = colours[-1]
middle_colours = colours[2:5]

print(first_colour)
print(last_colour)
print(middle_colours)

### 8.2 Introduction to Sets

A set is an **unordered** collection of **unique** elements. Sets are useful when you need to ensure that there are no duplicates and when the order is not important.

In [None]:
my_set = {"red", "yellow", "pink", "green", "orange", "purple", "blue"}
print(my_set) # Check the ordering of the output.

#### Using Sets to Remove Duplicates from a List

One common use of a set in Python is to remove duplicate elements from a list. Since sets cannot have duplicate values, converting a list to a set automatically removes any repeated items.

In [None]:
my_list = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique_values = set(my_list)
unique_list = list(unique_values)

print(unique_list)

#### Join Sets

You can use the union() method that returns a new set containing all items from both sets, or the update() method that inserts all the items from one set into another:


In [None]:
set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

set3 = set1.union(set2)
print(set3)

In [None]:
set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

set1.update(set2)
print(set1)

Both union() and update() will exclude any duplicate items.

In [None]:
set4 = {"a", "b" , "c", 1}
set5 = {1, 2, 3}

set4.update(set5)
print(set4)

#### Some useful Set Methods

Python has a set of built-in methods that you can use on sets.

The difference() method returns a set containing the difference between two or more sets

In [None]:
x = {"a", "b" , "c", 1}
y = {1, 2, 3}

z = x.difference(y) # Return a set that contains the items that only exist in set x, and not in set y

print(z)

In [None]:
x = {"a", "b" , "c", 1}
y = {1, 2, 3}

z = y.difference(x) # Reverse the above example. Return a set that contains the items that only exist in set y, and not in set x

print(z)

The intersection() method returns a set that contains the similarity between two or more sets

In [None]:
x = {"apple", "banana", "cherry"}
y = {"google", "microsoft", "apple"}

z = x.intersection(y)

print(z)

### 8.3 Exercise

Here are some exercise to check your understanding of the above sections. Please try your best to solve the problems. Once you finish, you can check the link to see the solutions.

1) Calculate the area of a circle with a radius of 7.5. Use the formula area=œÄr^2, where r is the radius of the circle.

In [None]:
# Your answer here

2) Given the string "Python is awesome", write a Python program to:

Print this string in uppercase.

Replace "awesome" with "great".

Print the number of occurrences of the letter 's' in this string.

In [None]:
# Your answer here

3) Given the list numbers = [10, 20, 30, 40, 50], write a Python program to:


Add the number 60 to the end of the list.

Insert the number 25 between 20 and 30.

Remove the number 40 from the list.

Print the sum of all numbers in the updated list.

In [None]:
# Your answer here

4) Given the dictionary student = {"name": "Emma", "class": "9th", "marks": 75}, write a Python program to:

Add a new key-value pair "gender": "Female".

Update the marks to 85.

Print all keys in the dictionary.

Print all values in the dictionary.

In [None]:
# Your answer here

5) Given the tuple info = ("John", "Doe", 30), write a Python program to:

Unpack this tuple into variables first_name, last_name, and age.
Print these variables.

In [None]:
# Your answer here

6) Given two sets set1 = {1, 2, 3, 4, 5} and set2 = {4, 5, 6, 7, 8}, write a Python program to:

Print the union of these two sets.

Print the intersection of these two sets.

Find the set difference of set1 minus set2.

In [None]:
# Your answer here

[Click the link to check the solutions!](https://colab.research.google.com/drive/1VdbLgwSrifeigCVSpbqjuv_-x7fTIRdi?usp=sharing)


---

# Section 9: Conditional Statements

**Learning Objectives:**
- Use `if`, `elif`, and `else` statements
- Write conditions that control program flow
- Understand indentation in Python

## 9.1 The `if` Statement

An `if` statement runs code only when a condition is `True`:

```python
if condition:
    # code to run if condition is True
```

**Important:** The code inside the `if` block must be **indented** (4 spaces).

In [None]:
temperature = 35

if temperature > 30:
    print("Warning: High temperature!") # Note the indentation at the beginning of the line.
    print("Consider cooling measures.") # This indicates an if statement and runs only when the condition (temperature > 30) is True.

print("Temperature check complete.")  # This always runs

## 9.2 The `if-else` Statement

Use `else` to specify what happens when the condition is `False`:

In [None]:
stock_level = 45
reorder_point = 50

if stock_level < reorder_point:
    print("Stock is low. Please reorder!") #This executes if the condition stock_level < reorder_point is True
else:
    print("Stock levels are adequate.") # This exercute if the condition is false

## 9.3 The `if-elif-else` Statement

Use `elif` (short for "else if") to check multiple conditions:

In [None]:
score = 72 # Keep modifying this value and running the cell again to better understand

if score >= 70:
    grade = "Distinction"
elif score >= 60:
    grade = "Merit"
elif score >= 50:
    grade = "Pass"
else:
    grade = "Fail"

print(f"Score: {score}% - Grade: {grade}")

### Practical Example: Quality Control

In [None]:
# Quality control check
measurement = 10.2  # mm
lower_limit = 9.8
upper_limit = 10.5

if measurement < lower_limit:
    status = "REJECT - Below tolerance"
elif measurement > upper_limit:
    status = "REJECT - Above tolerance"
else:
    status = "PASS - Within tolerance"

print(f"Measurement: {measurement}mm")
print(f"Tolerance: {lower_limit}mm - {upper_limit}mm")
print(f"Status: {status}")

---

# Section 10: Loops

**Learning Objectives:**
- Use `for` loops to iterate over sequences
- Use `while` loops for condition-based repetition
- Control loop execution with `break` and `continue`

## 10.1 The `for` Loop

A `for` loop repeats code for each item in a sequence (list, string, range, etc.):

```python
for item in sequence:
    # code to run for each item
```

In [None]:
# Loop through a list
products = ["Steel Beam", "Copper Wire", "Aluminium Sheet"]

for product in products:
    print(f"Processing: {product}")

### Using `range()` for Numeric Loops

In [None]:
# range(5) generates: 0, 1, 2, 3, 4
for i in range(5):
    print(f"Iteration {i}")

print("---")

# range(start, stop) - from 1 to 4
for i in range(1, 5):
    print(f"Item {i}")

print("---")

# range(start, stop, step)
for i in range(0, 10, 2):  # Count by 2
    print(i)

### Using `enumerate()` for Index and Value

In [None]:
products = ["Steel Beam", "Copper Wire", "Aluminium Sheet"]

# Get both index and value
for index, product in enumerate(products):
    print(f"{index + 1}. {product}")

## 10.2 The `while` Loop

A `while` loop repeats code **as long as** a condition is `True`:

```python
while condition:
    # code to repeat
```

‚ö†Ô∏è **Warning:** Make sure the condition eventually becomes `False`, or you'll create an infinite loop!

In [None]:
# Count down example
count = 5

while count > 0:
    print(f"Count: {count}")
    count -= 1  # Decrease count by 1 each iteration. This enforces that, at some point, count is <= 0, then, the condition becomes false and stop an infinite loop

print("Liftoff!")

## 10.3 Loop Control: `break` and `continue`

- `break` ‚Äî exit the loop immediately
- `continue` ‚Äî skip to the next iteration

In [None]:
# Using break - stop when we find what we're looking for
numbers = [1, 3, 5, 7, 9, 2, 4, 6, 8]

for num in numbers:
    if num % 2 == 0:  # If even
        print(f"Found first even number: {num}")
        break
    print(f"Checking {num}...")

print("---")

# Using continue - skip certain items
for num in range(1, 6):
    if num == 3:
        continue  # Skip number 3
    print(f"Number: {num}")

### Practical Example: Processing Orders

In [None]:
# Calculate total order value
orders = [
    {"product": "Steel Beam", "quantity": 10, "price": 24.99},
    {"product": "Copper Wire", "quantity": 50, "price": 12.50},
    {"product": "Aluminium Sheet", "quantity": 25, "price": 18.75}
]

total_value = 0

for order in orders:
    line_total = order["quantity"] * order["price"]
    print(f"{order['product']}: {order['quantity']} x ¬£{order['price']:.2f} = ¬£{line_total:.2f}")
    total_value += line_total

print(f"\nTotal Order Value: ¬£{total_value:.2f}")

---

# Section 11: Functions

**Learning Objectives:**
- Create reusable functions
- Understand parameters and return values
- Use built-in Python functions

## 11.1 What Are Functions?

A **function** is a reusable block of code that performs a specific task. Functions help you:

- Avoid repeating the same code
- Organise your program into logical pieces
- Make code easier to test and debug

The code inside of a function is only executed when the function its called. To call a function, use its name.

### Creating a Function

In [None]:
# Define a function using 'def' followed by a name and ().
# The name of the function below is greet.
def greet():
    """A simple greeting function.""" # This is a Docstring. It is optional and can be use to describe what the function does.
    print("Hello! Welcome to our system.")

# Call (use/execute) the function
greet()
greet()  # You can call a function multiple times

## 11.2 Functions with Parameters

**Parameters** allow you to pass data into a function:

In [None]:
# Function greet_user with one parameter called "name"
def greet_user(name):
    """Greet a user by name."""
    print(f"Hello, {name}!")

greet_user("Sarah")
greet_user("James")

# Function with multiple parameters
def calculate_total(quantity, unit_price):
    """Calculate total cost."""
    total = quantity * unit_price
    print(f"Total: ¬£{total:.2f}")

calculate_total(100, 24.99)
calculate_total(50, 12.50)

### 11.3 Return Values

Use `return` to send a value back from a function:

In [None]:
# You can also add default values to parameters like rate=0.20.
# This allows you to call a function without send a value for this parameter
def calculate_vat(amount, rate=0.20):
    """
    Calculate VAT for a given amount.

    Parameters:
        amount: The base amount
        rate: VAT rate (default 20%)

    Returns:
        The VAT amount
    """
    vat = amount * rate
    return vat

# Using the returned value
subtotal = 100.00
vat_amount = calculate_vat(subtotal) # Calling the function without sending the second parameter, rate. Therefore, by default, rate will be considered 0.20
total = subtotal + vat_amount

print(f"Subtotal: ¬£{subtotal:.2f}")
print(f"VAT: ¬£{vat_amount:.2f}")
print(f"Total: ¬£{total:.2f}")

## 11.4 Default Parameter Values

You can give parameters default values:

In [None]:
def create_order_summary(product, quantity, unit_price, discount=0):
    """Create an order summary with optional discount."""
    subtotal = quantity * unit_price
    discount_amount = subtotal * discount
    total = subtotal - discount_amount

    return {
        "product": product,
        "quantity": quantity,
        "subtotal": subtotal,
        "discount": discount_amount,
        "total": total
    }

# Without discount
order1 = create_order_summary("Steel Beam", 100, 24.99)
print(order1)

# With 10% discount
order2 = create_order_summary("Copper Wire", 50, 12.50, discount=0.10)
print(order2)

## 11.5 Learn more about Functions with Example

Let's create an e-commerce example, assuming we have some customers and recorded their ordering information. Here we have created two dictionaries, storing customer information and order details respectively.

In [None]:
# Assuming we are having 4 customers and here we store the name of customers ('name'), the number of orders ('open_orders")and the order numbers ('orders').
# If 'open_orders' == 0, 'orders' will have empty list '[]'
customers = {'James':{'name':'James Pennington', \
                      'open_orders':1, \
                      'orders':['o10001', 'o10002', 'o10004']
                      }, \
             'Gareth':{'name':'Gareth Edwards', \
                       'open_orders':0, \
                       'orders':[]
                       }, \
             'Mark':{'name':'Mark Bonnett', \
                     'open_orders':1, \
                     'orders':['o10003']
                     }, \
             'Emily':{'name':'Emily Davis', \
             'open_orders':0, \
             'orders':[]}
             }

In [None]:
# We have seen the order numbers in the above dictonary, and the below are the details for each order.

orders = {'o10001':{'date':'2019/01/10', \
                    'product':'Social Media Detector', \
                    'quantity':'1'
                    }, \
          'o10002':{'date':'2019/01/13', \
                    'product':'Realistic Man\'s wig', \
                    'quantity':'2' \
                    }, \
          'o10003':{'date':'2019/01/14', \
                    'product':'Weather\'s Originals', \
                    'quantity':'10' \
                    }, \
          'o10004':{'date':'2019/01/15', \
                    'product':'Brown Shoes', \
                    'quantity':'2' \
                    }
          }

Assume a customer would want to login, go to the ‚ÄúOrders‚Äù page of the website, and view their open orders. However, this is probably not the only part of our system where we would want to be able to make such a check. For instance, we may want to enable our sales staff to view open orders before visiting a client; our customer services team may need to view this information to deal with customer queries; our accounts team may need to review orders for invoicing purposes; and so on. We want to be able to include this functionality on all these separate pages of our staff intranet site.


TL;DR We would like to create a program that would help us check the order detail of a customer.

To be able to do this, we need to know whether the customer has placed order; if yes, we will print a list of their orders; if not, then we will print out "this customer has no orders". To do so, we can use "for loop" and "if statements" that we have learned earlier.

In [None]:
for customer in customers: # here we assume the customer we are checking are in the customers dictionary
  if len(customers[customer]['orders']) > 0: # here we use len() to know whether there are items in "orders" list. If not empty, then it should be >0
    print('Customer: ' + str(customer))
    for order in customers[customer]['orders']: # here we are finding each order number of the customer from the customer dictionary
      print('Date: ' + str(orders[order]['date'])) # and then we use order number to find out order information from order dictionary, including date, product and quantity
      print('Product: ' + orders[order]['product'])
      print('Quantity: ' + orders[order]['quantity'])
      print('\n')
  else:
    print('Customer: ' + str(customer) + ' has no orders') # if the orders is empty in customers dictionary, then tell this customer has no order.
    print('\n')


The majority of this functionality should be familiar to you from earlier sections. We have made use of for loops, if statements, `len( )` for measuring the length of a string, and creating print strings by concatenation/joining. There is one part that will not be familiar which is the print(`\n`) command. `\n` is the Python code for new line, so effectively we are asking Python to print a line break before going on to the next part of the for loop.

We have two problems with this code in respect to our stated goal. While this is not a particularly long program (11 lines of code) we don't want to have to write this code on every page where the functionality will be used. If we do this, we'd also have the issue of having to rewrite this code on every page any time we have an update to the functionality. If we can store this as a single function we would only have 1 line of code on each page, and one place where updates need to be applied. Secondly, our program will loop through every customer. What we want is to be able to pass the name of a single customer, and return just their results. We can do this with a function.

Firstly, we need to tell Python that we are building a function. We can do so like this:

`def order_printer(customer):`

The def part (short for ‚Äúdefine‚Äù) tells Python this is a function, and that it has the name ‚Äúorder_printer‚Äù. We can use any name we like but we want a name that is logical and describes the purpose of the function.

After the function name, within the brackets, we tell Python if the function should expect a data input. This can be blank if the function requires no data to passed. In our case here we definitely want there to be a data item passed, the customer name we want to print orders for. The name we use here doesn't matter, for instance it could be the letter ‚Äúi‚Äù. Within the function we use this name whenever we want to refer to the data item being passed, in the same way as ‚Äúcustomer" in our ‚Äúfor customer in customers:‚Äù loop.

Let's look at the full code:

In [None]:
def order_printer(customer):
  if len(customers[customer]['orders']) > 0:
    print('Customer: ' + str(customer))
    for order in customers[customer]['orders']:
      print('Date: ' + str(orders[order]['date']))
      print('Product: ' + orders[order]['product'])
      print('Quantity: ' + orders[order]['quantity'])
      print('\n')
  else:
    print('Customer: ' + str(customer) + ' has no orders')
    print('\n')

In fact it is exactly the same code as the code we used in the for loop, except we changed the first line to make it a function definition rather than a loop. If we want to call the function all we need to do is type the function name, and the data we wish to pass to it:

In [None]:
# For example, here we would like to check the orders placed by James.
# After run the above code (our function), then you can run the below code to get results directly

order_printer('James')

In [None]:
# We can also test "Emily" to see what we get

order_printer('Emily')

We can now use the order_printer function anywhere in our code and it will run the assigned set of code on our data. For completion we will update the function with a bit of extra code so that we can deal with situations where the data input is not in our customers dictionary. Fortunately, this is fairly easy to do by just wrapping it in an if condition that check if the customer name is in the customer dictionary like so:

`if customer in customers:`

This means that we can pass a more friendly message if they are not, rather than the error message Python would pass. The final code, and friendly message, is shown as below:

In [None]:
def order_printer(customer):
  if customer in customers:
    if len(customers[customer]['orders']) > 0:
      print('Customer: ' + str(customer))
      for order in customers[customer]['orders']:
        print('Date: ' + str(orders[order]['date']))
        print('Product: ' + orders[order]['product'])
        print('Quantity: ' + orders[order]['quantity'])
        print('\n')
    else:
      print('Customer: ' + str(customer) + ' has no orders')
      print('\n')
  else:
    print('Customer name not found. Please try again')

In [None]:
order_printer('John')

#### In-built Functions in Python

Alongside custom functions we can design ourselves, Python has a large number of in-built functions that can be used. We have already encountered several of these such as `print( )`, `str( )`, `len( )`, and others. The full list of these, and their definitions, are available here: https://docs.python.org/3/library/functions.html.

## 11.6 Exercise

This exercise aims to test your understanding of functions

Objective: Practice defining and using functions in Python by creating a simple contact book application. This exercise involves creating functions to manage a contact book stored as a dictionary.

Your task: Your contact book will store contacts as key-value pairs where the key is the contact's name (a string) and the value is their phone number (also a string for simplicity).

1) Add Contact: Write a function named add_contact that accepts the contact book dictionary, a name, and a phone number as arguments. The function should add the contact to the dictionary. If the contact already exists, print a message indicating that the contact is already in the book.

2) Find Contact: Implement a function called find_contact that takes the contact book and a name as arguments and prints the phone number of the contact. If the contact does not exist, print a message indicating so.

3) Update Contact: Create a function named update_contact that accepts the contact book, a name, and a new phone number as arguments. It should update the contact's phone number if it exists; otherwise, it should print a message indicating that the contact does not exist.

4) Delete Contact: Write a function called delete_contact that takes the contact book and a name as arguments. If the contact exists, it should delete the contact from the book; otherwise, print a message indicating that the contact was not found.

5) Display All Contacts: Finally, write a function named display_contacts that prints all the contacts in the contact book, formatted nicely. If there are no contacts in the book, print a message indicating that the book is empty.

The example usage:

contact_book = {}

add_contact(contact_book, "Alice", "123-456-7890")

add_contact(contact_book, "Bob", "987-654-3210")

find_contact(contact_book, "Alice")

update_contact(contact_book, "Alice", "111-222-3333")

delete_contact(contact_book, "Bob")

display_contacts(contact_book)

So basically, you start with an empty dictionary `contact_book` and then you create 5 functions to add, find, update, delete and display contacts. Once you have built the functions, you should be able to call the functions as we showed above and get the expected output reflecting the changes made to the contact book.

In [None]:
#Your Answer here

[Click the link to check the solution!](https://colab.research.google.com/drive/1o9X5M-o6veUB-VQqNOc4xEDmFT1vKSse?usp=sharing)

---

# Section 12: Working with Packages

**Learning Objectives:**
- Import and use external packages
- Understand how packages extend Python's functionality

## 12.1 What Are Packages?

**Packages** (also called libraries or modules) are collections of pre-written code that extend Python's capabilities. Instead of writing everything from scratch, you can import packages to:

- Perform complex mathematical operations
- Handle data analysis
- Create visualisations
- Connect to databases
- And much more!

### Importing Packages

In [None]:
# Import an entire package
import math

print(math.pi)          # 3.141592653589793
print(math.sqrt(16))    # 4.0

# Import with an alias (shorter name)
import numpy as np

array = np.array([1, 2, 3, 4, 5])
print(array.mean())     # 3.0

# Import specific items from a package
from datetime import datetime, timedelta

today = datetime.now()
print(f"Today: {today.strftime('%Y-%m-%d')}")

## 12.2 Common Packages for Data Analysis

| Package | Purpose | Common Alias |
|---------|---------|-------------|
| `pandas` | Data manipulation and analysis | `pd` |
| `numpy` | Numerical computing | `np` |
| `matplotlib` | Basic plotting | `plt` |
| `seaborn` | Statistical visualisation | `sns` |
| `scikit-learn` | Machine learning | `sklearn` |

These packages come pre-installed in Google Colab.

## 12.3 More about Packages

Alongside Python's in-built function, and functions that we build ourselves (section 14), Python provides access to a large number of modules and packages (the equivalent of libraries in other languages). These modules/packages give us pre-built functionality that can be used to help with a wide range of tasks, including statistics and data management. Some of these come automatically with our Google Colab (or Python distribution if you run it locally), but unlike the in-built functions need to be called (imported into the session).

In this section, we are exploring how we can import and install packages and modules in Python.

For an example, will use the Python module random. As the name suggest, random provides random number functionality for our Python programs. We can import the whole of the random library with the command:

In [None]:
import random

This gives us access to all the functionality available. To demo its use, let‚Äôs build a single program that replicates the popular 1980s British quiz show ‚ÄúPlay Your Cards Right‚Äù (https://en.wikipedia.org/wiki/Play_Your_Cards_Right).

In this game a player will draw a card from the deck where ‚Äú2‚Äù is the lowest card and ‚ÄúAce‚Äù is the highest (effectively 14). The player then will have to guess if the next card will be higher or lower. We will assume that the player will base this purely on the most logical answer and if the card is between two and seven they will call ‚Äúhigher‚Äù. If the card is between nine and ace they will call ‚Äúlower‚Äù. If the card is eight they will pick either ‚Äúhigher‚Äù or ‚Äúlower‚Äù at random. Every time they get it right they win ¬£100 and get to go again. If they get it wrong, or the new card is the same as the existing card they lose and whatever money that have won is their final award. There will be two players and the winner is the one who wins the most money.


*Note: we are simplifying the game so that we will ignore that there are actually only four of each number, one for each suit, and assuming there are infinite of each number (infinite cards with duplications).*


The code looks like this:


In [None]:
import random

players = ['Player #1', 'Player #2']
game_continue = True
money_dictionary = {}

def print_name(card_drawn):
  if card_drawn == 11:
    card_drawn_print = "Jack"
  elif card_drawn == 12:
    card_drawn_print = "Queen"
  elif card_drawn == 13:
    card_drawn_print = "King"
  elif card_drawn == 14:
    card_drawn_print = "Ace"
  else:
    card_drawn_print = str(card_drawn)
  return card_drawn_print

for player in players:
  game_continue = True
  prize = 0
  while game_continue == True:
    card_drawn = random.randint(2, 14)
    print("The card is a " + print_name(card_drawn))
    if card_drawn < 8:
      guess = "higher"
    elif card_drawn > 8:
      guess = "lower"
    else:
      if random.randint(1, 2) == 1:
        guess = "lower"
      else:
        guess = "higher"
        print(player + " guesses " + guess)
        new_card = random.randint(2, 14)
        print("The card is a " + print_name(new_card))
        if new_card > card_drawn and guess == "higher":
          prize += 100
          print(player + " wins ¬£100")
        elif new_card < card_drawn and guess == "lower":
          prize += 100
          print(player + " wins ¬£100")
        else:
          print(player + " loses!")
          print("Total prize money is ¬£" + str(prize))
          money_dictionary[player] = prize
          game_continue = False

if money_dictionary['Player #1'] > money_dictionary['Player #2']:
  print("Player #1 wins with a total prize of ¬£" + \
        str(money_dictionary['Player #1']))
elif money_dictionary['Player #1'] < money_dictionary['Player #2']:
  print("Player #2 wins with a total prize of ¬£" + \
        str(money_dictionary['Player #2']))
else:
  print("It's a tie. No-one wins")


Much of this should be familiar to but there are a few points to make.

Firstly, is our `print_name( )` function. While in the most part this works like the function we created in section 14, in this we have introduced a final command of return card_drawn_print. This means that the function will return this calculated value when called.

Secondly, when we compare the two cards (if new card > card_drawn) we include an and statement so that both of these conditions need to be true for the test to pass.

Thirdly, as detailed in the operators shown in section 6, we increase the value of prize using the command prize += 100 (meaning the prize is increased by 100 if the player wins).
The final ‚Äúnew‚Äù code in this example relates to the random library. At various points we make calls to the `random.randint( )` function. randint is a shortened version of ‚Äúrandom integer‚Äù and, as this suggests, the function will draw a random number between the first value passed and the last value passed. So `random.randint(2, 14)` will draw a random integer between 2 and 14.

We can evaluate that the program works by running it in the notebook. The output will obviously vary (it's based on random numbers), but a possible output is shown as above.

In this example we have only used one function available in random (there are lots of them), the `randint( )` function. If we are not planning on using any other functions we can import the package in a different way:

In [None]:
from random import randint

The benefit is that rather than having to call `random.randint( )` we can call just `randint( )` like so:

In [None]:
new_card = randint(2, 14)

We can also import modules with a specific name, such as:

In [None]:
from random import randint as num_generator

which allows us to call the function as:

In [None]:
card_drawn = num_generator(2, 14)

Thus, the codes we showed earlier can be updated as below:

In [None]:
from random import randint as num_generator

players = ['Player #1', 'Player #2']
game_continue = True
money_dictionary = {}

def print_name(card_drawn):
  if card_drawn == 11:
    card_drawn_print = "Jack"
  elif card_drawn == 12:
    card_drawn_print = "Queen"
  elif card_drawn == 13:
    card_drawn_print = "King"
  elif card_drawn == 14:
    card_drawn_print = "Ace"
  else:
    card_drawn_print = str(card_drawn)
  return card_drawn_print

for player in players:
  game_continue = True
  prize = 0
  while game_continue == True:
    card_drawn = num_generator(2, 14) # we make change here
    print("The card is a " + print_name(card_drawn))
    if card_drawn < 8:
      guess = "higher"
    elif card_drawn > 8:
      guess = "lower"
    else:
      if num_generator(1, 2) == 1:
        guess = "lower"
      else:
        guess = "higher"
        print(player + " guesses " + guess)
        new_card = num_generator(2, 14)
        print("The card is a " + print_name(new_card))
        if new_card > card_drawn and guess == "higher":
          prize += 100
          print(player + " wins ¬£100")
        elif new_card < card_drawn and guess == "lower":
          prize += 100
          print(player + " wins ¬£100")
        else:
          print(player + " loses!")
          print("Total prize money is ¬£" + str(prize))
          money_dictionary[player] = prize
          game_continue = False

if money_dictionary['Player #1'] > money_dictionary['Player #2']:
  print("Player #1 wins with a total prize of ¬£" + \
        str(money_dictionary['Player #1']))
elif money_dictionary['Player #1'] < money_dictionary['Player #2']:
  print("Player #2 wins with a total prize of ¬£" + \
        str(money_dictionary['Player #2']))
else:
  print("It's a tie. No-one wins")

If you have time, you can play with some of the other functions in the package (https://flashtext.readthedocs.io/en/latest/).

## 12.4 Installing Packages with pip

Alongside the modules which come with our Python installation, we can install further packages, which are basically modules that come from other sources. While we import them into our session in the same way as modules, we need to first install them on to our machine.
There are multiple ways to do this, including directly from binaries/zip files/tar balls, or via the Python plugin `easyinstall`, but by far and away the easiest and most common approach is to use `pip`, which comes pre-installed with this version of Python.

Let's demo this by installing and using a third-party package into Python. The package we'll use is a slightly obscure, but fun and useful `flashtext` (https://flashtext.readthedocs.io/en/latest/). We can install `flashtext` directly in the notebook window using:

In [None]:
!pip install flashtext

We don't need to read the full text here but we are interested to know that the package installed to which the line near the bottom **‚ÄúSuccessfully installed flashtext-2.7‚Äù** tells us that this is the case.

We can also use `pip` to upgrade the packages to the latest version. Let's do that for a package that comes with our distribution which we will use later in the book ‚Äì `numpy` (numerical Python - http://www.numpy.org/).

In [None]:
!pip install --upgrade numpy

Again, we only really care about the text at the end which says that we have **‚Äúuninstalled numpy-1.25.2‚Äù** and have **‚ÄúSuccessfully installed numpy-1.26.4‚Äù**.

After installing the packages successfully, now we are ready to import and use the package as we did in the previous example. Let's return to `flashtext` and run an example using it (read the documentation for other tutorials and functionalities - https://flashtext.readthedocs.io/en/latest/). We're going to write a simple program that can find specified in keywords in a block of text.


In [None]:
from flashtext import KeywordProcessor

keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('computational', 'Computers')
keyword_processor.add_keyword('computer', 'Computers')
keyword_processor.add_keyword('high performance computing', 'Computers')
keyword_processor.add_keyword('statistics', 'Statistics')
keyword_processor.add_keyword('statistical', 'Statistics')
keyword_processor.add_keyword('python', 'Python')

text = 'This module aims to introduce students to many of the advanced statistical techniques made possible by innovations in computing and modern processing power. This includes Markov chain Monte Carlo approaches, probabilistic methods, Bayesian statistics, dimension reduction and high performance computing'
text = text.lower()

keywords_found = keyword_processor.extract_keywords(text)
print(keywords_found)

Here we have imported the `KeywordProcessor( )` function from the package, and renamed it as keyword_processor. This just allows us to use it in a slightly different way where we don't need to use brackets after.

Next we added some keywords that we are interested in and then process some text.

The words we will search for (the first word listed each time) are obviously fairly relevant, and are a combination of single words (called unigrams) and multi-word phrases (called n-grams), each word/phrase is separated by quote marks as before. We pass them to flashtext using the keyword_processor variable we created, and the function `add_keyword( )`. After we pass each keyword we give it a category (Computers, Statistics or Python) which we will use to count occurrences.

We then want to pass the text. Because computers will not consider an uppercase letter as the same as lower case letter, if our text includes ‚ÄúPython‚Äù that will not be a match with ‚Äúpython‚Äù. Therefore, we will also convert our text to lower case using the `lower()` function.

Lastly, we will count the keywords that occur (by their category), to produce the output shown as above - we have found two keywords we associated with Statistics (1x ‚Äústatistical‚Äù and 1x ‚Äústatistics‚Äù) and one keyword with Computers (‚Äúhigh performance computing‚Äù).

If you have time, you can play with some of the other functions in the package (https://flashtext.readthedocs.io/en/latest/).

---

# Section 13: Introduction to Pandas

**Learning Objectives:**
- Load data into a DataFrame
- Explore and understand your data
- Perform basic data manipulation

## 13.1 What is Pandas?

**Pandas** is the most popular Python library for data analysis. It provides:

- **DataFrame**: A table-like structure (think Excel spreadsheet)
- **Series**: A single column of data
- Powerful tools for data cleaning, transformation, and analysis

In [None]:
# Import pandas
import pandas as pd

# Check the version
print(f"Pandas version: {pd.__version__}")

## 13.2 Creating a DataFrame and Series

The primary data structures in *pandas* are implemented as two classes:

  * **`DataFrame`**, which you can imagine as a relational data table, with rows and named columns.
  * **`Series`**, which is a single column. A `DataFrame` contains one or more `Series` and a name for each `Series`.

The data frame is a commonly used abstraction for data manipulation. Similar implementations exist in [Spark](https://spark.apache.org/) and [R](https://www.r-project.org/about.html).

You can create a DataFrame from a dictionary:

In [None]:
# Create a DataFrame from a dictionary
data = {
    "Product": ["Steel Beam", "Copper Wire", "Aluminium Sheet", "Brass Tube"],
    "SKU": ["SB-100", "CW-200", "AS-300", "BT-400"],
    "Price": [24.99, 12.50, 18.75, 32.00],
    "Stock": [150, 500, 75, 200],
    "Category": ["Structural", "Electrical", "Structural", "Plumbing"]
}

df = pd.DataFrame(data)
df

Creating a Data Series:

In [None]:
city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])
population = pd.Series([852469, 1015785, 485199])

In [None]:
city_names

In [None]:
print(population)

`DataFrame` objects can be created by passing a `dict` mapping `string` column names to their respective `Series`. If the `Series` don't match in length, missing values are filled with special [NA/NaN](http://pandas.pydata.org/pandas-docs/stable/missing_data.html) values.

Following the above examples, we can create dataframe with series:

In [None]:
pd.DataFrame({ 'City name': city_names, 'Population': population })

Basically we have created a <a target="_blank" href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame">`DataFrame`</a> by passing a dictionary of objects that can be converted into a series-like structure.Here is another example to show:

In [None]:
df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5, 6],
        "grade": ["a", "b", "b", "a", "a", "e"]
    }
)

df

## 13.3 Loading Data from Files

In practice, you'll usually load data from external files:

In [None]:
# Load from CSV (most common)
# df = pd.read_csv("filename.csv")

# Load from Excel
# df = pd.read_excel("filename.xlsx")

# Load from URL
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
titanic_df = pd.read_csv(url)

# Preview the first few rows
titanic_df.head()

We can also read data from uploaded file (i.e. excel or csv) into dataframe with pandas that is uploaded to Colab.

In Google Colab, first we need to upload the dataset into Google cloud from our local machine (your laptop), and then we can use `pd.read_csv()` or `pd.read_excel()` for reading csv file or excel file.

Here is the example:

In [None]:
# You can upload any dataset. For example, you can upload the famous Boston Housing data which is commonly used in machine learning course.
# You can download the data from github https://github.com/selva86/datasets/blob/master/BostonHousing.csv

# Click play button and then you will see the "choose file" button and you should be able to select the data from your laptop and upload
# you can also click the folder icon on menu bar on the left and manually upload the file there
from google.colab import files
uploaded = files.upload()

In [None]:
# when it shows 100% done, then the file now is on cloud. And then you can read the data now
df = pd.read_csv('BostonHousing.csv') # make sure you are putting the right file name in the bracket. Here our data is csv file, so we use pd.read_csv()
df

Once you run the code above, you will see the dataframe showing the number of rows and columns, that means you have read data into dataframe successfully with python. Next we will be showing you how to view data and do some basic exploratory analysis.

## 13.4 Exploring Your Data

You can use `DataFrame.head()` and DataFrame.`tail()` to view the top and bottom rows of the frame respectively:

In [None]:
titanic_df.head() # view the top 5 rows. you can also view top 3, 10 or any number top rows you like, just put a number within the bracket.

In [None]:
titanic_df.tail() # view the bottom 5 rows. and same as above, put the number to view any number of bottom rows as you like

In [None]:
# Basic information about the DataFrame
print("Shape (rows, columns):", titanic_df.shape)
print("\nColumn names:")
print(titanic_df.columns.tolist())

print("\nData types:")
print(titanic_df.dtypes)

In [None]:
# Summary statistics for numeric columns
titanic_df.describe()

In [None]:
# Detailed information including missing values
titanic_df.info()

## 13.5 Selecting Data

Sometimes, we might not want to view or use the whole dataset and we want to view or select parts of data. Here are some ways for us to do.

You use DataFrame['column_name'] or DataFrame.column_name to select one specific column.

In [None]:
titanic_df["Age"]

In [None]:
titanic_df.Age

In [None]:
# Select a single column (returns a Series)
ages = titanic_df["Age"]
print("Ages (first 5):")
print(ages.head())

# Select multiple columns (returns a DataFrame)
subset = titanic_df[["Name", "Age", "Survived"]]
print("\nSubset (first 5):")
print(subset.head())

We can also select data by positions by using DataFrame.iloc[ ]:

In [None]:
titanic_df.iloc[0:3] # so we are returning the first three rows but with all columns

In [None]:
titanic_df.iloc[0:3, 0:2] # and here we are getting the first three rows and the first second columns

In [None]:
titanic_df.iloc[:,3:] # here we are getting all rows and data starting from the third column

In [None]:
titanic_df.iloc[1, 1] # here you will get a value explicitly-the first row and the first column

Selecting by position is very useful. You can start or end the data where you want, or you can take one or few specific values as you want very easily.

You can also select data by conditions:

In [None]:
titanic_df[titanic_df["Age"] > 20] # here we get the data where Age data are always greater than 20

In [None]:
titanic_df[(titanic_df["Age"] > 20) & (titanic_df["Age"] <= 25)] # can be with multiple conditions

## 13.6 Filtering Data

In [None]:
# Filter rows based on a condition
survivors = titanic_df[titanic_df["Survived"] == 1]
print(f"Number of survivors: {len(survivors)}")

# Multiple conditions (use & for AND, | for OR)
young_survivors = titanic_df[(titanic_df["Survived"] == 1) & (titanic_df["Age"] < 18)]
print(f"\nYoung survivors (under 18): {len(young_survivors)}")
print(young_survivors[["Name", "Age", "Sex"]].head())

## 13.7 Grouping and Aggregating Data

In [None]:
# Group by a category and calculate statistics
survival_by_class = titanic_df.groupby("Pclass")["Survived"].mean()
print("Survival rate by class:")
print(survival_by_class)

# Multiple aggregations
summary = titanic_df.groupby("Sex").agg({
    "Survived": "mean",
    "Age": "mean",
    "Fare": "mean"
})
print("\nSummary by sex:")
print(summary)

## 13.8 Saving Data

In [None]:
# Save to CSV
# df.to_csv("output.csv", index=False)

# Save to Excel
# df.to_excel("output.xlsx", index=False)

# In Google Colab, you can download files:
# from google.colab import files
# df.to_csv("output.csv", index=False)
# files.download("output.csv")

print("Data can be saved using df.to_csv() or df.to_excel()")

## 13.9 Data Preprocessing

Sometimes we might need to drop columns/rows and only keep some of the useful columns/rows, or sometimes we need to check whether there are missing data and drop or fill the missing data. Here are the ways how we can do:

Dropping columns/rows using DataFrame.drop() :

In [None]:
titanic_df.head()

In [None]:
df1 = titanic_df.drop(0,axis = 0) # here we drop row 0 (which is the first row) and use axis = 0 to tell we are dropping a row
df1

In [None]:
df1 = titanic_df.drop(['Name'],axis = 1) # here we drop column Name and use axis = 1 to tell we are dropping a column.
df1

When doing exploratory data analysis, we always need to check whether there is missing data. We can use `.isna()` to know whether there are missing data in each column.

In [None]:
titanic_df.isna().sum() # check for missing data

So, we have missing valuesin Age, Cabin, and EMbarked. Therefore, we can either drop or fill missing data using `.dropna()` or `.fillna()`:

In [None]:
titanic_df.dropna(inplace = True) # with inplace = True, the dataframe will be removing all missing data
titanic_df.isna().sum() # check for missing data again

Now, we have no missing data anymore.

Alternatively, we can replace data. Before showing you show to do this, let's load the dataset again. We need to do this because the missing values do not exist anymore since we droped them.

In [None]:
# Load from URL
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
titanic_df = pd.read_csv(url)

In [None]:
titanic_df.isna().sum() # check for missing data again. Note they are back again

Given that Cabin and Embarked are columns with String values and Age contains integer values, we have to replace with the appropriate data type.

In [None]:
titanic_df["Age"] = titanic_df["Age"].fillna(titanic_df["Age"].mean()) # Use fillna to replace missing values, in this case, with mean value of Age column. Of course you fill it with any value you want
titanic_df["Cabin"] = titanic_df["Cabin"].fillna("Unknown") # Replace missing values by the word "Unknown"
titanic_df["Embarked"] = titanic_df["Embarked"].fillna(titanic_df["Embarked"].mode()[0]) # Replace missing values by the mode, that is, the value that repeats the most in the column
titanic_df.isna().sum() # check for missing data again.

---

# Section 14: Debugging Process

**Learning Objectives:**
- Understand errors and fix them

## The Coding Process

Coding typically follows a three-step process:

1. Write Code: We start by writing our code.
2. Run Code: We run our code to see if it works.
3. Fix Code: If there are errors or it doesn't work as expected, we fix the code.

This cycle repeats until our code runs correctly and does what we want it to do.

## Computers and Instructions

Computers don't make guesses; they follow our instructions precisely. If oour instructions (code) contain mistakes, the computer cannot execute the code. Unlike humans, who can often understand meaning despite errors, computers require correct syntax and logic to run a program.

## Debugging is Part of the Process

Making mistakes in coding is normal. Debugging is a natural part of the coding process. Even experienced programmers make mistakes. Don't worry about making errors; instead, use them as opportunities to learn and improve. Enjoy the process of problem-solving and debugging.

## Fixing Multiple Bugs

If a program has several bugs, all of them need to be addressed. Sometimes fixing one bug can reveal another, so patience is key. Work through each error methodically.

## Execution Flow in Python

Python executes code line by line from top to bottom. If it encounters an error, it stops (or "**interrupts**") the program at the first error it finds. We need to fix errors in the order they appear.

## Logical Errors

A program can run without syntax errors but still not do what we want it to do. These are called logical errors. For example, if we accidentally multiply where we meant to divide, Python will do the multiplication without error. The program runs, but the result is incorrect. Logical errors require careful review to catch.

## Practice: Fixing Bugs

With the knowledge you've gained from the previous notebooks, you are now equipped to tackle debugging. Below, you will find a series of programs with intentional bugs. Your task is to run the code, identify and fix these issues using what you've learned. You can rewrite the code and make it work.

Before you begin here is some advice:
- Read error messages carefully; they often point you toward the location and type of the problem.
- Check your syntax: parentheses, quotation marks.
- Make sure you're using the correct operators and variables.
- Verify the logic of your code to ensure it does what you intend.

#### Good Luck Detective!

In [137]:
greeting = "Hello, World!

SyntaxError: unterminated string literal (detected at line 1) (ipython-input-3873043922.py, line 1)

In [138]:
5 + 3 = sum

SyntaxError: cannot assign to expression here. Maybe you meant '==' instead of '='? (ipython-input-3526272935.py, line 1)

In [139]:
print"The quick brown fox jumps over the lazy dog"

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)? (ipython-input-748923745.py, line 1)

In [140]:
name <- "Alice"

TypeError: bad operand type for unary -: 'str'

In [141]:
print("Jordan said "Five in a row is my favourite board game"")

SyntaxError: invalid syntax. Perhaps you forgot a comma? (ipython-input-3466391269.py, line 1)

In [None]:
print("C:\Users\Username\Documents\File.txt")

In [None]:
name = "Bob"
favourite_number - 28
print(name)

In [None]:
anime = Re:Zero

In [None]:
print"Having fun yet?")

In [None]:
name = "Carol"
age = 28
job = Teacher
print(age)

In [None]:
player_name = "Player 1"
score = "1000"
bonus_score = score * 3
print(player_name + bonus_score)

In [None]:
print(colour)
colour = "Black"

In [None]:
book = "The Count of Monte Cristo"
main_character = "Edmond Dant√®s"
print(author)

In [None]:
count = "10"
print(count + 1)

In [None]:
number1 = 10
number2 = 20
average = (number1 + number2)

In [None]:
#Print("Print Me!!!")

In [None]:
number = 5
number_squared = (number ** number)
print(number_squared)

In [None]:
Storing University of Warwick geographical information
location = "University of Warwick"
latitude = 52.379414
longitude = -1.561904

print(latitude)
print(longitude)

In [None]:
character name = "Subaru"

In [None]:
users_age = input("Enter your age: ")
print("Soon you will be " + (users_age + 1) + " years old.")

In [None]:
dog_age = "4"
dog_years = dog_age * 7
print(dog_years)

In [None]:
first_name = input("Enter your first name: ")
last_name = input("Enter your last name: ")
print("Hello" + first_name + last_name)

In [None]:
1stplace = "Alice"
2ndplace = "Bob"

In [None]:
x, y, z = 10
print(x)
print(y)
print(z)

---

# üéâ Congratulations!

You have completed the Python Bootcamp! You now understand:

- ‚úÖ Basic Python syntax and data types
- ‚úÖ Variables that store data
- ‚úÖ Strings for text manipulation
- ‚úÖ Numbers and arithmetic operations
- ‚úÖ Lists and dictionaries for data organisation
- ‚úÖ Conditional statements for decision-making
- ‚úÖ Loops for repetition
- ‚úÖ Functions for code reusability
- ‚úÖ Pandas for data analysis

## Next Steps

### Practice Resources

1. **[HackerRank Python](https://www.hackerrank.com/domains/python)** ‚Äî Lots of problems at different levels
2. **[W3Schools Python Exercises](https://www.w3schools.com/python/python_exercises.asp)** ‚Äî Tutorials with exercises
3. **[LeetCode](https://leetcode.com/)** ‚Äî Coding challenges (also used for interview prep)
4. **[Kaggle](https://www.kaggle.com/)** ‚Äî Real datasets and notebooks from the community

### In the Module

During the Big Data and Analytics module, you will:

- Apply these skills to real engineering datasets
- Learn more advanced data analysis techniques
- Create visualisations using Tableau and Python
- Work with predictive analytics and decision science

---

**Good luck with your studies!**

If you have any questions, please contact:  
**Dr Leonardo Alves Dias** ‚Äî leonardo.alves-dias@warwick.ac.uk