# Notebook 01: Python Fundamentals via Choice Modeling

**Objective:** Introduce core Python programming fundamentals (variables, data types, data structures, functions, and control flow) and illustrate them with simple examples. We then apply these basics to a toy choice modeling scenario: a softmax-based utility model for choosing among travel modes. By the end of this notebook, participants will be comfortable with basic Python syntax and see how these concepts translate to discrete choice contexts.

## 01.1 Variables and Data Types

In Python, you can store data in variables. A variable is essentially a name that refers to a value. You create a variable with the assignment operator `=`. Python is dynamically typed, meaning you don't declare types explicitly; the type is inferred from the value assigned.

Examples of basic data types:

In [1]:
# Assigning variables of different types
traveler_name = "Alice"        # str (string)
age = 30                      # int (integer)
ticket_price = 15.50          # float (floating-point number)
is_student = True             # bool (Boolean)


Here, `traveler_name` is a string (text) containing `"Alice"`, `age` is an integer, `ticket_price` is a floating-point number, and `is_student` is a Boolean value (True/False). We can check their types using the built-in `type()` function:

In [3]:
print(type(traveler_name), type(age), type(ticket_price), type(is_student))

<class 'str'> <class 'int'> <class 'float'> <class 'bool'>


Variables allow us to label data and reuse it. We will use variables to store things like travel times, costs, choices, etc., in choice modelling examples.

> **Note:** Python variable names should start with a letter and can contain letters, numbers, and underscores. They are case-sensitive (`Mode` and `mode` would be different variables).

## 01.2 Data Structures: Lists and Dictionaries

**Lists:** A list in Python is an ordered, mutable sequence of items. Lists are created with square brackets `[]`. They can contain elements of any type (even mixed types, though usually we keep them homogeneous). Use lists to store collections of related items, like a list of mode names or a list of travel times.

In [6]:
# List of available travel modes
modes = ["Car", "Bus", "Train"]
print("Modes:", modes)
print("First mode:", modes[0])   # Indexing (0-based: 0 is first item)
modes.append("Air")              # Add an element to the list
print("Updated modes:", modes) 

Modes: ['Car', 'Bus', 'Train']
First mode: Car
Updated modes: ['Car', 'Bus', 'Train', 'Air']


Lists preserve the order of insertion and allow duplicates. You can modify elements (`modes[1] = "Coach"` would change "Bus" to "Coach"), iterate over them, and use built-in functions like `len(modes)` (number of items).

In [9]:
modes[1] = "Coach"
print("Updated modes:", modes) 
print("Number of modes:", len(modes))  # Length of the list

Updated modes: ['Car', 'Coach', 'Train', 'Air']
Number of modes: 4


**Dictionaries:** A dictionary is an unordered collection of key-value pairs enclosed in curly braces `{}`. Each entry maps a key to a value, like a real dictionary maps words to definitions. Use dictionaries to structure data by named attributes.

Example:

In [11]:
# Dictionary of travel times for each mode (in minutes)
travel_time = {"Car": 30, "Bus": 45, "Train": 40}
print("Travel time by Car:", travel_time["Car"], "minutes")
# Add a new key-value pair for Air
travel_time["Air"] = 60
print("Modes and times:", travel_time)

Travel time by Car: 30 minutes
Modes and times: {'Car': 30, 'Bus': 45, 'Train': 40, 'Air': 60}


Here, `"Car"`, `"Bus"`, etc. are keys (must be unique and immutable, typically strings or numbers), and the numbers are values. We accessed the Car time with `travel_time["Car"]`. We then added `"Air": 60`. Dictionaries are great for structured data – e.g., storing attributes of an alternative (mode) by name.

In [12]:
print("All modes:", list(travel_time.keys()))        # List all modes
print("All travel times:", list(travel_time.values()))  # List all travel times

# storing attributes of an alternative (mode) by name
mode_attributes = {
    "Car": {"cost": 10.0, "comfort": 7},
    "Bus": {"cost": 5.0, "comfort": 5},
    "Train": {"cost": 8.0, "comfort": 6},
    "Air": {"cost": 50.0, "comfort": 9}
}
print("Train attributes:", mode_attributes["Train"])
print("Air cost:", mode_attributes["Air"]["cost"])
print("Bus comfort:", mode_attributes["Bus"]["comfort"])

All modes: ['Car', 'Bus', 'Train', 'Air']
All travel times: [30, 45, 40, 60]
Train attributes: {'cost': 8.0, 'comfort': 6}
Air cost: 50.0
Bus comfort: 5


In [14]:
# mode_attributes could be used in a choice model to evaluate alternatives
# based on cost, comfort, and travel time stored in the dictionaries.
# For example, calculating a simple utility score for each mode
for mode, attrs in mode_attributes.items():
    time = travel_time.get(mode, 999)  # Default to 999 if mode not found
    utility = -0.1 * attrs["cost"] + 0.5 * attrs["comfort"] - 0.05 * time
    print(f"Utility for {mode}: {utility:.2f}")

Utility for Car: 1.00
Utility for Bus: -0.25
Utility for Train: 0.20
Utility for Air: -3.50


We will often use dictionaries to hold parameters or results in modeling (for example, a dictionary of utility coefficients by variable name, or a record of outputs).

**List of dictionaries:** Sometimes you'll have a list of records, where each record is a dictionary. This could represent dataset-like structures (each dict is an observation). For instance, a list of individuals each with their attributes, or a list of alternatives each with its characteristics. Python's flexibility with these structures is useful for simple simulations.

In [18]:
# Example: List of individuals with attributes
individuals = [
    {"name": "Alice", "age": 30, "income": 70000},
    {"name": "Bob", "age": 25, "income": 50000},
    {"name": "Charlie", "age": 35, "income": 100000}
]

# Example: List of alternatives with characteristics
transport_modes = [
    {"mode": "car", "cost": 0.5, "comfort": 0.8, "time": 30},
    {"mode": "bus", "cost": 0.2, "comfort": 0.6, "time": 45},
    {"mode": "bike", "cost": 0.1, "comfort": 0.7, "time": 60}
]
print("Individuals:", individuals)
print("Transport modes:", transport_modes)

# Accessing data from the list of dictionaries
for person in individuals:
    print(f"{person['name']} is {person['age']} years old with an income of ${person['income']}.")

for mode in transport_modes:
    print(f"Transport mode: {mode['mode']}, Cost: {mode['cost']}, Comfort: {mode['comfort']}, Time: {mode['time']} minutes")


Individuals: [{'name': 'Alice', 'age': 30, 'income': 70000}, {'name': 'Bob', 'age': 25, 'income': 50000}, {'name': 'Charlie', 'age': 35, 'income': 100000}]
Transport modes: [{'mode': 'car', 'cost': 0.5, 'comfort': 0.8, 'time': 30}, {'mode': 'bus', 'cost': 0.2, 'comfort': 0.6, 'time': 45}, {'mode': 'bike', 'cost': 0.1, 'comfort': 0.7, 'time': 60}]
Alice is 30 years old with an income of $70000.
Bob is 25 years old with an income of $50000.
Charlie is 35 years old with an income of $100000.
Transport mode: car, Cost: 0.5, Comfort: 0.8, Time: 30 minutes
Transport mode: bus, Cost: 0.2, Comfort: 0.6, Time: 45 minutes
Transport mode: bike, Cost: 0.1, Comfort: 0.7, Time: 60 minutes


## 01.3 Introducing NumPy Arrays

While Python lists are very flexible, for numeric data we often use **NumPy arrays** for efficiency. NumPy (Numerical Python) provides a multi-dimensional array object and operations to process arrays quickly

A NumPy array is like a grid of values (all of the same type) indexed by tuple(s) of nonnegative integers. They are optimized for numeric computations, enabling vectorized operations (operating on whole arrays at once).

First, import NumPy:

In [21]:
import numpy as np


Create a NumPy array from a Python list:

In [24]:
times_list = [30, 45, 40, 60]               # regular Python list
times_array = np.array(times_list)          # NumPy array
print("List * 2:", times_list * 2)          # List * 2 concatenates the list with itself
print("Array * 2:", times_array * 2)        # Array * 2 multiplies each element by 2

List * 2: [30, 45, 40, 60, 30, 45, 40, 60]
Array * 2: [ 60  90  80 120]


>  Notice the difference: multiplying a Python list by 2 repeats it (because for lists, `*` is defined as repetition), whereas multiplying a NumPy array by 2 performs element-wise numerical doubling. This vectorization is powerful for mathematical operations and is much faster than using loops for large arrays.

In [None]:
print("Array mean:", np.mean(times_array))    # Mean of the array
print("Array sum:", np.sum(times_array))      # Sum of the array
print("Array sqrt:", np.sqrt(times_array))    # Square root of each element

Array mean: 43.75
Array sum: 175
Array sqrt: [5.47722558 6.70820393 6.32455532 7.74596669]


Creating a 2D NumPy array (matrix) for travel times of different modes over different distances:

In [29]:
travel_times = np.array([[30, 45, 40], [60, 50, 55], [70, 80, 75]])  # 3 modes, 3 distances
print("Travel times (2D array):")
print(travel_times)
print("First row (mode 1):", travel_times[0])        # First row
print("Element at (2,1):", travel_times[2, 1])
print("Mean travel time:", np.mean(travel_times))      # Mean of all elements
print("Sum of travel times:", np.sum(travel_times))      # Sum of all elements
print("Travel times * 1.1 (10% increase):")
print(travel_times * 1.1)  # Increase all times by 10%

Travel times (2D array):
[[30 45 40]
 [60 50 55]
 [70 80 75]]
First row (mode 1): [30 45 40]
Element at (2,1): 80
Mean travel time: 56.111111111111114
Sum of travel times: 505
Travel times * 1.1 (10% increase):
[[33.  49.5 44. ]
 [66.  55.  60.5]
 [77.  88.  82.5]]


Example: Calculate average travel time for each mode (row-wise mean)

In [28]:
average_times = np.mean(travel_times, axis=1)
print("Average travel time per mode:", average_times)

Average travel time per mode: [38.33333333 55.         75.        ]


>  NumPy arrays will be heavily used when we deal with large datasets or model computations (e.g., calculating utility for many observations at once). We will explore NumPy further in Notebook 02, but remember: when you see `np.array` and similar syntax, we are leveraging NumPy for speed and convenience in numeric calculations.

## 01.4 Functions and Control Flow

**Functions:** Functions are reusable blocks of code that perform a specific task. We define a function with the `def` keyword, specifying parameters, and use `return` to output a result. Functions help organize code and avoid repetition.

For example, let's define a simple function to compute a linear utility given some attributes:

In [32]:
def compute_utility(time, cost):
    """Compute utility as a weighted sum of time and cost (toy example)."""
    beta_time = -0.1   # coefficient for travel time (per minute)
    beta_cost = -0.5   # coefficient for travel cost (per currency unit)
    utility = beta_time * time + beta_cost * cost
    return utility

# Test the function
u_car = compute_utility(time=30, cost=5)   # e.g., 30 mins, £5
print("Utility for time=30, cost=5:", u_car)

u_bus = compute_utility(time=45, cost=2)   # e.g., 45 mins, £2
print("Utility for time=45, cost=2:", u_bus)

u_bike = compute_utility(time=25, cost=1)   # e.g., 25 mins, £1
print("Utility for time=25, cost=1:", u_bike)



Utility for time=30, cost=5: -5.5
Utility for time=45, cost=2: -5.5
Utility for time=25, cost=1: -3.0


So, for a 30-minute trip costing £5, the utility is -5.5 (the negative sign indicating disutility from time and cost, as expected).

Another example:

In [None]:
def calculate_travel_cost(distance, mode):
    cost_per_km = {"car": 0.5, "bus": 0.2, "bike": 0.1}  # cost per km for each mode
    return distance * cost_per_km.get(mode, 0)

# Test the function
print("Travel cost (car, 100 km):", calculate_travel_cost(100, "car"))
print("Travel cost (bus, 100 km):", calculate_travel_cost(100, "bus"))
print("Travel cost (bike, 100 km):", calculate_travel_cost(100, "bike"))



Travel cost (car, 100 km): 50.0
Travel cost (bus, 100 km): 20.0
Travel cost (bike, 100 km): 10.0


Functions make code more readable and maintainable, especially when the logic might be used multiple times. We will use functions to encapsulate tasks like computing probabilities or evaluating log-likelihoods in later notebooks.

**Control Flow:** Control flow statements like **if-else** and **loops** allow us to execute code based on conditions and to repeat tasks.

* *Conditional statements*: `if` checks a condition and executes a block if true, optionally followed by `elif` (else-if) and `else` for additional cases. For example:

In [34]:
mode = "Bus"
if mode == "Car":
    print("Driving a car")
elif mode == "Bus":
    print("Taking a bus")
else:
    print("Other mode")


Taking a bus


In [None]:
def travel_advice(mode, weather):
    if mode == "bike":
        if weather == "rainy":
            return "It's rainy, consider taking public transport instead of biking."
        else:
            return "Great weather for biking!"
    elif mode == "car":
        return "Driving a car is comfortable."
    elif mode == "bus":
        return "Taking a bus is economical."
    else:
        return "Consider walking or other modes."   
    
# Test the function
print(travel_advice("bike", "sunny"))
print(travel_advice("bike", "rainy"))
print(travel_advice("car", "cloudy"))

Great weather for biking!
It's rainy, consider taking public transport instead of biking.
Driving a car is comfortable.


* *Loops*: Python has *for* loops to iterate over items in a sequence, and *while* loops to repeat until a condition is false. For instance, to iterate over our modes list:

In [36]:
for m in modes:
    print("Mode option:", m)


Mode option: Car
Mode option: Coach
Mode option: Train
Mode option: Air


In [40]:
def travel_advice(mode, weather):
    advice = {
        ("bike", "sunny"): "Great weather for biking!",
        ("bike", "rainy"): "It's rainy, consider taking public transport instead of biking.",
        ("car", "cloudy"): "Driving a car is comfortable.",
        ("bus", "sunny"): "Taking a bus is economical.",
    }
    return advice.get((mode, weather), "Consider walking or other modes.")

# Test the function
print(travel_advice("bike", "sunny"))
print(travel_advice("bike", "rainy"))
print(travel_advice("car", "cloudy"))
print(travel_advice("bus", "sunny"))



Great weather for biking!
It's rainy, consider taking public transport instead of biking.
Driving a car is comfortable.
Taking a bus is economical.


In [39]:
def calculate_total_cost(travel_data):
    total_cost = 0
    for data in travel_data:
        mode = data.get("mode")
        distance = data.get("distance", 0)
        cost = calculate_travel_cost(distance, mode)
        total_cost += cost
    return total_cost

def calculate_travel_cost(distance, mode):
    cost_per_km = {"car": 0.5, "bus": 0.2, "bike": 0.1}  # cost per km for each mode
    return distance * cost_per_km.get(mode, 0)

# Test the function
travel_data = [
    {"mode": "car", "distance": 100},
    {"mode": "bus", "distance": 50},
    {"mode": "bike", "distance": 20},
]

print("Total travel cost:", calculate_total_cost(travel_data))

Total travel cost: 62.0


We will use loops to iterate over records (like going through each individual or each alternative) and if-statements to apply logic (like availability checks: e.g., if a mode is not available, skip it).

However, in data analysis with Pandas or NumPy, explicit loops are often unnecessary because we can operate on vectors, but it’s important to know how to use loops when needed (especially for clarity in simple cases or when writing simulation logic).

## 01.5 Example: A Toy Softmax Utility Model (Three Travel Modes)

Now that we've covered basics, let's apply them to a simple choice modeling scenario. We will create a toy example of a traveler choosing among three travel modes (Car, Bus, Train) for a trip, using a softmax function to model choice probabilities. This mimics a Multinomial Logit model where utilities are computed and then converted to choice probabilities via the softmax (logit) formula.

**Step 1: Define mode attributes and parameters.** For simplicity, assume:

- Travel times for Car, Bus, Train are 30, 45, 40 minutes respectively.

- Travel costs for Car, Bus, Train are £5, £2, £3 respectively.

- We will use fixed utility coefficients: $\beta_{time}= -0.1$ (per minute), $\beta_{cost} = -0.5$ (per £).

Let's set this up: