# An Intro to Python

The cell below is a line of python code. To run it, select the cell (it will turn blue). Then, either press the triangle button towards the top of your screen or press shift + enter on your keyboard.

In [1]:
print("Hello World")

Hello World


##### Note
print() is a function used to display its input. We will talk more about functions in a later section

## Python Basics and Data Types

### Common Python Data Types

Python provides several built-in data types for storing and working with data. Below are some of the most commonly used ones.

**`int` (Integer)**  
Represents whole numbers. Integers can be positive, negative, or zero and are commonly used for counting or indexing.

**`float` (Floating-Point Number)**  
Represents numbers that contain a decimal point.

**`bool` (Boolean)**  
Represents one of two values: `True` or `False`.

**`str` (String)**  
Represents text data. Strings are sequences of characters and are enclosed in quotation marks.

**`list`**  
An ordered collection of items. Lists can contain elements of different data types and can be modified after creation.

**`set`**  
An unordered collection of unique items. Sets automatically remove duplicate values and are useful when uniqueness matters or when performing mathematical set operations.

**`dict` (Dictionary)**  
A collection of key–value pairs. Dictionaries are used to store data in a structured way, allowing values to be accessed using meaningful keys rather than numeric positions.

### Integers, Floats, and Strings

In [2]:
# The # symbol denotes a comment. These do not affect the code and are useful for explaining the code

# A single = defines a variable
# The following defines 3 ints

a = 10
b = 3
c = a * b
print(a,b,c)

10 3 30


In [3]:
# Defining the same variable a second time will overwrite it's original value
b = 3.14
print(a,b,c)

10 3.14 30


In [4]:
# We can add to a given variable and change its value by using +=
# This works for + - * and /
a += 4
print(a)

14


##### Note
Notice that `c` did not change its value.  
When we set `c = a * b` in the cell above, Python assigns `c` the product of the **current** values of `a` and `b` at that moment. After this assignment, `c` does **not** automatically update if `a` or `b` change. The value of `c` will only change if it is explicitly reassigned.

In [5]:
# Use type() to see the type of a variable
print(a,type(a))
print(b,type(b))

14 <class 'int'>
3.14 <class 'float'>


In [6]:
# Use "" or '' to define a string
message = "Hello"
greeting = 'Welcome!'

In [7]:
# You can add strings together
print(message + ' and ' + greeting)

# You can even multiply a string by an integer
print(message*10)

Hello and Welcome!
HelloHelloHelloHelloHelloHelloHelloHelloHelloHello


##### Note
You can also put variables into a string by using a formatted string also called an f-string

In [8]:
print(f"Variable a = {a}")

Variable a = 14


### Booleans

In [9]:
# True and False are predefined values in Python that you can set variables to
a = True
b = False

In [10]:
# To compare two values use ==. This will return True if they are equal and False if not
(4+4) == 8

True

In [11]:
"Hello" == "Goodbye"

False

In [12]:
# > >= < and <= can also be used to compare numeric values
print(10 <= 10)
print(1<2)
print(1>2)

True
True
False


In [13]:
# Can use the keywords not, and, or

# not takes the opposite of the following boolean
print(not True)

# and is put between two booleans
# if they are both True, the and statement is True, otherwise it is False
print(True and False)

# or is put between two booleans
# if they are both False, the or statement is False, otherwise it is True
print(True or False)

False
False
True


### Lists

In [14]:
# A list is denoted by []
# Each item is separated by ,
# The items don't need to be the same type
cool_list = [15, "Hello", ["A list in a list?", "Cool"], b, "Word", 124.34]
cool_list

[15, 'Hello', ['A list in a list?', 'Cool'], False, 'Word', 124.34]

In [15]:
# Use sorted() to sort a list
print(sorted([1,5,2,5,7,45,3,2]))
print(sorted(["these","words","are","sorted","alphabetically"]))

[1, 2, 2, 3, 5, 5, 7, 45]
['alphabetically', 'are', 'sorted', 'these', 'words']


##### Note
In the previous cell, I just put cool_list and it showed up below the cell.<br>Putting a variable or a function with an output as the last line of a cell will output it's value.

In [16]:
# Get a value from a list by using [] around the index
cool_list[1]

'Hello'

##### Note
Notice how it returned the second value.<br> In Python, indices start at 0. So if you want the first value of `cool_list`, do `cool_list[0]`

In [17]:
# Find a value in a list
cool_list.index('Hello')

1

In [18]:
# If the value is not there, it will cause an error
cool_list.index('Something that isnt there')

ValueError: 'Something that isnt there' is not in list

In [19]:
# Take a slice of a list by using [start index : end index]
# The slice will include the start index but exclude the end index

# If left blank, the start value will default to 0
# If left blank, the end value will default to the length of the list
simple_list = [0,1,2,3,4,5,6,7]
print('[1:3]')
print(simple_list[1:3])
print()
print('[4:]')
print(simple_list[4:])

[1:3]
[1, 2]

[4:]
[4, 5, 6, 7]


In [20]:
# Take a slice of a list by using [start index : end index : step]
# The default value of step is 1
simple_list[::2]

[0, 2, 4, 6]

In [21]:
# If step is negative, it will count backwards
simple_list[6:0:-2]

[6, 4, 2]

In [22]:
# When step is negative, the default values of the start and end indices are swapped to include the full list
# This means [::-1] will print the list in reverse
print(simple_list[::-1])

[7, 6, 5, 4, 3, 2, 1, 0]


In [23]:
# The list slicing described above can be done on strings as well
print("A very strange string:")
print("bAk yspeqcorqents zmzemsjssadgbe")
print()

print("Looking at every other letter:")
print("bAk yspeqcorqents zmzemsjssadgbef!g"[1::2])
print()

print('stressed backwards:')
print("stressed"[::-1])

A very strange string:
bAk yspeqcorqents zmzemsjssadgbe

Looking at every other letter:
A secret message!

stressed backwards:
desserts


### Sets
A set is like a list with the following differences
- The order does not matter
- Each value in a set is unique

In [24]:
{1,2,3,5,3,2,12,23,3,2,1,1,2,3,2}

{1, 2, 3, 5, 12, 23}

In [25]:
# You can make a set from a list
set(["A","C","D","C"])

{'A', 'C', 'D'}

In [26]:
# You can also turn a set back into a list if needed
list(set(["A","C","D","C"]))

['A', 'D', 'C']

### Dictionaries

In [27]:
# A dictionary is a collection of keys and values. Giving a dictionary a key outputs its value
first_dictionary = {
    "Timmy's Height":65,
    "Jimmy's Height":72
}
# Notice how each pair is key : value and pairs are separated by ,

In [28]:
first_dictionary["Timmy's Height"]

65

In [29]:
# Once a dictionary is defined, we can add new entries by using the following syntax
first_dictionary["Kimmy's Height"] = 84
first_dictionary["Kimmy's Height"]

84

In [30]:
# We can also change the values for existing keys
first_dictionary["Jimmy's Height"] += 2
first_dictionary["Jimmy's Height"]

74

In [31]:
# The keys and values can be any type
first_dictionary[45] = "Forty Five"
print(first_dictionary)

{"Timmy's Height": 65, "Jimmy's Height": 74, "Kimmy's Height": 84, 45: 'Forty Five'}


In [89]:
# You can get the keys and the values
print('Keys: ',first_dictionary.keys())
print()
print('Values: ',first_dictionary.values())

Keys:  dict_keys(["Timmy's Height", "Jimmy's Height", "Kimmy's Height", 45])

Values:  dict_values([65, 74, 84, 'Forty Five'])


## Functions in Python

### What is a function?

A **function** is a reusable block of code that performs a specific task. Functions help make programs easier to read, reduce repetition, and organize logic into meaningful units.

### How Functions Work
When a function is called, Python:
1. Takes the input values (called **arguments** or **parameters**),
2. Executes the statements inside the function,
3. Returns an output value (if specified).

Functions allow you to write code once and use it many times with different inputs.

### Defining a Function
To define a function in Python:
- Start with the keyword `def`.
- Give the function a **name** that describes what it does.
- Specify **parameters**, which act as placeholders for input values.
- Write the function’s logic in an indented block below the definition.
- Optionally, specify a **return value**, which is the result produced by the function.

### Key Concepts
- **Parameters** are the variable names used in the function definition.
- **Arguments** are the actual values passed into the function when it is called.
- A function does not run when it is defined — it only runs when it is called.
- Variables created inside a function are **local** to that function and do not affect variables outside of it unless explicitly returned.

Functions are a core building block of Python and are essential for writing clean, modular, and maintainable code.


In [32]:
# This function does not take any parameters or return any values
def first_function():
    print(" \O/")
    print("  |")
    print(" / \ ")

In [33]:
first_function()

 \O/
  |
 / \ 


In [34]:
# This function takes in two parameters and returns a string
def second_function(a, b):
    return a+b

In [35]:
second_function(9,10)

19

In [36]:
second_function("He","llo")

'Hello'

##### Note
This function works when the inputs are both ints or strings. Typically, we write functions with a specific data type in mind for each parameter. This should be explained to the user in the documentation.

In [37]:
def summarize_scores(name, scores):
    """
    This is called a docstring. It is used to document what the function does and how to use it.
    
    This function takes a person's name (string) and a list of numeric scores.
    Returns a dictionary summarizing the results.
    """
    total = sum(scores)
    average = total / len(scores)

    summary = {
        "name": name,
        "total_score": total,
        "average_score": average,
        "num_scores": len(scores)
    }

    return summary

In [38]:
summarize_scores("Timmy", [95,90,98,88,0,100,86])

{'name': 'Timmy',
 'total_score': 557,
 'average_score': 79.57142857142857,
 'num_scores': 7}

##### Note
The function above uses functions within it. sum() and len() are built in functions in Python. sum() adds up all values in a list and returns the sum. len() takes the length of any iterable (list, string, etc.).

## Conditionals

**Conditionals** allow a program to make decisions based on whether a condition is `True` or `False`.

### How Conditionals Work
Python evaluates a condition:
- If the condition is `True`, the corresponding block of code runs.
- If the condition is `False`, that block is skipped.

### Common Conditional Keywords
- **`if`**: Runs code when a condition is true.
- **`elif`**: Checks an additional condition if the previous one was false.
- **`else`**: Runs code when none of the previous conditions are true.

### Key Concepts
- Conditions are expressions that evaluate to a boolean (`True` or `False`).
- Python checks conditions **from top to bottom** and stops once a true condition is found.
- Indentation is required and determines which code belongs to each condition.
- Comparison operators (such as equality and inequality) and logical operators (such as `and`, `or`, and `not`) are commonly used in conditionals.

Conditionals are essential for controlling program flow and allowing code to respond differently to different inputs.


In [39]:
# Change the value of the variable to understand the code's behavior
new_variable = 55
if(new_variable > 100):
    print("If statement accepted")
elif(new_variable < 15):
    print("First elif statement accepted")
elif(new_variable < 30):
    print("Second elif statement accepted")
else:
    print("Neither statement accepted")

Neither statement accepted


In [40]:
def check_eligibility(age, has_id, is_member):
    if age < 18:
        return "Not eligible: under 18"
    elif not has_id:
        return "Not eligible: valid ID required"
    elif is_member:
        return "Eligible: member access granted"
    else:
        return "Eligible: guest access"

print(check_eligibility(25, False, True))
print(check_eligibility(17, False, False))
print(check_eligibility(65, True, True))

Not eligible: valid ID required
Not eligible: under 18
Eligible: member access granted


##### Note
The code is executed in order. Once a value is returned, the function ends. This is why the check_eligibility(17, False, False) returns "Not eligible: under 18" and not "Not eligible: valid ID required" or "Eligible: guest access".

## Loops

**Loops** allow you to repeatedly execute a block of code while a condition is met. They are useful when working with collections of data or when an operation needs to be performed multiple times.

### `for` Loops

A **`for` loop** is used to iterate over a sequence of values, such as a list, set, or dictionary, or over a range of numbers.

- The loop runs once for each item in the sequence.
- The loop variable takes on the value of the current item during each iteration.
- `for` loops are commonly used when the number of iterations is known in advance.

### `while` Loops

A **`while` loop** continues to run as long as a condition evaluates to `True`.

- The condition is checked before each iteration.
- If the condition becomes `False`, the loop stops.
- `while` loops are useful when the number of iterations is not known ahead of time.

Loops are a fundamental tool for processing data efficiently and automating repetitive tasks.


In [41]:
for word in ["First","Second","Third"]:
    print(word)

First
Second
Third


##### Note
The variable `word` is called the **iterator**. It takes the value of each item in the list, `word` is then used in the code that the loop repeats.

In [42]:
# range(n) can be used to get the integers 0 to n-1
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [94]:
# You can loop with multiple iterators
for k, v in first_dictionary.items():
    print('Key: '+str(k))
    print('Value: '+str(v))
    print()

Key: Timmy's Height
Value: 65

Key: Jimmy's Height
Value: 74

Key: Kimmy's Height
Value: 84

Key: 45
Value: Forty Five



In [43]:
a = 1
while(a<100):
    print(a)
    a *= 2

1
2
4
8
16
32
64


##### Note
- The loop starts with `a` equal to 1. Each time the loop runs, the current value of `a` is printed and then doubled.
- The loop condition requires `a` to be less than 100, so the loop continues only while this condition is true.
- Eventually `a` reaches 128, the condition is no longer satisfied, the loop stops, and no values greater than 100 are printed.


In [44]:
# We can change the order to have it print after updating the value
a = 1
while(a<100):
    a *= 2
    print(a)

2
4
8
16
32
64
128


##### Note
When writing while loops, make sure that your loop will eventually end. For example, if we wrote `a/=2` instead of `a*=2`, `a` would always be less than 100 and the loop would never end. If you accidently run an infinite loop, press the square button on top of the screen to interupt the code.

## Introduction to NumPy

**NumPy** is a useful library for numerical computing in Python. 

NumPy arrays are more efficient and convenient than Python lists for large numerical datasets. This is especially important when we use larger datasets.

We will go over:
- Creating arrays
- Reshaping arrays
- Basic operations
- Filtering arrays

In [57]:
# Creating NumPy arrays
import numpy as np

# 1D array
arr1 = np.array([1, 2, 3, 4, 5])
print("1D array:", arr1)

# 2D array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:\n", arr2)

# You can also make higher dimensional arrays
# 5D array
arr5 = np.array(list(range(240))).reshape((2,2,3,4,5)) # This line first makes a 1d array for the numbers 0 to 239 and then reshapes it to a 2x2x3x4x5 array
print()
print("5D array:\n", arr5)

1D array: [1 2 3 4 5]
2D array:
 [[1 2 3]
 [4 5 6]]

5D array:
 [[[[[  0   1   2   3   4]
    [  5   6   7   8   9]
    [ 10  11  12  13  14]
    [ 15  16  17  18  19]]

   [[ 20  21  22  23  24]
    [ 25  26  27  28  29]
    [ 30  31  32  33  34]
    [ 35  36  37  38  39]]

   [[ 40  41  42  43  44]
    [ 45  46  47  48  49]
    [ 50  51  52  53  54]
    [ 55  56  57  58  59]]]


  [[[ 60  61  62  63  64]
    [ 65  66  67  68  69]
    [ 70  71  72  73  74]
    [ 75  76  77  78  79]]

   [[ 80  81  82  83  84]
    [ 85  86  87  88  89]
    [ 90  91  92  93  94]
    [ 95  96  97  98  99]]

   [[100 101 102 103 104]
    [105 106 107 108 109]
    [110 111 112 113 114]
    [115 116 117 118 119]]]]



 [[[[120 121 122 123 124]
    [125 126 127 128 129]
    [130 131 132 133 134]
    [135 136 137 138 139]]

   [[140 141 142 143 144]
    [145 146 147 148 149]
    [150 151 152 153 154]
    [155 156 157 158 159]]

   [[160 161 162 163 164]
    [165 166 167 168 169]
    [170 171 172 173 174]
    

In [58]:
# Basic NumPy operations
arr = np.array([1, 2, 3, 4, 5])

# Element-wise operations
print("Array + 10:", arr + 10)
print("Array * 2:", arr * 2)

# Aggregation functions
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Max:", np.max(arr))


Array + 10: [11 12 13 14 15]
Array * 2: [ 2  4  6  8 10]
Sum: 15
Mean: 3.0
Max: 5


In [47]:
# Slicing and Indexing
arr = np.array([10, 20, 30, 40, 50])

# Access single element
print("First element:", arr[0])

# Slice elements
print("Elements 2 to 4:", arr[1:4])

# Negative indexing
print("Last element:", arr[-1])

# 2D array slicing
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("2D array:\n", arr2d)
print("First row:", arr2d[0])
print("First column:", arr2d[:, 0])


First element: 10
Elements 2 to 4: [20 30 40]
Last element: 50
2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
First row: [1 2 3]
First column: [1 4 7]


In [48]:
# Boolean Masking
arr = np.array([5, 10, 15, 20, 25])

# Create a boolean mask
mask = arr > 15
print("Mask:", mask)

# Use mask to filter array
filtered = arr[mask]
print("Filtered values (greater than 15):", filtered)

# Combining conditions
combined_mask = (arr >= 10) & (arr <= 20)
print("Values between 10 and 20:", arr[combined_mask])


Mask: [False False False  True  True]
Filtered values (greater than 15): [20 25]
Values between 10 and 20: [10 15 20]


## Introduction to Pandas

**Pandas** is a Python library for data manipulation and analysis. It provides two primary data structures:

- **Series**: 1-dimensional labeled array (like a list with labels)
- **DataFrame**: 2-dimensional labeled table (like an Excel spreadsheet or SQL table)

We will go over:
- Creating and modifying dataframes
- Getting statistics
- Merging and concatenating two dataframes

Pandas is widely used in data analysis, machine learning, and finance.


In [86]:
import pandas as pd

# Series
s = pd.Series([10, 20, 30, 40])
print("Series:\n", s)
print()

# Can convert the type of the series
print("Series as strings:\n",s.astype(float))

# DataFrame from dictionary
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Premium": [100.5, 200.0, 150.75]
})
print("\nDataFrame:\n", df)

Series:
 0    10
1    20
2    30
3    40
dtype: int64

Series as strings:
 0    10.0
1    20.0
2    30.0
3    40.0
dtype: float64

DataFrame:
       Name  Age  Premium
0    Alice   25   100.50
1      Bob   30   200.00
2  Charlie   35   150.75


In [71]:
# You can get a column from the array - giving you a series
print(df['Name'])
print()

# You can get an dataframe with a subset of the columns
print(df[['Premium','Age']])

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

   Premium  Age
0   100.50   25
1   200.00   30
2   150.75   35


In [78]:
# (You can replace with a CSV file path if needed)
data = {
    "PolicyID": [101, 102, 103, 104],
    "Losses": [500, 0, 1200, 300],
    "Active": [True, False, True, True]
}
df = pd.DataFrame(data)

#Add new columns
df['Expense'] = [10,20,30,40]
df['Useless_Column'] = [1,2,3,4]

#Remove a column
df = df.drop(columns=['Useless_Column'])

# Inspect the first few rows
print("Head of DataFrame:\n", df.head())

# Get column names and shape
print("\nColumns:", df.columns)
print("Shape:", df.shape)


Head of DataFrame:
    PolicyID  Losses  Active  Expense
0       101     500    True       10
1       102       0   False       20
2       103    1200    True       30
3       104     300    True       40

Columns: Index(['PolicyID', 'Losses', 'Active', 'Expense'], dtype='object')
Shape: (4, 4)


In [80]:
# Filter rows where Losses > 500
high_losses = df[df["Losses"] >= 500]
print("\nPolicies with high losses:\n", high_losses)


Policies with high losses:
    PolicyID  Losses  Active  Expense
0       101     500    True       10
2       103    1200    True       30


In [74]:
# Sum, mean, max of numeric columns
print("Total losses:", df["Losses"].sum())
print("Average loss:", df["Losses"].mean())
print("Max loss:", df["Losses"].max())

# Count of active policies
print("Number of active policies:", df["Active"].sum())

# Describe all numeric columns
print("\nSummary statistics:\n", df.describe())


Total losses: 2000
Average loss: 500.0
Max loss: 1200
Number of active policies: 3

Summary statistics:
          PolicyID       Losses    Expense
count    4.000000     4.000000   4.000000
mean   102.500000   500.000000  25.000000
std      1.290994   509.901951  12.909944
min    101.000000     0.000000  10.000000
25%    101.750000   225.000000  17.500000
50%    102.500000   400.000000  25.000000
75%    103.250000   675.000000  32.500000
max    104.000000  1200.000000  40.000000


In [75]:
# Left DataFrame
df_left = pd.DataFrame({
    "restaurant_id": [1, 2, 3, 4],
    "name": ["Afton", "Cafe 123", "Pizza Place", "Burger Spot"]
})

# Right DataFrame
df_right = pd.DataFrame({
    "restaurant_id": [1, 2, 4],
    "rating": [4.5, 4.0, 3.8]
})

# Left join on restaurant_id
df_merged = pd.merge(
    df_left,
    df_right,
    on="restaurant_id",
    how="left"
)

df_merged

Unnamed: 0,restaurant_id,name,rating
0,1,Afton,4.5
1,2,Cafe 123,4.0
2,3,Pizza Place,
3,4,Burger Spot,3.8


In [83]:
# Concatenating 2 DataFrames together

df_a = pd.DataFrame({
    "restaurant_id": [1, 2],
    "name": ["Afton", "Cafe 123"]
})

df_b = pd.DataFrame({
    "restaurant_id": [3, 4],
    "name": ["Pizza Place", "Burger Spot"]
})

# Stack rows on top of each other
df_concat_rows = pd.concat([df_a, df_b])

df_concat_rows

Unnamed: 0,restaurant_id,name
0,1,Afton
1,2,Cafe 123
0,3,Pizza Place
1,4,Burger Spot


In [82]:
# You can then reset the index
df_concat_rows = df_concat_rows.reset_index(drop=True)
df_concat_rows

Unnamed: 0,restaurant_id,name
0,1,Afton
1,2,Cafe 123
2,3,Pizza Place
3,4,Burger Spot
