# **MENG 15100** Lab 1

Welcome to the first lab of MENG 15100: Machine Learning and Artificial Intelligence for Molecular Discovery and Engineering

In this lab, you will learn the basics of coding in a **Jupyter Notebook** using **Google Colab** and get an introduction to **Python programming**, one of the most widely used languages in AI development.

This lab starts from first principles—**no prior coding experience is required.** For every exercise, you will be provided with baseline Python code, and you will only be asked to make minor edits or adaptations (e.g., change a variable’s value, adjust the number of iterations, or modify a plot’s formatting).


## Lab Structure and Grading
The lab is organized as follows:
- **Topics** – Broad units (e.g., *1. Jupyter Notebooks, 2. Introductory Python*).
- **Sections** – Subdivisions within each Topic (e.g., *1.2 Editing and Executing Cells*).
- **Problems** – Each Section ends with a Problem to be completed in the Jupyter Notebook. Problems are indicated with the ✅ character along with a listing of the number of points available that is indicative of the level of effort required for the solution. Problems may involve:
  - Short-answer questions
  - Modifications to existing Python code
  - Note: Many Problems contain multiple tasks.


## Table of Contents
### 1. Jupyter Notebook Crash Course

&emsp; 1.1 Types of Cells in a Jupyter Notebook <br>
&emsp; 1.2 Editing and Executing Cells <br>
&emsp; 1.3 Execution order matters! <br>
&emsp; 1.4 The Jupyter Kernel <br>

### 2. Introduction to Python Programming for AI
&emsp; 2.1 Mathematical Operations <br>
&emsp; 2.2 Modules; the `math` Module <br>
&emsp; 2.3 Variables <br>
&emsp; 2.4 Lists <br>
&emsp; 2.5 *For* loops. <br>
&emsp; 2.6 The `numpy` Module <br>
&emsp; 2.7 Logical Comparisons <br>
&emsp; 2.8 *If/else* Statements <br>
&emsp; 2.9 *While* Loops <br>
&emsp; 2.10 Functions <br>
&emsp; 2.11 Plotting; the `matplotlib` module <br>
&emsp; 2.12 Applications in the Molecular Sciences <br>

**TIP:**
You can access the interactive Table of Contents from the panel on the far left. Use the 'bullet list' icon to expand it, and click on any section to jump straight there.

**Make sure to activate the TOC in the left panel now as it will make navigating this workbook much easier.**

<img src="https://github.com/andrewlferguson/MENG15100/blob/main/labs/L1/table_of_contents.png?raw=true" width="600">


# 1 Jupyter Notebook Crash Course

## 1.1 Types of Cells in a Jupyter Notebook

In this course, we’ll work in **Jupyter Notebooks**, an interactive coding environment that combines code, explanations, and results in a single place.

A notebook is organized into cells, which come in two main types:

- **Text cells** – for explanations, instructions, or documentation (like this one).

- **Code cells** – for writing and executing Python code (like the one below).

You’ll alternate between reading text cells to understand concepts and using code cells to experiment, test ideas, and solve problems.

In [None]:
# This is a code cell, which contains Python code.

# The lines in this cell are all *comments*.
# Comments do not run when the cell is executed.
# Instead, they are here to help explain the code
# or share important notes with the reader.
#
# In Python, any line starting with the '#' symbol
# is treated as a comment and ignored by the computer.

**Text cells** can be formatted in many ways, including:  

- **Bold text**  
- *Italic text*  
- Bullet point lists  
- [Links](https://www.youtube.com/watch?v=dQw4w9WgXcQ)  
- …and much more!  

The rules for formatting text are called [Markdown](https://www.markdownguide.org/basic-syntax/).  
You don’t need to master Markdown for this course, but it’s here for reference.  

<br>

**Tip:** You can collapse a section by clicking the small **chevron** icon next to any text cell containing a section header. Consider collapsing each sub-section (e.g. *1.1 Types of Cells in a Jupyter Notebook*) after you complete it to track your progress and make navigation easier!

<img src="https://github.com/andrewlferguson/MENG15100/blob/main/labs/L1/chevron.png?raw=true" width="600">

**Heads up:** Line breaks in Markdown can be tricky at first. Pressing *Enter* in the text editor will **not** create a visible line break when the cell is rendered.
To add a visible break, type the characters `<br>`. See [Line break best practices](https://www.markdownguide.org/basic-syntax/#line-break-best-practices) for more details.  


## 1.2 Editing and Executing Cells



To edit the contents of a **text cell**:  
- **Double-click** the cell to open the text editor.  
- Press **Shift and Return** simultaneously to exit the text editor and render the formatted text.  

To edit the contents of a **code cell**:  
- **Click** inside the cell to open the code editor.  
- Press **Shift and Return** simultaneously to run the code in the cell.  


### ✅ Problem 1: Editing Your First Text Cell [2 points]
<a name='problem1'></a>


**Task:** What is your name? Edit the text cell below by replacing the placeholder text with your answer.  

**Tip:** Double-click the text cell to enter edit mode, type your name, and press **Shift and Return** simultaneously to save your changes and render the formatted text.  

YOUR ANSWER HERE

### ✅ Problem 2: Editing Your First Code Cell [2 points]
<a name='problem1'></a>


**Task:** Modify the python code below to print out "Hello World!".

**Tip:** To print text, we use the **`print()`** function:  
- A **function** performs a specific task (for example, printing text to the output)  
- Functions receive inputs between parentheses:  
  `function(input)`  
- function inputs are also referred to as **arguments**
- For the function `print`, the input/argument is a string of text:  
  `print("string of text to print out")`
- Strings of text must be enclosed in quotation marks
- Later, in section 2.10 we will learn how to write our own functions.
- Python includes many **[built-in functions](https://docs.python.org/3/library/functions.html)**, and `print()` is one of the most commonly used.  

The `print` function will take a string of text as input and print it to the **output section** of the code cell. When you run a code cell, the output appears **directly below** the cell after execution.

**To run the code cell:** click to select the cell, then press **Shift + Return** (or click the “Run” ▶ button in the toolbar).  

In [None]:
# Problem 2: modify the python code below to print "Hello, World"
print("string of text to print out, enclosed by quotation marks")

## 1.3 Execution Order Matters!
  

In Jupyter Notebooks, code cells are usually run from **top to bottom** — but not always!  
You can execute cells in **any order**, and this can affect your results.  

Key points to remember:  
- The output of a cell depends on the **order** in which you run cells, **not** their position in the notebook.  
- Unexpected results can happen if you forget the execution order.  
- You can keep track of the execution order by looking at the **execution counter** in square brackets to the left of the cell. (e.g. `[1]`)
- If you have not yet executed a code cell, the execution counter will be empty (e.g. `[ ]`)

<br>

**Example:** Let’s create our first **variable** in Python:  
- Variables can be **named** (in this case, named `x`).  
- Variables are **assigned** a value using the equals sign (`=`).

**Tip:** Run the code cell below to create the variable `x`, assign it a value, and see the output of the `print(x)` function.  

In [None]:
# Example code to create a variable named x,
# and assign it the value 15100
x = 15100

# Print the value stored in the variable x
print(x)

### ✅ Problem 3: Execution Order [2 points]

**Context:** The three code cells below are labeled **A**, **B**, and **C**. Cells A and B assign different values to the variable x, and cell C prints the value stored in x.

**Task:** Determine the order in which you need to run the cells so that cell **C** prints the number `42`. Type your answer in the text cell below.

For example, if you ran the cells from top to bottom, your answer would be: "A, B, C"

**Tip:** run the cells below in various orders and look at the output to determine the behavior of the code.

**Hint:** In Python, variables can be **overwritten** — if you assign a new value to the same variable name, the old value is replaced.  


YOUR ANSWER HERE

In [None]:
# Cell 'A'

# Assign a number to a variable
x = 42

In [None]:
# Cell 'B'

# Assign a string (text) to a variable
x = "the answer to the ultimate question of life, the universe, and everything"

In [None]:
# Cell 'C'

# Print the value stored in a variable
print(x)

**Tip:** When programmers are unsure about what their code is doing, they often start by **printing variables** to check their values. This technique is called **print debugging**, and it’s a powerful way to understand your program’s internal state. Don’t hesitate to use it!  

**Heads up:** In Jupyter Notebooks, the value of the **last expression** in a code cell is automatically displayed, even without using `print()`. This means the two examples below are equivalent.  


In [None]:
# Explicitly call the print function to display the value of x
x = "MENG 15100"
print(x)

In [None]:
# Or simply place the variable name on the last line of the cell
# Jupyter will automatically display its value
x = "MENG 15100"
x

## 1.4 The Jupyter Kernel



The **kernel** is the “engine” that runs your code.  
It keeps track of two important things:  
- The **order of execution** (which cells were run, and in what order)  
- The **internal state** of all variables in your notebook  

### Restarting the Kernel  
Restarting the kernel will:  
- Clear all variables from memory  
- Reset the execution counter to **0**  

You should **restart the kernel** when:  
- Your variables or notebook are in a confused state (often due to running cells out of order)  
- You want a completely fresh start  

**How to restart the kernel:**  
<img src="https://github.com/andrewlferguson/MENG15100/blob/main/labs/L1/restart.png?raw=true" width="400">

<br>

### Interrupting the Kernel  
Interrupting the kernel will:  
- Immediately stop execution of the current code cell  
- Keep all variable values and execution history intact  

You should **interrupt the kernel** when:  
- Your code is stuck in an **infinite loop** or a very long computation  
- You want to stop a cell without losing your current variable states  

**How to interrupt the kernel:**  


<img src="https://github.com/andrewlferguson/MENG15100/blob/main/labs/L1/interrupt.png?raw=true" width="400">

<br>

**Tip:** A restart is like turning your computer off and back on — everything is cleared.  
An interrupt is like pressing the **pause/stop** button — your work stays, but the current task stops.

### ✅ Problem 4: Interrupt Kernel [2 points]

**Task:** The code cell below contains an **infinite loop** that will never stop on its own.  
Run the cell to start the loop, then **interrupt** the kernel to stop it.  

While the loop is running:  
- The kernel will be busy, and no other cells can run.  
- You’ll need to manually stop execution.  

We’ll learn more about `while` loops in **Section 2: Python Programming**.  

**Tip:** In Google Colab, you can click the **Stop** ⏹ button on the left of the code cell .


In [None]:
# Infinite loop example — will run forever unless interrupted
while True:
  pass

Congratulations, you have completed Section 1.

**Tip:** Consider collapsing the entire section by clicking the chevron next to the section header titled *1 Jupyter Notebook Crash Course*




# 2 Introduction to Python Programming for AI

Python is one of the most popular programming languages in the world — and a favorite in the fields of **Artificial Intelligence (AI)** and **Machine Learning (ML)**.  

In this section, we'll build your Python skills from the ground up, starting with basic math operations and gradually moving toward concepts and tools that are widely used in AI development.  

Our goal is to make you comfortable writing small pieces of Python code, experimenting with them, and understanding how they work — skills that will carry over into more advanced AI and ML topics later in the course.

For a high level overview, let's start by watching this 100 second introduction to Python by executing the cell then clicking the Play button. Don't worry if a lot of this is foreign to you! We will gently walk you through the pieces you need to know as we progress through this notebook.

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('x7X9w_GIm1s')

## 2.1 Mathematical Operations


Python can be used to perform a wide variety of arithmetic calculations.  

Here are some of the most common operations:  

| Operation       | Description                                           |
|-----------------|-------------------------------------------------------|
| `a + b`         | Addition                                               |
| `a - b`         | Subtraction                                            |
| `a * b`         | Multiplication                                         |
| `a / b`         | Division                                               |
| `a ** b`        | Exponentiation (raise a base to a power)               |
| `a // b`        | Integer division (divide and round **down**)           |
| `a % b`         | Modulo operation (divide and return the remainder)     |
| `abs(a)`        | Absolute value function                                |

<br>

**TIP!** Use parenthesis `()` to control the order of arithmetic operations.  
For example, the expression $\frac{10}{3+2}$ is written in python as `10 / (3+2)`


**Heads Up!** The exponentiation operator is **not** the caret symbol (`^`), as is used in many online calculators.

<br>

**Below are some examples of arithmetic calculations in Python using the operators introduced above.**

**Run each code cell to observe the output.**

In [None]:
# addition
2 + 3

In [None]:
# subtraction
2 - 3

In [None]:
# multiplication
2 * 3

In [None]:
# division
2 / 3

In [None]:
# exponentiation (i.e. taking a base to a power)
2 ** 3

In [None]:
# Integer division
10 // 3

In [None]:
# Modulo
10 % 3

In [None]:
# Absolute value
abs(-5)

### ✅ Problem 5: Arithmetic Calculations [5 points]

**Task:** Write Python code that evaluates the following expressions. [Each task is worth 1 point]

1. $5 \times 3 + 20$

2. $\frac{2 + 5 }{ 8 - 6}$

3. $\left| 1 - (4-2)^{10-8} \right|$  (don't miss the absolute value)

4. Print the **quotient** (whole number restult) of $100 \div 7$

5. Print the **remainder** of $100 \div 7$


In [None]:
# Expression 1
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Expression 2
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Expression 3
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Expression 4
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Expression 5
NotImplemented ### YOUR SOLUTION HERE

## 2.2 Modules; the `math` module


What if you want to do more complicated mathematical operations, like $\sqrt{2}$?

These **functions** are part of a **module** named math.
- A module is a collection of related functions and tools bundled together.
- We will use many modules throughout this course, so it’s important to get familiar with how to use them early on.

Before you can use these functions, you need to **import** the module:  
- `import math`
- this line of code only needs to be executed **once** in the jupyter notebook to make the `math` module available in all code cells

Here are a few commonly used functions from the `math` module. For a full list, check the [official math module documentation](https://docs.python.org/3/library/math.html).

| Math Function Call                | Algebraic Expression             | Description                                         |
|---------------------------------|---------------------------------|-----------------------------------------------------|
| `math.sqrt(x)`                  | $$\sqrt{x}$$                    | Square root of \(x\)                                |
| `math.cos(x)`, `math.sin(x)`, `math.tan(x)` | $$\cos(x), \sin(x), \tan(x)$$ | Trigonometric functions (input in radians)         |
| `math.pi`                      | $$\pi = 3.14159 \ldots$$        | The mathematical constant pi                        |
| `math.e`                       | $$e = 2.718 \ldots$$             | Euler’s number (base of natural logarithms)     |
| `math.exp(x)`                  | $$e^{x}$$                       | Exponential function, returns \(e^x\)              |
| `math.log(x, base)`            | $$\log_{b}(x)$$                 | Logarithm of \(x\) to the base \(b\) (default is natural log) |

<br>

**Below are some examples of arithmetic calculations in Python using the `math` module functions introduced above:**

**Run each code cell to observe the output.**

In [None]:
# import the math module (only needs to be executed once)
import math

In [None]:
# square root of 2
math.sqrt(2)

In [None]:
# cosine of -180 degrees (pi radians)
math.cos(math.pi)

In [None]:
# euler's constant to the power of 2
math.exp(2)

In [None]:
# log base 2 of 8
math.log(8, 2)

### ✅ Problem 6: Arithmetic Calculations with the Math Module [4 points]

**Task:** Write Python code that evaluates the following expressions using functions from the `math` module. [Each task is worth 1 point]

1. Calculate the **area** of a circle with radius $r = 3$.

$$ A = \pi r^2 $$

2. Calculate the **distance** between two values, $y_1 = 2, y_2 = 5$  

  (We will frequently encounter the distance function later in this class, as it is used in AI models to compute the model error or 'loss'.)

$$ d(y_1, y_2) = \sqrt{(y_1 - y_2)^2} $$

3. Calculate the value of the **sigmoid function** for $x=0.5$

 (We will frequently encounter the sigmoid function later in this class, as it is used as an activation function in artificial neural networks.)

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

4. Calculate the cosine of 60 degrees.

  (**Hint:** `math.cos(x)` takes input $x$ in **radians**. Use `math.radians()` to convert degrees to radians.)


$$ \cos(60 ^{\circ}) $$



In [None]:
# import the math module (only needs to be executed once)
import math

In [None]:
# Equation 1
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Equation 2
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Equation 3
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Equation 4
NotImplemented ### YOUR SOLUTION HERE

### Other useful module information

**Tip:** Python provides several different ways to import modules. You’ll encounter all of them in this course, so here’s a quick reference list of common import styles and their typical uses.

| Import Style | Description | Example Usage |
|--------------|-------------|---------------|
| `import math` | Imports the entire module. Access functions using the module name. | `math.sqrt(4)` |
| `import math as my_alias` | Imports a module and assigns it a short alias of your choice. Access functions using the alias. | `my_alias.sqrt(4)` |
| `from math import sqrt` | Imports only a specific function from a module. Access the function directly without reference to the module name| `sqrt(4)` |
| `from math import *` | Imports all module functions. Access functions without reference to the module name. (**not recommended** — can cause conflicts). | `sqrt(4)` |

## 2.3 Variables

As we saw in section 1.3, **variables** are used to store and manage data in Python. They allow us to label pieces of data so we can use them later in our programs.

### Key Points About Variables

- **Naming:** Variable names should be descriptive and follow [Python’s naming rules](https://www.w3schools.com/python/python_variables_names.asp) (e.g., must start with a letter, cannot contain spaces, and are case-sensitive).

- **Assigning:** You assign a value to a variable using the equals sign `=`.

- **Data types:** Variables can store different kinds of data, such as:
  - `x = 5` → integer  
  - `x = 3.14159` → float (decimal number)  
  - `x = "Hello World!"` → string (text)  

**Tip:** Choose variable names that clearly describe the data they hold — it makes your code much easier to read and debug.

**Examples of the use of variables are shown below**

**Run the code to see the output**

In [None]:
# Example 1: Variables make calculations easier to read and understand
length = 2
width = 5
area = length * width  # multiply length by width

# print out results
# note: the print() function can take multiple inputs, separated by commas!
# e.g. print(first, second, third)
print("The area of a rectangle with length", length, "and width", width, "is:", area)

In [None]:
# Example 2: Variables can store different data types
name = "John Doe"  # string data type
age = 19           # integer data type
gpa = 3.51         # floating point data type

print("My name is", name)
print("My age is", age)
print("My GPA is", gpa)

### ✅ Problem 7: Using Variables [4 points total]

**Task:** Use Python variables to evaluate two equations commonly used in AI/ML.

#### 1) Equation 1: Linear Regression [2 points]

One of the first machine learning models we will encounter is **linear regression**, which models a linear relationship between inputs $x$ and outputs $y$ with slope $m$ and intercept $b$:

$$ y = m \times x + b $$

Use variables to calculate the value of the linear regression model for:

$$ x = 1, \quad m = 2, \quad b = -1 $$

<div align="center">
<img src="https://github.com/andrewlferguson/MENG15100/blob/main/labs/L1/linear_regression.png?raw=true" width="600">
</div>


#### 2) Equation 2: Gaussian Probability Distribution [2 points]

The Gaussian function is widely used in machine learning to model a bell-shaped probability distribution for the variable $x$. Its equation is:

$$
f(x) = \frac{1}{\sigma \sqrt{(2 \pi)} } e^{-\frac{(x - \mu)^2}{2\sigma^2}}
$$



where $\mu$ is the mean (center) and $\sigma$ is the standard deviation (width).

**Heads up**: both the $2$ and $\pi$ are underneath the square root; sometimes the overbar is not correctly rendered.

<div align="center">
<img src="https://github.com/andrewlferguson/MENG15100/blob/main/labs/L1/gaussian.png?raw=true" width="600">
</div>


This expression is long, so it’s a good idea to use **intermediate variables** to break the calculation into multiple lines. In addition to $x$, $\mu$, and $\sigma$, introduce the `coefficient`, and the `exponent` as shown below:

$$ \text{gaussian} = \text{coefficient} * e^{\text{exponent}} $$

Using these variables, calculate the value of the gaussian function function with the following inputs:
$$x = 1.5, \quad \mu = 2, \quad \sigma = 1$$

In [None]:
# Equation 1
# Replace NotImplemented with your values or expressions
x = NotImplemented    # input value
m = NotImplemented    # slope
b = NotImplemented    # intercept
y = NotImplemented    # result of the equation; use the variables defined above to calculate y
print(y)

In [None]:
# Equation 2
# Replace NotImplemented with your values or expressions
x = NotImplemented            # input value
mu = NotImplemented           # mean (center)
sigma = NotImplemented        # standard deviation (width)
coefficient = NotImplemented  # use the variables defined above to calculate coefficient
exponent = NotImplemented     # use the variables defined above to calculate exponent
gaussian = NotImplemented     # use the variables defined above to calculate gaussian
print(gaussian)

## 2.4 Lists


Suppose we are working with an AI model that uses the **molecular weight** of different compounds as training data. Each **training data point** is the weight (in grams per mole, g/mol) for one molecule.

We **could** store these values in separate variables, for example:

```python
water_weight = 18.02           # g/mol
carbon_dioxide_weight = 44.01   # g/mol
oxygen_weight = 32.0            # g/mol
```

However, this quickly becomes unwieldy as the number of variables increases.

#### Storing data in a list

Instead, we can use **lists** — Python’s built-in way to store multiple items in a single variable:

In [None]:
# store multiple data points in a list
molecule_weights = [18.02, 44.01, 32.0]
print(molecule_weights)

Lists are **ordered collections** of items, separated by commas and enclosed in square brackets `[]`. A single list can hold:
- Numbers (integers or floats)
- Strings
- Even other lists (lists of lists)

#### Accessing items in a list
You can get individual list elements using **indexing**.

In Python, **indexing starts at 0**, so the element in the beginning of the list has index = 0. Let's say that again with emphasis:

🚨 **In Python, indexing starts at 0**! 🚨

This is a common point of confusion for new Python programmers. It might not seem like it right now, but this is actually a very logical convention that makes your Python programming life so much better in so many ways. Trust us!

In [None]:
# use indexes to access each element individually
print(molecule_weights[0])  # First item → 18.02
print(molecule_weights[1])  # Second item → 42
print(molecule_weights[2])  # Third item → 32

#### Changing items in a list

Lists are **mutable**, meaning you can change their contents:

In [None]:
molecule_weights[1] = 42  # Update the second molecule’s weight
print(molecule_weights)

#### Useful list functions
Python has many **built-in functions** to work with lists:

| Function | Description |
|----------|-------------|
| `len(list)` | Returns the number of items in the list |
| `max(list)` | Returns the largest value in the list |
| `min(list)` | Returns the smallest value in the list |
| `sum(list)` | Returns the sum of all values in the list (works only for numeric lists) |
| `sorted(list)` | Returns a new list with items sorted in ascending order |

### ✅ Problem 8: Working with Molecular Weight Lists [5 points]

In this problem, you will practice creating and manipulating Python lists using molecular weight data.

---

**Context:**  We have measured the **molecular weight** (in grams per mole, g/mol) for a number of molecules.

| **Molecule**      | **Formula**  | **Molecular Weight (g/mol)** |
|-------------------|--------------|------------------------------|
| Water             | H₂O          | 18.02                      |
| Carbon Dioxide    | CO₂          | 44.01                        |
| Oxygen            | O₂           | 32.00                        |
| Nitrogen          | N₂           | 28.02                        |
| Methane           | CH₄          | 16.04                        |
| Glucose           | C₆H₁₂O₆      | 180.16                       |
| Ethanol           | C₂H₅OH       | 46.07                        |
| Sodium Chloride   | NaCl         | 58.44                        |
| Ammonia           | NH₃          | 17.03                        |
| Sulfur Dioxide    | SO₂          | 64.06                        |
| Acetone           | C₃H₆O        | 58.08                        |
| Hydrogen Peroxide | H₂O₂         | 34.02                        |
| Methanol          | CH₃OH        | 32.04                        |
| Acetic Acid       | C₂H₄O₂       | 60.05                        |

A list of the molecuar weights is provided in the python variable ```molecular_weights``` in the code cell below.

---

**Tasks:** [Each task is worth 1 point]

1. **Access** the molecular weight of oxygen using list indexing and print the value. (remember: list indexes start at 0)
3. **Print** the number of molecules in the list.
4. **Print** the heaviest molecule’s weight.
5. **Print** the lightest molecule’s weight.  
6. **Calculate and print** the **average** molecular weight.

**Hint:** Are there built-in functions that can accomplish these tasks?

In [None]:
# Variables (provided)
# Run this cell to initialize the variables
weights = [18.02, 44.01, 32.00, 28.02, 16.04, 180.16, 46.07, 58.44, 17.03, 64.06, 58.08, 34.02, 32.04, 60.05]
print(weights)

In [None]:
# Task 1: Access the molecular weight of oxygen and print the value
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Task 2: Print the number of molecules in the list
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Task 3: Print the heaviest molecule's weight
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Task 4: Print the lightest molecule's weight
NotImplemented ### YOUR SOLUTION HERE

In [None]:
# Task 5: Calculate and print the average molecular weight
NotImplemented ### YOUR SOLUTION HERE

## 2.5 'For' Loops



When working with lists in Python, you often want to perform the same action for each element in the list.  
Instead of repeating the same line of code many times, we can use a **for loop**.

A `for` loop lets you **iterate** (loop through) each item in a sequence (such as a list) and execute a block of code for each element.

The general structure is:

    for variable in list:
        # code to execute for each item

- `variable` is a name you choose to represent the current item in the sequence.
- `list` is a list variable or other iterable data type.

In [None]:
molecule_weights = [18.02, 28.01, 44.01]  # H2O, N2, CO2

for weight in molecule_weights:
    print("Molecular weight:", weight)

In the example above:
- The loop goes through the list `molecule_weights`.
- On each iteration, the variable `weight` takes the value of the next element in the list.
- The `print` statement runs once for each element.

**Tip:**  **Indentation is critical in Python!**
All the code that is part of the loop must be indented. When the indentation stops, the body of the loop ends. Here is an example of the indentation structure for blocks of python code inside `for` loops:

```python
for item in list:
  # inside the for loop (indented)
  # also part of the foor loop (indented)

# outside the for loop (not indented)
```

Either tabs or spaces can be used for indentation, but you can't mix both!

Python's official style guide, PEP 8, strongly recommends using spaces for indentation, specifically four spaces per indentation level.

There are several common ways to use a `for` loop in Python:

### 1. Looping over items in a list
If you want to access each element in a sequence directly, you can loop using the `for item in list:` syntax.
- Here, the loop variable `molecule` takes the value of each element in the list in turn.

In [None]:
molecules = ["Water", "Ethanol", "Ammonia"]
for molecule in molecules:
    print(molecule)


### 2. Looping over a range of numbers
The built-in `range()` function generates a sequence of numbers, which is often used for looping a specific number of times.
- By default, `range(n)` starts at 0 and ends at `n-1`. Remember, Python indexing starts at zero!
- You can also specify a start, end, and step size: `range(start, stop, step)`.


In [None]:
# loop 5 times
for i in range(5):
    print("Iteration:", i)

The range-based `for` loop can be used to access elements in a list by *index*:

In [None]:
molecules = ["Water", "Ethanol", "Ammonia"]
number_of_molecules = len(molecules)

for i in range(number_of_molecules):
    print("index:", i, "molecule:", molecules[i])


### 3. Looping over multiple lists in parallel using `zip()`
The `zip()` function combines two or more lists, allowing you to loop over them together.
- On each iteration, `molecule` and `bp` take values from the same index in their respective lists.

In [None]:
molecules = ["Water", "Ethanol", "Ammonia"]
boiling_points = [100.0, 78.4, -33.3]

for molecule, bp in zip(molecules, boiling_points):
    print(molecule, "boils at", bp, "°C")

### 4. Looping with index and value using `enumerate()`
If you need both the index and the value from a sequence, use `enumerate()`.


In [None]:
molecules = ["Water", "Ethanol", "Ammonia"]

for index, molecule in enumerate(molecules):
    print("Molecule #", index, "is", molecule)



**Summary Table of `for` Loop Variants:**

| Syntax                               | Use Case                                 |
|--------------------------------------|------------------------------------------|
| `for item in list:`                  | Loop over items directly                  |
| `for i in range(n):`                 | Loop a fixed number of times              |
| `for a, b in zip(list1, list2):`    | Loop over multiple sequences in parallel  |
| `for i, item in enumerate(list):`   | Loop with both index and value            |


### ✅ Problem 9: Boiling Point AI Model Error [5 points]

**Context:**  
A research group has trained a new AI model to predict the **boiling point** of various compounds based on their chemical structure (e.g. molecular weight).
You are tasked with evaluating how well this model is performing by comparing its predictions to known experimental molecular boiling points.

Below is a table of molecules, their true boiling points (**targets**), and the model’s predicted values (**predictions**):

| Molecule      | True Boiling Point (°C) | Predicted Boiling Point (°C) |
|---------------|------------------------|------------------------------|
| Water         | 100.0                  | 98.0                         |
| Ethanol       | 78.4                   | 79.5                         |
| Ammonia       | -33.3                  | -30.0                        |
| Acetone       | 56.1                   | 60.0                         |
| Benzene       | 80.1                   | 78.0                         |

---

In order to evaluate how well the model is performing, we will calculate two common error metrics:  
- **Mean Absolute Error (MAE)**  
- **Mean Squared Error (MSE)**  

The formulas are:  

$$ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right| $$  

$$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 $$  

Where:  
- $ y_i $ = true boiling point  
- $ \hat{y}_i $ = predicted boiling point  
- $ n $ = number of data points (i.e. number of molecules in the dataset).  

And:
- $ \left| y_i - \hat{y}_i \right| $ is the absolute error for molecule $i$
- $\left( y_i - \hat{y}_i \right)^2 $ is the squared error for molecule $i$
---

**Task:**  
Modify the code below to calculate the **MAE** and **MSE** of the AI model. The calculation of **MAE** and **MSE** involves a summation, which we will accomplish using a `for` loop. Inside the for loop, we will calculate the absolute error and squared error for each molecule. The `for` loop will iterate through all the molecules and sum their errors together. Finally, we will need to divide by $n$, the number of molecules, to obtain an average (mean) error.

The code provided below uses a `for` loop to iterate through the list of true boiling points and the list of predicted boiling points for each molecule. Modify the code below so it accomplishes the following objectives:

1. In the `for` loop, calculate the `absolute_error` for each molecule (each pair of true and predicted boiling points). This value will be accumulated (added) into a running total, `total_absolute_error`.
2. In the `for` loop, calculate the `squared_error` for each molecule (each pair of true and predicted boiling points). This value will be accumulated (added) into a running total, `total_squared_error`.
3. After the `for` loop, divide the `total_absolute_error` and `total_squared_error` by $n$, the number of molecules, to obtain the mean absolute error (MAE) and mean squared error (MSE).

**Tip:** In the code below we introduce a super userful new operator, `+=` which adds the value on the right hand side of the operator to the variable on the left. `x += y` is exactly equivalent to `x = x + y` but is simpler, more readable, and saves some typing.

**Tip:** The number of data points $n$ is equal to the length of the true boiling point list. Can you use the `len` function to calculate this?

In [None]:
# Variables (provided)
# Run this cell to initialize the variables
true_boiling_points = [100.0, 78.4, -33.3, 56.1, 80.1]   # °C
predicted_boiling_points = [98.0, 79.5, -30.0, 60.0, 78.0] # °C

In [None]:
total_absolute_error = 0
total_squared_error = 0

for true_value, predicted_value in zip(true_boiling_points, predicted_boiling_points):
  # Task 1: calculate and accumulate the absolute error for each boiling point prediction
  absolute_error = NotImplemented ### YOUR SOLUTION HERE
  total_absolute_error += absolute_error

  # Task 2: calculate and accumulate the squared error for each boiling point prediction
  squared_error = NotImplemented ### YOUR SOLUTION HERE
  total_squared_error += squared_error

# Task 3: divide the total absolute and total squared error by n
# Tip: can you use a built-in list function to calculate n?
MAE = NotImplemented ### YOUR SOLUTION HERE
MSE = NotImplemented ### YOUR SOLUTION HERE

print("the MAE is:", MAE)
print("the MSE is", MSE)

## 2.6 Introduction to `numpy`


**NumPy** (short for *Numerical Python*, pronounced *Num-pie*) is a collection of powerful Python modules for numerical computing.  
Before we see why it’s so useful, let’s look at a limitation of **basic Python lists**.

### Motivation: Python Lists Are Not Mathematical Vectors
In mathematics, adding two vectors means adding their elements component-wise,  

$$\mathbf{a} = [1, 2, 3]$$  
$$\mathbf{b} = [4, 5, 6]$$   
$$\mathbf{a} + \mathbf{b} = [1+4, \; 2+ 5, \; 3 + 6] = [5, \; 7, \; 9]$$


and multiplying a vector by a scalar means scaling each element.

$$c = 10 $$  
$$\mathbf{a} = [1, 2, 3]$$  
$$c \cdot \mathbf{a} = [10 \cdot 1, \; 10 \cdot 2, \; 10 \cdot 3] = [10, 20, 30]$$

However, Python lists don’t behave this way:


In [None]:
# initialize a, b, c
a = [1, 2, 3]
b = [4, 5, 6]
c = 10

In [None]:
# Adding lists in Python concatenates them
print(a + b)

In [None]:
# Multiplying a list by a number repeats it 'c' times
print(c * a)

In [None]:
# Division throws an error!
print(weights_list / 3)

This is **not** how mathematical vectors behave, which can be frustrating when doing scientific or AI computations.

---

### NumPy to the Rescue
**NumPy** provides the **numpy array** object, which behaves like a true mathematical vector or matrix.
- with NumPy arrays, operations like `+` and `*` work element-wise:

In [None]:
# import numpy module
# typically the alias 'np' is used for convenience
import numpy as np

In [None]:
# then, create a numpy array
# np.array(input_list) takes a python list and returns a numpy array
a_array = np.array(a)
b_array = np.array(b)

print(a_array)
print(b_array)

In [None]:
# Element-wise addition
print(a_array + b_array)

In [None]:
# Scalar multiplication
print(c * a_array)

In [None]:
# Scalar Division
print(a_array / 2)

### Creating NumPy Arrays

Numpy arrays can be created directly from a basic python list using the `np.array(python_list)` function
- `np.array(python_list)` returns a numpy array with the same elements as the input `python_list`

In [None]:
# From a Python list
molecular_weights = [18.015, 46.07, 17.031]
weights_array = np.array(molecular_weights)

print(weights_array)

Numpy arrays can also be created using several **built-in** functions:

| Function | Description |
|----------|-------------|
| `np.zeros(shape)` | Create an array filled with zeros. |
| `np.ones(shape)` | Create an array filled with ones. |
| `np.linspace(start, stop, num)` | Create an array with a specified number of evenly spaced values between two points. |
| `np.array(data)` | Create an array from a Python list or tuple. |


In [None]:
# list of zeros
zeros_array = np.zeros(5)
print(zeros_array)

In [None]:
# list of ones
ones_array = np.ones(5)
print(ones_array)

In [None]:
# list of evenly spaced
linspace_array = np.linspace(0, 1, 5)  # from 0 to 1, 5 points
print(linspace_array)

### NumPy Mathematical Functions (and how they differ from `math`)

Both Python’s built-in `math` module and NumPy offer mathematical functions, but they serve **different purposes**:

- **`math`** works on **single numbers (scalars)**.
- **NumPy** works on **arrays (vectors, matrices)** and applies operations **element-wise** (vectorized), which is faster and more convenient for scientific computing.


### Element-wise operations vs. scalar operations


In [None]:
import math
import numpy as np

x_scalar = 2.0
x_array  = np.array([1.0, 4.0, 9.0, 16.0])

In [None]:
# math → scalar
math.sqrt(x_scalar)      # ✅ works

In [None]:
# math → array
math.sqrt(x_array)       # ❌ TypeError (math expects a number, not an array)

In [None]:
# NumPy → scalar
np.sqrt(x_scalar)         # ✅ works

In [None]:
# NumPy → array
np.sqrt(x_array)          # ✅ works

Here are a list of common mathematical functions in numpy.
- notice that they are very similar to the functions in the `math` module, except that they can operate on `numpy` arrays

| NumPy Function      | Mathematical Expression | Description |
|---------------------|-------------------------|-------------|
| `np.sqrt(x)`        | $\sqrt{x}$              | Square root of each element in `x`. |
| `np.square(x)`        | $\sqrt{x}$              | Square  of each element in `x`. |
| `np.abs(x)`         | $|x|$                   | Absolute value of each element in `x`. |
| `np.exp(x)`         | $e^{x}$                 | Exponential function applied element-wise. |
| `np.log(x)`         | $\ln(x)$                | Natural logarithm applied element-wise. |
| `np.log10(x)`       | $\log_{10}(x)$          | Base-10 logarithm applied element-wise. |
| `np.sin(x)`         | $\sin(x)$               | Sine of each element in radians. |
| `np.cos(x)`         | $\cos(x)$               | Cosine of each element in radians. |
| `np.tan(x)`         | $\tan(x)$               | Tangent of each element in radians. |
| `np.power(x, y)`    | $x^y$                   | Element-wise exponentiation. |
| `np.mean(x)`        | $\frac{1}{n} \sum_{i=1}^{n} x_i$ | Mean (average) of elements in `x`. |
| `np.sum(x)`         | $\sum_{i=1}^{n} x_i$    | Sum of all elements in `x`. |
| `np.min(x)`         | $\min(x)$               | Minimum value in `x`. |
| `np.max(x)`         | $\max(x)$               | Maximum value in `x`. |

### ✅ Problem 10: Boiling Point AI model revisited with numpy [7 points]

In this problem, we will return to our boiling point AI model and use NumPy arrays along with built-in NumPy functions to re-calculate the MAE and MSE. This will demonstrate both the efficiency and simplicity of NumPy compared to raw Python list processing.

---

We will re-use the same **target** and AI model **prediction** data from Problem 9:

| Molecule      | True Boiling Point (°C) | Predicted Boiling Point (°C) |
|---------------|------------------------|------------------------------|
| Water         | 100.0                  | 98.0                         |
| Ethanol       | 78.4                   | 79.5                         |
| Ammonia       | -33.3                  | -30.0                        |
| Acetone       | 56.1                   | 60.0                         |
| Benzene       | 80.1                   | 78.0              

We will also re-calculate the same **MAE** and **MSE** error metrics with the equations reproduced below:

$$ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right| $$  

$$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 $$  

Where:  
- $ y_i $ = true boiling point  
- $ \hat{y}_i $ = predicted boiling point  
- $ n $ = number of data points (i.e. number of molecules in the dataset).  

And:
- $ y_i - \hat{y}_i $ is the **error** for molecule $i$
- $ \left| y_i - \hat{y}_i \right| $ is the **absolute error** for molecule $i$
- $\left( y_i - \hat{y}_i \right)^2 $ is the **squared error** for molecule $i$

**Task:**  

1.Convert the python lists `true_boiling_points` and `predicted_boiling_points` to numpy arrays

2.Use NumPy to compute an `errors` array, where each element is the **error** between the predicted boiling point and the corresponding true boiling point. (not the absolute error or squared error, just the **error**).

**Tip:**  
- You can subtract NumPy arrays directly to get an array of errors in one line of code, without using a `for` loop.

3.From the `errors` array, calculate the **mean absolute error** (**MAE**) and **mean squared error** (**MSE** ). Use **NumPy array operations** instead of a `for` loop.  

**Tips:**  
- Use `np.abs()` to take the absolute value of every element in a numpy array
- Use `np.square()` to take the square of every element in a numpy array
- Use `np.mean()` to compute the average of a numpy array.

In [None]:
# Variables (provided)
# Run this cell to initialize the variables
true_boiling_points = [100.0, 78.4, -33.3, 56.1, 80.1]   # °C
predicted_boiling_points = [98.0, 79.5, -30.0, 60.0, 78.0] # °C

In [None]:
# Task 1
true_boiling_points_array = NotImplemented ### YOUR SOLUTION HERE
predicted_boiling_points_array = NotImplemented ### YOUR SOLUTION HERE
print(true_boiling_points_array)
print(predicted_boiling_points_array)

In [None]:
# Task 2
errors = NotImplemented ### YOUR SOLUTION HERE
print(errors)


In [None]:
# Task 3
MAE = NotImplemented ### YOUR SOLUTION HERE
MSE = NotImplemented ### YOUR SOLUTION HERE

print("the MAE is:", MAE)
print("the MSE is", MSE)

## 2.7 Logical Comparisons


Up to this point, we’ve mostly been using Python like a calculator.  
However, the real power of programming comes from **controlling the flow of a program** based on conditions.

With **logic flow**, we can:
- Evaluate whether a statement is `True` or `False`
- Change what the program does depending on the result

### Boolean Variables
To make this possible, we use **boolean variables** — a special data type that can only hold two values:  
- `True`  
- `False`

Often, boolean values are created by performing a **logical comparison** between two values.

---

### Logical Comparison Operators

| Operator | Example      | Description                  |
|----------|--------------|------------------------------|
| `==`     | `a == b`     | Equal to                     |
| `!=`     | `a != b`     | Not equal to                 |
| `<`      | `a < b`      | Less than                    |
| `>`      | `a > b`      | Greater than                 |
| `<=`     | `a <= b`     | Less than or equal to        |
| `>=`     | `a >= b`     | Greater than or equal to     |
| `and`    | `a and b`    | Logical AND                  |
| `or`     | `a or b`     | Logical OR                   |
| `not`    | `not a`      | Logical NOT (negation)       |

---

**TIP:**  
- The **assignment operator** (`=`) sets a value to a variable.  
- The **equality operator** (`==`) compares two values and returns `True` or `False`.  


**Below are examples of boolean variables and comparisons**

In [None]:
# a boolean variable assigned to be True
I_am_a_student = True
print(I_am_a_student)

In [None]:
# a boolean variable assigned to be False
I_am_a_nobel_prize_winner = False
print(I_am_a_nobel_prize_winner)

In [None]:
a = 10
b = 20

In [None]:
# a is equal to b
print(a == b)

In [None]:
# a is not equal to b
print(a != b)

In [None]:
# a is less than b
print(a < b)

In [None]:
# a is greater than or equal to b
print(a >= b)

### ✅ Problem 11: Conditional operations [4 points]

Logical comparisons are widely used in AI and Machine Learning — for example, when checking whether a model’s accuracy is high enough to be acceptable.

---
**Context**.
In this problem, you’ll work with two AI models that have been trained to classify molecular structures.  
Their classification accuracies are shown below:

| Model Name | Accuracy |
|------------|----------|
| Model A    | 87 %   |
| Model B    | 92 %    |

---

**Task:**  
Using the variables provided, write logical comparisons to answer the following questions:

1. Check if **Model A’s accuracy** is greater than **Model B’s accuracy**.
2. Check if **Model B’s accuracy** is greater than or equal to `0.90`.
3. Check if **both models** have accuracy above `0.85`.
4. Check if **at least one** of the models has accuracy above `0.90`.

**Tip:** Wrap your logical comparisons inside a `print` statement so that once they evaluate they will print out either `True` or `False`. For example, `print(a >= b)`





In [None]:
# Variables (provided)
# Run this cell to initialize the variables
model_a_accuracy = 0.87
model_b_accuracy = 0.92

In [None]:
# Task 1
print(NotImplemented) ### YOUR SOLUTION HERE

In [None]:
# Expression 2
print(NotImplemented) ### YOUR SOLUTION HERE

In [None]:
# Expression 3
print(NotImplemented) ### YOUR SOLUTION HERE

In [None]:
# Expression 4
print(NotImplemented) ### YOUR SOLUTION HERE

## 2.8 If/Else Statements

Now that we understand boolean variables, we can introduce our first **logical flow control structure**: the **if/else** statement.

An **if/else** statement lets your program make decisions and run different blocks of code depending on whether a **boolean variable** is `True` or `False`.

### Python Syntax Example

    if boolean_condition:
        # Code to run if the condition is True
    else:
        # Code to run if the condition is False

### In Plain English:
- The `if` keyword checks the value of a **boolean condition**.  
- If the condition is `True`, the indented block under `if` executes.  
- If the condition is `False`, the indented block under `else` executes instead.  

**TIP:**  
The following `if` statements are equivalent in Python:  

    if condition == True:
        # Execute this code

    if (condition == True):
        # Execute this code

    if condition:
        # Execute this code

---

**Next:** Let’s look at some practical examples of if/else statements in action.

In [None]:
# A standard if/else statement

I_understand = False # what happens if you change this to True?

if I_understand == True:
  print("Let's move on!")
else:
  print("More practice necessary")

In [None]:
# The 'else' part of the condition is optional!
my_AI_model_is_sentient = False

if my_AI_model_is_sentient:
  print("Terminate program immediately!")

### ✅ Problem 12: Model Evaluation with If/Else [4 points total]

#### **Context**   
In this problem, you’ll work with a model trained to predict molecular solubility.  
You will decide whether the model's predictions are "good" based on its accuracy.

| Model Name   | Accuracy |
|--------------|----------|
| SolubNet-1   | 0.78     |

<br>
  
**Task 1 [2 points]:** Modify the code below so the **if/else** statement.
- Prints `"Good model!"` if the accuracy is **greater than or equal to 0.8**.  
- Prints `"Model needs improvement."` otherwise.



In [None]:
# Variables (provided)
# Run this cell to initialize the variables
accuracy = 0.78

In [None]:
# Task 1:

if NotImplemented: ### YOUR SOLUTION HERE
  print("Good model!")
else:
  print("Model needs improvement.")

**Task 2 [2 points]:** We would like to be more descriptive of our model's performance. In the code cell below, update the **if/else** statments to print out the correct performance rating for a model with a given accuracy (between 0 and 1). The ratings follow a traditional grading scheme, adapted for model evaluation.

$$ A: [90 \% -100 \%] $$
$$ B: [80 \% -90 \%) $$
$$ C: [70 \% -80 \%) $$
$$ D: [60 \% -70 \%) $$
$$ F: [0 \% -60 \%) $$

Here, square brackets $[,]$ indicate the boundary value is included in the range, and round parentheses $)$ indicate the boundary value is not included. For example, a model with accuracy exactly 0.90 earns an **A**, not a **B**.

**Tip** - If statements can be chained using the `elif` keyword, which is shorthand for `else if`

```
if condition1:
  # code that executes if condition1 is true
elif condition2:
  # code that executes if condition2 is true
else:
  # code that executes if neither condition1 or condition2 were true
```

In [None]:
# Variables (provided)
accuracy = 0.78

In [None]:
# Task 2

if NotImplemented: ### YOUR SOLUTION HERE
    grade = "A"
elif NotImplemented: ### YOUR SOLUTION HERE
    grade = "B"
elif NotImplemented: ### YOUR SOLUTION HERE
    grade = "C"
elif NotImplemented: ### YOUR SOLUTION HERE
    grade = "D"
else:
    grade = "F"

print(grade)

## 2.9 While Loops (Logical Flow)


A **while loop** is another way to control the flow of a program.  
It repeatedly executes a block of code **as long as** a specified boolean condition remains `True`.

The general syntax is:

    while loop_condition == True:
        # This code loops repeatedly

### How to exit a while loop

There are two common ways to break out of a `while` loop:

#### 1. Update the loop condition inside the loop
By changing the value of the condition variable to `False`, the while loop will exit after completing that loop.

    while loop_condition == True:
        # some code

        if exit_condition:
            loop_condition = False  # stop the loop after this iteration

        # more code (this code **will** be executed on the last iteration)

#### 2. Use the `break` keyword
The `break` statement immediately stops the loop, regardless of the condition.

    while loop_condition == True:
        # some code

        if exit_condition:
            break  # stop the loop immediately

        # more code (this code **will not** be executed on the last iteration)

  



### ✅ Problem 13: Model training with while loops [2 points]

**Context:**  
In real AI training scenarios, models are often iteratively improved over many training cycles until they reach a desired accuracy.  
Here, we simulate that process using a `while` loop. Each loop iteration increases the model’s accuracy by 10%, mimicking gradual improvements over time.

**Tip:** Remember, the operator `+=` adds the value on the right-hand side to the variable on the left.  
For example:

    model_accuracy += 10

is equivalent to:

    model_accuracy = model_accuracy + 10

---

**Task:** Update the `while` statement so that the training process stops when the model accuracy is **exactly 80%**.

**Tip:** If your loop keeps running and doesn't stop as expected, you can interrupt the kernel (see Section 1.4) to halt execution.
If the output is especially long, you can also use "Clear output" from the code cell's output dropdown menu accessible from ⋮ to tidy things up.


In [None]:
# Task
model_accuracy = 0

while NotImplemented: ### YOUR SOLUTION HERE
  model_accuracy += 10     # the model improves with training!
  print("Model accuracy is: ", model_accuracy, "%.")

## 2.10 Functions




Functions let us package a sequence of steps into a reusable block of code.  
They help make programs **clearer**, **shorter**, and **easier to test**.

A function can:
- take **inputs** (called *parameters* or *arguments*),
- do some computation,
- and optionally **return** a result.

```
def function_name(input):
    output = # do some computation with the input.
    return output
```

For example, the simple function below calculates the area of a circle with radius $r$:
```
def area_of_circle(r):
    area =  math.pi * (r ** 2)
    return area
```


### When is the code inside a function executed?
A **function definition** tells Python *what* the function should do when it is used,  
but **does not run that code immediately**.  
The function’s code only runs when you **call** the function by its name followed by parentheses, e.g.:

In [None]:
import math # load math module to make math.pi available

def area_of_circle(r):          # r is a parameter
  area =  math.pi * (r ** 2)
  print("hello, from inside the function!")
  return area                   # value sent back to the caller

# Nothing is executed yet — we have only defined the function.
# Notice how the print statement inside the function definition was not executed.
# Thus, there is no "hello, from inside the function!" output from this cell.

In [None]:
# Now we call the function and store it in a variable.
area_result = area_of_circle(2)

# Notice how the print statement inside the function was executed
# Thus, there is "hello, from inside the function!" visible in the cell output.

In [None]:
# we can also print out the result returned by the function:
print(area_result)

The number of inputs and outputs can also be varied, see the examples below:

In [None]:
# Multiple inputs:
def add(a, b):             # a and b are two parameters, separated by a comma
  result = a + b
  return result            # value sent back to the caller

print(add(5, 20))          # call the 'add' function with arguments 5 and 20

In [None]:
# We've already seen an example of multiple inputs:
# The print statement will accept any number of input strings, separated by commas.
print("this is an example", "of calling print with two inputs!")

In [None]:
# No inputs or outputs
def greet():
  print("Hello!")

# notice how even with zero inputs,
# parenthesis are still needed to call the function!
greet()

In [None]:
# Multiple outputs
def area_and_perimeter(r):
  # calculates area and perimeter of circle with radius r
  area = math.pi * (r ** 2)
  perimeter = 2 * math.pi * r
  return area, perimeter


# Notice how multiple outputs are returned as a **list**
returned_list = area_and_perimeter(1)
print(returned_list)

# outputs can be individually accessed from the returned list
area_returned = returned_list[0]
perimeter_returned = returned_list[1]

### ✅ Problem 14: Implement Linear and Logistic models [4 points total]
**Context:**  
In machine learning, a **model** takes an input (or inputs), applies some learned parameters, and produces an output — called a **prediction**.  
For example, a very simple model could predict a molecular property (such as solubility) based on one or more molecular descriptors.  

Below are two common simple models:  

1. **Linear Regression (1D)** [2 points]
$$  y = m \times x + b $$

where $m$ is the slope, $b$ is the intercept, and $x$ is your input descriptor value.  

2. **Logistic Regression (1D)**  [2 points]
$$ y = \frac{1}{1 + e^{-(m \times x + b)}} $$  
where the output $y$ is between 0 and 1, representing a probability.  

---

**Task:**  
Write two Python functions:  
1. `linear_model(x, m, b)` — computes the output of the linear regression model for a single $x$.  
2. `logistic_model(x, m, b)` — computes the output of the logistic regression model for a single $x$.  

evaluate both models for:  
- $x = 2.5$  
- $m = 1.2$  
- $b = -0.5$  

**Tip:** Might it be helpful to call the linear model **within** the logistic function?


In [None]:
# Function 1: Linear model
def linear_model(x, m, b):
  NotImplemented ### YOUR SOLUTION HERE

print(linear_model(2.5, 1.2, -0.5))

In [None]:
# Function 2: Logistic model
def logistic_model(x, m, b):
  NotImplemented ### YOUR SOLUTION HERE

print(logistic_model(2.5, 1.2, -0.5))

## 2.11 Plotting with the `matplotlib` module


So far, we’ve worked with numbers, variables, and functions — but data becomes far more useful when we can **visualize** it.  
In Python, one of the most widely used libraries for creating plots and charts is **[Matplotlib](https://matplotlib.org/)**.

We’ll use Matplotlib’s `pyplot` module (often imported as `plt`) to create figures, plot data, and display the results.

To import: `import matplotlib.pyplot as plt`

### Why visualize data?
- Helps identify **patterns** and **relationships** in the data.
- Makes results **easier to interpret** and explain.
- Is a **critical step** in machine learning, where we often visualize training progress, model predictions, or statistical distributions.

### Example:
The code below displays a simple plot of $ y = x^2 $ between -5 and 5

In [None]:
import matplotlib.pyplot as plt
import numpy as np


x_values = np.linspace(-5, 5, 100)     # x is a list of 100 evenly spaced values from -5 to 5
y_values = x_values ** 2               # y is a list of those 100 x values, squared

plt.plot(x_values, y_values)           # plot a line passing through the 100 (x,y) values
plt.xlabel("x")                        # Label x-axis
plt.ylabel("y")                        # Label y-axis
plt.title("Example Plot: y = x^2")
plt.grid(True)                         # Show a grid
plt.show()                             # Display the plot

Matplotlib’s `pyplot` module works like a **state machine** — it keeps track of the "current" figure and plot settings until you change them or start a new figure.
- Think of it like a painter’s canvas: once you create it, you can keep adding strokes (plots, labels, legends) until you decide to clear it or start a fresh canvas.
- Each call to functions like `plt.plot()` or `plt.title()` adds elements to the current figure.
---

Here is a table of common matplotlib commands:

| Command | Description |
|---------|-------------|
| `import matplotlib.pyplot as plt` | Imports the Matplotlib `pyplot` interface. |
| `plt.figure()` | Creates a new figure (canvas) for plotting. |
| `plt.plot(x, y)` | Plots a line graph of `y` versus `x`. |
| `plt.scatter(x, y)` | Creates a scatter plot of `y` versus `x` |
| `plt.xlabel("Label")` | Adds a label to the x-axis. |
| `plt.ylabel("Label")` | Adds a label to the y-axis. |
| `plt.title("Title")` | Adds a title to the plot. |
| `plt.legend()` | Displays a legend for labeled plot elements. |
| `plt.grid(True)` | Adds a grid to the plot. |
| `plt.xlim(min, max)` | Sets the range of the x-axis. |
| `plt.ylim(min, max)` | Sets the range of the y-axis. |


#### Styling plots

The `plt.plot` command can take in many additional, optional arguments that control the style of the plot, for example:

In [None]:
# change the line color
plt.plot(x_values, y_values, color="red")
plt.title("Change the line color")

In [None]:
# Change the line graph style (dashed line)
plt.plot(x_values, y_values, linestyle='--')
plt.title("Change the line graph style (dashed line)")

In [None]:
# Mark individual data points
plt.plot(x_values, y_values, marker='o')
plt.title("Mark individual data points")

In [None]:
# Multiple style settings at once:
# 1. change line color
# 2. change line graph style (dashed line)
# 3. mark individual data points
plt.plot(x_values, y_values, color="red", linestyle='--', marker='o')
plt.title("Multiple style settings at once")

In [None]:
# Add a legend
plt.plot(x_values, y_values, label='y = x^2')    # label this data for the legend
plt.legend()                                     # display the legend
plt.title("Add a legend")

### ✅ Problem 15: Visualizing Model Predictions [9 points]

**Context:**  
In the previous section, you implemented two simple machine learning “models”:
1. A **linear regression** model:  
   $ y = m \times x + b $
2. A **logistic (sigmoid) model**:  
   $ \sigma(x) = \frac{1}{1 + e^{-x}} $

**Task:**  
Using your previously defined functions, use matplotlib code to:

1. Compute predictions for both models across all `x` values using the following parameters:
$$ m = 1.2 \quad b = -0.5 $$

2. Plot both functions **on the same figure**, with:
   - A **solid blue line** for the linear regression model.
   - A **dashed red line** for the logistic function.
3. Add axis labels, a title, a legend, and a grid.

**Hint:**  
- Reuse the equations from your previous problems.  
- Use `plt.plot()` to draw each curve and `plt.legend()` to distinguish them.  


In [None]:
# Variables (provided)
# Run this cell to initialize the variables
m = 1.2
b = -0.5
x_values = np.linspace(-2, 2, 100)

In [None]:

# Task 1: compute predictions for both models across all x values
y_values_linear = NotImplemented ### YOUR SOLUTION HERE
y_values_logistic = NotImplemented ### YOUR SOLUTION HERE

# Task 2: plot both functions on the same figure,
# with a solid blue line for the linear model
# and a dashed red line for the logistic model
NotImplemented ### YOUR SOLUTION HERE
NotImplemented ### YOUR SOLUTION HERE

# Task 3: add features:
# axis labels
NotImplemented ### YOUR SOLUTION HERE
NotImplemented ### YOUR SOLUTION HERE

# title
NotImplemented ### YOUR SOLUTION HERE

# legend
NotImplemented ### YOUR SOLUTION HERE

# grid
NotImplemented ### YOUR SOLUTION HERE

## 2.12 Applications in the Molecular Sciences

Python has a rich ecosystem for AI and data analysis in the molecular sciences. In this course, we’ll often need to inspect molecules in 3D—examining their geometry—and retrieve key properties.

We’ll primarily use two libraries:
- **RDKit** — load/create molecules, compute descriptors, and query chemical information.
- **py3Dmol** — interactively visualize molecular structures in 3D within a notebook.

Together, these tools let us load compounds, compute properties, and render structures interactively.

**Run the code cell below to install some of these libraries:**


In [None]:
# Install RDKit and py3Dmol in Colab
!pip install rdkit py3Dmol requests

from rdkit import Chem
from rdkit.Chem import AllChem, Descriptors
import py3Dmol
import requests


### ✅ Problem 16: Visualizing Molecules with Python [4 points total]

**Tasks**: Modify and execute the code blocks in each task to visualize molecules in different styles.

**Task 1 [1 point].** Execute the code below, which visualizes the 'water' molecule. Using the table below, change the PubChem CID number in the code cell in order to visualize 'Dopamine'.

| Molecule | Formula | PubChem CID |
|---|---|---|
| Water | H₂O | 962 |
| Ethanol | C₂H₆O | 702 |
| Glucose | C₆H₁₂O₆ | 5793 |
| Caffeine | C₈H₁₀N₄O₂ | 2519 |
| Aspirin (acetylsalicylic acid) | C₉H₈O₄ | 2244 |
| Ibuprofen | C₁₃H₁₈O₂ | 3672 |
| Carbon dioxide | CO₂ | 280 |
| Dopamine | C₈H₁₁NO₂ | 681 |
| Penicillin | C₁₆H₁₈N₂O₄S | 2349

In [None]:
cid = 962 ### YOUR SOLUTION HERE

url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/SDF"
sdf_data = requests.get(url).text

# Load into an RDKit molecule object
mol = Chem.MolFromMolBlock(sdf_data, removeHs=False)

# Generate 3D coordinates if missing
AllChem.EmbedMolecule(mol, AllChem.ETKDG())

# Convert molecule to 3D coordinates for py3Dmol
mol_block = Chem.MolToMolBlock(mol)

# Inspect some properties
print("Formula:", Chem.rdMolDescriptors.CalcMolFormula(mol))
print("Molecular Weight:", Descriptors.MolWt(mol))

# Display in 3D
view = py3Dmol.view(width=500, height=400)
view.addModel(mol_block, "sdf")
view.setStyle('sphere')
view.zoomTo()
view.show()

**Task 2 [1 point].** Let's change the visualization style. Previously, we were using the 'sphere' style, which renders each atom as a ball, colored by element type (red=oxygen, grey=carbon, blue=nitrogen). This is good for seeing the shape of the molecule, but it is hard to see how they are connected!

Using the table below, visualize **caffiene** using the 'stick' style, which shows individual chemical bonds as sticks. Don't forget to look up the CID of caffiene, available above. You can change the visualization style in the line of python containing `view.setStyle('sphere')`

| Style | What it represents | Typical use |
|---|---|---|
| line| Thin lines for bonds; minimalist atoms | Fast overview; small molecules |
| cross | Crosshairs at atom positions (no bonds) | Emphasize atom positions without bonds |
| stick | Cylinders for bonds; small atom spheres (ball-and-stick feel via radius/scale) | General small-molecule work |
| sphere | Atoms as spheres (vdW radii) | Packing/contacts; quick shape |
| cartoon | Protein/nucleic secondary structure (ribbons/tubes, helices, sheets) | Biomacromolecules |


In [None]:
cid = 962 ### YOUR SOLUTION HERE
url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/SDF"
sdf_data = requests.get(url).text

# Load into an RDKit molecule object
mol = Chem.MolFromMolBlock(sdf_data, removeHs=False)

# Generate 3D coordinates if missing
AllChem.EmbedMolecule(mol, AllChem.ETKDG())

# Convert molecule to 3D coordinates for py3Dmol
mol_block = Chem.MolToMolBlock(mol)

# Inspect some properties
print("Formula:", Chem.rdMolDescriptors.CalcMolFormula(mol))
print("Molecular Weight:", Descriptors.MolWt(mol))

# Display in 3D
view = py3Dmol.view(width=500, height=400)
view.addModel(mol_block, "sdf")
view.setStyle('sphere') ### YOUR SOLUTION HERE
view.zoomTo()
view.show()

**Task 3 [1 point]:** Now let's visualize some **proteins** which are larger molecules used by living organisms to accomplish biological functions.

Modify the code below by changing the pdb_id to visualize the SARS-CoV-2 spike.

| Protein | Function | Example PDB ID |
|---|---|---|
| Hemoglobin | Oxygen transport in blood | 4HHB |
| Insulin | Hormone that regulates blood glucose | 3I40 |
| SARS-CoV-2 spike (S) | Viral surface glycoprotein mediating ACE2-dependent entry | 6VSB |
| DNA polymerase I (Klenow fragment) | DNA synthesis/repair | 1KLN |

In [None]:
# Choose a PDB ID from the Protein Data Bank
pdb_id = "3I40"  ### YOUR SOLUTION HERE

# Fetch the PDB file text from RCSB
pdb_url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
protein = requests.get(pdb_url).text

# Create the viewer
view = py3Dmol.view(width=600, height=400)
view.addModel(protein, "pdb")                      # load the protein
view.setStyle('stick')
view.zoomTo()
view.show()

**Task 4 [1 point]**: Change the visualization style for the SARS-CoV-spike protein to the 'cartoon style'. For large molecules like this, the 'cartoon' visualization style is useful to see the overall structure of the protein without seeing each individual bond.


In [None]:
# Choose a PDB ID from the Protein Data Bank
pdb_id = "3I40"  ### YOUR SOLUTION HERE

# Fetch the PDB file text from RCSB
pdb_url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
protein = requests.get(pdb_url).text

# Create the viewer
view = py3Dmol.view(width=600, height=400)
view.addModel(protein, "pdb")                      # load the protein
view.setStyle('stick')  ### YOUR SOLUTION HERE
view.zoomTo()
view.show()

# Congratulations!

You have completed Lab 1, and should now be aquainted with Introductory Python for AI in the Molecular Sciences.