# Chapter 2

## 2.1 Syntactic Sugar

`Syntactic sugar` is a nickname given to any part of a programming language that does not extend the capabilities of the language. \
 If any of these features were suddenly removed from the language, the language would still be just as capable, but the advantage of anything labeled `syntactic sugar` is that it makes the code **quicker/shorter** to write or **easier** to read. 
 Below are a few examples from the Python language that you are likely to come across and find useful.

### 2.1.1 Augmented Assignment

| Augmented Assignment | Regular Assignment | Description       |
|----------------------|--------------------|-------------------|
| x += a               | x = x + a          | Increments the value |
| x -= a               | x = x - a          | Decrements the value |
| x *= a               | x = x * a          | Multiplies the value |
| x /= a               | x = x / a          | Divides the value    |

In [5]:
# This is not difficult but requires typing variable name more than once which becomes more difficult to maintain with longer variable names
x = 5
x = x + 1 
x

6

In [6]:
# The augmented assignment for increment can be used instead for the same result
x += 1 
x

7

### 2.1.2 List Comprehension

It's very common to need a list filled with a series of numbers or calculated values.

* For most cases (where values aren't just evenly spaced integers):
    1.  Start with an empty list.
    2.  Use a `for` loop to go through your desired range or items.
    3.  Inside the loop, calculate each value.
    4.  `append()` that calculated value to your list.

**Special Case (Easy Way for Evenly Spaced Integers):**

* If you just need evenly spaced integers, the range() function combined with list() is much simpler:

In [7]:
squares = []
for integer in range(10):
    sqr = integer**2
    squares.append(sqr)

squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [8]:
squares = [integer**2 for integer in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### 2.1.3 Compound Assignment

A lot of beginner code sets variables on separate lines - this can instead be done by assigning multiple variables in the same assignment.

In [9]:
H, He, Li = 1.01, 4.00, 5.39
H
# This is known as tuple packing 
# Variables are automatically turned into tuples by Python behind the scenes
# It's equivalent to (H, He, Li) = (1.01, 4.00, 5.39)

1.01

### 2.1.4 Lambda Functions

`lambda` functions are small, unnamed (**anonymous**) functions you can define in a **single line**. They're perfect for simple tasks where you need a function quickly, without formally defining it with `def`.

**Core Ideas:**

* **Anonymous:** They don't need a name (variable) to be assigned to them, unlike `def` functions. This helps avoid "cluttering the namespace" if you only need the function once.
* **Single Expression:** A `lambda` can only contain one expression (what comes after the colon `:`) that is automatically returned. No complex logic or multiple lines of code.
* **Concise:** They allow you to define simple functions in very few characters, often inline where they are used.

**Syntax Explained:**

* `lambda` keyword
* `x`: The input variable(s) (what goes in the parentheses of a `def` function).
* `:`: Separates inputs from the function's logic.
* `x**2`: The single expression to be evaluated and returned (what goes in the indented block of a `def` function).

In [10]:
lambda x: x**2

<function __main__.<lambda>(x)>

In [11]:
f = lambda x: x**2
f(9)

81

**The Key Use: Inline with Other Functions (Anonymous Use):**

`lambda` functions truly shine when you need a simple function *temporarily* as an argument to another function. This is common in scientific libraries.

* **Scenario:** Many functions (like `quad()` for integration, or sorting functions) need a small piece of custom logic as an input. Instead of defining a whole `def` function for a one-off use, `lambda` provides it concisely.

* **Example:** Here we use integration to find the probability of finding a particle in the lowest state between 0 and 0.4 in a box of length 1 by performing the following integration.

$$
p = 2 \int_{0}^{0.4} \sin^2(\pi x) dx
$$

In [12]:
from scipy.integrate import quad
import math

quad(lambda x: 2 * math.sin(math.pi * x)**2, 0, 0.4)

(0.30645107162113616, 3.402290356348383e-15)

In [13]:
def particle_box(x):
    return 2 * math.sin(math.pi * x)**2

quad(particle_box, 0, 0.4)

(0.30645107162113616, 3.402290356348383e-15)

## 2.2 Dictionaries

Dictionaries are a fundamental and incredibly versatile **multi-element Python object type** that stores data as **key-value pairs**.

**Think of them like:**
* A real-world dictionary (word : definition).
* An address book (name : phone number).
* An object full of labeled "variables" (variable_name : variable_value).

They allow you to **access stored values using a `key`** (a unique label) rather than a numerical index (like in lists).

---

#### Key Characteristics:

* **Key-Value Pairs:** Each item in a dictionary consists of a unique `key` linked to a `value`.
* **Unordered by Index, Ordered by Insertion (Python 3.7+):** Historically, dictionaries were unordered. In modern Python (3.7+), they *preserve the order* in which items were added. However, you *still access items by their key*, not by a numerical position.
* **Keys Must Be Unique & Immutable:**
    * Each key in a dictionary must be distinct. If you add a key that already exists, its value will be updated.
    * Keys must be **immutable** data types (e.g., strings, numbers, tuples). Lists, for example, cannot be keys because they are mutable.
* **Values Can Be Anything:** Values can be any Python object type: numbers, strings, lists, other dictionaries, functions, etc.
* **Mutable:** Dictionaries themselves are mutable, meaning you can add, remove, or change key-value pairs after creation.

In [14]:
AM = {'H':1.01, 'He':4.00, 'Li':6.94, 'Be':9.01,
      'B':10.81, 'C':12.01, 'N':14.01, 'O':16.00,
      'F':19.00, 'Ne':20.18}

AM['Li']

6.94

| Method Name | Description                                    | What it Returns (Example)                       |
| :---------- | :--------------------------------------------- | :---------------------------------------------- |
| `.keys()`   | Returns a view of all the **keys** in the dictionary. | `dict_keys(['H', 'He', 'Li'])`                  |
| `.values()` | Returns a view of all the **values** in the dictionary. | `dict_values([1.01, 4.00, 6.94])`               |
| `.items()`  | Returns a view of all **key-value pairs** as tuples. | `dict_items([('H', 1.01), ('He', 4.00), ('Li', 6.94)])` |

In [15]:
AM.keys()

dict_keys(['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne'])

In [16]:
for key, values in AM.items():
    print(values)

1.01
4.0
6.94
9.01
10.81
12.01
14.01
16.0
19.0
20.18


In [17]:
# Additional key:value pairs can be added to an already existing dictionary by calling the key and assigning it to a value 
AM['Na'] = 22.99
AM

{'H': 1.01,
 'He': 4.0,
 'Li': 6.94,
 'Be': 9.01,
 'B': 10.81,
 'C': 12.01,
 'N': 14.01,
 'O': 16.0,
 'F': 19.0,
 'Ne': 20.18,
 'Na': 22.99}

In [18]:
# Another method for generating a dictionary is the dict() function which takes in pair for
# nested lists or tuples and generates key:value pairs as follows.

dict([('H',1), ('He',2), ('Li',3)])

{'H': 1, 'He': 2, 'Li': 3}

In [19]:
# Not only can dictionaries be used to store data for calculations, such as atomic masses, they can also be used to store 
# changing data as we perform calculations or operations.

DNA = 'GGGCTCCATTGTCTGCCCGGGCCGGGTGTAGTCTAAGGTT'

dna_bases = {'A':0, 'T':0, 'C':0, 'G':0}
for base in DNA:
    dna_bases[base] += 1

dna_bases

{'A': 4, 'T': 11, 'C': 10, 'G': 15}

## 2.3 Set

Sets are another multi-element Python object, similar to lists, but with a crucial distinguishing feature:

* **Uniqueness:** Every element within a set **must be unique**. Duplicate items are automatically removed.
* **Unordered:** Like older dictionaries, sets do not store items in any particular order. You cannot access elements by index.

---

**Think of a Set as:**
A collection of distinct items, where you only care *what* is present, not *how many* of each or *in what order* they were added.

**Syntax:**
Sets are defined using **curly braces `{}`**, but unlike dictionaries, they contain only values (no key-value pairs).

In [20]:
compounds = {'ethanol', 'sodium chloride', 'water',
             'toluene', 'acetone'}

In [21]:
# We can add additional items to the set using the add() set method
compounds.add('calcium chloride')
compounds

{'acetone',
 'calcium chloride',
 'ethanol',
 'sodium chloride',
 'toluene',
 'water'}

In [22]:
# Notice that when ethanol is added to the set, nothing changes. This is because ethanol is already in the set, and sets do not 
# store redundant copies of elements.
compounds.add('ethanol')
compounds

{'acetone',
 'calcium chloride',
 'ethanol',
 'sodium chloride',
 'toluene',
 'water'}

### Set Operations: Combining and Comparing Unique Collections

Sets are powerful for performing mathematical set operations, useful for analyzing unique collections of items.

| Operator | Name         | Description                                                               |
| :------- | :----------- | :------------------------------------------------------------------------ |
| `\|`     | **Union** | Combines **all unique elements** from both sets. (A OR B)               |
| `-`      | **Difference** | Returns elements in the **first set** that are **not** in the second set. (A MINUS B) |
| `&`      | **Intersection** | Returns elements present in **both** sets. (A AND B)                    |
| `^`      | **Symmetric Difference** | Returns elements unique to *either* set, but not in both. (Exclusive OR) |

In [23]:
N = {'1s','2s','2p'}
Ca = {'1s','2s','2p', '3s', '3p', '4s'}

N | Ca # returns orbitals in either set

{'1s', '2p', '2s', '3p', '3s', '4s'}

In [24]:
Ca - N  # returns Ca orbitals minus those in common

{'3p', '3s', '4s'}

In [25]:
N & Ca  # returns orbitals in both sets

{'1s', '2p', '2s'}

In [26]:
N ^ Ca

{'3p', '3s', '4s'}

## 2.4 Python Modules

A **module** in Python is a file that contains a collection of functions, variables, and classes that share a common theme or purpose. Modules allow you to organize code into reusable components and keep your programs clean and manageable.

Python includes many **built-in (native) modules** that come with every Python installation. These modules provide a wide range of functionality, from working with files to generating random numbers, handling dates, and more.

Below is a list of some commonly used Python modules, along with a brief description of what each one does.

| Name       | Description                                       |
|------------|---------------------------------------------------|
| `os`         | Provides access to your computer file system      |
| `itertools`  | Iterator and combinatorics tools                  |
| `random`     | Functions for pseudorandom number generation      |
| `datetime`   | Handling of date and time information (see 2.9)   |
| `csv`        | For writing and reading CSV files                 |
| `pickle`     | Preserves Python objects on the file system       |
| `timeit`     | Times the execution of code                       |
| `audioop`    | Tools for reading and working with audio files    |
| `statistics` | Statistics functions                              |

* You can find a full index of Python's built-in modules here: [Python Module Index](https://docs.python.org/3/py-modindex.html)

### 2.4.1 `os` Module

The `os` (operating system) module provides a way to use operating system-dependent functionality, primarily for working with files and directories (folders) on your computer.

Up until now, you've likely worked with files in the same location as your Jupyter notebook. The `os` module becomes essential when you need to:
* Access files located **elsewhere** on your computer.
* Work with **multiple files** within a specific folder (e.g., analyzing all experimental data files from a batch).

---

#### Key `os` Module Functions:

| Function      | Description                                                    |
| :------------ | :------------------------------------------------------------- |
| `os.getcwd()` | Get Current Working Directory. Returns the path of the directory Python is currently operating from. |
| `os.chdir()`  | Change Directory. Changes the current working directory to a specified path. |
| `os.listdir()`| Returns a list of all files and subdirectories within a given path (or the current directory if no path is specified). |

#### `os.getcwd()`: Where Am I Right Now?

* **Purpose:** This tells you the exact full path of the directory (folder) where your current Python script or Jupyter notebook is running from. It's your script's "current location."

In [27]:
import os
os.getcwd()

'/Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills/Chapter 02'


#### `os.chdir()`: Go To This Folder

* **Purpose:** This command changes your Python script's "current location" (its CWD) to a different folder. This is useful if you want to access files that are not in your current folder without typing their entire path every time.
* **Think:** It's like using the `cd` (change directory) command in your terminal.
* **Key Concept: Relative Paths:** This is crucial and often where confusion lies. A relative path describes a folder's location *relative to your current location*.

 * **Moving Up (`..`):** `..` means "go up one level (to the parent folder)."


In [28]:
# Assuming you start in 'Chapter 02'
os.chdir('..') # Go up one level
print(f"Moved up one level to: {os.getcwd()}")
# Expected Output: .../Basic Scientific Computing Skills/

os.chdir('..') # Go up another level
print(f"Moved up another level to: {os.getcwd()}")
# Expected Output: .../scientific-computing-for-chemists/

Moved up one level to: /Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills
Moved up another level to: /Users/codiefreeman/Documents/scientific-computing-for-chemists


In [29]:
# Assuming you are in 'Chapter 02'
os.chdir('../Chapter 01') # Go up one level, then down into 'Chapter 01'
print(f"Moved to Chapter 01: {os.getcwd()}")
# Expected Output: .../Basic Scientific Computing Skills/Chapter 01

FileNotFoundError: [Errno 2] No such file or directory: '../Chapter 01'

In [None]:
# Always works, no matter your current CWD
os.chdir('/Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills/Chapter 02')
print(f"Changed to absolute path: {os.getcwd()}")

Changed to absolute path: /Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills/Chapter 02


#### `os.listdir()`: What's in This Folder?

* **Purpose:** Lists the names of all files and subfolders within a given directory. If you don't provide a path, it lists the contents of your current CWD.
* **Think:** Like typing `ls` (macOS/Linux) or `dir` (Windows) in your terminal.

In [None]:
# Make sure your CWD is '/Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills/Chapter 02'
# (You can use os.chdir() to get there if you're not)

contents_of_ch02 = os.listdir()
print(f"Contents of {os.getcwd()}:\n{contents_of_ch02}")
# Expected Output (will show files like chapter_2_intro.ipynb, etc.):
# ['chapter_2_intro.ipynb', 'chapter_2_exercises.ipynb']

Contents of /Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills/Chapter 02:
['Chapter_2.ipynb']


In [None]:
# Assuming your CWD is 'Chapter 02'
contents_of_ch01 = os.listdir('../Chapter 01')
print(f"Contents of Chapter 01:\n{contents_of_ch01}")
# Expected Output: ['chapter_1_exercises.ipynb', 'chapter_1_notes.ipynb', 'data]

Contents of Chapter 01:
['chapter_1.ipynb', 'chapter_1_exercises.ipynb', 'data']


In [None]:
# Ensures your CWD is 'Chapter 01'
os.chdir('/Users/codiefreeman/Documents/scientific-computing-for-chemists/Basic Scientific Computing Skills/Chapter 01')

contents_of_data = os.listdir('data')
print(f"Contents of 'data' (inside Chapter 01):\n{contents_of_data}")

Contents of 'data' (inside Chapter 01):
['docstring.png', '.DS_Store', 'water_density.csv', 'new_file.jpg', 'header_file.csv', 'new_file.csv', 'squares.csv', 'water_density.png']


The permutations() function is very similar to combinations(), except with permutations(), order matters. Therefore, (2, 1) and (1, 2) are inequivalent. This is especially important in probability and statistics. Permutations of a group of items can be generated just like in the combinations example above.

### 2.4.2 `itertools` Module

The `itertools` module provides a collection of tools for looping over data in an efficient manner. It's especially powerful for generating sequences, often without storing them all in memory at once (they are "generators").

While it has many functions, we'll focus on the combinatorics functions, which are critical for scenarios involving selections and arrangements of items.

---

#### Key Combinatorics Functions:

* **`itertools.combinations(iterable, n)`**
    * Generates all **`n`-sized combinations** of elements from an `iterable` (like lists, tuples, or `range` objects).
    * **Order does NOT matter:** `(1, 2)` is considered the same as `(2, 1)`.
    * **No Repeats:** Elements within an `n`-sized combination are unique (they are selected without replacement).

* **`itertools.permutations(iterable, n)`**
    * Generates all **`n`-sized ordered permutations** of elements from an `iterable`.
    * **Order DOES matter:** `(2, 1)` is **NOT** the same as `(1, 2)`.
    * **No Repeats:** Elements within an `n`-sized permutation are unique (selected without replacement).
    * **Relevance:** Important in probability and statistics where the sequence of events or items matters.

* **`itertools.product(iterable, repeat=r)`**
    * Generates the **Cartesian product** of an `iterable` repeated `r` times.
    * **Order DOES matter:** `(0, 1)` is different from `(1, 0)`.
    * **Repeats ARE allowed:** Elements can be selected multiple times (with replacement).
    * **Relevance:** Crucial for generating things like passwords, PINs, or any code where items can repeat at different positions (e.g., all 4-digit phone codes).

In [None]:
# Instead of returning a list, it returned a combinations object. You do not need to know much about these except that they can be converted into
# lists or iterated over to extract their elements, and they are single use. Once iterated over they need to be generated again if you need them.

import itertools

numbers = range(5)
itertools.combinations(numbers, 2)

<itertools.combinations at 0x113de4770>

In [None]:
# Each combination is returned in a tuple, and if the combination object is converted to a list, it would be a list of tuples.

for pair in itertools.combinations(numbers, 2):
    print(pair)

(0, 1)
(0, 2)
(0, 3)
(0, 4)
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)


In [None]:
for pair in itertools.permutations(numbers, 2):
    print(pair)

(0, 1)
(0, 2)
(0, 3)
(0, 4)
(1, 0)
(1, 2)
(1, 3)
(1, 4)
(2, 0)
(2, 1)
(2, 3)
(2, 4)
(3, 0)
(3, 1)
(3, 2)
(3, 4)
(4, 0)
(4, 1)
(4, 2)
(4, 3)


### 2.4.3 `random` Module

The `random` module provides tools for generating pseudorandom numbers and for performing random selections/shuffling. It's essential for simulations, statistical sampling, and creating variable conditions.

**Note on Ranges:**
* `[x, y)`: Means "inclusive of x, exclusive of y" (includes x, but does NOT include y).

---

**Key Functions:**

| Function                 | Description                                                        |
| :----------------------- | :----------------------------------------------------------------- |
| `random.random()`        | Generates a random **float** between 0.0 (inclusive) and 1.0 (exclusive). |
| `random.uniform(x, y)`   | Generates a random **float** within the range `[x, y)`.          |
| `random.randrange(x, y)` | Generates a random **integer** within the range `[x, y)`.          |
| `random.shuffle(list)`   | **Shuffles** the items of a list **in-place** (modifies the original list). |

---

**Important Limitation:**
* Functions in the `random` module typically generate **one value at a time**.
* For generating large numbers of random values efficiently (e.g., for data arrays in simulations), you will typically use **NumPy's random functions** (covered in Chapter 4).

**Practical Use for Chemists:**
* Simulating molecular motion (e.g., Monte Carlo simulations).
* Randomly selecting samples for analysis.
* Generating random errors in simulated data to test robustness of analysis.

In [None]:
import random
random.random()

0.03531528505634418

In [None]:
random.randrange(0, 10)

1

In [None]:
a = [1,2,3,4,5,6]
random.shuffle(a)
a

[5, 2, 4, 3, 6, 1]

## 2.5 Zipping and Enumeration

There are times where it is necessary to iterate over two lists simultaneously.

**Example:** Say we have two lists of **Atomic Numbers** and relative **Atomic Masses** of the first six elements.

> `AN` = [1, 2, 3, 4, 5, 6] \
> `mass` = [1, 4, 7, 9, 11, 12]

If we want to calculate the number of neutrons in each isotope we need to subtract atomic number from the atomic mass, it is helpful to iterate over both lists simultaneously 

---

### 2.5.1 Zipping

The *simplest* way to iterate over two lists simultaneously is to combine both into a single, iterable object and iterate over it once. \
The `zip()` function does this by merge two lists or tuples into a nested list of lists.
 * Instead of returning a list or tuple, the `zip()` function returns a **single-use** zip object.
  
 * If the two lists are of different length, `zip()` stops at the end of the shorter list and returns a zip object with a length of the shorter list. 

In [31]:
AN = [1, 2, 3, 4, 5, 6]
mass = [1, 4, 7, 9, 11, 12]

In [32]:
zipped = zip(AN, mass)

In [33]:
zipped

<zip at 0x111b17780>

In [34]:
for pair in zipped:
    print(pair[1] - pair[0])

0
2
4
5
6
6


In [36]:
# **Single Use** so if called upon again nothing happens

for pair in zipped:
    print(pair[1] - pair[0])

### 2.5.2 Enumeration

The `enumerate()` function instead of zipping two lists or tuples together, it zips a list or tuple to the index values for that list. 
* *Similar to `zip()`, it returns a **one-time** use iterable object.*

In [37]:
enum = enumerate(mass)

for pair in enum:
    print(pair)

(0, 1)
(1, 4)
(2, 7)
(3, 9)
(4, 11)
(5, 12)


The `zip()` function can be made to *do the same thing* by zipping a list with a range object of the same length as shown below, but `enumerate()`\ may be slightly more convenient.

In [38]:
zipped = zip(range(len(mass)), mass)
for item in zipped:
    print(item)

(0, 1)
(1, 4)
(2, 7)
(3, 9)
(4, 11)
(5, 12)


## 2.6 Encoding Numbers

While Python usually handles number storage automatically, understanding encoding is crucial for:
* **Memory Efficiency** (especially with large datasets/images).
* **Numerical Precision** in scientific calculations.
* **Interpreting External Data** (e.g., image files, instrument data).

---

#### Core Concepts: Binary & Bits

* Computers store all data as **binary** (base-two: 0s and 1s).
* Each 0 or 1 is a **bit**.
* Numbers are stored in **fixed-size blocks of bits** (e.g., 8, 16, 32, 64 bits). More bits = larger range or higher precision.

---

#### Integer Storage: `uint` vs. `int`

* **Unsigned Integers (`uint`):**
    * Only store positive whole numbers (0 and up).
    * All bits represent the number's magnitude.
    * *Example:* `uint8` (8 bits) stores 0 to 255.
* **Signed Integers (`int`):**
    * Store both positive and negative whole numbers.
    * One bit is used for the sign (0 for positive, 1 for negative).
    * *Trade-off:* Has about half the positive range of an `uint` of the same bit length.
    * *Example:* `int8` (8 bits) stores -128 to 127.

---

#### Floating-Point Numbers (`float`): For Decimals

* Store numbers with decimal points.
* The **number of bits dictates the precision** (how many decimal places are accurately represented).
* `float64` (64-bit) is standard in Python for high precision.

---

#### Summary of Common Data Types:

| Data Type | Description (Bits)             | Example Range / Precision        |
| :-------- | :----------------------------- | :------------------------------- |
| `uint8`   | Unsigned Integer (8-bit)       | 0 to 255                         |
| `int8`    | Signed Integer (8-bit)         | -128 to 127                      |
| `float32` | Single-Precision Float (32-bit)| ~6-7 decimal digits of precision |
| `float64` | Double-Precision Float (64-bit)| ~15-17 decimal digits of precision|

---

**Why This Matters for Science:**
* **Memory:** Choose smaller types (`uint8`) for data like grayscale image pixels (0-255) to save memory with large datasets.
* **Precision:** Use `float64` (double precision) for critical scientific calculations to minimize rounding errors.
* **Data Import:** Essential when reading data from instruments or files that store numbers in specific encoded formats.

## 2.7 Advanced Functions: Flexible Inputs & Recursion

Standard Python functions take a predetermined number of arguments. This section introduces ways to create more flexible functions that can handle an **unspecified number of inputs**, and also explores functions that call themselves.

---

#### 2.7.1 Variable Positional Arguments (`*args`)

* **Purpose:** Allows a function to accept **any number of non-keyword (positional) arguments**.
* **How it works:** All extra positional arguments are collected into a single **tuple** inside the function.
* **Syntax:** Use an asterisk `*` before the parameter name in the function definition (e.g., `*g_crops`).
* **Use Case:** When you want to sum, average, or process a flexible number of items.

---

**Example:** Calculating percent yield from a theoretical yield and an unknown number of recrystallization crops.

In [None]:
def per_yield(g_theor, *g_crops):
    # g_theor will get the first number (e.g., 1.32)
    # *g_crops will collect all the *rest* of the numbers into a tuple
    g_total = sum(g_crops)
    percent_yield = 100 * (g_total / g_theor)
    return percent_yield

In [41]:
per_yield(1.32, 0.50, 0.11, 0.27)
# This calculates: ( (0.50 + 0.11 + 0.27) / 1.32 ) * 100
# Which is: (0.88 / 1.32) * 100

66.66666666666666

#### 2.7.2 Variable Keyword Arguments (`**kwargs`)

* **Purpose:** Allows a function to accept **any number of keyword-labeled arguments**.
* **How it works:** All extra keyword arguments (`name=value` pairs) are collected into a single **dictionary** inside the function. The argument names become the dictionary **keys** (as strings), and their assigned values become the dictionary **values**.
* **Syntax:** Use two asterisks `**` before the parameter name in the function definition (e.g., `**elements`).
* **Use Case:** When you need to process a flexible set of named parameters or properties, and you don't know all possible names or how many will be provided beforehand. This avoids needing a very long list of optional parameters in your function definition.

In [51]:
def mol_mass(**elements):
    m = {'H':1.008, 'He':4.003, 'Li':6.94, 'Be':9.012,
         'B':10.81, 'C':12.011, 'N':14.007, 'O':15.999,
         'F':18.998}
    masses = []  # mass total from each element
    for key in elements.keys():
        masses.append(elements[key] * m[key])
    return sum(masses)

In [47]:
mol_mass(C=8, H=10, N=4, O=2)

194.194


#### 2.7.3 Recursive Functions

* **Purpose:** A function that solves a problem by calling **itself** one or more times with smaller inputs until it reaches a "base case."
* **How it works:** The function breaks down a problem into smaller, similar sub-problems.
* **Key Requirement:** Must have a **base case** (a condition) that stops the recursion; otherwise, it will run forever (or until Python hits a recursion limit error).
* **Use Case:** Problems that can be naturally expressed as smaller versions of themselves (e.g., factorials, Fibonacci sequences, tree traversals). Can sometimes simplify code that would otherwise use loops.
    * **Chemistry Example:** Calculating remaining mass after half-lives.

In [None]:
def half_life(mass, hl=1):
    '''(float, hl=int) -> float 
    Takes in mass and number of half-lives and returns 
    remaining mass of material. Half-lives need to be 
    integer values.
    '''
    mass /= 2
    hl -= 1
    
            # If hl has reached 0 (meaning all specified half-lives have occurred),
            # stop the recursion and return the current mass.
    if hl == 0:
        return mass
            # Call the 'half_life' function again with the NEW (halved) mass
            # and the NEW (decremented) half-life count.
            # This repeats the process until the base case (hl == 0) is met.
    else:
        return half_life(mass, hl=hl)

In [43]:
half_life(4.00, hl=2)

1.0

In [44]:
half_life(4.00, hl=4)

0.25

Recursive functions must have a **base case** to stop. However, invalid inputs can prevent this, leading to problems.

**The Problem: Runaway Recursion**
* If input (e.g., `hl=1.5`) never meets the base case (`hl == 0`), the function calls itself endlessly.
* Python's built-in recursion limit (around 1000 calls) prevents an infinite loop, raising a `RecursionError`.

**The Solution: Input Validation**
* **Validate inputs** at the start of your function to ensure they meet requirements.
* Use `isinstance(variable, type)` to check if a variable is of the expected type.
* **Why:** It's always better to return a clear error (or `None`) for invalid input than to crash or produce incorrect results silently.

In [57]:
def half_life(mass, hl=1):
    '''(float, hl=int) -> float
    Takes in mass and number of half-lives and returns
    remaining mass of material. Half-lives need to be
    integer values.
    '''

    if not isinstance(hl, int):
        print('Invalid hl. Integer required.')
        return None
        
    mass /= 2
    hl -= 1
    
    if hl <= 0:
        return mass
    else:
        return half_life(mass, hl=hl)

In [59]:
print(half_life(4.00, hl=1.5))
print(half_life(4.00, hl=2)) 

Invalid hl. Integer required.
None
1.0


As a final note on **recursive functions**, you may have noticed that you could just as easily have accomplished the above task with a `while` or `for` loop.
 
Recursive functions can usually be avoided, but once in a while a recursive function will substantially simplify your code. It is a good technique to have in your back pocket for the moment you need it, but you will not likely use them often.

## 2.8 Error Handling
It doesn’t take long to realize that error messages are an inevitable part of computer programming, so it is helpful to know what the different type of **errors messages** mean and how to deal with them.

### 2.8.1 Types of Errors

| Type of Error      | Description                                                               |
| :----------------- | :------------------------------------------------------------------------ |
| `NameError`        | A variable or name being used has not been defined.                       |
| `SyntaxError`      | Invalid syntax in code (e.g., typos in keywords, missing punctuation).    |
| `TypeError`        | An operation or function is being used on an incorrect object type.       |
| `ValueError`       | A value is being used that is not accepted by a function or application (the *type* is correct, but the *value* is not valid). |
| `ZeroDivisionError`| Attempting to divide by zero.                                             |
| `IndentationError` | Invalid indentations are present (Python uses indentation for code blocks).|
| `IndexError`       | An invalid index is being used to access an item in a sequence (e.g., list, tuple).|
| `KeyError`         | An invalid key is being used to access a value in a dictionary or DataFrame. |
| `DeprecationWarning`| Code uses a function or feature that will change or be removed in a future version. (Often a warning, not a full error). |

### `NameError`

The `NameError` means code has called a variable or function which has not been defined so doesn't exist. \
**Some potential causes:**
*  A mistyping of a variable name
*  Jupyter notebook code run out of order (When variable is defined in earlier code)

In [60]:
print(root)

NameError: name 'root' is not defined

### `SyntaxError`

A programming language’s syntax is the set of rules that dictate how the code is formatted, the appropriate symbols, valid values and variables, etc. 

A `SyntaxError` indicates that your code **violated** one of these rules. To be helpful, the error message shows the line of code with the invalid syntax and points to where in the line the problem seems to be occurring.

In [None]:
# In the first example below, the error occurred because `<>` is not a valid operator in Python.
5 <> 6

SyntaxError: invalid syntax (2186603431.py, line 1)

In [None]:
# This example is because variable names cant start with a number
5sdq = 52

SyntaxError: invalid decimal literal (848242794.py, line 1)

### `TypeError`

A `TypeError` occurs when using the wrong object type for a particular function or application. For example, Python cannot take the absolute value of the letter, so this generates a `TypeError`.

In [63]:
abs('a')

TypeError: bad operand type for abs(): 'str'

In [65]:
# Error is because a boolean operation cannot be performed on a list - at least not without a for loop or NumPy
[1,2,3] > 5

TypeError: '>' not supported between instances of 'list' and 'int'

### `ValueError`

The `ValueError` is somewhat similar to a `TypeError` execept in this case it indicates that a **numerical value** is not valid or appropriate for a particular function. Some functions require that their arguments be within a certain range such as the `math.sqrt()` which does not accept **negative numbers**. As a result, taking the square root of -1 with this function generates a `ValueError`.

In [66]:
import math
math.sqrt(-1)

ValueError: math domain error

### `ZeroDivisionError`

The `ZeroDivisionError` error is what the name says - the code attempted to **divide by zero**.

In [67]:
4 / 0

ZeroDivisionError: division by zero

### `IndentationError`

Python does not care about spaces except those at the start of a line as these spaces or indentations have meaning. In the example below, the `print(x)` should be indented below the the start of the for loop, so it generates an `IndentationError`.

In [68]:
for x in range(5):
print(x)

IndentationError: expected an indented block after 'for' statement on line 1 (963956298.py, line 2)

### `IndexError` and `KeyError`

When indexing a composite object like a list, an index value that is outside the range results in an `IndexError`. In the list below, the **indices run from 0 to 4**, so using an **index of 5** returns an `IndexError`. Similarly, if the code tries to look up a value using a key **not present in a dictionary**, it returns a `KeyError` as shown below.

In [69]:
lst = [1,5,7,4,3]
lst[5]

IndexError: list index out of range

In [70]:
elements = {'H':1, 'He':2, 'Li':3, 'Be':4, 'B':5, 'C':6}
elements['Li']

3

In [71]:
elements['N']

KeyError: 'N'

### `DeprecationWarning`

A `DeprecationWarning` occurs when code uses a feature that will be removed or changed in a future release of Python or a third-party library. This error does not stop your code and is a friendly heads up that your code may not work in the future.

### 2.8.2 Workout Around Errors with `try` and `except`

While this may seem like a bad idea at first glance, there are times when you may want Python to not come to a grinding halt in the face of an error.

One common situation is when importing a large number of data files from different sources. Not every data source may have formatted data or files the same, and some files may be malformed or there may be other unexpected edge cases. To get Python to not stop at an error message, you can use a `try`/`except` block.

The general structure of a `try`/`except` block is to include the code you originally intend to run under the try statement, and under the following except statement, include what Python should do in the event of a **specific error**.

In [72]:
import math

sqr_nums = [4, 25, 9, 81, 144, 'four', 49]
sqr_root = []

for num in sqr_nums:
    sqr_root.append(math.sqrt(num))

TypeError: must be real number, not str

In [73]:
sqr_nums = [4, 25, 9, 81, 144, 'four', 49]
sqr_root = []

for num in sqr_nums:
    try:
        sqr_root.append(math.sqrt(num))
    except TypeError:
        print(f'{num} is not a float or int')

four is not a float or int


Instead, the for loop has been placed under a `try`: telling Python to make a best attempt at running the code. The code under the `except` `TypeError`: tells Python to run the following code in the event of a `TypeError`.

In the example nothing is done with the **string** except to inform the user that there was a problem. It is a prudent practice to not let unsolved errors pass by silently. If you have a good idea of where errors may turn up and have a solution to them, you can include that code under the `except`: as well.

In [74]:
sqr_nums = [4, 25, 9, 81, 144, 'four', 49]
sqr_root = []

txt_to_int = {'one':1, 'two':2, 'three':3, 'four':4, 'five':5, 'six':6}

for num in sqr_nums:
    try:
        sqr_root.append(math.sqrt(num))
    except TypeError:
        integer = txt_to_int[num]
        sqr_root.append(math.sqrt(integer))

sqr_root

[2.0, 5.0, 3.0, 9.0, 12.0, 2.0, 7.0]

It is worth noting that `try`/`except` blocks can be avoided using `if`/`else` blocks.

So when should you use `try`/`except` versus `if`/`else`? If you **anticipate exceptions to occur frequently**, `if`/`else` is likely to be **more efficient**, but **if exceptions are rare**, it may be more efficient to use `try`/`except`.

In [75]:
sqr_nums = [4, 25, 9, 81, 144, 'four', 49]
sqr_root = []

for num in sqr_nums:
    if type(num) in [float, int]:
        sqr_root.append(math.sqrt(num))
    else:
        print(f'{num} is not a float or int')

four is not a float or int


### 2.8.3 Raising Exceptions

One thing worse than **code not running** is code **running and producing incorrect outputs**. At least when code fails to run, the user knows something is wrong whereas code that fails silently can lull the user into false conclusions. 

It is a prudent practice in coding to include checks that important conditions are met, and when these conditions are not met, the code should stop and produce an error known as **raising an exception**. 

To include checks in your code, you can use a condition with a `raise` statement followed by some form of error from Table 7 and an error message. The more specific you can be in your error type and message, the better.



As an example, we will write a function below which quantifies the differences between two DNA sequences.

The Hamming distance is one possible metric for determining how different two sequences are and is simply the number of locations where two sequences of the same length are different. 
 * **For example:** `AATGC` and `AATGT` have a Hamming distance of 1 because they are identical except for the last base position. Because it is critical that the two DNA sequences be the same length, this should be checked before any further calculations, and if the sequences have different lengths, the function should not proceed and provide a helpful error message.

In [None]:
def hamming(seq1, seq2):

    if len(seq1) != len(seq2):                                  # These two lines are the raised exception for length check.
        raise ValueError('Sequences must be of equal length')
        
    sequences = zip(seq1, seq2)
    distance = 0
    for position in sequences:
        if position[0] != position[1]:
            distance += 1
    
    return distance

In [None]:
dna1 = 'AACCT'
dna2 = 'ATCCA'
dna3 = 'ATCCTA'

hamming(dna1, dna2) #Code will run normal as meet all parameters

2

In [None]:
hamming(dna2, dna3) # The `raise` error will now be called upon as they are different lengths

ValueError: Sequences must be of equal length

## 2.9 Date and Time Information

Python's built-in `datetime` module is essential for working with date, time, and timestamp information, whether it's from file names, data logs, or for calculating durations (e.g., in chemical kinetics experiments).

---

#### Core `datetime` Objects:

The `datetime` module provides four primary object types to represent and manipulate date and time information:

| Object Type          | Description                                    | Example of What it Stores         |
| :------------------- | :--------------------------------------------- | :-------------------------------- |
| `datetime.datetime`  | Contains both **date and time** information.   | `2025-07-28 23:59:00`             |
| `datetime.date`      | Contains **date only**, ignoring time.         | `2025-07-28`                      |
| `datetime.time`      | Contains **time only**, ignoring date.         | `23:59:00`                        |
| `datetime.timedelta` | Represents a **duration** (change in time) between two dates or times. | `4 days, 13:00:00`                |

### 2.9.1 Date and Time Data

In [None]:
# You can create a `datetime` object by specifying year, month, day (required), and optionally hour, minute, second, microsecond.

import datetime

pi_day = datetime.datetime(2025, 3, 14, 12, 0, 0) # Noon on Pi Day 2025
print(pi_day)

2025-03-14 12:00:00


In [91]:
# The date and time information can also be provided to datetime() using keyword arguments like below

mario_day = datetime.datetime(year=2025, month=3, day=10 ,hour=8, minute=10, second=0, microsecond=0)

In [86]:
# Getting Current Date/Time
datetime.datetime.now()

now = datetime.datetime.now()
print(f'Current datetime: {now}')

Current datetime: 2025-07-28 22:54:31.735686


In [None]:
# Access individual parts (year, month, day, hour, minute, second, microsecond) as attributes.

print(f'Current hour: {now.hour}')

Current hour: 22


In [90]:
# Creating `datetime.time` Objects, Similar to `datetime.datetime`, but for time-only.

specific_time = datetime.time(hour=5, minute=3, second=32)
print(specific_time)
print(specific_time.second)

05:03:32
32


### 2.9.2 Changes in Date and Time

In [None]:
# Modifying Components .replace() returns a new `datetime` object with specified components replaced. 
# The original object is not changed.

modified_time = now.replace(hour=3, minute=0)
print(f"Modified datetime: {modified_time}")

Modified datetime: 2025-07-28 03:00:31.735686


#### Calculating Differences with `datetime.timedelta`:

* Subtracting two `datetime` objects results in a `timedelta` object, representing the duration between them.
* You can then access its `days` and `seconds` attributes, or convert to total seconds.

In [93]:
delta = pi_day - mario_day
delta

datetime.timedelta(days=4, seconds=13800)

In [92]:
mario_day = datetime.datetime(2025, 3, 10, 8, 10, 0)
delta = pi_day - mario_day # Using pi_day from above

print(f"Timedelta: {delta}")          # Output: 4 days, 3:50:00 (4 days and 13800 seconds)
print(f"Days: {delta.days}")          # Output: 4
print(f"Seconds (of the non-day part): {delta.seconds}") # Output: 13800
print(f"Total seconds: {delta.total_seconds()}") # Output: 359400.0

Timedelta: 4 days, 3:50:00
Days: 4
Seconds (of the non-day part): 13800
Total seconds: 359400.0


### 2.9.3 Extracting Date and Time Information

Extracting date and time from a file or file name can be accomplished using the **‘string-parsed time’** `strptime()` function and formatting codes shown below.

Formatting Codes for Parsing Date and Time Strings (`.strptime()` and `.strftime()`):

| Code | Example  | Description                                 | Length    |
| :--- | :------- | :------------------------------------------ | :-------- |
| `%y` | `01`     | Year without century (00-99)                | Two digits|
| `%Y` | `2001`   | Year with century (e.g., 2001)              | Four digits|
| `%b` | `Jan`    | Month abbreviation (e.g., Jan, Feb)         | Three letters|
| `%B` | `January`| Month full name (e.g., January)             | Varies    |
| `%m` | `01`     | Month as zero-padded number (01-12)         | Two digits|
| `%d` | `05`     | Day of the month with zero padding (01-31)  | Two digits|
| `%H` | `14`     | Hour in 24-hour time (00-23)                | Two digits|
| `%p` | `AM`     | AM or PM                                    | Two letters|
| `%I` | `02`     | Hour in 12-hour time (01-12)                | Two digits|
| `%M` | `16`     | Minute (00-59)                              | Two digits|
| `%S` | `09`     | Second (00-59)                              | Two digits|
| `%f` | `090000` | Microseconds (000000-999999)                | Six digits|

---

These codes will allow you to parse strings into the datetime module by providing the `strptime()` function with both the string from the data file and a description of how the date and time information is organized. 

For example, below is a file where the collection time is included in the file name as hour, minutes, seconds separated by hyphens.

In [94]:
file_name_1 = 'Absorbance_12-03-48.txt' 
timestamp = datetime.datetime.strptime(file_name_1[-12:-4], '%H-%M-%S')
timestamp

datetime.datetime(1900, 1, 1, 12, 3, 48)

Because the *date (i.e., year, month, and day)* information were not provided, **default values of January 1, 1900** is chosen for the datetime object. 

If you only want the **date** or **time** information, you can access them using the `date()` or `time()` functions, respectively.

In [96]:
timestamp.date()

datetime.date(1900, 1, 1)

In [97]:
timestamp.time()

datetime.time(12, 3, 48)

If the values are not formatted like Python assumes, a little extra effort may be required. 

For example, below the time is formatted at hours-minutes-seconds-microseconds, but microseconds is not represented as six digits with zero padding like Python assumed. To deal with this, the microseconds are sliced out of the file name and added to the `datetime` object using the `replace()` method.

In [99]:
# Problem: File's microseconds (e.g., '215') don't match %f (6 digits).
# Solution: Extract microseconds via slicing, convert to int, scale (x1000), then use .replace() to update.

file_name_2 = 'glucose_Absorbance_12-03-48-215.txt'

time = datetime.datetime.strptime(file_name_2[-16:-8], '%H-%M-%S')
time.replace(microsecond = int(file_name_2[-7:-4]))

datetime.datetime(1900, 1, 1, 12, 3, 48, 215)