# Programming for Chemistry 2025/2026 @ UniMI

![logo](logo_small.png "Logo")

## Lecture 06: Random numbers, file I/O, text processing

This lecture will deal with misc topics. Eventually we will finish the exercises from the previous lectures.

## 1. Random numbers
The **`random` module** in Python provides functions to generate random numbers and make random selections. It's built on a **pseudo-random number generator**, which means the numbers it produces are deterministic but appear random. This tutorial will cover its most common functions with practical examples.

### 1.1 Core functions
Here are the most frequently used functions in the `random` module:
* **`random.random()`**: Returns a random float between 0.0 and 1.0 (exclusive of 1.0). This is the fundamental building block for many other functions.
* **`random.randint(a, b)`**: Returns a random integer between a and b, **inclusive**. This is ideal for simulating dice rolls or picking a random number from a specific range.
* **`random.uniform(a, b)`**: Returns a random floating-point number between a and b. Unlike `random.random()`, you can specify the range.
* You can also generate random numbers according to a probability distribution. The most useful is `random.gauss(mu,sigma)`.


In [None]:
# you need to import the random module
import random

In [None]:
# Generate a random float between 0 and 1. Each time you execute the code, you will get different results
for i in range(10):
    print(random.random())

In [None]:
# generate a random integer between 1 and 6
print(random.randint(1,6))

In [None]:
# generate a random float between -1 and 1
print(random.uniform(-1.0, 1.0))

In [None]:
# generate random gaussian distributed numbers with average 9.0 and standard deviation of 0.2
for i in range(10):
    print(random.gauss(9.0, 0.2))

Each time you run the code you get different random numbers. If you need the same sequence of random numbers to check that your results are reproducible, you can initialize a **random seed**.

In [None]:
# each time you execute the cell without changing seed, you'll get the same results
random.seed(1492)

for i in range(10):
    print(random.randint(1, 10))

The Python `random` module has some useful functions that work on lists and tuples:
* **`random.choice(sequence)`**: Returns a random item from a non-empty sequence, such as a list or a string.
* **`random.shuffle(x)`**: Shuffles the items of a list in place. This means it modifies the original list and doesn't return a new one. This is perfect for shuffling a deck of cards.
* **`random.sample(population, k)`**: Returns a new list containing **k** unique items chosen from the population sequence or set. It's useful for drawing multiple non-repeating items, like drawing lottery numbers.

In [None]:
# Select a random color from a list
colors = ['red', 'blue', 'green', 'yellow']
random_color = random.choice(colors)
print(f"Random color: {random_color}")

# Select a random letter from a string
random_letter = random.choice("Hello, World!")
print(f"Random letter: {random_letter}")

In [None]:
cards = ['Ace', 'King', 'Queen', 'Jack']
random.shuffle(cards)
print(f"Shuffled cards: {cards}")

In [None]:
# Pick 5 unique numbers from 1 to 90 for a lottery
lottery_numbers = random.sample(range(1, 91), 5)
print(f"Lottery numbers: {lottery_numbers}")

### Exercise 1: Roll one die
Roll one die 1000 times, and check that probability of each number is close to 1/6

In [None]:
# insert code here

### Exercise 2: Roll two dice
At least 1000 times and check that the probability is peaked in the middle. Print a sort of histogram of the probability.

In [None]:
# insert code here

### Exercise 3: Monte Carlo integration
Generate a pair of random floats in $[0,1]$ and estimate the area of a quarter of circle by counting how many pairs are inside the unit circle. Note that this is not the most efficient way to estimate $\pi$.

In [None]:
import math

def monte_carlo_pi(num_points):
   # insert code here

In [None]:
approx_pi = monte_carlo_pi(10_000_000)
print(f"Approximation of pi: {approx_pi:.15f}")
print(f"Exact pi:            {math.pi}")

### Exercise 4: random password generator
Generate a secure password with a mix of character types: letter, digits and punctuation. You can use **`random.choice()`** and **`random.shuffle()`** to generate one.

In [None]:
def generate_password(length=12):
    # insert code here

In [None]:
print(generate_password(8))

## 2. Files I/O & text processing
Working with files is a fundamental part of programming in Python. Python provides built-in functions to handle files, allowing you to **read from**, **write to**, and **modify** them.

Files can be **text** or **binary** files. Text files are human readable and contain only printable characters and numbers. Binary files are used for images, videos, sound, documents, compressed data. We will mostly work with text files.

### 2.1 Opening and Closing Files

To interact with a file, you first need to **open** it. The `open()` function is used for this purpose. It takes two primary arguments: the **file path** and the **mode** in which you want to open the file. The most common modes are:

  * `'r'` - **Read mode** (default). Used for reading content from a file.
  * `'w'` - **Write mode**. Creates a new file or overwrites an existing one.
  * `'a'` - **Append mode**. Adds new content to the end of a file without erasing the existing content.

It's crucial to **close** a file after you've finished working with it to free up system resources. You can do this with the `file.close()` method. A better and more common practice is to use a `with` statement, which automatically handles the closing of the file for you, even if errors occur.

**Explanation:**
The `with open(...) as file:` block creates a file object assigned to the variable `f`. The `'w'` mode ensures that if `notes.txt` exists, its content is erased and replaced by the new text. The `.write()` method adds the specified string to the file. The `\n` is a newline character, which moves the cursor to the next line. When the `with` block finishes, the file is automatically closed.

In [None]:
# Let's say you want to create a file named `notes.txt` and write some text into it.

# using open/close
f = open('notes.txt', 'w')
f.write('This is the first line.\n')
f.write('This is the second line.\n')
f.write('This is the third line.')
f.close()

# using with open, it is safer
with open('notes.txt', 'w') as f:
    f.write('This is the first line.\n')
    f.write('This is the second line.\n')
    f.write('This is the third line.')

In [None]:
!cat notes.txt

### 2.2 Reading a file
Python offers several ways to read a text file, including reading the entire content at once or reading it line by line.
* the `read()` method reads the entire file in one string
* the `readlines()` method return a list of strings, one for each line
* the `readline()` method reads one single line, returning `''` when the file is finished

Note that the `\n` end of lines are retained and you have to deal with them.

In [None]:
with open('notes.txt', 'r') as file:
    content = file.read()
    print(content)

In [None]:
with open('notes.txt', 'r') as file:
    content = file.readlines()
    print(content)

In [None]:
with open('notes.txt', 'r') as file:
    while True:                    # pay attention to possible infinite loop
        line = file.readline()
        if len(line) == 0:
            break
        else:
            print(line)

### Exercise 1: count characters, words and line
Opens the text file `julius_caesar.txt`. Count the number of characters, words, and lines. Use `.split()` to split into words.

In [None]:
# insert code here

In [None]:
print(lines, words, chars)

In [None]:
# the UNIX wc command does the same
!wc julius_caesar.txt

### Exercise 2: mean and standard deviation
Open the file `nacl_MD.dat`. It's a multicolumn text file. Compute the mean value and standard deviation of the second column.

Hint: `value = float(line.split()[1])`

In [None]:
# insert code here

In [None]:
print(f'{mean} +/- {stddev}')

## Exercise 3: chemical formula parser
Write a function that converts a chemical formula into a dictionary of key=atom, value=number of atoms. For instance:
* `'NaCl'` => `{'Na':1, 'Cl':1}`
* `'Ca(OH)2'` => `{'Ca':1, 'O':2, 'H':2}`
* `'Ca5Cl10'` => `{'Ca':5, 'Cl':10}`

in last case, optionally, reduce the number of unit formulas:
* `'Ca5Cl10'` => `{'Ca':1, 'Cl':2}`

To help you, I provide the list of elements of the periodic table and a function to split the formula into its components using **regular expressions**.

In [None]:
elements = ['H', 'He', 
            'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 
            'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 
            'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr',
            'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe',
            'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn',
            'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr', 'Rf', 'Db', 'Sg', 'Bh', 'Hs', 'Mt', 'Ds', 'Rg', 'Cn', 'Nh', 'Fl', 'Mc', 'Lv', 'Ts', 'Og']

In [None]:
# split a formula into elements, numbers and parenthesis and return a list
# e.g. 'Ca(OH)2' => ['Ca', '(', 'O', 'H', ')', '2']
import re

def _split_formula(formula):
    formula = formula.strip()
    return re.findall(r'([A-Z][a-z]?|\d+|\(|\))', formula)

for f in ['NaCl', 'Ca(OH)2', 'Fe2Cd(H2O)3Na', 'Ca5Cl10']:
    print(f, '=>', _split_formula(f))

In [None]:
# write a function that convert a formula into a list of lists
# e.g. 'Ca(OH)2' => [['Ca'], ['O', 'H'], ['O', 'H']]

def _formula_to_list(formula):
    # insert code here

In [None]:
for f in ['NaCl', 'Ca(OH)2', 'Fe2Cd(H2O)3Na', 'Ca5Cl10', 'Wrong']:
    print(f, '=>', _formula_to_list(f))

In [None]:
# write a RECURSIVE function that "flattens" a list
# e.g.: [['Ca'], ['O', 'H'], ['O', 'H']] => ['Ca', 'O', 'O', 'H', 'H']

def _flatten_list(mylist):
    # insert code here

In [None]:
for f in ['NaCl', 'Ca(OH)2', 'Fe2Cd(H2O)3Na', 'Ca5Cl10']:
    print(f, '=>', _flatten_list(_formula_to_list(f)))

In [None]:
# write a function that takes the flattened list and return a dictionary
# e.g. ['Ca', 'O', 'O', 'H', 'H'] => {'Ca':1, 'O':2, 'H': 2}

def _flatlist_to_dict(mylist):
    # insert code here

In [None]:
for f in ['NaCl', 'Ca(OH)2', 'Fe2Cd(H2O)3Na', 'Ca5Cl10']:
    print(f, '=>', _flatlist_to_dict(_flatten_list(_formula_to_list(f))))

In [None]:
# finally write a function that convert a formula string into the dictionay,
# optionally dividing by the number of formula units using math.gcd(list_of_numbers)
import math

def formula_to_dict(formula, reduce=False):
    # insert code here

In [None]:
for f in ['NaCl', 'Ca(OH)2', 'Fe2Cd(H2O)3Na', 'Ca5Cl10']:
    print(f, '=>', formula_to_dict(f))

In [None]:
for f in ['NaCl', 'Ca(OH)2', 'Fe2Cd(H2O)3Na', 'Ca5Cl10']:
    print(f, '=>', formula_to_dict(f, reduce=True))

## Exercise 4: calculate possible oxidation states in a molecule
Write a function that given a brute formula, and given a table of common oxidation states, finds the possible oxidation states in a molecule, such that their sum is equal to the charge of the ion.

In [None]:
# most common oxidation states, this table is incomplete
valence = { 'H': [1],
            'Li': [1], 'Na': [1], 'K': [1], 'Rb': [1], 'Cs': [1],
            'Be': [2], 'Mg': [2], 'Ca': [2], 'Sr': [2], 'Ba': [2],
            'B': [3], 'Al': [3], 'Ga': [3], 'In': [3],
            'C': [4,2,-4], 'Si': [4,-4], 'Ge': [4,-4], 'Sn': [4,2],
            'N': [5,4,3,2,1,-3], 'P': [5,3,-3], 'As': [5,3,-3], 'Bi': [5,3],
            'O': [2,-2,-1,-0.5], 'S':[6,4,2,-2], 'Se': [6,4,2,-2],
            'F': [-1], 'Cl': [7,6,5,4,3,2,1,-1], 'Br': [5,3,1,-1],
            'Sc': [3], 'Ti': [4,3,2], 'V': [5,4,3,2], 'Cr': [6,5,4,3,2], 'Mn': [7,6,4,3,2],
            'Fe': [3,2], 'Co': [3,2], 'Ni': [2], 'Cu': [2,1], 'Zn': [2],
            'Y': [3] }

In [None]:
# itertools.product(list1, list2, ...) makes the cartesian product of lists
from itertools import product

# Write a function that tries every possible combination of oxidation states and
# prints them when their sum is equal to charge of the ion
def oxidation_states(formula_dict, charge=0):
    # insert code here

In [None]:
oxidation_states(formula_to_dict('KOH'))

In [None]:
oxidation_states(formula_to_dict('LiO2'))

In [None]:
oxidation_states(formula_to_dict('H3PO4'))

In [None]:
oxidation_states(formula_to_dict('SO4'), -2)

In [None]:
oxidation_states(formula_to_dict('H5O2'), +1)

In [None]:
oxidation_states(formula_to_dict('BaTiO3'))
oxidation_states(formula_to_dict('YTiO3'))

In [None]:
oxidation_states(formula_to_dict('BaBiO3', False))

In [None]:
oxidation_states({'Ba':2, 'Bi1':1, 'Bi2':1, 'O':6})