<a href="https://colab.research.google.com/github/AI4ChemS/CHE-1147/blob/main/tutorials/tutorial_01_python_refresher.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🐍 Python Refresher  

In this notebook, we’ll do a quick refresher of Python basics.  
You’ll need this foundation for everything else we’ll do in the course (data analysis, machine learning, and chemical engineering applications).  

---

### How to use this notebook:
- **Code cells** → run them by pressing `Shift + Enter`.  
- **Markdown cells** → contain explanations (like this one).  
- You can edit cells, re-run them, and play around.  


# 0 Coding in 2025!

> ⚠️ **GenAI is your coding assistant, not your substitute!**  
> You are encouraged to use GenAI tools to help you learn and code.  
> However, **you are responsible for understanding, testing, and verifying all code you submit**.  
> Always review and document any GenAI-assisted work according to course policy.

Most IDEs now have GenAI integration. Whether you use Microsoft Code or Google colab, you can use a GenAI model to help you with coding.

**GenAI Coding Tips for Beginners:**
- Start with simple questions (e.g., "How do I make a list in Python?").
- Ask for code examples and explanations.
- If you don’t understand the answer, ask for a step-by-step breakdown.
- Try the code yourself and experiment with changes.
- Always check and test the code before using it in your work.


## 🤖 Using Gemini inside Google Colab

Colab now integrates with **Gemini**, Google’s large language model.  
You can ask Gemini questions about your code directly inside Colab.

### How to use it:
- Highlight code → Right click → "Ask Gemini"  
- Or click the **Gemini button (spark icon)** in the sidebar.  
- You can ask things like:
  - "Explain why this error happens."
  - "Refactor this loop into NumPy vectorized code."
  - "Write a docstring and unit tests for this function."
  - "Plot this data with seaborn instead of matplotlib."

---

### ✅ Tips for Colab + Gemini
- Keep prompts **small and focused** (paste the error or 5–10 lines of code).  
- Always **run and test locally** — Gemini may hallucinate APIs.  
- Use it to **learn and debug**, not to paste entire assignment solutions.  
- Document usage in assignments with an **LLM Usage Note** (same policy as ChatGPT).  

---

### ⚡ Example Workflow
1. Write some code and run it.  
2. If you get an error → highlight the cell → "Ask Gemini".  
3. Copy Gemini’s suggestion back into a new cell.  
4. Run again and verify.  


----

Let’s start with the most important first step: saying hello. 👋

In [None]:
print("Hello CHE1147! 🎉")

✅ Great! If you see `Hello CHE1147! 🎉` above, your Python environment is working.  

Next, we’ll dive into Python basics: variables, types, and operations.

# 1. Python Basics

In Python, everything is an object — numbers, text, even functions.  
We’ll start with some core building blocks:

- **Variables**: names that store values.  
- **Types**: integers, floats, strings, booleans.  
- **Casting**: converting between types.  

Let’s play!


In [None]:
# Integers
atoms = 6
print(atoms, type(atoms))

# Float (decimal number)
molecular_weight = 44.01
print(molecular_weight, type(molecular_weight))

# String (text)
name = "Carbon Dioxide"
print(name, type(name))

# Boolean (True/False)
is_gas = True
print(is_gas, type(is_gas))

Notice how `type(...)` tells us the kind of value a variable holds.  
We’ll often move between types, for example when converting numbers to text or rounding floats.

In [None]:
# Converting between types
grams = "88.02"       # stored as string
grams_float = float(grams)   # cast to float
print("Mass in grams:", grams_float)

atoms_str = str(atoms)       # convert int → string
print("We have " + atoms_str + " atoms in our molecule.")

✅ **Mini Exercise**

- Define a variable `mass = 88.02` (grams of CO₂).  
- Use the molecular weight (MW = 44.01 g/mol) to calculate the number of moles.  
- Print the result with a descriptive message.  

Hint: moles = mass / MW


In [None]:
# mass = 88.02       # grams
# MW = 44.01         # g/mol
# moles = mass / MW
# print(f"Mass = {mass} g, MW = {MW} g/mol → {moles:.2f} mol CO₂")

# 2. Collections

Often we need to store **multiple values** in one variable.  
Python gives us several **collection types**:

- **List** → ordered, changeable (`[]`)  
- **Tuple** → ordered, unchangeable (`()`)  
- **Dictionary** → key–value pairs (`{}`)  

Let’s see how each works.


### Lists

- Created with square brackets `[]`  
- Can store numbers, strings, or even other lists  
- We can **index** items (`list[0]`)  
- We can **slice** (`list[1:3]`)  
- We can **append** or modify items

In [None]:
# Example: list of atom symbols in CO2
atoms = ["C", "O", "O"]

print("Atoms:", atoms)
print("First atom:", atoms[0])       # indexing
print("Last atom:", atoms[-1])      # negative indexing
print("Slice:", atoms[0:2])         # slicing

# Lists are mutable: we can change them
atoms.append("O")  # add another O
print("Modified atoms:", atoms)


### Tuples

- Created with parentheses `()`  
- Similar to lists, but **immutable** (cannot be changed)  
- Useful when you want data to stay fixed

In [None]:
# Example: critical point of CO2 (T, P)
critical_point = (304.2, 73.8)  # Kelvin, bar

print("Critical point (T, P):", critical_point)

# Tuples are immutable:
# critical_point[0] = 310  # ❌ will give an error


### Dictionaries

- Created with curly braces `{}`  
- Store data as **key–value pairs**  
- Keys are unique identifiers, values hold the data  


In [None]:
# Example: properties of CO2
CO2 = {
    "name": "Carbon Dioxide",
    "MW": 44.01,
    "critical_T": 304.2,   # K
    "critical_P": 73.8,    # bar
    "atoms": ["C", "O", "O"]
}

print("Molecule name:", CO2["name"])
print("Molecular weight:", CO2["MW"])
print("Atoms:", CO2["atoms"])


### Mini Exercise

1. Create a list of molecules: `["CO2", "H2O", "CH4"]`.  
2. Turn it into a tuple (so it cannot be changed).  
3. Make a dictionary for `"H2O"` with:  
   - Name: Water  
   - MW: 18.02  
   - Critical temperature: 647 K  
   - Critical pressure: 220 bar  
4. Print out the dictionary values.


# 3. Control Flow

Control flow lets us **make decisions** and **repeat tasks** in code.  
We’ll look at:

- `if / elif / else` → conditional logic  
- `for` loops → repeat over a sequence (e.g., a list of molecules)  
- `while` loops → repeat until a condition is met

### 🟢 If / Elif / Else

An **if statement** allows your code to make decisions:  
- `if` → run only if a condition is `True`  
- `elif` → (else if) add more conditions  
- `else` → run if none of the above are true

In [None]:
temperature = 350  # Kelvin

if temperature < 273:
    print("The substance is solid ❄️")
elif temperature < 373:
    print("The substance is liquid 💧")
else:
    print("The substance is gas 💨")

### 🔁 For Loops

A **for loop** repeats code for each element in a sequence (list, string, etc.).  
- Use `for item in sequence:`  
- Inside the loop, `item` is updated each time.  

In [None]:
molecules = ["CO2", "H2O", "CH4", "O2"]

for mol in molecules:
    print("Analyzing:", mol)

### 🔄 While Loops

A **while loop** repeats code until a condition becomes false.  
⚠️ Be careful: if the condition never becomes false, the loop will run forever!  

In [None]:
pressure = 1  # atm
while pressure < 5:
    print("Current pressure:", pressure, "atm")
    pressure += 1

### Mini Exercise

Write a loop that goes through the list:

```python
MWs = [44.01, 18.02, 16.04, 32.00]  # CO2, H2O, CH4, O2

In [None]:
## Your code here

Example solution:
```python
MWs = [44.01, 18.02, 16.04, 32.00]

for mw in MWs:
    if mw < 20:
        print(mw, "→ Light molecule")
    elif mw < 40:
        print(mw, "→ Medium molecule")
    else:
        print(mw, "→ Heavy molecule")


# 4. Functions

A **function** is a reusable block of code.  
Instead of repeating yourself, you can **define once** and **call many times**.

- Define with `def`  
- Provide **arguments** (inputs)  
- Return a value with `return`  


In [None]:
# Define a function that greets the user
def greet(name):
    """Print a friendly greeting message."""
    print(f"Hello {name}, welcome to CHE1147! 🎉")

# Call the function
greet("Lya")
greet("Mohamad")


In [None]:
# Function to calculate moles from mass and molecular weight
def calc_moles(mass, MW):
    """
    Calculate number of moles given mass (g) and molecular weight (g/mol).
    Returns moles (float).
    """
    return mass / MW

# Example usage
mass = 88.02   # grams
MW = 44.01     # g/mol
print(f"Moles of CO2: {calc_moles(mass, MW):.2f} mol")


In [None]:
# Function with a default molecular weight
def calc_moles_CO2(mass, MW=44.01):
    """
    Calculate number of moles.
    By default, uses MW of CO2 (44.01 g/mol).
    """
    return mass / MW

print("Default (CO2):", calc_moles_CO2(88.02), "mol")
print("Custom MW (H2O):", calc_moles_CO2(18.02, MW=18.02), "mol")


In [None]:
# Function that takes a list of (mass, MW) pairs and returns moles
def calc_moles_list(masses, MWs):
    """
    Calculate moles for each molecule in lists of masses and MWs.
    Returns a list of moles.
    """
    results = []
    for m, w in zip(masses, MWs):
        results.append(m / w)
    return results

masses = [88.02, 18.02, 16.04]   # g
MWs    = [44.01, 18.02, 16.04]   # g/mol
print("Moles list:", calc_moles_list(masses, MWs))


### Mini Exercise

1. Write a function `density(mass, volume)` that returns density (g/cm³).  
2. Test it with mass = 10 g and volume = 2 cm³.  
3. Extend the function by giving volume a default value of 1 cm³.  


# 5. Modules & Libraries

Python is powerful because we can **import modules** (extra functionality).  

- `import module` → bring in the whole module  
- `from module import function` → bring in just what you need  

We’ll use both **standard library modules** (built into Python) and **external libraries** (installed separately, e.g., NumPy, Pandas, Matplotlib).


In [None]:
import math

# Using functions from math
print("Square root of 16:", math.sqrt(16))
print("Sine of 30 degrees:", math.sin(math.radians(30)))
print("log_e(10):", math.log(10))

In [None]:
import random

# Generate random numbers
print("Random number between 0 and 1:", random.random())
print("Random integer between 1 and 10:", random.randint(1, 10))

# Randomly select from a list
molecules = ["CO2", "H2O", "CH4", "O2"]
print("Random molecule:", random.choice(molecules))


### NumPy

NumPy = **Numerical Python**  
- Efficient arrays (faster than Python lists)  
- Useful for vectorized math, linear algebra, etc.

In [None]:
import numpy as np

# Create arrays
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

# Math operations
print("Array * 2:", arr * 2)
print("Array squared:", arr ** 2)

# Statistics
print("Mean:", arr.mean())
print("Standard deviation:", arr.std())


In [None]:
import matplotlib.pyplot as plt

# Create x values from 0 to 2π
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

plt.plot(x, y, label="sin(x)")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Simple Sine Wave")
plt.legend()
plt.show()


### Mini Exercise

1. Use the `math` module to compute the factorial of 5.  
2. Use the `random` module to simulate rolling a 6-sided die 5 times.  
3. Use NumPy to create an array of molecular weights `[44.01, 18.02, 16.04, 32.00]` and calculate:
   - The mean MW  
   - The maximum MW  


### 📦 Installing Packages

Not all libraries come with Python by default.  
We often need to install them first:

- With **pip** (Python’s built-in package manager):  
  ```bash
  pip install numpy

- With conda (if you’re using Anaconda/Miniconda):
  ```bash
  conda install numpy



### 📦 Installing Packages *inside Jupyter Notebooks*

You don’t always need to leave Jupyter to install a package.  
Just add a `!` (for shell command) or `%pip install` (magic command) in a code cell:

- Using pip:
```python
!pip install numpy


### 📦 Installing Packages on Google Colab

Google Colab already includes many packages (NumPy, Pandas, Matplotlib, Scikit-learn, etc.).  
But for chemistry/engineering libraries (like `rdkit`, `chemprop`, or `pyarrow`), you’ll need to install them manually.

Use `!pip install` just like in Jupyter:

```python
!pip install rdkit-pypi


### ✅ Mini Exercise

1. Try running the code above for a package like `pandas` or `seaborn`.  
2. If it fails, install it with pip or conda.  
3. Once installed, import it and check the version:

```python
import pandas as pd
print(pd.__version__)


# 6. Visualizing and Plotting with Matplotlib

Visualization is essential in data science.  
We’ll use **Matplotlib**, the standard plotting library in Python.

- `plt.plot()` → line plots  
- `plt.scatter()` → scatter plots  
- `plt.bar()` → bar charts  


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Temperature data (K)
time = np.arange(0, 10, 1)
temperature = [300, 305, 310, 315, 317, 320, 325, 330, 335, 340]

plt.plot(time, temperature, marker="o")
plt.xlabel("Time (s)")
plt.ylabel("Temperature (K)")
plt.title("Temperature rise over time")
plt.show()

In [None]:
# scatter plot

molecules = ["CO2", "H2O", "CH4", "O2"]
MW = [44.01, 18.02, 16.04, 32.00]        # g/mol
boiling_points = [-78.5, 100, -161.5, -183.0]  # °C

plt.scatter(MW, boiling_points, color="red")

for i, mol in enumerate(molecules):
    plt.text(MW[i] + 0.5, boiling_points[i], mol)

plt.xlabel("Molecular Weight (g/mol)")
plt.ylabel("Boiling Point (°C)")
plt.title("Molecular Weight vs Boiling Point")
plt.grid(True)
plt.show()


In [None]:
# Bar plot

molecules = ["CO2", "H2O", "CH4"]
critical_T = [304.2, 647.0, 190.6]  # Kelvin

plt.bar(molecules, critical_T, color=["green", "blue", "orange"])
plt.ylabel("Critical Temperature (K)")
plt.title("Critical Temperatures of Selected Molecules")
plt.show()

### Mini Exercise

1. Create a list of pressures `[1, 2, 3, 4, 5]` (atm).  
2. Compute volumes using the ideal gas law at T = 298 K, n = 1 mol, R = 0.08206 L·atm/(mol·K).  
   \[
   V = \frac{nRT}{P}
   \]  
3. Plot `P` vs `V` as a scatter plot with a line connecting the points.  
4. Label the axes and add a title.


# 7. Working with Data (Pandas Intro)

[Pandas](https://pandas.pydata.org/) is the go-to library for working with data in Python.  

- **DataFrame** = table (like Excel, but more powerful)  
- **Series** = a single column of data  


In [None]:
import pandas as pd

# Example: simple molecular dataset
data = {
    "Molecule": ["CO2", "H2O", "CH4", "O2"],
    "MW (g/mol)": [44.01, 18.02, 16.04, 32.00],
    "Boiling Point (°C)": [-78.5, 100, -161.5, -183.0],
    "Critical Temp (K)": [304.2, 647.0, 190.6, 154.6]
}

df = pd.DataFrame(data)
df


In [None]:
import pandas as pd

# Example: simple molecular dataset
data = {
    "Molecule": ["CO2", "H2O", "CH4", "O2"],
    "MW (g/mol)": [44.01, 18.02, 16.04, 32.00],
    "Boiling Point (°C)": [-78.5, 100, -161.5, -183.0],
    "Critical Temp (K)": [304.2, 647.0, 190.6, 154.6]
}

df = pd.DataFrame(data)
df


In [None]:
print("First few rows:\n", df.head())
print("\nData summary:\n", df.describe())
print("\nColumn names:", df.columns)

In [None]:
# Select a column
print(df["MW (g/mol)"])

# Select multiple columns
print(df[["Molecule", "Boiling Point (°C)"]])

# Select a row by index
print(df.loc[2])   # third row


In [None]:
# Find molecules heavier than 30 g/mol
heavy = df[df["MW (g/mol)"] > 30]
print(heavy)


In [None]:
df.plot(x="Molecule", y="Boiling Point (°C)", kind="bar", legend=False)
plt.ylabel("Boiling Point (°C)")
plt.title("Boiling Points of Molecules")
plt.show()


### Mini Exercise

1. Add a new column `"Density (g/L)"` with values `[1.98, 0.997, 0.717, 1.429]`.  
2. Filter the DataFrame to show only molecules with density > 1.0 g/L.  
3. Plot `"MW (g/mol)"` vs `"Critical Temp (K)"` as a scatter plot.  
