# 📝 Introduction to Python
In this lesson, we will learn the basics of Python programming. Python is a versatile and powerful programming language that is widely used in various fields, including data science, web development, and automation.

## Why Learn Python?
- Python is easy to read and write.
- It has a large community and many libraries.
- It is used in many industries and applications.

## How We Do This:
- We will start with basic concepts like variables and data types.
- Then, we will move on to more advanced topics like functions and loops.
- Finally, we will apply our knowledge to solve real-world problems.

This is a Jupyter Notebook, which is an interactive way to work with the programming language Python.

Any code inside of a code cell can be run here as if you are running the cell in the terminal as a standalone `.py` file

In [None]:
# this is a comment, preceded by a hastag - this line will not be treated as code

print("Hello World") # <- this is a print statement, allowing us to output text to the screen

How about if we want to do something a little bit more dynamic?

## Variables: Storing Data  
A **variable** is like a container that holds a value. You can name it anything (e.g., `message`).  
- `message = "Hello World"` creates a variable called `message` and stores the text "Hello World".  
- `print(message)` displays the value stored in `message`.  



In [None]:
message = "Hello World" # <- this is a variable, which stores a value

print(message) # <- this will output the value of the variable to the screen

There are lots of different types of variables, which all behave slightly differently

**Strings** (`"text"`), **integers** (`1`), and **floats** (`1.0`) are different data types.  

In [None]:
a = "Five" # <- this is a string, a sequence of characters
b = 5 # <- this is an integer, a whole number
c = 5.0 # <- this is a float, floating point or rather, a decimal number

### ✍️ Student Task: What Happens When You Add Different Types?  
Try adding `b` (an integer) and `c` (a float). Predict the result, then write code below to test it.  
*Hint: It works just like a calculator*

In [None]:
#  Your code here...

we can assign a new variable to the result of an operation

In [None]:
d = b + c 

print(d) 

### ✍️ Student Task: Errors
Try adding `a` (a string) and `b` (an integer). Predict the result, then write code below to test it.  
*Hint: Python requires compatible types for operations!*

In [None]:
#  Your code here...

## Understanding Errors: What They Are, How to Interpret Them, and Why They're Useful

## What Are Errors?
In programming and data science, an **error** is an indication that 
something unexpected or incorrect has occurred during the execution of 
your code. Errors can arise from a variety of sources, including:

1. **Syntax Errors**: Mistakes in the structure of your code (e.g., typos, 
missing punctuation).
2. **Runtime Errors**: Issues that occur while your code is running (e.g., 
division by zero, accessing a non-existent file).
3. **Logical Errors**: Errors where the code runs without crashing but 
produces incorrect results due to flawed logic or reasoning.
4. **Data-Related Errors**: Issues caused by unexpected or invalid input 
data (e.g., missing values, mismatched data types).

---

## Common Types of Errors in Python
Here are some common error types you might encounter in Python:

1. **`TypeError`**:
   - Occurs when an operation is performed on an object of the incorrect 
type.
   - Example: Trying to add a string and an integer (`"5" + 3`).

2. **`NameError`**:
   - Occurs when Python encounters a variable or function that has not 
been defined.
   - Example: Trying to use `my_variable` before it is assigned.

3. **`ValueError`**:
   - Raised when a function receives an argument of correct type but 
inappropriate value.
   - Example: Passing a negative number to a square root function.

---

## How to Interpret Errors
When an error occurs, Python provides an error message that helps you 
understand what went wrong. For example:


`print("Hello, World!")`

Suppose we misspell `print` as `prnt`:

`prnt("Hello, World!")`

The error message might look like this:

```
NameError: name 'prnt' is not defined
```

This tells you that the function or variable `prnt` does not exist. You 
can then check your code to ensure you spelled it correctly.

## Why Are Errors Useful?
1. **Debugging**:
   - Errors help you identify where and why something went wrong in your 
code. Debugging is the process of finding and fixing errors, and error 
messages are your primary tool for this.

2. **Learning**:
   - Errors are opportunities to learn more about how programming 
languages work. By understanding why an error occurs, you gain a deeper 
appreciation of the language's syntax, semantics, and limitations.

## Best Practices for Handling Errors
1. **Read the Error Message Carefully**: Start by understanding what 
Python is telling you.
2. **Check Your Code Line by Line**: Trace back through your code to 
identify where the error originated.
3. **Test with Smaller Inputs**: Simplify your code or data to isolate the 
problem.

### ✍️🐞 Student Task: Understanding and Correcting Errors (or bugs)

Correct for the errors in the examples below

In [None]:
prnt("Hello World") 

In [None]:
print"Hello World")

We can use python for more complex calculations as well

In [None]:
a = 1
b = 5
c = 3 

x_1 = ( (-b) + (b**2 - 4*a*c)**0.5 ) / 2*a 
x_2 = ( (-b) - (b**2 - 4*a*c)**0.5 ) / 2*a

print(x_1, x_2) # <- this will output the two solutions to the quadratic equation

### ⌨ F-Strings
Sometimes we want an output to look more human readable. We can combione variables and text using an f-string.

This is just a regular string but with the letter f infront of it like:

`f"Hello World"`

This allows us to also use other code inside the string like:

`f"My favourite number is {6*6}"`

In [None]:
# this is an f-string, allowing us to insert variables into a string
print(f"The solutions to the quadratic equation are {x_1:.4f} and {x_2:.4f}") 

### ✍️🐞 Student Task: Understanding and Correcting Errors (or bugs)

Correct for the *2* errors in the example below

In [None]:
a = "5"
b = "6"

print(f"{a} + {b} = {a + b})

What about if we want to calculate the solutions to the quadratic equation over and over again? 

We can define a function, which is a wrapper for a specific set of instructions which we need to use frequently

## Functions: Reusable Blocks of Code  
A **function** groups code to perform a specific task.  
- `def function_name(parameters):` defines a function.  
- Use `return` to send a result back.  

**Example:**  
The `quadratic_solver` function calculates solutions to equations like \(ax^2 + bx + c = 0\).  

In [None]:
def quadratic_solver(a, b, c): # <- this is a function, which takes inputs or arguments
    x_1 = ( (-b) + (b**2 - 4*a*c)**0.5 ) / 2*a 
    x_2 = ( (-b) - (b**2 - 4*a*c)**0.5 ) / 2*a
    print(f"The solutions to the quadratic equation are {x_1:.4f} and {x_2:.4f}") 

In [None]:
quadratic_solver(1, 9, 1) # <- this will call the function

A function can also return values to be used in a subsequent operation

In [None]:
def this_function_makes_pie(): # <- a function doesnt always need to take arguments

    # lets calculate an approximation of pi
    pi = 355/113 

    return pi # <- this will return the value

pi = this_function_makes_pie() 

print(3*pi)

We can do some pretty complicated operations inside a function. For example, lets get a function which generates a simulation of a planet

Lets download some code that I have already written from the internet. This will save the library into your workspace here so that we can use it in a second.

In [None]:
!wget https://raw.githubusercontent.com/Jools-Clarke/orbyts24/refs/heads/main/planet_visualiser.py

If you are running this code not in google colab, you might need to use the `curl` command instead. It does the same thing, but for macos systems

In [None]:
# !curl -O https://raw.githubusercontent.com/Jools-Clarke/orbyts24/refs/heads/main/planet_visualiser.py


we need to import the code for the function from the library we just downloaded as `exo`

In [None]:
import planet_visualiser as exo # <- this is how we import a module

### 🌍 Generating Planets  
`exo.generate_single_planet()` creates:  
- A random planet image  
- With optional rings/atmosphere  
- Automatically shows the image  
- Returns a planet object that can be saved  

Then we can run the function (try running the cell multiple times)

In [None]:
# this will call the function generate_single_planet, which we are telling python that it is in the module exo
exo.generate_single_planet() 

lets create a function which returns the solutions of the quadratic equation

### ✍️ Student Task: Complete the Quadratic Solver  
Modify the `quadratic_solver` function to **return** both solutions instead of printing them.  
*Hint: Use `return x_1, x_2` and call the function with `solutions = quadratic_solver(1, 9, 1)`.*

In [None]:
def my_quadratic_function(
    # Your code here...


lets test your function

In [None]:
x1, x2 = my_quadratic_function(1, 9, 1) # <- this will call the function and assign the return values to x1 and x2 variables

print(f'The first solution is {x1:.4f}')
print(f'and the second solution is {x2:.4f}')

## Loops: Repeating Actions  
A **`for` loop** repeats code for each item in a list/array.  

**Example:**  
`for row in parameters:` loops through each row of the `parameters` array.  

In [None]:
for i in [1,2,3,4,7]:
    print(i*3)

Now this works fine for 1 value, but say we want to do an operation that needs 3 input vcalues, such as the quadratic solver we have made above! We need a way to store these sets of 3 values together so we can loop through them

#### We can do this by putting the inputs into a _matrix_!!


First we need to import a specific set of instructions on how to handle a matrix from an external package called numpy

In [None]:
import numpy as np

we will need some other packages later as well so lets import them here in advance

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt
from tqdm import tqdm

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

## NumPy Arrays  
NumPy is a library for working with arrays (grids of numbers).  
- `import numpy as np` imports the library.  
- `parameters = np.array([...])` creates a 2D array.  |


# Introduction to NumPy Arrays: Creating, Indexing, and Modifying

## What Are NumPy Arrays?
NumPy  (Numerical Python) is a powerful library for working with arrays of 
numerical data in Python. A **NumPy array** is a grid-like structure 
containing multiple values of the same type.

## Creating NumPy Arrays with Zeros

To create NumPy arrays, we initialise them with zeros 
using `numpy.zeros()` like this:

In [None]:

# Create a 1D array of zeros with shape (5,)
zeros_1d = np.zeros(5)
print(zeros_1d)   # Output: [0. 0. 0. 0. 0.]


In [None]:
# Create a 2D array of zeros with shape (3, 4)
zeros_2d = np.zeros((3, 4))
print(zeros_2d)

The `numpy.zeros()` function takes the desired shape as input and returns 
an array filled with zeros.

---

## Indexing Elements in NumPy Arrays

NumPy arrays can be indexed (searched) using square brackets `[]`. 
For multi-dimensional arrays, you specify indices for each 
dimension separated by commas.

### 1D Array Indexing


In [None]:
# Access the first element of a 1D array (index 0)
print(zeros_1d[0])   # Output: 0.0

### 2D Array Indexing

In [None]:
# Access the element in row 0, column 1
print(zeros_2d[0, 1])   # Output: 0.0

### Modifying Values in NumPy Arrays

You can modify values in a NumPy array by directly assigning new values to 
specific indices.

In [None]:
# Modify the third element (index 2) to 5
zeros_1d[2] = 5
print(zeros_1d)   # Output: [0. 0. 5. 0. 0.]

In [None]:
# Modify the element in row 1, column 2 to 10
zeros_2d[1, 2] = 10
print(zeros_2d)

### Slicing Elements
You can also access subsets of an array using slicing:

In [None]:
# Access the second row and all columns in a 2D array
print(zeros_2d[1, :]) 

In [None]:
# Access all rows and the third column in a 2D array
print(zeros_2d[:, 2])

We can use the `:` (colon) as a placeholder for more than just a full row, we can also say 

`:2`, all values before two, 

or `2:`, all values after 2

In [None]:
# Access the first two elements of a 1D array
print(zeros_1d[:2])

### ✍️ Student Task: Changing Numpy Arrays

Modify the element at row 2, column 3 of zeros_2d to 7

In [None]:
zeros_2d[ # Your code here...

print(zeros_2d)

### ✍️ Student Task: Changing Multiple Elements at Once

set the first row of the new array to `[1,2,3]`

In [None]:
# Create a new array with zeros
new_array = np.zeros((3, 3))


# Modify specific elements using indexing
new_array[ # Your code here...


print(new_array)

#### Lets get back to the quadratic equations
 We define 5 sets of 3 values as a numpy array

In [None]:
parameters = np.array([ [1,9,1],
                        [1,5,3],
                        [2,8,3],
                        [1,3,2],
                        [2,6,2] ])

parameters # <- this will output the array to the screen, this is a shortcut for print() we can use in Jupyter

lets put the loops we covered earlier into use...

In [None]:
for row in parameters: # <- this is a loop, which will iterate over each row in the array
    
    print(row) # <- this will output each row to the screen


### ✍️ Student Task: Solving the quadratic equation all at once
Modify the below code to print the solutions to the quadratic equation instead of the parameters

In [None]:
for row in parameters: # <- this is a loop, which will iterate over each row in the array
    a = row[0]
    b = # Your code here...
    c = # Your code here...

    x1, x2 = # Your code here...
    print(x1, x2) # <- this will output each row to the screen

we can then use the function you created to save these solutions to another array rather than printing them

In [None]:
# first create an empty matric in which to store the results
results = np.zeros((5, 2))

results

In [None]:
#then we can input a value into this empty matrix by specifying its position

results[0, 1] = 1 # <- this will input the value 1 into the first row, second column
results[3] = [2,5] # <- this will input the values 2 and 5 into the fourth row

results


### ✍️ Student Task: Fixing a common error 
Suppose you encounter this error:

```
IndexError: list index out of range
```

This means you're trying to access an element in a list using an index 
that is too large. For example:

```python
my_list = [1, 2, 3]
print(my_list[5])  # This will raise IndexError.
```

To fix this:
- Check the length of your list using `len()`.
- Ensure your index is within valid bounds.

Fix the errors in the code below

_Hint: `print(results.shape)` will tell you the number of rows and columns in `results`_


In [None]:
print(f"the second column of results is {results[:,2]}")

In [None]:
print(f"the third row, first column value of results is {results[0,2]}")

In [None]:
ones = np.ones(5)

print(f'We can modify results by adding 1:\n {results + ones}')

We can also get the information about what iteration we are on inside a loop using `enumerate()` 

In [None]:
for i, row in enumerate(parameters): # <- this loop will iterate over each row, and also output the index as i
    a, b, c = row
    
    print(i, x1, x2)

modify this code to save the solutions

### ✍️ Student Task: Store Results in a Matrix  
Use a loop to calculate and save the quadratic solutions for each row in `parameters` into the `results` array.  
*Hint: Use your `quadratic_solver` function and assign to `results[i, 0]` and `results[i, 1]`.*

In [None]:
for i, row in enumerate(parameters): # <- this loop will iterate over each row, and also output the index as i
    a, b, c = row
    # Your code here...
    
    results[ # Your code here...

print(results)

we can also do this whole process much faster as one matrix transformation like this. This may look quite offputting, but because it is asking the computer to solve all of the solutions simpletaneously, it is much faster.

In [None]:
# we can also do this whole process as one matrix transformation like so

p = parameters
results = np.array([( (-p[:,1]) + (p[:,1]**2 - 4*p[:,0]*p[:,2])**0.5 ) / 2*p[:,0],
              ( (-p[:,1]) - (p[:,1]**2 - 4*p[:,0]*p[:,2])**0.5 ) / 2*p[:,0]]).T

results

we can make it easier to understand by using functions!

In [None]:
def mqs(parameters):
    a, b, c = parameters.T # <- .T is a matrix transpose, this swaps the rows and columns of the matrix, making it easier to work with in this case
    results = np.array([( (-b) + (b**2 - 4*a*c)**0.5 ) / 2*a,
                        ( (-b) - (b**2 - 4*a*c)**0.5 ) / 2*a]).T # <- then we transpose back to the original shape
    return results

results = mqs(parameters)

results

# 🪐 Using Python as a tool for Exoplanet Analysis

Lets generate a sample of planets. Notice we are using the `exo` module from earlier

In [None]:
exo.generate_planet_grid()

Without light, we wouldnt be able to see these planets, so lets generate some light

### 🌟 Generating Stars  
`exo.generate_star_grid()`:  
- Creates a 5x5 grid of stars  
- Each star has random size/position  
- Uses black background for space effect  

In [None]:
exo.generate_single_star()

And now a whole load more...

In [None]:
exo.generate_star_grid()

If we modify the magnification of the star, we can see some of the details of the stellar surface more clearly

In [None]:
exo.generate_star_grid(radius_range=(1, 1))

### ✍️ Student Task: What do you think are the circles we can see on these stars? 

do you think they will have an effect on our observations? 

Write some quick observations

Lets generate a single planet, and save this planet to a variable so we can use it later

In [None]:
my_planet = exo.generate_single_planet()

We can also save it to the disk as a file

In [None]:
my_planet.save("planets/myplanet.png")

We can do the same thing with a star

In [None]:
my_star = exo.generate_single_star(radius_range=(1, 1))

In [None]:
my_star.save("stars/mystar.png")

Now we can load these tweo files back in, in order to combine them to take a simulated observation.

### 📸 Combining Images  
`exo.generate_combined_image()`:  
- Overlays planet and star images  
- Simulates a planetary system observation  
- Handles image alignment automatically  

In [None]:
my_combined = exo.generate_combined_image("planets/myplanet.png", "stars/mystar.png")


In [None]:
my_combined.save("combined/mycombined.png")

# Real Life Observations

Remember from session 2, we could not actually spacially observe the star and planet system, so we were instead looking at the magnitude across different wavelengths. We can do this here in the simulation

In [None]:
colourmap = exo.default_colourmap()

Load back in the observation we have saved, and then observe it's spectrum

### 🔬 Spectral Analysis  
`exo.generate_spectrum()`:  
- Converts image colors to wavelength spectrum  
- Plots intensity vs. wavelength  
- Can save data as CSV for analysis  
- Colorbar shows visible light spectrum (400-700nm)  

In [None]:
image_path = "combined/mycombined.png"
exo.generate_spectrum(image_path)

If we change the scale of the y axis to log, we can see more details

In [None]:
my_spectral_analysis = exo.generate_spectrum(image_path, y_axis_scale='log')

### ✍️ Student Task: What is going on?

Can you see where each of these features in the spectra comes from on the planet? Find the wavelength of the planet atmosphere from the graph

---
Now we can save this spectral data as a table.

In [None]:
my_spectral_analysis.save("spectra/combinedspectralanalysis.csv")

If we use numpy, we can then load this table back in

In [None]:
x_ = np.loadtxt("spectra/combinedspectralanalysis.csv", delimiter=",", skiprows=1)
x_

This is quite difficult to interpret, so lets look instead at some summary statistics

In [None]:
x_.shape

this is 2 columns, wavelength (filter colour) and intensity, and 256 seperate wavelengths

We can do the same observation for just the star - this is out observation taken when the planet is completely behind the star

In [None]:
star_path = "stars/mystar.png"
star_analysis = exo.generate_spectrum(star_path, y_axis_scale='log')
star_analysis.save("spectra/starspectralanalysis.csv")

instead of loading these back in from the disk, we can also just use the copies of the data that we have just generated directly. This is much faster with big datasets

# 📈 Visualisation:
We can create our own plots with much more control using a function called matplotlib.

To create a plot use matplotlib's `plot` function 

**Plot the Data**: Use `plt.plot(x, y)` where `x` and `y` are arrays 

    These arrays **must be the same length** for it to work, as the correspond to pairs of coordinates

**Control the look of the line** Use the syntax `"<colour><linestyle>"` to control how the plot looks.
   
Here are some examples of colours:
- **`k`**: black
- **`r`**: red
- **`g`**: green
- **`b`**: blue

And some linestyles
- **`-`**: solid line
- **`--`**: dashed line
- **`x`**: crosses
- **`.`**: dots

**Add Titles and Labels**: Label the axes and add a title for clarity using `plt.xlabel()`, `plt.ylabel()` and `plt.title()`

**Display the Plot**: Call `plt.show()` to render the plot.

In [None]:

# Generate mockup data points
x = np.arange(0, 2 * np.pi, 0.1)
y = np.sin(x)

# Create the plot with a black dashed line
plt.plot(x, y, "k--")

# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Sine Wave Plot')

# Display the plot
plt.show()


### We can also plot multiple lines on one graph

If we do this, use `label="<line name>"` to specify which line is which

after, also run `plt.legend()` to plot the legend

In [None]:
# Create the plot with a black solid line
plt.plot(x, np.sin(x), "kx", label='Sine Wave')
plt.plot(x, np.cos(x), "r.", label='Cosine Wave')

plt.legend()

# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Trig Plot')

# Display the plot
plt.show()

### ✍️ Student Task: Visualising Data

lets use what we have learned to plot `my_spectral_analysis.wavelength` against `my_spectral_analysis.intensity`

In [None]:
plt.plot(# Your code here...
    


plt.yscale('log')
exo.add_colorbar()

### ✍️ Student Task: Background data

Now that we have data for the planet and the star, and also data from just the star on its own, how would be subtract the background data to leave us with just the data from the planet?

_Hint: you can get the intensity data from the star observation with `star_analysis.intensity`_

In [None]:
planet_intensity = # Your code here...

Lets visualise this

In [None]:

plt.plot(
    my_spectral_analysis.wavelength,
    my_spectral_analysis.intensity,
    "r--",
    label='Raw Observation Data'
    )

plt.plot(
    my_spectral_analysis.wavelength,
    planet_intensity,
    "k-",
    label='Background Removed Data'
    )

plt.legend()

exo.add_colorbar()

plt.yscale('log')

In [None]:
for i in range(3):
    
    print(i)

    exo.generate_single_planet(show_chemistry=True)

lets generate a few more examples of water worlds

In [None]:
table_of_results = np.zeros((10, 256))

for i in tqdm(range(10)):

     planet = exo.generate_single_planet(atmosphere_type='h2o', show=False)
     star = exo.generate_single_star(radius_range=(1, 1), show=False)
     planet.save(f"planets/planet_{i}.png")
     star.save(f"stars/star_{i}.png")

     combined = exo.generate_combined_image(f"planets/planet_{i}.png", f"stars/star_{i}.png", show=False)
     combined.save(f"combined/combined_{i}.png")

     sigma_spectral_analysis = exo.generate_spectrum(f"combined/combined_{i}.png", show=False)
     star_spectral_analysis = exo.generate_spectrum(f"stars/star_{i}.png", show=False)

     planet_intensity = sigma_spectral_analysis.intensity - star_spectral_analysis.intensity
    
     table_of_results[i] = planet_intensity

we can plot the spectra of these all at the same time to see the variation in their spectral features

In [None]:
for i in range(10):
    plt.plot(
        sigma_spectral_analysis.wavelength,
        table_of_results[i],
        )
    
plt.yscale('log')
exo.add_colorbar()

In [None]:
np.savetxt("spectra/h2o_spectral_analysis.csv", table_of_results, delimiter=',')

### ✍️ Student Task: Wrapping our full experiment into a function.

 Can you write an explanation of what this function is doing? 

In [None]:
def run_experiment(chemistry, n_samples=10):

    table_of_results = np.zeros((n_samples, 256))

    for i in tqdm(range(n_samples)):

        planet = exo.generate_single_planet(atmosphere_type=chemistry, show=False)
        star = exo.generate_single_star(radius_range=(1, 1), show=False)
        planet.save(f"planets/planet_{chemistry}_{i}.png")
        star.save(f"stars/star_{chemistry}_{i}.png")

        combined = exo.generate_combined_image(f"planets/planet_{chemistry}_{i}.png", f"stars/star_{chemistry}_{i}.png", show=False)
        combined.save(f"combined/combined_{chemistry}_{i}.png")

        sigma_spectral_analysis = exo.generate_spectrum(f"combined/combined_{chemistry}_{i}.png", show=False)
        star_spectral_analysis = exo.generate_spectrum(f"stars/star_{chemistry}_{i}.png", show=False)

        planet_intensity = sigma_spectral_analysis.intensity - star_spectral_analysis.intensity
        
        table_of_results[i] = planet_intensity
        
    np.savetxt(f"spectra/{chemistry}_spectral_analysis.csv", table_of_results, delimiter=',')

Lets now run the function and see! 

In [None]:
run_experiment('co2', 15)
run_experiment('ch4')

now we have generated a few examples of each of the different types of chemistry, lets see if we can teach a computer how to differentiate between them. 

## Machine Learning Basics  
**Classification** is a task where we predict categories (e.g., "h2o", "co2").  
- A **Decision Tree** is an algorithm that makes predictions by learning rules from data.  
- `training_data` contains example spectra, and `labels` are the correct categories.  


### Example Workflow

1. **Data Preparation**: Split your dataset into training and testing sets to evaluate model performance.
2. **Model Training**:
   ```python
   classifier = DecisionTreeClassifier()
   classifier.fit(training_data, training_labels)
   ```
3. **Evaluation**:
   ```python
   print(f"Accuracy after training: {classifier.score(test_data, test_labels)}")
   ```
4. **Prediction**: Use the model to make predictions with `classifier.predict()`
5. **Visualization**: Use libraries like `matplotlib` to visualize the results


First we need to load in the data

In [None]:
h2o = np.loadtxt("spectra/h2o_spectral_analysis.csv", delimiter=',')
co2 = np.loadtxt("spectra/co2_spectral_analysis.csv", delimiter=',')
ch4 = np.loadtxt("spectra/ch4_spectral_analysis.csv", delimiter=',')

training_data = np.vstack((h2o, co2, ch4))
labels = np.array(['h2o']*len(h2o) + ['co2']*len(co2) + ['ch4']*len(ch4))

### ✍️ Student Task: Combining the dataset

Look at the output of the cell below. How many samples do we have? 

In [None]:
print(training_data.shape)
print(labels)

In [None]:
# zip together the data and labels and shuffle
shuffled_data = list(zip(training_data, labels))
np.random.shuffle(shuffled_data)

# unzip the shuffled data
shuffled_data, shuffled_labels = zip(*shuffled_data)

### ✍️ Student Task: Randomness in machine learning

Why might we need to shuffle the dataset? If we trained the machine learning algorithm using the data which was still in order what would be the risk?

In [None]:
print(shuffled_data.shape)
print(shuffled_labels)

In [None]:
training_data = np.array(shuffled_data)[:20]
training_labels = np.array(shuffled_labels)[:20]

test_data = np.array(shuffled_data)[20:]
test_labels = np.array(shuffled_labels)[20:]

print('train')
print(training_labels)

print('test')
print(test_labels)


#### Next we can create and train the classifier!

In [None]:
# Create the classifier
classifier = DecisionTreeClassifier()

# Train the classifier on the training set
classifier.fit(training_data, training_labels)

To know how well the classifier has done, we can use a metric of accuracy from the training data.

In [None]:
# Validate the classifier on the testing set using classification accuracy
print(f"Accuracy after training: {classifier.score(test_data, test_labels)}")

### ✍️ Student Task: Test Train Leakage
Why do we split the data into training and testing sets.  
What happens when we evaluate the model performance on the training set.
Why might this be? 

#### Evaluate how well the model has performed using the testing dataset.
_Hint: You will need to compare against `test_labels`_

In [None]:
test_predictions = classifier.predict(test_data)

model_accuracy = # Your code here...

# 💯 Model Evaluation:
We can evaluate the performance of a model across different classes using something called a confusion grid. This shows where the model performed well, and where it performed worse, by comparing the actual value against the predicted one across all samples

Looking at the grid below, which two types of atmosphere are the most similar to each opther and therefore easily confused? 

In [None]:
# Unique classes
classes = np.unique(np.concatenate((test_labels, test_predictions)))
n_classes = len(classes)

# Create a confusion grid
grid = np.zeros((n_classes, n_classes), dtype=int)

# Populate the grid
for gt, pred in zip(test_labels, test_predictions):
    x = np.where(classes == gt)[0][0]  # Ground truth index
    y = np.where(classes == pred)[0][0]  # Predicted value index
    grid[y, x] += 1

# Plot the grid
plt.figure(figsize=(6, 6))
plt.imshow(grid, cmap='Blues', interpolation='nearest')
plt.xticks(range(n_classes), classes, rotation=45)
plt.yticks(range(n_classes), classes)
plt.xlabel("Ground Truth")
plt.ylabel("Predicted")
plt.colorbar(label="Count", shrink=0.74)

for i in range(n_classes):
    for j in range(n_classes):
        plt.text(j, i, grid[i, j], ha='center', va='center', color='black')
plt.tight_layout()
plt.show()


# 🌟 Task 1: Predict a Planet’s Atmosphere   🌟  
---

1. Generate a new planet.  
2. Compute its spectrum.  
3. Use your classifier to predict its atmosphere.  

In [None]:
planet = # generate your planet here...
chemistry = planet.atmosphere_type
print(f"Actual atmosphere: {chemistry}")
star = # generate your star here...
planet.save(f"planets/planet_{chemistry}_temp.png")
star.save(f"stars/star_{chemistry}_temp.png")

In [None]:
combined = exo.generate_combined_image(f"planets/planet_{chemistry}_temp.png", f"stars/star_{chemistry}_temp.png", show=True)
combined.save(f"combined/combined_{chemistry}_temp.png")

sigma_spectral_analysis = exo.generate_spectrum(f"combined/combined_{chemistry}_temp.png", show=True, y_axis_scale='log')
star_spectral_analysis = exo.generate_spectrum(f"stars/star_{chemistry}_temp.png", show=True, y_axis_scale='log')

planet_intensity = # remove the star intensity from the observation

In [None]:
plt.plot(# Your code here...

plt.legend()
exo.add_colorbar()
plt.yscale('log')

In [None]:
prediction = classifier.predict(planet_intensity.reshape(1, -1))

print(# Your code here...

# 🌟 Task 2: Enhancing Our Classifier with Uncertainty Predictions 🌟  
---

## Part 1: Why Uncertainty Matters  
Our current model gives "hard" predictions (e.g., "h2o"), but real science requires understanding **confidence**.  
**Your Task**:  
1. Modify the classifier to output **probabilities** (e.g., 80% "h2o", 15% "co2", 5% "ch4").  

2. Test this new model on edge cases, for example a **star with no planet** to see if it detects something unusual.  

---

## Step 1: Train a Probabilistic Model  
**Replace the Decision Tree with a Random Forest** (a model that can estimate probabilities):  
```
from sklearn.ensemble import RandomForestClassifier  

classifier = RandomForestClassifier()  
classifier.fit(training_data, training_labels)

In [None]:
classifier = RandomForestClassifier(  
    n_estimators=100,  # number of trees  
    max_depth=10,      # Limit tree depth  
    min_samples_split=5 # Require min samples to split  
)  

In [None]:
classifier.fit(training_data, training_labels) 

_Hint: use `classifier.predict_proba(<data>)` to get probabilities out of a classifier_

In [None]:
probabilities = classifier.predict_proba(# Your code here...

## 🧪 Testing Uncertainty Function
Decide on a threshold for the model to declare that it is uncertain of a prediction.

For example this should return "ambiguous"

`print(calculate_uncertainty([0.3, 0.3, 0.4]))`  

Test on real data




In [None]:
def calculate_uncertainty(probabilities):  
    # Your code here...



In [None]:
# Test it  
print(calculate_uncertainty([0.3, 0.3, 0.4]))  # Should return "ambiguous"  