<a href="https://colab.research.google.com/github/Loicmasioni/Deeplearningassignment/blob/main/Deep_learning_group_project_26.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Home assignment (2026)

* Author: Romain Tavenard (@rtavenar)
* License: CC-BY-NC-SA

A home assignment from a course on Deep Learning at EDHEC.

## Problem statement

In this assignment, you will work with a dataset coming from a CNES
(French Space Agency) challenge on automatic analysis of satellite spectra.
The data are provided on the course page.

You will **use the following**:
- `spectra.npy`: main spectral measurements (high-dimensional numerical data)
- `auxiliary.csv`: additional tabular information for each spectrum
- `targets.csv`: target variables for each spectrum

Your objective is to:
1. Load and explore the data.
2. Preprocess the different modalities appropriately (normalization, train/validation split, etc.).
3. Build and train a **neural network with two inputs and two outputs** using Keras.

Concretely, you should:
- Use **two inputs**:
  - One input for the spectra data (loaded from `spectra.npy`),
  - One input for the auxiliary/tabular data (loaded from `auxiliary.csv`).
- Use **two outputs**, each constiting of one of the targets in `targets.csv`

Your model should be implemented using the **Keras Functional API**, which is
specifically designed to handle models with multiple inputs and multiple outputs.
You should carefully design:
- The architecture of each input branch (spectra branch vs auxiliary-data branch),
- The way these branches are merged,
- The architecture of each output head,
- The choice of loss functions and metrics for each output,
- The strategy for training and evaluating such a model.

To understand how to build such models, you are strongly encouraged to read
the Keras guide on the Functional API, in particular the section on
models with multiple inputs and outputs:
[Keras Functional API – models with multiple inputs and outputs](https://keras.io/guides/functional_api/#models-with-multiple-inputs-and-outputs)

In your notebook, you should:
- Clearly describe the preprocessing steps for each modality,
- Justify the architecture you propose (depth, width, choice of activations, etc.),
- Explain how you combine the different inputs,
- Explain the role of each output and the associated losses,
- Compare several reasonable architectural variants,
- Justify your final choice based on appropriate validation indicators.

## Deadline

Deadline for this home assignment is **March 1st, 11:59pm, Paris time**.
You should use the link on Moodle to hand in your assignment.
A single `ipynb` file should be provided, with execution traces.
This assignment is to be done **by groups of two to three students** and names of all
students should be included in the file name.

## Data loading

Code below loads the **training data only** as NumPy arrays and pandas
DataFrames. You should then perform your own preprocessing and build the
requested multi-input / multi-output model.

In [8]:
import numpy as np
import pandas as pd

# Main spectral data (NumPy array)
spectra_path = "spectra.npy"
X_spectra = np.load(spectra_path)

# Auxiliary tabular data (pandas DataFrame)
auxiliary_path = "auxiliary.csv"
X_aux = pd.read_csv(auxiliary_path)

# Targets (pandas DataFrame)
targets_path = "targets.csv"
y = pd.read_csv(targets_path)

print("Spectra shape:", X_spectra.shape)
print("Auxiliary shape:", X_aux.shape)
print("Targets shape:", y.shape)

Spectra shape: (3000, 52, 3)
Auxiliary shape: (3000, 5)
Targets shape: (3000, 3)


At this stage, you should:
- Inspect the columns of `X_aux` and `y`,
- Decide which columns to predict (and thus define clearly your two outputs),
- Prepare train/validation splits,
- Normalize / standardize inputs where appropriate,
- Implement and train a Keras Functional model with two inputs and two outputs,
  as described in the assignment statement above.


In [9]:
import keras
from keras import layers, ops

In [10]:
print("X_spectra")
print()
print(X_spectra[1,:10,:])
print()
print("X_aux")
print()
print(X_aux.head(5))
print()
print("y")
print()
print(y.head(5))

X_spectra

[[7.27555920e+00 9.28926472e-04 1.45509458e-06]
 [6.81373911e+00 9.28081182e-04 1.30812241e-06]
 [6.38123330e+00 9.28619197e-04 1.13278649e-06]
 [5.97618103e+00 9.28165025e-04 9.86552208e-07]
 [5.59683967e+00 9.27387957e-04 8.63434296e-07]
 [5.24157722e+00 9.26965283e-04 7.58934735e-07]
 [4.90886524e+00 9.27080388e-04 6.69611297e-07]
 [4.59727234e+00 9.26342816e-04 5.92787791e-07]
 [4.30545796e+00 9.27037130e-04 5.26354574e-07]
 [4.03216667e+00 9.25644533e-04 5.04713550e-07]]

X_aux

   star_mass_kg  star_radius_m  star_temperature  planet_mass_kg  \
0  1.570836e+30    494402820.0            5033.0    1.262481e+26   
1  1.710024e+30    591890700.0            5320.0    3.959436e+25   
2  1.153272e+30    382988100.0            3985.0    8.958000e+25   
3  3.777960e+29    146231820.0            2988.0    3.810136e+25   
4  1.371996e+30    445658880.0            4925.0    8.360800e+25   

   semi_major_axis_m  
0       2.277510e+10  
1       1.217744e+10  
2       1.057672e+10  

The two target values are going to be water and cloud values

In [11]:
from sklearn.model_selection import train_test_split

# Split data into training and validation sets
X_spectra_train, X_spectra_val, X_aux_train, X_aux_val, y_train, y_val = train_test_split(
    X_spectra, X_aux, y, test_size=0.2, random_state=42
)

print("X_spectra_train shape:", X_spectra_train.shape)
print("X_spectra_val shape:", X_spectra_val.shape)
print("X_aux_train shape:", X_aux_train.shape)
print("X_aux_val shape:", X_aux_val.shape)
print("y_train shape:", y_train.shape)
print("y_val shape:", y_val.shape)

X_spectra_train shape: (2400, 52, 3)
X_spectra_val shape: (600, 52, 3)
X_aux_train shape: (2400, 5)
X_aux_val shape: (600, 5)
y_train shape: (2400, 3)
y_val shape: (600, 3)


spectra and auxiliary values are clearly not standardized and they need to be

In [12]:
from sklearn.preprocessing import StandardScaler

# Standardize X_spectra

# Reshape spectra data to (num_samples, num_features) for StandardScaler
# Here, num_features will be 52 * 3 = 156
X_spectra_train_flat = X_spectra_train.reshape(X_spectra_train.shape[0], -1)
X_spectra_val_flat = X_spectra_val.reshape(X_spectra_val.shape[0], -1)

spectra_scaler = StandardScaler()

# Fit on training data and transform both train and validation data
X_spectra_train_scaled_flat = spectra_scaler.fit_transform(X_spectra_train_flat)
X_spectra_val_scaled_flat = spectra_scaler.transform(X_spectra_val_flat)

# Reshape back to original 3D shape
X_spectra_train_scaled = X_spectra_train_scaled_flat.reshape(X_spectra_train.shape)
X_spectra_val_scaled = X_spectra_val_scaled_flat.reshape(X_spectra_val.shape)


# Standardize X_aux
aux_scaler = StandardScaler()

# Fit on training data and transform both train and validation data
# We'll use pandas DataFrames to keep column names, then convert to numpy for Keras
X_aux_train_scaled = pd.DataFrame(aux_scaler.fit_transform(X_aux_train), columns=X_aux_train.columns, index=X_aux_train.index)
X_aux_val_scaled = pd.DataFrame(aux_scaler.transform(X_aux_val), columns=X_aux_val.columns, index=X_aux_val.index)


print("Spectra (scaled) training sample (first 5 values of first feature):")
print(X_spectra_train_scaled[0, 0, :5])
print("Auxiliary (scaled) training sample (first 5 values):")
print(X_aux_train_scaled.head())


Spectra (scaled) training sample (first 5 values of first feature):
[ 2.46913601e-13 -8.29719579e-01 -8.30248809e-01]
Auxiliary (scaled) training sample (first 5 values):
      star_mass_kg  star_radius_m  star_temperature  planet_mass_kg  \
642      -0.216536      -0.250751         -0.189348       -0.177966   
700       0.556297       0.481715          0.693699       -0.169917   
226      -0.173601      -0.372829          0.239527       -0.141667   
1697      0.341621       0.441023          0.753488       -0.174670   
1010     -1.461656      -1.430835         -1.392040       -0.177008   

      semi_major_axis_m  
642           -1.217502  
700           -0.476010  
226            0.254133  
1697          -0.343600  
1010          -0.651925  


Now that the data is standardized, we are ready to build the Keras Functional API model with two inputs and two outputs.

In [38]:
y_train_list = {
    "water_output": y_train['water'].values.reshape(-1, 1),
    "cloud_output": y_train['cloud'].values.reshape(-1, 1)
}
y_val_list = {
    "water_output": y_val['water'].values.reshape(-1, 1),
    "cloud_output": y_val['cloud'].values.reshape(-1, 1)
}


In [46]:
spec_in = keras.Input(shape=(52, 3), name="spectra_input")
aux_in = keras.Input(shape=(X_aux_train_scaled.shape[1],), name="aux_input")

# Spectra Branch
x_spec = layers.Conv1D(64, kernel_size=3, activation='relu')(spec_in)
x_spec = layers.MaxPooling1D(2)(x_spec)
x_spec = layers.Flatten()(x_spec)

# Merge & Output
merged = layers.concatenate([x_spec, aux_in])
dense = layers.Dense(64, activation='relu')(merged)

water_out = layers.Dense(1, activation='sigmoid', name="water_output")(dense)
cloud_out = layers.Dense(1, activation='sigmoid', name="cloud_output")(dense)

# 3. Compile and Fit
model = keras.Model(inputs=[spec_in, aux_in], outputs=[water_out, cloud_out])

model.compile(
    optimizer='adam', 
    loss='binary_crossentropy', 
    metrics={"water_output": "accuracy", "cloud_output": "accuracy"}
)
history = model.fit(
    x={"spectra_input": X_spectra_train_scaled, "aux_input": X_aux_train_scaled.values},
    y=y_train_list,
    validation_data=({"spectra_input": X_spectra_val_scaled, "aux_input": X_aux_val_scaled.values}, y_val_list),
    epochs=50,
    batch_size=32
)

Epoch 1/50
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - cloud_output_accuracy: 0.4950 - cloud_output_loss: 0.6979 - loss: 1.3948 - water_output_accuracy: 0.4800 - water_output_loss: 0.6968 - val_cloud_output_accuracy: 0.4600 - val_cloud_output_loss: 0.6950 - val_loss: 1.3888 - val_water_output_accuracy: 0.4833 - val_water_output_loss: 0.6936
Epoch 2/50
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - cloud_output_accuracy: 0.5092 - cloud_output_loss: 0.6939 - loss: 1.3871 - water_output_accuracy: 0.5054 - water_output_loss: 0.6932 - val_cloud_output_accuracy: 0.4817 - val_cloud_output_loss: 0.6950 - val_loss: 1.3888 - val_water_output_accuracy: 0.4717 - val_water_output_loss: 0.6938
Epoch 3/50
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - cloud_output_accuracy: 0.4846 - cloud_output_loss: 0.6936 - loss: 1.3868 - water_output_accuracy: 0.5033 - water_output_loss: 0.6932 - val_cloud_output_accuracy: 0.45