**Task 1** of your assignment is asking to **modify the Graph Attention Network (GAT) tutorial** so that it performs a **trajectory prediction task**, specifically:

---

### 🔧 **What You Need to Do in Task 1:**

#### 1. **Adapt the GAT Tutorial to Predict Future Positions**
Instead of classifying node labels (like in the original tutorial), you now need to **predict the future (x, y) position** of a pedestrian, given their current and previous locations, and their connections to other pedestrians (edges).

- Each **node** in the graph is a pedestrian.
- Each **edge** represents a relationship (like proximity or visibility) between pedestrians.
- Each **node has features** like:
  - `current x`, `current y`
  - `previous x`, `previous y`
- Your **goal** is to predict:
  - `future x`, `future y`

> This turns the problem from **classification** into a **regression** task (predicting continuous values).

---

#### 2. **Use Graph Structure**
Use the `.edges` files to build the graph — optionally making it **bidirectional** or **visibility-based** as explained in the instructions.

---

#### 3. **Train Your Model and Evaluate**
Train your GAT model on a training set and evaluate it on a **separate test set** by:

- Predicting future positions for each pedestrian in the test set.
- Comparing predicted vs actual using **Euclidean distance**:

```python
euclidean_distance = np.sqrt((pred_x - true_x)**2 + (pred_y - true_y)**2)
```

You’ll probably average that over all test nodes to report a single number like:

```python
mean_distance_error = np.mean(euclidean_distances)
```

---

### Summary of Task 1

| Aspect | What to do |
|--------|------------|
| **Data** | Load `.nodes` and `.edges` files for scenes |
| **Model** | Modify GAT to output 2 values per node: `future x`, `future y` |
| **Loss** | Use **Mean Squared Error (MSE)** or **Euclidean distance loss** |
| **Eval** | Measure average Euclidean distance between prediction and ground truth on test set |


## Import packages

In [177]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd
import os
import warnings

import glob

warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 6)
pd.set_option("display.max_rows", 6)
np.random.seed(2)

## Get files

In [178]:
edge_files = glob.glob(os.path.join("dataset", "*.edges"))
node_files = glob.glob(os.path.join("dataset", "*.nodes"))


## Create dataset

In [179]:
edge_dfs = []
for file_path in edge_files:
    df = pd.read_csv(
        file_path,
        sep=", ",
        header=None,
        names=["target", "source"],
        na_values="-1"
    )
    edge_dfs.append(df)

node_dfs = []
for file_path in node_files:
    df = pd.read_csv(
        file_path,
        sep=",",
        header=None,
        names=["node id", "current x", "current y", "previous x", "previous y", "future x", "future y"],
        na_values="_"
    )
    node_dfs.append(df)

edges = pd.concat(edge_dfs, ignore_index=True)
nodes = pd.concat(node_dfs, ignore_index=True)

In [180]:
print(edges)
print(nodes)

        target      source
0     19585800  19590700.0
1     19585800  19595200.0
2     19590700  19592400.0
...        ...         ...
3456  19592800         NaN
3457  20014600         NaN
3458  20015100         NaN

[3459 rows x 2 columns]
       node id  current x  current y  ...  previous y  future x  future y
0     19502500    40972.0   -16957.0  ...    -16957.0   41185.0  -16480.0
1     19585800    12688.0    -6816.0  ...     -6816.0   13381.0   -7427.0
2     19590400   -16367.0    21644.0  ...     21644.0       NaN       NaN
...        ...        ...        ...  ...         ...       ...       ...
2306  20015100   -19196.0     3668.0  ...      3041.0  -19196.0    3668.0
2307  20015600    17568.0   -13258.0  ...    -14736.0   17568.0  -13258.0
2308  20015900    16994.0   -12152.0  ...         NaN   16994.0  -12152.0

[2309 rows x 7 columns]


## Preprocessing

In [181]:

prev_x_diff = nodes["current x"] - nodes["previous x"]
prev_y_diff = nodes["current y"] - nodes["previous y"]

nodes["previous vector x"] = prev_x_diff
nodes["previous vector y"] = prev_y_diff

fut_x_diff = nodes["future x"] - nodes["current x"]
fut_y_diff = nodes["future y"] - nodes["current y"]

nodes["future vector x"] = fut_x_diff # it's easier to predict a relative value
nodes["future vector y"] = fut_y_diff


### Remove edge without node

In [182]:
# edges_id = pd.concat([edges['target'], edges["source"]]).dropna().unique()
# nodes_filtered = nodes[nodes["node id"].isin(edges_id)]
# nodes_filtered

nodes_id = nodes["node id"].unique()
edges[edges["target"].isin(nodes_id)]
edges[edges["source"].isin(nodes_id)]
edges = edges.dropna()
edges = edges.astype(int)
edges

Unnamed: 0,target,source
0,19585800,19590700
1,19585800,19595200
2,19590700,19592400
...,...,...
3452,20002900,20004700
3453,20013400,20013402
3454,20015600,20015900


In [183]:
# moves future x and future y to last position in column
last = ["future x", "future y", "future vector x", "future vector y"]
cols = [col for col in nodes.columns if col not in last] + last
nodes = nodes[cols]
nodes

Unnamed: 0,node id,current x,current y,...,future y,future vector x,future vector y
0,19502500,40972.0,-16957.0,...,-16480.0,213.0,477.0
1,19585800,12688.0,-6816.0,...,-7427.0,693.0,-611.0
2,19590400,-16367.0,21644.0,...,,,
...,...,...,...,...,...,...,...
2306,20015100,-19196.0,3668.0,...,3668.0,0.0,0.0
2307,20015600,17568.0,-13258.0,...,-13258.0,0.0,0.0
2308,20015900,16994.0,-12152.0,...,-12152.0,0.0,0.0


## Split the dataset

In [184]:
# Obtain random indices
random_indices = np.random.permutation(range(nodes.shape[0]))

# 50/50 split
train_data = nodes.iloc[random_indices[: len(random_indices) // 2]]
test_data = nodes.iloc[random_indices[len(random_indices) // 2 :]]

# Identify test rows with missing future values
missing_future_mask = test_data[["future x", "future y"]].isna().any(axis=1)
test_rows_with_missing = test_data[missing_future_mask]

# Remove them from test_data and add to train_data
test_data = test_data[~missing_future_mask]
train_data = pd.concat([train_data, test_rows_with_missing], ignore_index=True)


In [185]:
print(train_data)
print(test_data)

       node id  current x  current y  ...  future y  future vector x  \
0     19591900     9019.0    -4136.0  ...   -4722.0            574.0   
1     20000700    40669.0   -18422.0  ...  -18350.0           1145.0   
2     19594200     8864.0    -3571.0  ...   -3939.0            276.0   
...        ...        ...        ...  ...       ...              ...   
1180  19591900    46667.0   -23477.0  ...       NaN              NaN   
1181  20000100    22475.0   -13481.0  ...       NaN              NaN   
1182  19594200    12262.0    -6186.0  ...       NaN              NaN   

      future vector y  
0              -586.0  
1                72.0  
2              -368.0  
...               ...  
1180              NaN  
1181              NaN  
1182              NaN  

[1183 rows x 11 columns]
       node id  current x  current y  ...  future y  future vector x  \
544   19585800    19495.0   -12645.0  ...  -13083.0            929.0   
2283  20013400    38201.0   -19607.0  ...  -19822.0          

## Prepare the graph

In [186]:
# Obtain paper indices which will be used to gather node states
# from the graph later on when training the model
train_indices = train_data["node id"].to_numpy()
test_indices = test_data["node id"].to_numpy()

# Obtain ground truth labels corresponding to each paper_id
train_labels = train_data[["future vector x", "future vector y"]].to_numpy()
test_labels = test_data[["future vector x", "future vector y"]].to_numpy()

# Define graph, namely an edge tensor and a node feature tensor
edges = tf.convert_to_tensor(edges[["target", "source"]])
node_states = tf.convert_to_tensor(nodes.sort_values("node id").iloc[:, 1:-4]) # remove all future values

# Print shapes of the graph
print("Edges shape:\t\t", edges.shape)
print("Node features shape:", node_states.shape)
print(edges)
print(nodes)


Edges shape:		 (2931, 2)
Node features shape: (2309, 6)
tf.Tensor(
[[19585800 19590700]
 [19585800 19595200]
 [19590700 19592400]
 ...
 [20002900 20004700]
 [20013400 20013402]
 [20015600 20015900]], shape=(2931, 2), dtype=int64)
       node id  current x  current y  ...  future y  future vector x  \
0     19502500    40972.0   -16957.0  ...  -16480.0            213.0   
1     19585800    12688.0    -6816.0  ...   -7427.0            693.0   
2     19590400   -16367.0    21644.0  ...       NaN              NaN   
...        ...        ...        ...  ...       ...              ...   
2306  20015100   -19196.0     3668.0  ...    3668.0              0.0   
2307  20015600    17568.0   -13258.0  ...  -13258.0              0.0   
2308  20015900    16994.0   -12152.0  ...  -12152.0              0.0   

      future vector y  
0               477.0  
1              -611.0  
2                 NaN  
...               ...  
2306              0.0  
2307              0.0  
2308              0.0  



In [187]:
class GraphAttention(layers.Layer):
    def __init__(
        self,
        units,
        kernel_initializer="glorot_uniform",
        kernel_regularizer=None,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.units = units
        self.kernel_initializer = keras.initializers.get(kernel_initializer)
        self.kernel_regularizer = keras.regularizers.get(kernel_regularizer)

    def build(self, input_shape):

        self.kernel = self.add_weight(
            shape=(input_shape[0][-1], self.units),
            trainable=True,
            initializer=self.kernel_initializer,
            regularizer=self.kernel_regularizer,
            name="kernel",
        )
        self.kernel_attention = self.add_weight(
            shape=(self.units * 2, 1),
            trainable=True,
            initializer=self.kernel_initializer,
            regularizer=self.kernel_regularizer,
            name="kernel_attention",
        )
        self.built = True

    def call(self, inputs):
        node_states, edges = inputs

        # Linearly transform node states
        node_states_transformed = tf.matmul(node_states, self.kernel)

        # (1) Compute pair-wise attention scores
        node_states_expanded = tf.gather(node_states_transformed, edges)
        node_states_expanded = tf.reshape(
            node_states_expanded, (tf.shape(edges)[0], -1)
        )
        attention_scores = tf.nn.leaky_relu(
            tf.matmul(node_states_expanded, self.kernel_attention)
        )
        attention_scores = tf.squeeze(attention_scores, -1)

        # (2) Normalize attention scores
        attention_scores = tf.math.exp(tf.clip_by_value(attention_scores, -2, 2))
        attention_scores_sum = tf.math.unsorted_segment_sum(
            data=attention_scores,
            segment_ids=edges[:, 0],
            num_segments=tf.reduce_max(edges[:, 0]) + 1,
        )
        attention_scores_sum = tf.repeat(
            attention_scores_sum, tf.math.bincount(tf.cast(edges[:, 0], "int32"))
        )
        attention_scores_norm = attention_scores / attention_scores_sum

        # (3) Gather node states of neighbors, apply attention scores and aggregate
        node_states_neighbors = tf.gather(node_states_transformed, edges[:, 1])
        out = tf.math.unsorted_segment_sum(
            data=node_states_neighbors * attention_scores_norm[:, tf.newaxis],
            segment_ids=edges[:, 0],
            num_segments=tf.shape(node_states)[0],
        )
        return out


class MultiHeadGraphAttention(layers.Layer):
    def __init__(self, units, num_heads=8, merge_type="concat", **kwargs):
        super().__init__(**kwargs)
        self.num_heads = num_heads
        self.merge_type = merge_type
        self.attention_layers = [GraphAttention(units) for _ in range(num_heads)]

    def call(self, inputs):
        atom_features, pair_indices = inputs

        # Obtain outputs from each attention head
        outputs = [
            attention_layer([atom_features, pair_indices])
            for attention_layer in self.attention_layers
        ]
        # Concatenate or average the node states from each head
        if self.merge_type == "concat":
            outputs = tf.concat(outputs, axis=-1)
        else:
            outputs = tf.reduce_mean(tf.stack(outputs, axis=-1), axis=-1)
        # Activate and return node states
        return tf.nn.relu(outputs)

In [188]:
class GraphAttentionNetwork(keras.Model):
    def __init__(
        self,
        node_states,
        edges,
        hidden_units,
        num_heads,
        num_layers,
        output_dim,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.node_states = node_states
        self.edges = edges
        self.preprocess = layers.Dense(hidden_units * num_heads, activation="relu")
        self.attention_layers = [
            MultiHeadGraphAttention(hidden_units, num_heads) for _ in range(num_layers)
        ]
        self.output_layer = layers.Dense(output_dim)

    def call(self, inputs):
        node_states, edges = inputs
        x = self.preprocess(node_states)
        for attention_layer in self.attention_layers:
            x = attention_layer([x, edges]) + x
        outputs = self.output_layer(x)
        return outputs

    def train_step(self, data):
        indices, labels = data

        with tf.GradientTape() as tape:
            # Forward pass
            outputs = self([self.node_states, self.edges])
            # Compute loss
            loss = self.compiled_loss(labels, tf.gather(outputs, indices))
        # Compute gradients
        grads = tape.gradient(loss, self.trainable_weights)
        # Apply gradients (update weights)
        optimizer.apply_gradients(zip(grads, self.trainable_weights))
        # Update metric(s)
        self.compiled_metrics.update_state(labels, tf.gather(outputs, indices))

        return {m.name: m.result() for m in self.metrics}

    def predict_step(self, data):
        indices = data
        # Forward pass
        outputs = self([self.node_states, self.edges])
        # Compute probabilities
        return tf.nn.softmax(tf.gather(outputs, indices))

    def test_step(self, data):
        indices, labels = data
        # Forward pass
        outputs = self([self.node_states, self.edges])
        # Compute loss
        loss = self.compiled_loss(labels, tf.gather(outputs, indices))
        # Update metric(s)
        self.compiled_metrics.update_state(labels, tf.gather(outputs, indices))

        return {m.name: m.result() for m in self.metrics}

### Train and evaluate

In [189]:
# Define hyper-parameters
HIDDEN_UNITS = 100
NUM_HEADS = 8
NUM_LAYERS = 3
OUTPUT_DIM = 2  # Predict future x and y

NUM_EPOCHS = 100
BATCH_SIZE = 256
VALIDATION_SPLIT = 0.1
LEARNING_RATE = 3e-1
MOMENTUM = 0.9

# Loss and metrics for regression
loss_fn = keras.losses.MeanSquaredError()
optimizer = keras.optimizers.SGD(LEARNING_RATE, momentum=MOMENTUM)
metrics = [keras.metrics.MeanAbsoluteError(name="mae")]

early_stopping = keras.callbacks.EarlyStopping(
    monitor="val_loss", min_delta=1e-5, patience=5, restore_best_weights=True
)

# Build and compile model
gat_model = GraphAttentionNetwork(
    node_states, edges, HIDDEN_UNITS, NUM_HEADS, NUM_LAYERS, OUTPUT_DIM
)
gat_model.compile(loss=loss_fn, optimizer=optimizer, metrics=metrics)

# Train
gat_model.fit(
    x=train_indices,
    y=train_labels,  # shape: (num_nodes, 2)
    validation_split=VALIDATION_SPLIT,
    batch_size=BATCH_SIZE,
    epochs=NUM_EPOCHS,
    callbacks=[early_stopping],
    verbose=2,
)

# Evaluate
loss, mae = gat_model.evaluate(x=test_indices, y=test_labels, verbose=0)
print("--" * 38 + f"\nTest MAE (distance error): {mae:.4f}")


Epoch 1/100


InvalidArgumentError: Graph execution error:

Detected at node graph_attention_network_5_1/multi_head_graph_attention_15_1/graph_attention_122_1/GatherV2 defined at (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel_launcher.py", line 18, in <module>

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\kernelapp.py", line 739, in start

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205, in start

  File "C:\Users\cpick\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 641, in run_forever

  File "C:\Users\cpick\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 1986, in _run_once

  File "C:\Users\cpick\AppData\Local\Programs\Python\Python312\Lib\asyncio\events.py", line 88, in _run

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 545, in dispatch_queue

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 534, in process_one

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 437, in dispatch_shell

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 362, in execute_request

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 778, in execute_request

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 449, in do_execute

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\ipykernel\zmqshell.py", line 549, in run_cell

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3047, in run_cell

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3102, in _run_cell

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\IPython\core\async_helpers.py", line 128, in _pseudo_sync_runner

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3306, in run_cell_async

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3489, in run_ast_nodes

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3549, in run_code

  File "C:\Users\cpick\AppData\Local\Temp\ipykernel_11948\1091928056.py", line 29, in <module>

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 371, in fit

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 219, in function

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 132, in multi_step_on_iterator

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\backend\tensorflow\trainer.py", line 113, in one_step_on_data

  File "C:\Users\cpick\AppData\Local\Temp\ipykernel_11948\3672007774.py", line 34, in train_step

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\layers\layer.py", line 909, in __call__

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\ops\operation.py", line 52, in __call__

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 156, in error_handler

  File "C:\Users\cpick\AppData\Local\Temp\ipykernel_11948\3672007774.py", line 25, in call

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\layers\layer.py", line 909, in __call__

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\ops\operation.py", line 52, in __call__

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 156, in error_handler

  File "C:\Users\cpick\AppData\Local\Temp\ipykernel_11948\1482980897.py", line 82, in call

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\layers\layer.py", line 909, in __call__

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 117, in error_handler

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\ops\operation.py", line 52, in __call__

  File "c:\Users\cpick\Documents\GitHub\DL_Assignements\.venv\Lib\site-packages\keras\src\utils\traceback_utils.py", line 156, in error_handler

  File "C:\Users\cpick\AppData\Local\Temp\ipykernel_11948\1482980897.py", line 39, in call

indices[2852,0] = 20002900 is not in [0, 2309)
	 [[{{node graph_attention_network_5_1/multi_head_graph_attention_15_1/graph_attention_122_1/GatherV2}}]] [Op:__inference_multi_step_on_iterator_107237]