# Exercise 07 Traffic Demand Prediction

In this exercise, you need to follow the requirements of each question to generate the Python code, and the following example is for reference：

- Sample Question: Write a program that takes the user's name as input and prints "Hello, [name]!" where [name] is the user's input.

- Potential Answer:

```python
    name = input("Enter your name: ")
    print("Hello, " + name + "!")
```
- If you enter 'David', the code will output 'Hello, David!', and this will satisfy the requirements.

## Attention
- Generally, there will be multiple answers for one question and you don't have to strictly follow the instructions in the tutorial, as long as you can make the output of the code meet the requirements of the question.
- If possible, strive to make your code concise and avoid excessive reliance on less commonly used libraries.
- You may need to search for information on the Internet to complete the excercise.

### Question 01: A NumPy dataset file named "processed_demand_datasetsMAN.npz" contains training set (trainX and trainy), validation set (valX and valy), and test set (testX and testy). Load the training set and display its shape.

### Write your answer in the following code frame:

In [1]:
import pandas as pd  
import numpy as np

datasets = np.load('./processed_demand_datasetsMAN.npz')
X_train, X_val, X_test, y_train, y_val, y_test = datasets['trainX'], datasets['valX'], datasets['testX'], datasets['trainy'], datasets['valy'], datasets['testy']

print('The dimensions of training dataset:',X_train.shape)

The dimensions of training dataset: (63, 886, 42)


### Question 02: Define a function named `mean_absolute_percentage_error` used to calculate Mean Absolute Percentage Error (MAPE), then based on `mean_absolute_percentage_error`, define another function `compute_metric`, which is used to calculate evaluation metrics for a machine learning model's predictions using numpy. The evaluation metrics include Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).

### Write your answer in the following code frame:

In [10]:
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def compute_metric(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mape = mean_absolute_percentage_error(y_true[np.where(y_true > 5)[0]], y_pred[np.where(y_true > 5)[0]])
    return mae, rmse, mape

### Question 03: Fit the traffic demand data using a GradientBoostingRegressor on the training set. Make predictions using the test set and calculate the MAE, RMSE, and MAPE of the predictions. Output these metrics.

### Write your answer in the following code frame:

In [12]:
from sklearn.ensemble import GradientBoostingRegressor
# Fit the GradientBoostingRegressor
gbdt = GradientBoostingRegressor(random_state=666)
gbdt.fit(X_train.reshape(-1, 42), y_train.reshape(-1))

# Predict using the test set
pred = gbdt.predict(X_test.reshape(-1, 42))

# Calculate metrics
mae, rmse, mape = compute_metric(y_test.reshape(-1), pred)
print(f'Test MAE: {mae:.3f}, RMSE: {rmse:.3f}, MAPE: {mape:.3f}')

Test MAE: 2.224, RMSE: 4.432, MAPE: 22.182


### Question 04:  Fit the traffic demand data using RandomForestRegressor and MLPRegressor. Make predictions using the test set for each model and calculate the MAE, RMSE, and MAPE. Organize these metrics into a Pandas DataFrame and display it.

### Write your answer in the following code frame:

In [18]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
# Fit models and calculate metrics

rf = RandomForestRegressor(random_state=666)
mlp = MLPRegressor(random_state=200, max_iter=400)

rf.fit(X_train.reshape(-1, 42), y_train.reshape(-1))
mlp.fit(X_train.reshape(-1, 42), y_train.reshape(-1))

models = [rf, mlp]
model_names = ['Random Forest', 'MLP']
results = []

for model, name in zip(models, model_names):
    pred = model.predict(X_test.reshape(-1, 42))
    mae, rmse, mape = compute_metric(y_test.reshape(-1), pred)
    results.append({'Model': name, 'MAE': mae, 'RMSE': rmse, 'MAPE': mape})

# Create a Pandas DataFrame
results_df = pd.DataFrame(results)

# Display the DataFrame
print(results_df)

           Model       MAE      RMSE       MAPE
0  Random Forest  2.163492  4.382477  21.648005
1            MLP  2.272669  4.587214  23.019059


### Question 05: A NumPy file named "edges_GAman.npy" contains edge information of a road network. Load the file and display the number of edges.

### Write your answer in the following code frame:


In [14]:
# Load the edges
edges = np.load('./edges_GAman.npy')

# Calculate and display the number of edges
num_edges = len(edges)
print(f'Number of edges: {num_edges}')

Number of edges: 325
