
### **Executive Summary**

To accurately predict project budgets, we are building an Artificial Neural Network (ANN) model. This is a sophisticated analytical tool that learns from historical project data to find complex patterns and make predictions on new projects.

The entire process, from data preparation to model evaluation, can be executed in both the **Python** and **MATLAB** environments. Both approaches are built on the exact same mathematical foundations, ensuring the core logic is identical. The primary difference lies in the specific libraries and syntax used for implementation.

This document provides a side-by-side comparison, explaining each mathematical formula, its purpose, and how it is implemented in both Python and MATLAB code.

***

### **Synchronous Process and Formula Explanation**

The model development follows four main stages, as outlined in the *Mathematical Equations Analysis Report*.

#### **Stage 1: Data Preprocessing and Feature Scaling**

**Purpose:** Before feeding data into a neural network, we must clean and transform it. This stage is crucial for stabilizing the model, improving its learning speed, and ensuring that all features are treated fairly, regardless of their original scale (e.g., budget in millions vs. number of storeys).

---

**1.1. Log-Transformation of the Target Variable (`log1p`)**

*   **What it is:** This transformation is applied to the project budget (`Budget`) to handle its skewed distribution, which is common in financial data where many projects have low budgets and a few have very high budgets.

*   **The Formula:**
    `y' = ln(1 + y)`
    *   `y` is the original project budget.
    *   `ln` is the natural logarithm.
    *   `y'` is the new, log-transformed budget value.
    *   The `+1` (making it the "log1p" function) is a standard technique to handle cases where a budget might be zero, as `ln(0)` is undefined.

*   **Implementation:**
    *   **Python:** In the `Arch_python_model.ipynb` notebook, this is done using the NumPy library.
        ```python
        # Section 3: Log-Transform the Target Variable
        df_merged['Budget_log'] = np.log1p(df_merged['Budget'])
        ```
    *   **MATLAB:** In `main_script.m`, MATLAB has a built-in function that performs the same operation.
        ```matlab
        % Section 3: Log-Transform the Target Variable
        T_merged.Budget_log = log1p(T_merged.Budget);
        ```

---

**1.2. Standardization (Standard Scaler / zscore)**

*   **What it is:** This technique is applied to the input features (e.g., estimated costs of materials, number of storeys) to put them on a comparable scale.

*   **The Formula:**
    `x' = (x - μ) / σ`
    *   `x` is the original value of a feature.
    *   `μ` (mu) is the mean (average) of that feature from the training data.
    *   `σ` (sigma) is the standard deviation of that feature from the training data.
    *   `x'` is the new, standardized feature value, which will have a mean of 0 and a standard deviation of 1.

*   **Implementation:**
    *   **Python:** We use the `StandardScaler` from the Scikit-learn library, a standard for machine learning in Python.
        ```python
        # Section 4: Feature and Target Scaling
        scaler_X = StandardScaler()
        X_train_scaled = scaler_X.fit_transform(X_train)
        ```
    *   **MATLAB:** MATLAB's `zscore` function directly implements this formula. We store the mean (`mu`) and standard deviation (`sigma`) from the training set to apply to new data.
        ```matlab
        % Section 4: Data Preparation
        [X_train_scaled, scaler_X_mu, scaler_X_sigma] = zscore(X_train);
        ```

---

**1.3. Min-Max Scaling**

*   **What it is:** This technique is applied to our log-transformed target variable (`Budget_log`). It scales the values to fit within a specific range, typically `[0, 1]`. This is necessary because our neural network's final layer (the Sigmoid function in Python) outputs a value in this exact range.

*   **The Formula:**
    `y'' = (y' - min(y'_train)) / (max(y'_train) - min(y'_train))`
    *   `y'` is a log-transformed budget value.
    *   `min(y'_train)` and `max(y'_train)` are the minimum and maximum log-transformed budget values from the training data.
    *   `y''` is the final scaled value between 0 and 1.

*   **Implementation:**
    *   **Python:** We use the `MinMaxScaler` from the Scikit-learn library.
        ```python
        # Section 4: Feature and Target Scaling
        scaler_y = MinMaxScaler()
        y_train_scaled = scaler_y.fit_transform(y_train)
        ```
    *   **MATLAB:** This is calculated directly using the formula, storing the min and max values from the training data for later use.
        ```matlab
        % Section 4: Data Preparation
        scaler_y_min = min(y_train);
        scaler_y_max = max(y_train);
        y_train_scaled = (y_train - scaler_y_min) / (scaler_y_max - scaler_y_min);
        ```---

**1.4. Inverse Transformations (`expm1` and Inverse Scaling)**

*   **What they are:** After the model makes a prediction (which is on the scaled `[0, 1]` and log-transformed scale), we must reverse these steps to get the final budget in the original currency (PHP). This is a two-step process: first, reverse the Min-Max scaling, then reverse the log-transformation.

*   **The Formula (Inverse Log):**
    `y = e^(y') - 1`
    *   This is the mathematical inverse of the `log1p` function, where `e` is Euler's number.

*   **Implementation:**
    *   **Python:** We first use the scaler's `inverse_transform` method, followed by NumPy's `expm1` function.
        ```python
        # Section 6: Inverse Transform Predictions
        log_predictions = scaler_y.inverse_transform(scaled_predictions)
        final_predictions = np.expm1(log_predictions)
        ```
    *   **MATLAB:** We manually reverse the Min-Max scaling using the stored min/max values, followed by the built-in `expm1` function.
        ```matlab
        % Section 6: Inverse Transform Predictions
        log_predictions = scaled_predictions .* (scaler_y_max - scaler_y_min) + scaler_y_min;
        final_predictions = expm1(log_predictions);
        ```

***

#### **Stage 2: Neural Network Components**

**Purpose:** The neural network is the "brain" of our model. It consists of layers of interconnected "neurons" that process the input features. Activation functions are mathematical gates that decide how information flows through these neurons, allowing the network to learn complex, non-linear patterns.

---

**2.1. Rectified Linear Unit (ReLU) Activation Function**

*   **What it is:** This is the activation function used in the hidden layers of our network. It's a simple but powerful function that helps the model learn efficiently.

*   **The Formula:**
    `f(x) = max(0, x)`
    *   It simply returns the input `x` if it's positive, and 0 otherwise. This introduces non-linearity without being computationally expensive.

*   **Implementation:**
    *   **Python:** In our `RegressionNet` class, this is applied using the PyTorch library's functional module.
        ```python
        # Section 5: RegressionNet Class
        x = F.relu(self.layer1(x))
        ```    *   **MATLAB:** This is defined as a layer type in the network architecture using the Deep Learning Toolbox.
        ```matlab
        % Section 5: Build and Train the ANN
        reluLayer('Name', 'relu1')
        ```

---

**2.2. Sigmoid Activation Function**

*   **What it is:** This function is used in the **final output layer** of the Python model. It squashes any input value into a range between 0 and 1, perfectly matching the Min-Max scaled target variable we created in Stage 1.

*   **The Formula:**
    `σ(x) = 1 / (1 + e^(-x))`

*   **Implementation:**
    *   **Python:** This is applied in the forward pass of our `RegressionNet` class.
        ```python
        # Section 5: RegressionNet Class
        x = torch.sigmoid(self.output_layer(x))
        ```
    *   **MATLAB:** The MATLAB implementation does not explicitly use a sigmoid layer. Instead, it uses a `regressionLayer`, which is more general and optimized for predicting continuous values directly. The underlying mathematics of training remains the same, but the architecture is slightly different. This is a common difference in how deep learning toolboxes are structured.

***

#### **Stage 3: Model Training (Loss Function)**

**Purpose:** Training is the process of adjusting the network's internal parameters to minimize its prediction error. The "loss function" is the formula we use to measure this error.

---

**3.1. Mean Squared Error (MSE)**

*   **What it is:** MSE is the function used to quantify the model's error during training. It calculates the average of the squared differences between the true values and the predicted values.

*   **The Formula:**
    `MSE = (1/n) * Σ(y_i - ŷ_i)^2`
    *   `n` is the number of data points.
    *   `y_i` is the actual scaled target value.
    *   `ŷ_i` is the model's predicted scaled value.
    *   By squaring the difference, it penalizes larger errors more heavily, pushing the model to be more accurate on all predictions.

*   **Implementation:**
    *   **Python:** We define this as our criterion for training using PyTorch's built-in `MSELoss`.
        ```python
        # Section 5: Model Initialization and Training Setup
        criterion = nn.MSELoss()
        ```
    *   **MATLAB:** MSE is the default loss function for the `regressionLayer` used in our network definition. The toolbox handles this calculation automatically during the training process initiated by `trainNetwork`.

***

#### **Stage 4: Model Performance Evaluation**

**Purpose:** Once the model is trained, we need to evaluate its performance on unseen test data. These metrics tell us how accurate and reliable our final model is. **All evaluation is done on the final, untransformed predictions (in PHP)** to make the results interpretable.

---

**4.1. R-squared (R² or Coefficient of Determination)**

*   **What it is:** Measures the proportion of the variance in the project budget that is predictable from our input features. A value close to 1 indicates a model that explains a large portion of the budget's variability.
*   **Implementation:**
    *   **Python:** Calculated using the `r2_score` function from Scikit-learn.
        ```python
        # Section 6: Calculate and Display Performance Metrics
        r2_ann = r2_score(y_test_actual, final_predictions)
        ```
    *   **MATLAB:** Calculated directly using its mathematical formula.
        ```matlab
        % Section 6: Compute metrics
        r2_ann = 1 - sum((y_test_actual - final_predictions).^2) / sum((y_test_actual - mean(y_test_actual)).^2);
        ```

---

**4.2. Mean Absolute Error (MAE)**

*   **What it is:** Measures the average absolute difference between the predicted and actual budgets. It gives a straightforward interpretation of the average error in currency units (e.g., "on average, the model's prediction is off by X PHP").
*   **Implementation:**
    *   **Python:** Calculated using `mean_absolute_error` from Scikit-learn.
        ```python
        # Section 6: Calculate and Display Performance Metrics
        mae_ann = mean_absolute_error(y_test_actual, final_predictions)
        ```
    *   **MATLAB:** Calculated directly using its formula.
        ```matlab
        % Section 6: Compute metrics
        mae_ann = mean(abs(y_test_actual - final_predictions));
        ```

---

**4.3. Root Mean Squared Error (RMSE)**

*   **What it is:** This is the square root of the MSE. Like MAE, it is in the same units as the budget, but because it's based on squared errors, it gives a higher weight to large prediction errors. A lower RMSE is generally better.
*   **Implementation:**
    *   **Python:** Calculated using `mean_squared_error` from Scikit-learn and then taking the square root.
        ```python
        # Section 6: Calculate and Display Performance Metrics
        rmse_ann = np.sqrt(mean_squared_error(y_test_actual, final_predictions))
        ```
    *   **MATLAB:** Calculated directly using its formula.
        ```matlab
        % Section 6: Compute metrics
        rmse_ann = sqrt(mean((y_test_actual - final_predictions).^2));
        ```

***

### **Conclusion**

As demonstrated, both the **Python** and **MATLAB** workflows are grounded in the same robust mathematical principles. They follow an identical logical path: data transformation, network construction, training with an MSE loss function, and evaluation with standard industry metrics.

The choice between Python and MATLAB is one of tooling and environment preference, not one of mathematical validity. Both systems will produce a reliable and accurate budget prediction model based on the provided data.
