### **Guide to Adapting the MATLAB ANN Code for a New Project**

#### **1. Introduction**

This document serves as a guide for modifying the provided MATLAB scripts to train a new Artificial Neural Network (ANN) model on a different dataset. By following these instructions, you can reuse the existing code architecture, saving significant development time.

The process involves updating three key files:
1.  **`guide_matlab.m`**: The main script that controls the entire workflow.
2.  **`clean_table.m`**: A helper function for initial data cleaning and preparation.
3.  **`extract_budget.m`**: A helper function for custom feature extraction from text.

---

#### **2. `guide_matlab.m` (Main Script)**

This is where most of the high-level changes will occur, from loading data to defining the model itself.

##### **Section 2: Data Loading, Cleaning, and Merging**

*   **File Paths (Lines 26-27 & 32-34)**:
    *   **What to change**: The names of your input CSV files.
    *   **Reason**: You will be working with new data files.
    *   **Original Code**:
        ```matlab
        opts_qty = detectImportOptions('Thesis Data - Architectural Quantity Cost.csv', ...);
        T_quantity = readtable('Thesis Data - Architectural Quantity Cost.csv', opts_qty);
        T_unit_cost = readtable('Thesis Data - Achitectural Unit Cost.csv', opts_cost);
        ```
    *   **Example Change (for a housing price project)**:
        ```matlab
        opts_features = detectImportOptions('housing_features.csv', ...);
        T_features = readtable('housing_features.csv', opts_features);
        T_prices = readtable('housing_prices.csv', ...);
        ```

*   **Target Variable Extraction (Lines 46-52)**:
    *   **What to change**: The logic for extracting your target variable (the value you want to predict). The original code uses a custom `extract_budget` function on a specific column. Your new project may have the target in its own clean column already.
    *   **Reason**: Your new dataset will have a different target variable and column structure.
    *   **Original Code**:
        ```matlab
        budgets = rowfun(@extract_budget, T_quantity_cleaned(:, 'Year/Budget'), ...);
        T_quantity_cleaned.Budget = budgets;
        T_quantity_cleaned.('Year/Budget') = [];
        ```
    *   **Example Change (if 'Price' is already a column)**:
        ```matlab
        % This section might be simplified or removed if the target is already clean.
        % For example, you might just rename the column:
        T_merged = renamevars(T_merged, 'Sale_Price', 'Target');
        ```

*   **Table Merging (Line 55)**:
    *   **What to change**: The join key (`'Join_Key'`) or the type of join (`innerjoin`).
    *   **Reason**: You may not need to merge tables, or you might use a different common column (e.g., 'ID', 'Project_Name').
    *   **Original Code**:
        ```matlab
        T_merged = innerjoin(T_quantity_cleaned, T_unit_cost_cleaned, 'Keys', 'Join_Key');
        ```

*   **Data Filtering (Lines 58-60)**:
    *   **What to change**: The filtering conditions. The original code removes rows with a `Budget` below 100,000.
    *   **Reason**: These filters are specific to the original project's domain and data quality.
    *   **Original Code**:
        ```matlab
        T_merged = T_merged(~isnan(T_merged.Budget), :);
        T_merged = T_merged(T_merged.Budget > 100000, :);
        ```
    *   **Example Change (for housing data)**:
        ```matlab
        T_merged = T_merged(T_merged.Target > 0, :); // Keep only houses with a positive price
        ```

##### **Section 3: Granular Feature Engineering & Visualization**

*   **Base Feature Columns (Lines 72-78)**:
    *   **What to change**: The list of column names in `base_feature_cols`.
    *   **Reason**: These are the raw features from your new dataset that you want to use to create calculated features.
    *   **Original Code**:
        ```matlab
        base_feature_cols = {
            'Quantity of plaster (sq.m.)', 'Quantity of glazed tiles (sq.m.)', ...
            ...
        };
        ```
    *   **Example Change**:
        ```matlab
        base_feature_cols = {
            'Square_Footage', 'Num_Bedrooms', 'Num_Bathrooms', ...
        };
        ```

*   **Custom Feature Extraction from Text (Lines 111-144)**:
    *   **What to change**: The entire logic for extracting features like `Num_Storeys` and `Num_Classrooms`, including the column name (`project_description_col`) and the regular expressions (`regexp`).
    *   **Reason**: This logic is highly specific to the project descriptions in the original dataset. You will need to write new rules for your new data.
    *   **Original Code**:
        ```matlab
        project_description_col = 'Project_Name_T_quantity_cleaned';
        storeys_cell = regexp(T_merged.(project_description_col), '(\d+)\s*(?:sty|x)', ...);
        ```

*   **Target Variable Transformation (Line 147)**:
    *   **What to change**: The name of the target variable being transformed (`T_merged.Budget`).
    *   **Reason**: Your target variable will have a different name.
    *   **Original Code**:
        ```matlab
        T_merged.Budget_log = log1p(T_merged.Budget);
        ```
    *   **Example Change**:
        ```matlab
        T_merged.Target_log = log1p(T_merged.Target);
        ```

*   **Correlation Heatmap Columns (Line 151)**:
    *   **What to change**: The list of column names in `heatmap_cols`.
    *   **Reason**: This should be updated to reflect your new engineered features and target variable.
    *   **Original Code**:
        ```matlab
        heatmap_cols = [individual_cost_features, 'Num_Storeys', 'Num_Classrooms', 'Budget'];
        ```

##### **Section 4: Data Preparation for the ANN Model**

*   **Final Feature & Target Columns (Lines 167-171)**:
    *   **What to change**: The list of columns in `final_feature_columns` and the target variable `y`. **This is one of the most critical steps.**
    *   **Reason**: You must explicitly tell the model which columns are inputs (X) and which one is the output (y).
    *   **Original Code**:
        ```matlab
        final_feature_columns = [individual_cost_features, 'Num_Storeys', 'Num_Classrooms'];
        X = T_merged{:, final_feature_columns};
        y = T_merged.Budget_log;
        ```
    *   **Example Change**:
        ```matlab
        final_feature_columns = {'Square_Footage', 'Num_Bedrooms', 'Year_Built'};
        X = T_merged{:, final_feature_columns};
        y = T_merged.Target_log;
        ```

##### **Section 5: Build and Train the Artificial Neural Network**

*   **Network Architecture (Lines 226-248)**:
    *   **What to change**: The `layers` array. You can add/remove layers, change the number of neurons (e.g., `fullyConnectedLayer(128)`), or adjust the dropout rate (`dropoutLayer(0.3)`).
    *   **Reason**: The optimal network structure depends entirely on the complexity of your new problem. This is the core of model tuning.
    *   **Original Code**:
        ```matlab
        layers = [
            featureInputLayer(input_size, 'Name', 'input')
            fullyConnectedLayer(128, 'Name', 'fc1')
            ...
            fullyConnectedLayer(1, 'Name', 'output')
            regressionLayer('Name', 'regression')
        ];
        ```

*   **Training Hyperparameters (Lines 251-257)**:
    *   **What to change**: The values in `trainingOptions`, such as `InitialLearnRate`, `MaxEpochs`, and `MiniBatchSize`.
    *   **Reason**: These parameters control the training process and need to be tuned to achieve the best performance on your new dataset.
    *   **Original Code**:
        ```matlab
        options = trainingOptions('adam', ...
            'InitialLearnRate', 0.001, ...
            'MaxEpochs', 200, ...
            'MiniBatchSize', 16, ...
        ...
        );
        ```

##### **Section 6 & 7: Evaluation, Plotting, and Saving**

*   **Plot Labels (Lines 324-325)**:
    *   **What to change**: The text for `xlabel` and `ylabel` on the results plot.
    *   **Reason**: To accurately describe the predicted and actual values for your new project.
    *   **Original Code**:
        ```matlab
        xlabel('Actual Budget (PHP)');
        ylabel('Predicted Budget (PHP)');
        ```
    *   **Example Change**:
        ```matlab
        xlabel('Actual House Price ($)');
        ylabel('Predicted House Price ($)');
        ```

*   **Output File Names (Throughout)**:
    *   **What to change**: The names of all saved files (`.png`, `.mat`).
    *   **Reason**: To give your saved model, scalers, and plots descriptive names that reflect the new project.
    *   **Example Files**: `correlation_matrix.png`, `ann_granular_model.mat`, `scalers_granular.mat`, etc.

---

#### **3. `clean_table.m` (Data Cleaning Helper)**

This function is highly customized for the original dataset. You will likely need to rewrite most of the logic here.

*   **Column Renaming (Lines 9-14)**:
    *   **What to change**: The logic for renaming the first column.
    *   **Reason**: Your new CSV might not have a `Var1` column.

*   **Custom Text Standardization (Lines 16-27)**:
    *   **What to change**: The `regexprep` pattern and replacement. This block is very specific to converting formats like `2x4` into `2 sty 4 cl`.
    *   **Reason**: This logic will not apply to your new data. You should adapt it for your own text cleaning needs or remove it.

*   **Join Key Creation (Lines 30-43)**:
    *   **What to change**: The pattern and logic for creating the `Join_Key`.
    *   **Reason**: This is specific to the project names in the original dataset.

*   **Column Removal (Lines 46-48)**:
    *   **What to change**: The name of the column to be removed (`'Architectural aspect'`).
    *   **Reason**: Your data will have different irrelevant columns.

*   **Feature Cleaning (Lines 50-65)**:
    *   **What to change**: The list of column names in `feature_cols`.
    *   **Reason**: These are the columns in your new dataset that are numeric but stored as text with characters like commas (`,`) or dashes (`-`).

---

#### **4. `extract_budget.m` (Custom Feature Extractor)**

This function is a template for extracting a number from a messy text field. You should rewrite it completely based on your new data's format.

*   **Extraction Logic (Lines 12-22)**:
    *   **What to change**: The entire `regexp` pattern and the `if` condition. The current logic looks for any number longer than 4 digits to avoid matching a year.
    *   **Reason**: The rules for finding the value you need in your new text data will be different. For example, you might need to find a number that comes after a specific keyword (e.g., "Price: $150,000").
    *   **Original Code**:
        ```matlab
        matches = regexp(text, '[\d,]+\.?\d*', 'match');
        if length(match_str) > 4
            ...
        end
        ```
    *   **Example Change (to find a price after a '$' sign)**:
        ```matlab
        % New function name could be extract_price.m
        matches = regexp(text, '\$([\d,]+)', 'tokens', 'once');
        if ~isempty(matches)
            price = str2double(strrep(matches{1}, ',', ''));
            return;
        end
        ```