The `process_csv` function is responsible for processing CSV files in a given folder and creating hourly data from them. Let's go through the implementation step by step:

1. The function takes two parameters: `folder_path` (the path to the folder containing the CSV files) and `output_folder` (the path to the folder where the hourly data will be saved).

2. It uses a `for` loop to iterate over each file in the `folder_path` directory. The `os.listdir` function returns a list of all files and directories in the specified path.

3. The `if` statement checks if the file has a `.csv` extension using the `endswith` method. If it does, the file is considered a CSV file and further processing is performed.

4. The `file_path` is created by joining the `folder_path` and `file_name` using the `os.path.join` function.

5. The CSV file is read into a Pandas DataFrame using the `pd.read_csv` function, and the resulting DataFrame is stored in the `data` variable.

6. The 'datetime' column in the DataFrame is converted to datetime format using the `pd.to_datetime` function.

7. The DataFrame index is set to the '_id' column using the `data.set_index` method. This assumes that '_id' is a unique identifier for each row.

8. The data is resampled by hour using the `data.resample` method with a frequency of 'H' (hourly) and the 'datetime' column as the time reference.

9. Inside the `resample` method, a lambda function is used to calculate the weighted average price and total volume for each hour. The lambda function returns a Pandas Series with two columns: 'weighted_avg_price' and 'total_volume'. If the resampled data is empty (no data for that hour), the lambda function returns 0 for both columns.

10. The columns in the `hourly_data` DataFrame are rearranged to have 'total_volume' first using the indexing operation `hourly_data[['total_volume', 'weighted_avg_price']]`.

11. The filename without the extension is extracted using the `os.path.splitext` function.

12. An `if` statement checks if the `output_directory` (specified in the main code) exists. If it doesn't, the function creates the directory using `os.makedirs(output_directory)`.

13. The new filename is created by appending '_hourly' to the original filename, and the new file path is created by joining the `output_folder` and the new filename.

14. Another `if` statement checks if the new file path already exists. If it does, it means that hourly data for that file already exists. In this case, the existing hourly data is read from the file using `pd.read_csv` with the '_id' column set as the index.

15. The existing hourly data and the newly calculated hourly data are appended together using the `existing_hourly_data.append(hourly_data)` method.

16. The combined hourly data is then sorted by the index ('_id') using the `combined_hourly_data.sort_index` method.

17. Finally, the combined hourly data is saved to the new file path using the `combined_hourly_data.to_csv` method. If the new file path doesn't exist, indicating that hourly data for that file doesn't exist yet, the hourly data is simply saved to the new file path.


In [None]:
import pandas as pd
import os

# Root directory containing subdirectories
root_directory = r'C:\Users\aksha\Downloads\data_eng_fin\classification'

# Function to process CSV files and create hourly data
def process_csv(folder_path, output_folder):
    for file_name in os.listdir(folder_path):
        if file_name.endswith('.csv'):
            file_path = os.path.join(folder_path, file_name)

            # Read the CSV file into a Pandas DataFrame
            data = pd.read_csv(file_path)

            # Convert 'datetime' column to datetime format
            data['datetime'] = pd.to_datetime(data['datetime'])

            # Set '_id' as the DataFrame index (assuming it's a unique identifier)
            data.set_index('_id', inplace=True)

            # Resample the data by hour and calculate weighted average price and total volume
            hourly_data = data.resample('H', on='datetime').apply(
                lambda x: pd.Series({
                    'weighted_avg_price': (x['v'] * x['vw']).sum() / x['v'].sum() if not x.empty else 0,
                    'total_volume': x['v'].sum() if not x.empty else 0
                })
            )

            # Rearrange columns to include 'volume' first
            hourly_data = hourly_data[['total_volume', 'weighted_avg_price']]

            # Get filename without extension
            filename_without_extension, extension = os.path.splitext(file_name)
            if not os.path.exists(output_directory):
            # Create the hourly_data folder
                 os.makedirs(output_directory)
                 print(f"Created {output_directory}")

            # New filename with '_hourly' added
            new_filename = f"{filename_without_extension}_hourly{extension}"
            new_file_path = os.path.join(output_folder, new_filename)
            print(new_file_path)

            # If hourly file already exists, append the data and sort by index (_id)
            if os.path.exists(new_file_path):
                existing_hourly_data = pd.read_csv(new_file_path, index_col='_id')
                combined_hourly_data = existing_hourly_data.append(hourly_data)
                combined_hourly_data.sort_index(inplace=True)
                combined_hourly_data.to_csv(new_file_path)
            else:
                # Save the hourly data to a new CSV file
                hourly_data.to_csv(new_file_path)

# Iterate through subdirectories
historical_fx_rates_folder = os.path.join(root_directory, 'historical_fx_rates')
output_directory = os.path.join(root_directory, 'hourly_data')
print(historical_fx_rates_folder)
if os.path.exists(historical_fx_rates_folder):
    print(f"Processing {historical_fx_rates_folder}")
    process_csv(historical_fx_rates_folder, output_directory)


In [None]:

"""
This script performs fractal dimension (FD) calculation and normalization on hourly data files.
It reads each CSV file in the input directory, calculates the FD values for each 6-hour interval,
and saves the original and normalized FD values into separate files.

Parameters:
    input_directory (str): The directory path where the input CSV files are located.
    output_directory (str): The directory path where the output files will be saved.
    new_folder_name (str): The name of the new folder to be created in the output directory.
    
Returns:
    None
"""

import os
import pandas as pd

input_directory = r'C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data'
output_directory = r'C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data/'

# Create a new folder if it doesn't exist
new_folder_name = 'fd_normalized'
new_output_directory = os.path.join(output_directory, new_folder_name)
if not os.path.exists(new_output_directory):
    os.makedirs(new_output_directory)

all_files = [os.path.join(input_directory, file) for file in os.listdir(input_directory) if file.endswith('.csv')]

for file_path in all_files:
    print(f'Operating on file: {file_path}')  # Print the operating file name
    
    df = pd.read_csv(file_path)
    df['datetime'] = pd.to_datetime(df['datetime'])
    
    file_name = os.path.splitext(os.path.basename(file_path))[0]
    
    # Initialize variables to store the mean and volatility for the first 6-hour window
    previous_mean_value = 0
    previous_volatility = 0
    
    # Lists to store FD values for normalization
    fd_values = []
    
    # Iterate through each 6-hour interval
    for idx, interval_start in enumerate(pd.date_range(df['datetime'].min(), df['datetime'].max(), freq='6H')):
        if idx == 0:
            # For the first 6-hour window, don't calculate FD
            continue
        
        interval_end = interval_start + pd.Timedelta(hours=6)
        interval_data = df[(df['datetime'] >= interval_start) & (df['datetime'] < interval_end)]
        
        # Perform calculations for the 6-hour interval using previous mean and volatility
        mean_value = previous_mean_value
        volatility = previous_volatility
        
        if len(interval_data) > 0:
            mean_value = interval_data['weighted_avg_price'].mean()
            max_price = interval_data['weighted_avg_price'].max()
            min_price = interval_data['weighted_avg_price'].min()
            
            # Avoid division by zero by checking if mean_value is not zero
            if mean_value != 0:
                volatility = (max_price - min_price) / mean_value
            else:
                volatility = 0  # Set volatility to 0 if mean_value is 0 to avoid division by zero
        
        max_min_diff = max_price - min_price if len(interval_data) > 0 else 0

        daily_crossings = 0
        for _, row in interval_data.iterrows():
            crossings = sum(((row['weighted_avg_price'] > mean_value + n * 0.025 * volatility) | (row['weighted_avg_price'] < mean_value - n * 0.025 * volatility)) for n in range(1, 101))
            daily_crossings += crossings

        fd = daily_crossings / max_min_diff if max_min_diff != 0 else 0
        fd_values.append(fd)
        
        # Update previous_mean_value and previous_volatility for the next iteration
        previous_mean_value = mean_value
        previous_volatility = volatility
    
    # Normalize FD values from 0 to 1
    min_fd = min(fd_values)
    max_fd = max(fd_values)
    normalized_fd_values = [(x - min_fd) / (max_fd - min_fd) for x in fd_values]
    
    # Create a DataFrame with original and normalized FD values for each 6-hour interval
    intervals = pd.date_range(df['datetime'].min() + pd.Timedelta(hours=6), df['datetime'].max(), freq='6H')
    fd_data = pd.DataFrame({'Interval_Start': intervals, 'Currency': [file_name] * len(intervals), 'FD_Original': fd_values, 'FD_Normalized': normalized_fd_values})
    
    # Save each currency's FD data into a separate file
    normalized_output_filename = os.path.join(new_output_directory, f'{file_name}_fractal_dimension_normalized.csv')
    fd_data.to_csv(normalized_output_filename, mode='a', header=not os.path.exists(normalized_output_filename), index=False)


In [1]:
"""
This code calculates the Fractal Dimension (FD) for each currency in a given directory of hourly data.
The FD is calculated based on the weighted average price of each currency over 6-hour intervals.
The code reads each CSV file in the input directory, performs calculations for each interval, and saves the FD data into separate files.

Parameters:
- input_directory: The directory path where the input CSV files are located.
- output_directory: The directory path where the output FD files will be saved.
- new_folder_name: The name of the new folder to be created within the output directory.
- n: The number of standard deviations used to determine the upper and lower thresholds for crossings.

Returns:
- None

"""

import os
import pandas as pd
input_directory = r'C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data'
output_directory = r'C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data/fd'

# Create a new folder if it doesn't exist
new_folder_name = 'weighted_avg_output_folder'
new_output_directory = os.path.join(output_directory, new_folder_name)
if not os.path.exists(new_output_directory):
    os.makedirs(new_output_directory)

all_files = [os.path.join(input_directory, file) for file in os.listdir(input_directory) if file.endswith('.csv')]

for file_path in all_files:
    print(f'Operating on file: {file_path}')  # Print the operating file name
    
    df = pd.read_csv(file_path)
    df['datetime'] = pd.to_datetime(df['datetime'])
    
    file_name = os.path.splitext(os.path.basename(file_path))[0]
    
    # Initialize variables to store the mean and volatility for the first 6-hour window
    previous_mean_value = 0
    previous_volatility = 0
    
    # Lists to store FD values for normalization
    fd_values = []
    min_prices = []
    max_prices = []
    volatilities = []
    vwap_values = []
    
    # Iterate through each 6-hour interval
    for idx, interval_start in enumerate(pd.date_range(df['datetime'].min(), df['datetime'].max(), freq='6H')):
        if idx == 0:
            # For the first 6-hour window, don't calculate FD
            continue
        
        interval_end = interval_start + pd.Timedelta(hours=6)
        interval_data = df[(df['datetime'] >= interval_start) & (df['datetime'] < interval_end)]
        
        # Perform calculations for the 6-hour interval using previous mean and volatility
        mean_value = previous_mean_value
        volatility = previous_volatility
        vwap = 0
        
        if len(interval_data) > 0:
            mean_value = interval_data['weighted_avg_price'].mean()
            max_price = interval_data['weighted_avg_price'].max()
            min_price = interval_data['weighted_avg_price'].min()
            
            # Calculate VWAP
            if interval_data['total_volume'].sum() != 0:  # Ensure denominator is not zero
                vwap = (interval_data['weighted_avg_price'] * interval_data['total_volume']).sum() / interval_data['total_volume'].sum()
            else:
                vwap = 0  # Set VWAP to 0 if the sum of total_volume is zero
            
            if mean_value != 0:  # Avoid division by zero for volatility calculation
                volatility = (max_price - min_price) / mean_value
            else:
                volatility = 0  # Set volatility to 0 if mean_value is 0 to avoid division by zero
        
        max_min_diff = max_price - min_price if len(interval_data) > 0 else 0

        daily_crossings = 0
        for _, row in interval_data.iterrows():
            crossings = sum(((row['weighted_avg_price'] > mean_value + n * 0.025 * volatility) | (row['weighted_avg_price'] < mean_value - n * 0.025 * volatility)) for n in range(1, 101))
            daily_crossings += crossings

        fd = daily_crossings / max_min_diff if max_min_diff != 0 else 0
        fd_values.append(fd)
        min_prices.append(min_price)
        max_prices.append(max_price)
        volatilities.append(volatility)
        vwap_values.append(vwap)
        
        # Update previous_mean_value and previous_volatility for the next iteration
        previous_mean_value = mean_value
        previous_volatility = volatility
    
    # Normalize FD values from 0 to 1
    min_fd = min(fd_values)
    max_fd = max(fd_values)
    normalized_fd_values = [(x - min_fd) / (max_fd - min_fd) for x in fd_values]
    
    # Create a DataFrame with original and normalized FD values, along with additional stats
    intervals = pd.date_range(df['datetime'].min() + pd.Timedelta(hours=6), df['datetime'].max(), freq='6H')
    fd_data = pd.DataFrame({'Interval_Start': intervals, 'FD_Original': fd_values, 'FD_Normalized': normalized_fd_values,
                            'Min_Weighted_Avg_Price': min_prices, 'Max_Weighted_Avg_Price': max_prices, 
                            'Volatility': volatilities, 'VWAP': vwap_values, 'Currency': [file_name] * len(intervals)})
    
    # Save each currency's FD data into a separate file
    normalized_output_filename = os.path.join(new_output_directory, f'{file_name}_fractal_dimension_normalized_weight.csv')
    fd_data.to_csv(normalized_output_filename, mode='a', header=not os.path.exists(normalized_output_filename), index=False)


Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\AUDCAD_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\AUDCHF_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\AUDHKD_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\AUDJPY_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\AUDNZD_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\AUDSGD_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\CADCHF_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\CADHKD_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data\CADJPY_hourly.csv
Operating on file: C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_dat

In [2]:
from pycaret.regression import *
# Establishing path for the folder which contains all the csv files.
input_folder = "C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data/fd_normalized/weighted_avg_output_folder"

# Get a list of all csv files in the input folder
csv_files = [csv_file for csv_file in os.listdir(input_folder) if csv_file.endswith('.csv')]
df_list = []

for csv_file in csv_files:
    # Read the csv file and only extract three required columns
    df_list.append(pd.read_csv(os.path.join(input_folder, csv_file)))
    
df = pd.concat(df_list, ignore_index=True)
df.dropna(subset=['VWAP'], inplace=True)
regression_setup = setup(data=df, target='VWAP', train_size=0.7)
best = compare_models()
scores = []
for i, df_ in enumerate(df_list):
    df_.dropna(subset=['VWAP'], inplace=True)
    # print(csv_files[i])  # Commented the line
    predict_model(best, data = df_)
    scores.append((csv_files[i][:-4], pull()["RMSE"][0]))
    print()

Unnamed: 0,Description,Value
0,Session id,2912
1,Target,VWAP
2,Target type,Regression
3,Original data shape,"(774705, 8)"
4,Transformed data shape,"(774705, 8)"
5,Transformed train set shape,"(542293, 8)"
6,Transformed test set shape,"(232412, 8)"
7,Numeric features,5
8,Categorical features,2
9,Preprocess,True


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
ridge,Ridge Regression,2.5582,951.6786,18.0832,0.9869,0.3633,0.5173,0.853
lar,Least Angle Regression,2.559,951.794,18.0838,0.9869,0.3635,0.5178,0.823
br,Bayesian Ridge,2.5572,951.6811,18.0832,0.9869,0.363,0.5172,0.888
en,Elastic Net,2.43,955.4896,18.1367,0.9868,0.3176,0.4238,5.097
lr,Linear Regression,2.4416,954.5261,18.1565,0.9867,0.3151,0.4117,2.43
lasso,Lasso Regression,2.4375,957.8466,18.1744,0.9867,0.3142,0.4085,4.3
llar,Lasso Least Angle Regression,2.4376,957.862,18.1745,0.9867,0.3142,0.4086,0.814
dt,Decision Tree Regressor,0.311,24070.4922,69.1691,0.8755,0.0034,0.0004,3.852
gbr,Gradient Boosting Regressor,0.5518,24072.0106,69.5714,0.8755,0.022,0.0227,20.034
rf,Random Forest Regressor,0.3598,34812.2514,81.4193,0.8533,0.0035,0.0003,246.798


AUDCAD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.372,0.3926,0.6266,-181.4421,0.2321,0.3854



AUDCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5728,0.6564,0.8102,-49.5231,0.3788,0.7398



AUDHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.8429,2.2264,1.4921,-1.2642,0.1631,0.1235



AUDJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,3.9497,26.1377,5.1125,0.4855,0.0615,0.0472



AUDNZD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,10.1325,719.1094,26.8162,0.9996,1.7336,8.8341



AUDSGD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.9099,2.411,1.5527,-142.066,0.4659,0.8361



CADCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.4192,0.4536,0.6735,-47.4003,0.283,0.5173



CADHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7254,1.9479,1.3957,-2.0118,0.1462,0.1051



CADJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,4.3042,31.8779,5.6461,0.464,0.0639,0.0488



CADSGD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5912,0.8766,0.9363,-83.4593,0.3214,0.5279



CHFHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7765,1.7804,1.3343,-8.976,0.121,0.0941



CHFJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,7.6196,119.5747,10.935,0.5139,0.1034,0.0735



CHFZAR_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,2.0174,6.1199,2.4738,0.4614,0.1892,0.1757



EURAUD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6454,0.666,0.8161,-46.3885,0.2964,0.4339



EURCAD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5609,0.6007,0.775,-89.9034,0.2649,0.3922



EURCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6567,0.7007,0.8371,-67.1704,0.3565,0.574



EURCZK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.8653,1.9247,1.3873,-0.8561,0.0544,0.0332



EURDKK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5585,0.5287,0.7271,-4188.166,0.0794,0.075



EURGBP_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5466,0.6345,0.7965,-277.1694,0.3494,0.6463



EURHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7616,1.7738,1.3318,-1.3018,0.1105,0.0794



EURHUF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,18.5598,683.1535,26.1372,0.3933,0.0815,0.0577



EURJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,6.0467,68.7364,8.2907,0.463,0.0697,0.0501



EURNOK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6579,0.79,0.8888,0.2304,0.0794,0.0702



EURPLN_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7628,2.0734,1.4399,-49.0635,0.1961,0.1779



EURSEK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5144,0.5905,0.7684,-0.1741,0.0632,0.0509



EURTRY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.8984,8.3084,2.8824,0.5984,0.3123,0.3707



EURUSD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.4892,0.6291,0.7931,-47.5705,0.2656,0.4069



EURZAR_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.6086,3.6139,1.901,0.5792,0.1268,0.1167



GBPAUD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6291,0.5783,0.7604,-25.0892,0.2581,0.3562



GBPCAD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6398,0.6294,0.7933,-44.014,0.269,0.3768



GBPCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5048,0.4478,0.6692,-23.0924,0.2485,0.3689



GBPHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7486,0.9474,0.9734,0.2776,0.0773,0.0644



GBPJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,9.3407,139.7596,11.822,0.5326,0.0795,0.0632



GBPNZD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.3672,0.4279,0.6542,-7.8648,0.1553,0.1826



GBPPLN_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7039,0.8336,0.913,-5.6729,0.135,0.1362



GBPUSD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6615,0.7682,0.8765,-33.112,0.3128,0.4633



HKDJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.2475,2.6734,1.6351,0.5138,0.1093,0.0952



NZDCAD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.4568,0.6501,0.8063,-175.7321,0.2949,0.5349



NZDCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6534,0.6435,0.8022,-197.2407,0.4683,0.951



NZDHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6798,1.9947,1.4123,-5.9331,0.16,0.1181



NZDJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,4.259,29.0288,5.3878,0.5892,0.0735,0.0587



NZDSGD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,4.9133,3569.4848,59.7452,0.9906,1.1727,4.6061



NZDUSD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.4564,0.5655,0.752,-112.8556,0.3188,0.6353



SGDCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.8073,2.5955,1.6111,-2625.3887,0.4904,1.1267



SGDJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,6.1298,56.5731,7.5215,0.1701,0.0871,0.069



USDBRL_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.389,4.7811,2.1866,-1.8503,0.3909,0.5004



USDCHF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.6685,0.7934,0.8907,-320.9393,0.4015,0.7007



USDCNY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.0296,3.2869,1.813,-34.2471,0.1864,0.1569



USDCZK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.4209,3.2499,1.8028,0.377,0.0809,0.0662



USDDKK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.559,0.7828,0.8848,-1.1733,0.1013,0.0889



USDHKD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.5574,0.5327,0.7299,-458.7023,0.0727,0.0716



USDHUF_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,27.094,1317.7759,36.3012,0.4604,0.1355,0.1037



USDJPY_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,7.5811,109.3099,10.4551,0.533,0.1035,0.0766



USDMXN_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,1.9883,4.7921,2.1891,0.5747,0.1244,0.1226



USDNOK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.9163,1.237,1.1122,0.41,0.1175,0.1152



USDPLN_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.8077,2.3233,1.5242,-9.6957,0.2318,0.2214



USDSEK_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.7769,1.0035,1.0017,0.3505,0.0983,0.0899



USDSGD_hourly_fractal_dimension_normalized_weight.csv


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Ridge Regression,0.8315,1.3872,1.1778,-386.8211,0.3859,0.623





In [3]:
def generate_currency_pair_classes_report(scores):
    """
    Generate a report of currency pair classes based on scores.

    Args:
        scores (list): A list of tuples containing currency pair scores.

    Returns:
        None
    """
    pair_errors = sorted(scores, key=lambda a: a[1])
    classes = [*["Forecastable"] * 3, *["Partially Forecastable"] * 4, *["Non Forecastable"] * 3]
    data = {"Currency Pairs": [], "Errors": [], "Class": []}

    for i in range(10):
        data["Currency Pairs"].append(pair_errors[i][0])
        data["Errors"].append(pair_errors[i][1])
        data["Class"].append(classes[i])

    output_folder = "C:/Users/aksha/Downloads/data_eng_fin/classification/hourly_data/fd_normalized/weighted_avg_output_folder/report"
    pd.DataFrame(data).to_csv(output_folder + "Currency Pair Classes.csv")
