### Importing Libraries and Modules

The cell imports several essential Python libraries for data analysis and visualization:

- `os`: A module to use Operative System functionalities
- `pandas`: A powerful data manipulation and analysis library providing data structures and operations for manipulating numerical tables and time series.
- `defaultdict`: Dict constructor, used to instantiate a dictionary of specified time
- `datetime`: Module to get current timestamp.

In [5]:
import os
import pandas as pd
from collections import defaultdict
from datetime import datetime

### Settings

- `input_folder`: Folder containing origial .csv files.
- `output_folder`: Folder to save files to.
- `groups`: File to group.
- `columns_to_average`: Columns of csv files to calculate the mean on.

In [6]:
input_folder = '../runs/'
output_folder = '../runs_mean/'

groups = ['linux_matlab', 'linux_python', 'windows_matlab', 'windows_python']

columns_to_average = [
    'loadTime', 'loadMem', 'decompTime', 'decompMem',
    'solveTime', 'solveMem', 'relativeError'
]

### Reading data from files

This section of code is needed to save the logs into dataframes.
For each file a dataframe is created and saved according to its group. 

In [7]:
group_data = defaultdict(list)

for filename in os.listdir(input_folder):
    if filename.endswith('.csv'):
        for group in groups:
            if filename.startswith(group):
                file_path = os.path.join(input_folder, filename)
                try:
                    df = pd.read_csv(file_path)
                    group_data[group].append(df)
                except Exception as e:
                    print(f"Error in file {filename}: {e}")
                break
                
os.makedirs(output_folder, exist_ok=True)

### Calculating mean

For each group, necessary columns are extracted and mean is calculated.
The other columns are simply copied from one arbitrary file to the final one.

In [8]:
for group, dfs in group_data.items():
    if not dfs:
        print(f"No files found for {group}")
        continue

    # Mean between necessary columns
    try:
        numeric_dfs = [df[columns_to_average].apply(pd.to_numeric, errors='coerce') for df in dfs]
        mean_df = sum(numeric_dfs) / len(numeric_dfs)
    except Exception as e:
        print(f"Error while calculating mean for group {group}: {e}")
        continue

    # Copy non necessary columns
    non_avg_columns = [col for col in dfs[0].columns if col not in columns_to_average]
    static_part = dfs[0][non_avg_columns].copy()

    # Building final dataframe
    final_df = pd.concat([static_part.reset_index(drop=True), mean_df.reset_index(drop=True)], axis=1)
    
    # Saving dataframe
    output_file = os.path.join(output_folder, f"mean_{group}_" + datetime.now().strftime("%y-%m-%d_%H-%M-%S") +".csv")
    final_df.to_csv(output_file, index=False)
    print(f"Created: {output_file}")


Created: ../runs_mean/mean_linux_matlab_25-06-09_15-43-28.csv
Created: ../runs_mean/mean_linux_python_25-06-09_15-43-28.csv
Created: ../runs_mean/mean_windows_matlab_25-06-09_15-43-28.csv
Created: ../runs_mean/mean_windows_python_25-06-09_15-43-28.csv
