<a href="https://colab.research.google.com/github/Palaeoprot/ModulAAR/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src='https://drive.google.com/uc?export=view&id=1jmpFC9fmDMGKNuhGMdjMvKMXDOcW5iO2' width=700px align=centre>


MoDuLAAR is a tool for analysing amino acid racemization in protein diagenesis studies. It consists of several modules that work together to process data, perform analyses, and visualize results.

Sure! Based on the provided files and assuming they follow typical patterns, here is a markdown description for each notebook. This will help new users understand the purpose of each notebook and how they are used in the overall program.

---

## Overview of Notebooks

This program is composed of several interconnected notebooks, each designed to perform specific tasks related to the analysis of amino acid racemization and hydrolysis processes. Below is a brief overview of each notebook and its role in the program.

### 1. Data Processor (`data_processor.ipynb`)
The Data Processor notebook handles the preprocessing and cleaning of raw data. It ensures that the data is in the correct format and structure required for further analysis.

**Key Functions:**
- **Load Data**: Imports raw data from CSV files or Google Sheets.
- **Clean Data**: Cleans the data by handling missing values, correcting inconsistencies, and renaming columns for consistency.
- **Transform Data**: Converts data types and normalizes values as needed.

**Usage:**
This notebook is run first to prepare the raw data for subsequent analysis steps.

### 2. Dehydration Analyser (`dehydration_analyser.ipynb`)
The Dehydration Analyser notebook examines the effects of dehydration on amino acid concentrations and D/L ratios. It processes data to identify trends and patterns related to dehydration.

**Key Functions:**
- **Analyze Dehydration**: Processes data to evaluate the impact of dehydration on the samples.
- **Visualize Results**: Generates plots and charts to visualize dehydration effects.

**Usage:**
Run this notebook after data processing to analyze dehydration-related phenomena.

### 3. Racemization Simulator (`racemization_simulator.ipynb`)
The Racemization Simulator notebook simulates the kinetics of hydrolysis and racemization processes. It models how these processes affect amino acid concentrations and D/L ratios over time.

**Key Functions:**
- **Simulate Kinetics**: Runs simulations to model hydrolysis and racemization.
- **Calculate Ratios**: Computes D/L ratios based on simulation results.
- **Plot Results**: Visualizes the simulation outcomes through various plots.

**Usage:**
Use this notebook to simulate and understand the behavior of amino acids under different conditions.

### 4. Parameter Optimiser (`parameter_optimiser.ipynb`)
The Parameter Optimiser notebook optimizes the parameters used in the racemization and hydrolysis simulations. It ensures that the model parameters are tuned for accuracy.

**Key Functions:**
- **Optimize Parameters**: Utilizes optimization algorithms to find the best-fit parameters for the model.
- **Evaluate Model**: Assesses the model's performance and adjusts parameters accordingly.

**Usage:**
Run this notebook to fine-tune the parameters before performing detailed simulations.

### 5. Result Visualiser (`result_visualiser.ipynb`)
The Result Visualiser notebook focuses on presenting the results of the analyses and simulations in an understandable format. It generates comprehensive visualizations to aid in interpreting the data.

**Key Functions:**
- **Visualize Data**: Creates detailed plots and charts to display the analysis and simulation results.
- **Generate Reports**: Compiles the visualizations into reports for easier interpretation.

**Usage:**
Use this notebook to generate and view visual representations of the processed and simulated data.

### Main Notebook (`main.ipynb`)
The main notebook orchestrates the execution of the other notebooks. It ensures that each step is performed in the correct order, starting from data processing to final visualization.

**Key Functions:**
- **Run Notebooks**: Sequentially runs each of the other notebooks using `%run` commands.
- **Manage Workflow**: Ensures that data flows correctly from one notebook to the next, maintaining the integrity of the analysis.

**Usage:**
Execute the main notebook to run the entire program from start to finish. This provides a streamlined process for analysing amino acid racemization and hydrolysis.



In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Import necessary libraries
import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Append the path to your custom modules
sys.path.append('/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR')

# Load the color dictionary
def load_color_dictionary():
    color_dict_path = '/content/drive/MyDrive/Colab_Notebooks/Dictionaries/Colours/colors.json'
    with open(color_dict_path, 'r') as file:
        return json.load(file)

# Setup amino acid colors
def setup_amino_acid_colors(color_dict):
    amino_acid_colors = color_dict["amino_acids_colors"]
    one_to_three_letter = {'A': 'Ala', 'R': 'Arg', 'N': 'Asn', 'D': 'Asp', 'C': 'Cys', 'E': 'Glu', 'Q': 'Gln',
                           'G': 'Gly', 'H': 'His', 'I': 'Ile', 'L': 'Leu', 'K': 'Lys', 'M': 'Met', 'F': 'Phe',
                           'P': 'Pro', 'S': 'Ser', 'T': 'Thr', 'W': 'Trp', 'Y': 'Tyr', 'V': 'Val'}

    amino_acid_colors_three_letter = {one_to_three_letter[k]: v for k, v in amino_acid_colors.items() if k in one_to_three_letter}
    three_letter_to_Conc = {k: f'[{k}]' for k in amino_acid_colors_three_letter.keys()}
    amino_acid_colors_conc = {three_letter_to_Conc[k]: v for k, v in amino_acid_colors_three_letter.items() if k in three_letter_to_Conc}

    return amino_acid_colors_conc

# Load other notebooks
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/data_processor.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/dehydration_analyzer.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/racemization_simulator.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/parameter_optimizer.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/result_visualizer.ipynb"

# Define functions for temperature extraction and slope calculation
def get_middle_temperature(data):
    temperatures = sorted(data['temp (°C)'].unique())
    middle_temp = temperatures[len(temperatures) // 2]
    return middle_temp

def get_low_temperature(data):
    temperatures = data['temp (°C)'].unique()
    low_temp = np.min(temperatures)
    return low_temp

def calculate_slope(mid_temp_params, low_temp_params, mid_temp, low_temp):
    slopes = {}
    for param in mid_temp_params:
        if param in low_temp_params:
            slopes[param] = (mid_temp_params[param] - low_temp_params[param]) / (mid_temp - low_temp)
    return slopes

def save_parameters_to_csv(params, file_path):
    df = pd.DataFrame.from_dict(params, orient='index', columns=['Value'])
    df.to_csv(file_path, index=True)

# Main function
def main():
    # Define file paths
    input_file = "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/ProcessedData/real_DL_output.csv"

    # Load and process data
    processor = DataProcessor()
    real_DL = processor.process_data(input_file)
    print("Data loaded from:", input_file)
    print("Shape of loaded data:", real_DL.shape)
    print("Columns in loaded data:", real_DL.columns)

    # Load color dictionary and set up amino acid colors
    color_dict = load_color_dictionary()
    amino_acid_colors = setup_amino_acid_colors(color_dict)

    # Initial simulation parameters
    simulation_params = {
        'N': 1,
        'fold_water': 8,
        'k_internal': 0.04,
        'k_terminal': 0.01,
        'k_loss': 0.001,
        'racemization_rate_polymer': 0.001,
        'racemization_rate_terminal': 0.002,
        'racemization_rate_free': 0.01,
        'user_defined_max_time': 6000,
        'slow_internal_hydrolysis_fraction': 0.5,
        'slow_internal_hydrolysis_rate': 0.01,
        'slow_racemization_rate_BAA_fraction': 0.5,
        'slow_racemization_rate_BAA_rate': 0.001,
        'slow_racemization_rate_FAA_fraction': 0.5,
        'slow_racemization_rate_FAA_rate': 0.001,
        'num_intervals': 100,
        'initial_int_ratio': 0.9,
        'initial_term_ratio': 0.09,
        'initial_free_ratio': 0.01
    }

    # Run dehydration analysis
    water_generation, water_input, dehydration_rate = run_dehydration_analysis(real_DL, amino_acid_colors, simulation_params['num_intervals'])
    print("Dehydration analysis completed.")

    # Define the order of amino acids to process
    amino_acids = ['Val', 'Phe', 'Ile', 'Glx', 'Asx', 'Ser', 'Ala']

    # Initialize dictionary to store optimized parameters for each amino acid
    optimized_parameters = {}

    # Process each amino acid
    for amino_acid in amino_acids:
        data = real_DL[real_DL['Amino Acid'] == amino_acid]  # Filter data for the current amino acid
        middle_temp = get_middle_temperature(data)
        low_temp = get_low_temperature(data)

        # Optimize parameters for middle temperature
        mid_temp_params = optimize_parameters(data, middle_temp, simulation_params, water_input)

        # Set rates for low temperature using activation energies
        activation_energies = calculate_activation_energies(mid_temp_params)
        low_temp_params = calculate_activation_energies(mid_temp_params, activation_energies)

        # Further optimize the rates for low temperature
        low_temp_params = optimize_parameters(data, low_temp, simulation_params, water_input)

        # Calculate rates for high temperature
        high_temp_params = calculate_high_temperature_rates(mid_temp_params, low_temp_params, middle_temp, low_temp)

        # Save parameters
        save_parameters_to_csv(high_temp_params, f"/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/OptimizedParameters/{amino_acid}_params.csv")

        # Store optimized parameters for the amino acid
        optimized_parameters[amino_acid] = high_temp_params

        # Print progress
        print(f"Completed optimization for {amino_acid}")

    # Plot results
    plot_results(real_DL, optimized_parameters, amino_acid_colors)
    print("Racemization analysis and plotting completed.")

if __name__ == "__main__":
    main()


Older versions - for deugging only

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
sys.path.append('/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR')

def load_color_dictionary():
    color_dict_path = '/content/drive/MyDrive/Colab_Notebooks/Dictionaries/Colours/colors.json'
    with open(color_dict_path, 'r') as file:
        return json.load(file)

def setup_amino_acid_colors(color_dict):
    amino_acid_colors = color_dict["amino_acids_colors"]
    one_to_three_letter = {'A': 'Ala', 'R': 'Arg', 'N': 'Asn', 'D': 'Asp', 'C': 'Cys', 'E': 'Glu', 'Q': 'Gln',
                           'G': 'Gly', 'H': 'His', 'I': 'Ile', 'L': 'Leu', 'K': 'Lys', 'M': 'Met', 'F': 'Phe',
                           'P': 'Pro', 'S': 'Ser', 'T': 'Thr', 'W': 'Trp', 'Y': 'Tyr', 'V': 'Val'}

    amino_acid_colors_three_letter = {one_to_three_letter[k]: v for k, v in amino_acid_colors.items() if k in one_to_three_letter}
    three_letter_to_Conc = {k: f'[{k}]' for k in amino_acid_colors_three_letter.keys()}
    amino_acid_colors_conc = {three_letter_to_Conc[k]: v for k, v in amino_acid_colors_three_letter.items() if k in three_letter_to_Conc}

    return amino_acid_colors_conc

# Run the notebooks
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/data_processor.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/dehydration_analyser.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/racemization_simulator.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/parameter_optimiser.ipynb"
%run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/result_visualiser.ipynb"

def main():
    # Define file paths
    input_file = "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/ProcessedData/real_DL_output.csv"

    # Load and process data
    processor = DataProcessor()
    real_DL = processor.process_data(input_file)
    print("Data loaded from:", input_file)
    print("Shape of loaded data:", real_DL.shape)
    print("Columns in loaded data:", real_DL.columns)

    # Load color dictionary and set up amino acid colors
    color_dict = load_color_dictionary()
    amino_acid_colors = setup_amino_acid_colors(color_dict)

    # Initial simulation parameters
    simulation_params = {
        'N': 1,
        'fold_water': 8,
        'k_internal': 0.04,
        'k_terminal': 0.01,
        'k_loss': 0.001,
        'racemization_rate_polymer': 0.001,
        'racemization_rate_terminal': 0.002,
        'racemization_rate_free': 0.01,
        'user_defined_max_time': 6000,
        'slow_internal_hydrolysis_fraction': 0.5,
        'slow_internal_hydrolysis_rate': 0.01,
        'slow_racemization_rate_BAA_fraction': 0.5,
        'slow_racemization_rate_BAA_rate': 0.001,
        'slow_racemization_rate_FAA_fraction': 0.5,
        'slow_racemization_rate_FAA_rate': 0.001,
        'num_intervals': 100,
        'initial_int_ratio': 0.9,
        'initial_term_ratio': 0.09,
        'initial_free_ratio': 0.01
    }

    # Run dehydration analysis
    water_generation, water_input, dehydration_rate = run_dehydration_analysis(real_DL, amino_acid_colors, simulation_params['num_intervals'])
    print("Dehydration analysis completed.")

    # Optimize parameters for middle temperature
    temperatures = sorted(real_DL['temp (K)'].unique())
    middle_temp = temperatures[len(temperatures)//2]
    optimized_params = optimize_parameters(real_DL, middle_temp, simulation_params, water_input)

    # Run simulations for all temperatures
    all_results = {}
    for temp in temperatures:
        params = adjust_params_for_temperature(optimized_params, middle_temp, temp)
        simulation_results, dl_ratios = run_racemization_simulation(real_DL, params, water_input)
        all_results[temp] = {'simulation': simulation_results, 'dl_ratios': dl_ratios}

    # Plot results
    plot_results(real_DL, all_results, amino_acid_colors)
    print("Racemization analysis and plotting completed.")

if __name__ == "__main__":
    main()

Main with looping for each amino acid

In [None]:
import pandas as pd

# Assuming the following functions are defined in the parameter_optimiser and racemization_simulator notebooks:
# - optimize_parameters(data, temp, amino_acid)
# - simulate_racemization(parameters, data, temp, amino_acid)
# - calculate_activation_energies(mid_temp_params, low_temp_params)
# - calculate_high_temp_rates(mid_temp_params, slope, activation_params)
# - save_parameters(parameters, amino_acid, file_path)

def load_data(amino_acid):
    # Implement this function to load data specific to the amino acid
    return data

def fit_middle_temperature(data, amino_acid):
    mid_temp = get_middle_temperature(data)
    mid_temp_params = optimize_parameters(data, mid_temp, amino_acid)
    return mid_temp_params, mid_temp

def set_rates_for_low_temperature(mid_temp_params, activation_energies):
    low_temp = get_low_temperature()
    low_temp_params = calculate_activation_energies(mid_temp_params, activation_energies)
    return low_temp_params, low_temp

def optimize_low_temperature(data, low_temp_params, amino_acid):
    low_temp_params = optimize_parameters(data, low_temp_params, amino_acid)
    return low_temp_params

def calculate_high_temperature_rates(mid_temp_params, low_temp_params):
    slope = calculate_slope(mid_temp_params, low_temp_params)
    high_temp_params = calculate_high_temp_rates(mid_temp_params, slope, activation_energies)
    return high_temp_params

def save_parameters(params, amino_acid):
    file_path = f"optimized_parameters_{amino_acid}.csv"
    save_parameters_to_csv(params, file_path)

def process_amino_acid(amino_acid):
    data = load_data(amino_acid)
    mid_temp_params, mid_temp = fit_middle_temperature(data, amino_acid)
    activation_energies = calculate_activation_energies(mid_temp_params)
    low_temp_params = set_rates_for_low_temperature(mid_temp_params, activation_energies)
    low_temp_params = optimize_low_temperature(data, low_temp_params, amino_acid)
    high_temp_params = calculate_high_temperature_rates(mid_temp_params, low_temp_params)
    save_parameters(high_temp_params, amino_acid)
    return mid_temp_params

# Main loop to process each amino acid in order
amino_acids = ['Val', 'Phe', 'Ile', 'Glx', 'Asx', 'Ser', 'Ala']
mid_temp_params = {}

for amino_acid in amino_acids:
    if amino_acid == 'Ala':
        # Special handling for Ala
        ser_params = mid_temp_params['Ser']
        data = load_data('Ala')
        mid_temp_params['Ala'] = fit_middle_temperature(data, 'Ala', ser_params)
    else:
        mid_temp_params[amino_acid] = process_amino_acid(amino_acid)

# Functions to be defined for actual implementations:
def get_middle_temperature(data):
    # Logic to determine middle temperature from data
    pass

def get_low_temperature():
    # Logic to determine low temperature
    pass

def calculate_slope(mid_temp_params, low_temp_params):
    # Logic to calculate slope between middle and low temperature parameters
    pass

def save_parameters_to_csv(params, file_path):
    # Logic to save parameters to a CSV file
    pd.DataFrame(params).to_csv(file_path, index=False)


In [None]:
# # Mount Google Drive
# from google.colab import drive
# drive.mount('/content/drive')

# import pandas as pd
# import numpy as np
# import matplotlib.pyplot as plt
# import json
# import sys
# sys.path.append('/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR')

# def load_color_dictionary():
#     color_dict_path = '/content/drive/MyDrive/Colab_Notebooks/Dictionaries/Colours/colors.json'
#     with open(color_dict_path, 'r') as file:
#         return json.load(file)

# def setup_amino_acid_colors(color_dict):
#     amino_acid_colors = color_dict["amino_acids_colors"]
#     one_to_three_letter = {'A': 'Ala', 'R': 'Arg', 'N': 'Asn', 'D': 'Asp', 'C': 'Cys', 'E': 'Glu', 'Q': 'Gln',
#                            'G': 'Gly', 'H': 'His', 'I': 'Ile', 'L': 'Leu', 'K': 'Lys', 'M': 'Met', 'F': 'Phe',
#                            'P': 'Pro', 'S': 'Ser', 'T': 'Thr', 'W': 'Trp', 'Y': 'Tyr', 'V': 'Val'}

#     amino_acid_colors_three_letter = {one_to_three_letter[k]: v for k, v in amino_acid_colors.items() if k in one_to_three_letter}
#     three_letter_to_Conc = {k: f'[{k}]' for k in amino_acid_colors_three_letter.keys()}
#     amino_acid_colors_conc = {three_letter_to_Conc[k]: v for k, v in amino_acid_colors_three_letter.items() if k in three_letter_to_Conc}

#     return amino_acid_colors_conc

# # Run the notebooks
# %run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/data_processor.ipynb"
# %run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/dehydration_module.ipynb"
# %run "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/racemization_module.ipynb"

# def main():
#     # Define file paths
#     input_file = "/content/drive/MyDrive/Colab_Notebooks/MoDuLAAR/ProcessedData/real_DL_output.csv"
#     dehydration_output = input_file.replace('real_DL_output.csv', 'dehydration_output.csv')
#     racemization_output = input_file.replace('real_DL_output.csv', 'racemization_output.csv')

#     # Load the existing processed data
#     real_DL = pd.read_csv(input_file)
#     print("Data loaded from:", input_file)
#     print("Shape of loaded data:", real_DL.shape)
#     print("Columns in loaded data:", real_DL.columns)

#     # Load color dictionary and set up amino acid colors
#     color_dict = load_color_dictionary()
#     amino_acid_colors = setup_amino_acid_colors(color_dict) # Pass the color_dict to the function

#     # Run Dehydration Analysis
#     amino_acids = ['Asx', 'Glx', 'Ser', 'Ala', 'Val', 'Phe', 'Ile']
#     real_DL, water_generation = run_dehydration_analysis(input_file, dehydration_output, amino_acid_colors)
#     print("Dehydration analysis completed.")
#     print("Water generation data saved to:", dehydration_output)

#     # Run Racemization Analysis
#     real_DL, racemization_rates = run_racemization_analysis(input_file, racemization_output, amino_acid_colors) # Pass amino_acid_colors here
#     print("Racemization analysis completed.")
#     print("Racemization data saved to:", racemization_output)

# if __name__ == "__main__":
#     main()