## Import Statements

These import statements include the necessary libraries and modules required for the code.

**Libraries and Modules**

- `pandas`: A data manipulation library for Python, providing data structures and operations for working with numerical, textual, and categorical data.
- `plotly.express`: A high-level API for creating interactive visualizations with Plotly, simplifying the creation of common plot types like scatter plots, line plots, and bar charts.
- `plotly.graph_objects`: A lower-level API for creating interactive visualizations with Plotly, offering more fine-grained control over plot appearance and behavior.
- `plotly.subplots make_subplots`: A module for creating subplots with Plotly, providing a function for generating figures with multiple plots.
- `re`: The Python regular expressions library, used for pattern matching and searching.
- `plotly.io`: A module for reading and writing Plotly figures, including functions for saving and loading figures in various formats.
- `re`: A module for working with regular expressions, providing functions for matching patterns in text and extracting data from text.
- `ipywidgets`: A library for creating interactive widgets in Jupyter notebooks, offering various widget types like sliders, dropdown menus, and text boxes.
- `IPython.display`: A module for displaying objects in Jupyter notebooks, providing functions for displaying text, images, and visualizations.
    - `display`: A function for displaying objects in Jupyter notebooks, including text, images, and visualizations.
    - `clear_out`: A function for clearing the output area of a Jupyter notebook cell.

**Installation**
Python package installer "pip" (https://pypi.org/project/pip/) is used to install libraries:
- pip install pandas numpy plotly IPython ipywidgets
- re is already included in Python built-in libraries

**Versions**

The versions of the libraries used in this project are:

- Python: `{sys.version}`
- pandas: `{pd.__version__}`
- plotly: `{plotly.__version__}`
- re: `{re.__version__}`
- ipywidgets: `{ipywidgets.__version__}`
- IPython: `{IPython.__version__}`

You can check these versions in your environment by running the following code:

print(f"Python: {sys.version}")

print(f"pandas: {pd.__version__}")

print(f"plotly: {plotly.__version__}")

print(f"numpy: {np.__version__}")

print(f"re: {re.__version__}")

print(f"ipywidgets: {ipywidgets.__version__}")

print(f"IPython: {IPython.__version__}")

**Usage**

These imports are essential for various aspects of the code, such as data handling, visualization, web application development, and regular expression operations.

```python
# Import these libraries and modules to access their functionality in your code.


In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
import ipywidgets as widgets
import plotly.io as pio
import IPython
import plotly
import sys
import re
import os
import xlrd

# graphs template
pio.templates.default = "ggplot2+presentation"

# Disable pandas chained assignment warning
pd.options.mode.chained_assignment = None  # default='warn'

In [2]:
print("Versions:")
print(f"Python: {sys.version}")
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
print(f"plotly: {plotly.__version__}")
print(f"re: {re.__version__}")
print(f"ipywidgets: {widgets.__version__}")
print(f"IPython: {IPython.__version__}")

Versions:
Python: 3.11.0 (v3.11.0:deaf509e8f, Oct 24 2022, 14:43:23) [Clang 13.0.0 (clang-1300.0.29.30)]
pandas: 1.5.1
numpy: 1.26.3
plotly: 5.13.1
re: 2.2.1
ipywidgets: 7.8.1
IPython: 8.6.0


## READ/LOAD DATA

This function opens an Excel file and reads two specific sheets: `clumped_export.wke` and `clumped_all_cycles_extra_workin`.

**Parameters:**

* `file`: The path to the Excel file.

**Returns:**

* A tuple containing two Pandas DataFrames:
    * `df_std`: The DataFrame containing the clumped standard data.
    * `df_intensity`: The DataFrame containing the clumped intensity data.

**Example:**

```python
file = "./data/RunXXXX.xls"

df_std, df_intensity = open_excel_file(file)

# Print the shape of the DataFrames
print(df_std.shape)
print(df_intensity.shape)

In [3]:
file = "./data/Run1030.xls"
def open_excel_file(file):
    try:
        workbook = xlrd.open_workbook(file, logfile=open(os.devnull, "w"))
        # print(workbook.sheet_names())
        df_std = pd.read_excel(workbook, sheet_name=workbook.sheet_names()[0])
        df_intensity = pd.read_excel(workbook, sheet_name=workbook.sheet_names()[1])
    except FileNotFoundError:
        raise FileNotFoundError("File not found!")
    
    return df_std, df_intensity

In [4]:
df_std, df_intensity = open_excel_file(file)
df_std.head()

Unnamed: 0,Date,Time,Time Code,Row,Line,Sample,Weight (mg),Analysis,Identifier 1,Identifier 2,...,d 49CO2/44CO2 ST Error,d 13C/12C Mean,d 13C/12C Std Dev,d 13C/12C ST Error,d 18O/16O Mean,d 18O/16O Std Dev,d 18O/16O ST Error,Background,Pressadjust,Information
0,01/28/24,15:08:57,2024/01/28 15:08:57,1,1,2,60,28843,CHALK_2,standard_refill,...,2.3,0.56,0.047,0.007,39.154,0.09,0.014,0,1,Acid: 69.9 (°C);LeakRate: 39; 0 x drops;P no...
1,01/28/24,15:45:42,2024/01/28 15:45:42,2,2,2,60,28844,ISOB_2,standard,...,2.638,-11.124,0.009,0.001,21.88,0.017,0.003,0,1,Acid: 69.8 (°C); LeakRate: 54; 0 x drops; ...
2,01/28/24,16:23:26,2024/01/28 16:23:26,3,1,3,58,28845,CHALK,standard,...,2.061,0.753,0.01,0.002,39.546,0.015,0.002,0,1,Acid: 70.0 (°C); LeakRate: 78; 0 x drops; ...
3,01/28/24,17:01:40,2024/01/28 17:01:40,4,2,3,60,28846,ISO A,standard,...,1.915,1.052,0.008,0.001,39.056,0.017,0.003,0,1,Acid: 69.9 (°C); LeakRate: 54; 0 x drops; ...
4,01/28/24,17:40:28,2024/01/28 17:40:28,5,1,4,59,28847,CHALK,standard,...,1.506,0.813,0.009,0.001,39.459,0.014,0.002,0,1,Acid: 70.0 (°C); LeakRate: 107; 0 x drops; ...


## STANDARD ISOTOPE DATA ANALYSIS

### Data Cleaning and Preparation
**The code performs several data cleaning and preparation steps**

Filtering: It filters the df_std and df_intensity DataFrames to only include rows where the `Identifier 2` column is either `standard` or `standard_refill`.

**Dropping Columns** It drops the columns `Information`, `Time`, and `Date` from the df_std_cp dataFrame as they are not used in the subsequent analysis.

**Converting Data Types** It converts the 'Time Code' column in the df_std_cp dataFrame to a datetime format using the `pd.to_datetime()` function.

**Converting Text Columns to String** It converts all columns in the df_kiel_par dataFrame that have text values (using the `select_dtypes()` function) to string type using the `astype()` method.

**Identifying Numeric Columns** It identifies the numeric columns in the df_std_cp dataFrame using the `select_dtypes()` function and assigns them to the numeric_columns variable.

These data cleaning and preparation steps ensure that the data is in a consistent and usable format for subsequent analysis.

In [5]:
def data_type_conversion(df_std: pd.DataFrame, df_intensity: pd.DataFrame):
    # Filter both DataFrames to include only 'standard' and 'standard_refill' identifiers
    df_std_cp = df_std[df_std['Identifier 2'].isin(['standard', 'standard_refill'])] 
    df_intensity_cp = df_intensity[df_intensity['Identifier 2'].isin(['standard', 'standard_refill'])] 

    # Drop unnecessary columns
    df_std_cp = df_std_cp.drop(columns=["Time", "Date"])
    df_intensity_cp = df_intensity_cp.drop(columns=["Time", "Date"])

    # Rename column Weight (mg) (mg) to Weight
    df_std_cp = df_std_cp.rename(columns={"Weight (mg)": "Weight"})

    # Convert 'Time Code' to datetime format
    df_std_cp["Time Code"] = pd.to_datetime(df_std_cp["Time Code"])
    df_intensity_cp["Time Code"] = pd.to_datetime(df_intensity_cp["Time Code"])

    # Identify and convert columns containing text values to strings
    text_columns_std = df_std_cp.select_dtypes(include=["object"]).columns
    text_columns_intensity = df_intensity_cp.select_dtypes(include=["object"]).columns

    df_std_cp[text_columns_std] = df_std_cp[text_columns_std].astype("string")
    df_intensity_cp[text_columns_intensity] = df_intensity_cp[text_columns_intensity].astype("string")
    # numeric columns
    # numeric_columns = df_std_cp.select_dtypes(include=["float64", "Int64"]).columns

    return df_std_cp, df_intensity_cp

#### Extract Kiel Data from DataFrame Information column

This code snippet defines a function to extract specific information from a DataFrame column using regular expressions and create a new DataFrame. The extracted data is stored in `df_kiel_par`.

**Code Summary**

- `info_keys`: List of keys representing the information to be extracted.
- `key_re_dict`: Dictionary containing regular expressions for each key.
- `get_kile_data(df)`: A function that takes a DataFrame, info_keys, and key_re_dict as parameters and returns a new DataFrame with the extracted information.
- `df_kiel_par`: The DataFrame where all columns are converted to numeric data types.

**Usage**

You can use this code to extract structured data from a DataFrame, such as log files or reports, based on predefined regular expressions.

```python
df_kiel_par = get_kiel_data(df_std)


In [6]:
def get_kiel_data(df: pd.DataFrame):
    """
    Extract specific information from a DataFrame column using regular expressions and create a new DataFrame.

    Parameters:
    df (pandas.DataFrame): The DataFrame containing the data.
    info_keys (list): A list of keys for the information to be extracted.
    key_re_dict (dict): A dictionary of regular expressions for each key.

    Returns:
    pandas.DataFrame: A new DataFrame with the extracted information.
    """

    col_rename_dict = {
      "Acid": "acid_temperature",
      "LeakRate": "leakrate",
      "P no Acid": "p_no_acid",
      "P gases": "p_gases",
      "RefRe": "reference_refill",
      "Total CO2": "total_CO2",
      "VM1 aftr Trfr.": "vm1_after_transfer",
      "Init int": "initial_intensity",
      "Bellow Pos": "bellow_position",
      "RefI": "reference_intensity",
      "RefPos": "reference_bellow_position"
    }

    info_keys = [
        "Acid", "LeakRate", "P no Acid", "P gases", "RefRe",
        "Total CO2", "VM1 aftr Trfr.", "Init int", "Bellow Pos", "RefI", "RefPos"
    ]
    key_re_dict = {
        "Acid": "\s?:\s+([\d.]+)",
        "LeakRate": "\s?:\s+([\d.]+)",
        "P no Acid": "\s?:\s+([\d.]+)",
        "P gases": "\s?:\s+([\d.]+)",
        "Total CO2": "\s?:\s+([\d.]+)",
        "Init int": "\s?:\s+([\d.]+)",
        "VM1 aftr Trfr.": "\s?:\s+([\d.]+)",
        "Bellow Pos": "\s?:\s+([\d.]+)",
        "RefRe": "\s?:\s+R\s+mBar\s+([\d.]+)",
        "RefI": "\s?:\s+mBar\s+r\s+([\d.]+)\s+pos\s+r\s+[\d.]+"
    }

    extracted_values = {key: [] for key in info_keys}
    # List for Kiel Line and Time Code
    time_codes = []  # list for storing Time Code values

    # Iterate over each row in the DataFrame
    for _, row in df.iterrows():
        row_data = {}
        for key, value in key_re_dict.items():
            pattern = f'{key}{value}'
            match = re.search(pattern, row["Information"])
            if match:
                if key != "RefI":
                    row_data[key] = match.group(1)
                else:
                    tmp = match[0].split(" ")
                    row_data["RefI"] = tmp[3]
                    row_data["RefPos"] = tmp[7]
            else:
                row_data[key] = None

        for key in info_keys:
            extracted_values[key].append(row_data.get(key, None))

        # Append Time Code for each row
        time_codes.append(row["Time Code"])

    df_kiel_par_tmp = pd.DataFrame(extracted_values)
    # Add 'Time Code' & 'Line' columns from df
    df_kiel_par_tmp["time"] = time_codes

    # Convert all columns of DataFrame to numeric, except for Time Code
    numeric_cols = df_kiel_par_tmp.columns.drop('time')
    df_kiel_par_tmp[numeric_cols] = df_kiel_par_tmp[numeric_cols].apply(
        lambda col: pd.to_numeric(col, errors='coerce'))

    # rename cols
    df_kiel_par_tmp.rename(columns=col_rename_dict, inplace=True)

    # Joing df_kiel_par and df
    # Join df_kiel_par and df_std_cp DataFrames ingoring index
    df_kiel_par = df.join(df_kiel_par_tmp.set_index('time'), on='Time Code', how='inner')
    
    # Drop Background column from df_kiel_par
    df_kiel_par.drop(columns=["Background"], inplace=True)

    # Set 'Time Code' as index
    df_kiel_par.set_index('Time Code', inplace=True)

    return df_kiel_par

In [7]:
df_std_cp, df_intensity_cp = data_type_conversion(df_std, df_intensity)
df_kiel_par = get_kiel_data(df_std_cp)

In [8]:
df_kiel_par.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 17 entries, 2024-01-28 15:08:57 to 2024-01-29 07:18:17
Data columns (total 45 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Row                        17 non-null     int64  
 1   Line                       17 non-null     int64  
 2   Sample                     17 non-null     int64  
 3   Weight                     17 non-null     int64  
 4   Analysis                   17 non-null     int64  
 5   Identifier 1               17 non-null     string 
 6   Identifier 2               17 non-null     string 
 7   1  Cycle Int  Samp  44     17 non-null     float64
 8   1  Cycle Int  Ref  44      17 non-null     float64
 9   1  Cycle Int  Samp  45     17 non-null     float64
 10  1  Cycle Int  Ref  45      17 non-null     float64
 11  d 45CO2/44CO2  Mean        17 non-null     float64
 12  d 45CO2/44CO2  Std Dev     17 non-null     float64
 13  d 45CO2/44CO2 

### Standards Marker and Colors

Markers and colors are almost identical to the onues used in EASOTOPE.

In [9]:
# Function for Identifier coloring
def standard_marker_color(identifier):
    # Define colors, markers, and identifiers
    colors = ['green', 'violet', 'violet', 
              'blue', 'red', 'lightblue',
              'orange', 'green', 'green',
              'red', 'red', 'blue',
              'blue', 'lightblue', 'orange', "green",
              'green', 'violet', 'violet', 'orange',
              'violet', 'green', 'red',
              'blue', 'green'
              ]

    markers = ['triangle-up', 'circle', 'triangle-down',
           'circle', 'triangle-up', 'circle',
           'circle', 'circle', 'triangle-down',
           'square', 'triangle-down', 'square',
           'triangle-down', 'circle', 'triangle-up', 'square',
           'triangle-up', 'triangle-up', 'square', 'cross',
           'circle-open', 'triangle-down-open', 'square-open',
           'square-open', 'triangle-up-open'
           ]
    
    identifiers = ['Carrara', 'CHALK', 'CHALK_new aliqu', 
                   'Equ Gas 25C', 'Fast Haga', 'Heated gas', 
                   'IAEA C1', 'IAEA C2', 'IAEA C2_new ali', 
                   'ISO A', 'Isolab A_new al', 'ISO B', 
                   'ISO B_new aliq', 'Merck', 'NBS18', 'NBS19', 
                   'Riedel', 'Speleo 2-8E', 'Speleo 9-25G', 'UN_CM12', 
                   'CHALK_2', '_IAEA C2_2', '_Isolab A 2', 
                   'ISOB_2', '_Riedel 2'
                   ] 

    # Print the identifier that the function is trying to find
    # print(f"Looking for identifier: {identifier}")

    # Create dictionaries that map each identifier to a color and a marker
    color_dict = {identifier: colors[i % len(colors)] for i, identifier in enumerate(identifiers)}
    marker_dict = {identifier: markers[i % len(markers)] for i, identifier in enumerate(identifiers)}

    # Return the color and marker corresponding to the given identifier
    # Return 'black' and 'circle' if the identifier is not found
    return color_dict.get(identifier, 'black'), marker_dict.get(identifier, 'circle')

In [10]:
color, marker = standard_marker_color('CHALK')
print(f"Color: {color}")
print(f"Marker: {marker}")

Color: violet
Marker: circle


### Create plots for Kiel parameters

This code snippet creates subplots for each column in a DataFrame (`df_kiel_par`) and sets titles for each subplot. The resulting subplots are arranged in a 3x4 grid.

**Code Summary**

- `subplot_titles`: A list of titles for the subplots, one for each column in `df_kiel_par`.
- `make_subplots`: Creates a subplot grid with a 3x4 layout, using `subplot_titles` for subplot titles.
- Iteration through the columns in `df_kiel_par`, creating separate scatter plots for each column with Plotly (`px.scatter`).
- Adding each subplot to the specified position in the subplot grid using `fig.add_trace`.
- Saving the resulting subplot figure to an HTML file named "kiel_par_plots.html."
- Displaying the subplot using `fig.show()`.

**Usage**

This code is used to visualize and compare the data in different columns of a DataFrame in separate subplots. The titles for each subplot make it easier to identify the content of each plot.

```python
# Execute the code to create and display the subplots


In [11]:
# Set subplot titles
cols = ['acid_temperature', 'leakrate', 'p_no_acid', 'p_gases',
       'reference_refill', 'total_CO2', 'vm1_after_transfer',
       'initial_intensity', 'reference_intensity',
       'reference_bellow_position']

# Custom dictionary of text per def_kiel_par column
kiel_par_dict = {
        "acid_temperature": "Acid temperature [°C]",
        "leakrate": "Leak Rate [mbar/min]",
        "p_no_acid": "P no Acid [mbar]",
        "p_gases": "P gases [mbar]",
        "total_CO2": "Total CO2 [mbar]",
        "vm1_after_transfer": "VM1 aftr CO2 Transfer. [mbar]",
        "initial_intensity": "Initial Intensity [mV]",
        # "bellow_position": "Bellow Compression [%]",
        "reference_refill": "Reference Refill [mbar]",
        "reference_intensity": "Ref Bellow Pressure [mbar]",
        "reference_bellow_position": "Ref Bellow Compression [%]",
        }

# define subplot height and length 
length = len(cols)
subplot_height = 300

# Create a subplot with a 4x3 grid
fig = make_subplots(rows=length//2 + length%2, 
                    cols=2, 
                    subplot_titles=[kiel_par_dict[col] for col in cols],
                    vertical_spacing=0.1,)

# Set the height of the figure
fig.update_layout(height=(length//2 + length%2)*subplot_height, 
                  showlegend=False,
                  autosize=True)

# Get unique identifiers in the filtered data
identifiers = df_kiel_par["Identifier 1"].unique()

# Iterate through the columns in df_kiel_par and create separate plots
for i, column in enumerate(cols):
    # Iterate through the identifiers and create a scatter plot for each
    for identifier in identifiers:
        # print(f"Line: {line}, Column: {column}, Identifier: {identifier}")
        identifier_data = df_kiel_par[df_kiel_par["Identifier 1"] == identifier]

        # Get the color and marker for the identifier
        color, marker = standard_marker_color(identifier)

        # print(f"Identifier: {identifier}, Color: {color}, Marker: {marker}")

        # Create a scatter plot trace with custom colors and hover template
        fig.add_trace(
                go.Scatter(x=identifier_data.index, 
                            y=identifier_data[column], 
                            mode='markers', 
                            name=f"{identifier}",  # Set the name of the data series,
                            marker=dict(color=color, symbol=marker),  # Use the color and marker corresponding to the identifier
                            # marker=dict(color='green', symbol='circle'),
                            customdata=identifier_data[["Identifier 1", "Line"]].assign(TimeCode=identifier_data.index).values,  # Add Time Code to customdata
                            hovertemplate=
                                f"<b>{column}</b>: %{{y:.1f}}<br>" +
                                "<b>Line</b>: %{customdata[1]}<br>" +
                                "<b>Datetime</b>: %{customdata[2]}<br>" +
                                "<b>Standard</b>: %{customdata[0]}<br>" +
                                "<extra></extra>"
                    ),
            row = i//2 + 1,
            col = i%2 + 1
        )

# for i, column in enumerate(cols):
#     fig.update_yaxes(title_text=kiel_par_dict[column], row=i//2 + 1, col=i%2 + 1)

# Show the subplot
fig.show()

### Creating Multiple Scatter Plots
The code creates multiple scatter plots to visualize the relationship between `Weight (mg)` and various numeric columns, categorized by `Kiel acid line number`. It follows these steps:

**Define Subplot Height** It defines the height of each subplot using the `subplot_height` variable.

**Calculate Subplot Rows and Columns** It determines the number of rows and columns needed to accommodate the subplots based on the length of the `numeric_columns[5:30]` list, which represents the columns to be visualized.

**Create Subplot Grid** It creates a subplot grid using the `make_subplots()` function, specifying the calculated rows, columns, vertical spacing, and horizontal spacing between subplots.

**Adjust Figure Height** It sets the height of the entire figure using the `update_layout()` function, ensuring the figure is tall enough to accommodate all subplots.

**Identify Unique Line Types** It extracts the unique values in the Kiel acid "Line" column using the `unique()` function, indicating Kiel acid line number for data series.

**Define Line Colors** It defines a list of colors to represent different line number using the colors variable.

**Create Scatter Plots** It iterates through each numeric column (column) and each line type (line):
a. It creates a scatter plot using the go.Scatter() function, specifying:
i. x: Weight (mg) values for the selected line type (line).
ii. y: Corresponding values from the selected numeric column (column) for the selected line type (line).
iii. mode: 'markers' to display data points as markers.
iv. name: A descriptive name for the data series, including the line type (line).
v. marker: A dictionary specifying the marker color based on the line type (line) using the colors

In [12]:
df_kiel_par.columns

Index(['Row', 'Line', 'Sample', 'Weight', 'Analysis', 'Identifier 1',
       'Identifier 2', '1  Cycle Int  Samp  44', '1  Cycle Int  Ref  44',
       '1  Cycle Int  Samp  45', '1  Cycle Int  Ref  45',
       'd 45CO2/44CO2  Mean', 'd 45CO2/44CO2  Std Dev',
       'd 45CO2/44CO2  ST  Error', 'd 46CO2/44CO2  Mean',
       'd 46CO2/44CO2  Std Dev', 'd 46CO2/44CO2  ST  Error',
       'd 47CO2/44CO2  Mean', 'd 47CO2/44CO2  Std Dev',
       'd 47CO2/44CO2  ST  Error', 'd 48CO2/44CO2  Mean',
       'd 48CO2/44CO2  Std Dev', 'd 48CO2/44CO2  ST  Error',
       'd 49CO2/44CO2  Mean', 'd 49CO2/44CO2  Std Dev',
       'd 49CO2/44CO2  ST  Error', 'd 13C/12C  Mean', 'd 13C/12C  Std Dev',
       'd 13C/12C  ST  Error', 'd 18O/16O  Mean', 'd 18O/16O  Std Dev',
       'd 18O/16O  ST  Error', 'Pressadjust', 'Information',
       'acid_temperature', 'leakrate', 'p_no_acid', 'p_gases',
       'reference_refill', 'total_CO2', 'vm1_after_transfer',
       'initial_intensity', 'bellow_position', 'reference_

In [13]:
# Numeric columns to plot
numeric_columns = ['1  Cycle Int  Samp  44', '1  Cycle Int  Ref  44', 
                   '1  Cycle Int  Samp  45', '1  Cycle Int  Ref  45', 
                   'd 45CO2/44CO2  Std Dev', 'd 46CO2/44CO2  Std Dev', 
                   'd 47CO2/44CO2  Std Dev', 'd 48CO2/44CO2  Std Dev', 
                   'd 49CO2/44CO2  Std Dev', 'd 13C/12C  Std Dev', 
                   'd 18O/16O  Std Dev']

# print(numeric_columns)

# Define the height of each subplot (in pixels)
subplot_height = 300
length = len(numeric_columns)

# Create a subplot for each column in the DataFrame
fig1 = make_subplots(rows=length//2 + length%2, cols=2, vertical_spacing=0.08, horizontal_spacing=0.15)

# Set the height of the figure
fig1.update_layout(height=(length//2 + length%2)*subplot_height, showlegend=False)

# Get unique identifiers in the filtered data
identifiers = df_kiel_par["Identifier 1"].unique()

# Iterate through each column and create a scatter plot
for i, column in enumerate(numeric_columns):
    for identifier in identifiers:
        identifier_data = df_kiel_par[df_kiel_par["Identifier 1"] == identifier]

        # Get the color and marker for the identifier
        color, marker = standard_marker_color(identifier)

        fig1.add_trace(
            go.Scatter(x=identifier_data.index, 
                    y=identifier_data[column], 
                    mode='markers', 
                    name=f"{identifier}",  # Set the name of the data series,
                    marker=dict(color=color, symbol=marker),  # Use the color and marker corresponding to the identifier
                    customdata=identifier_data[["Identifier 1", "Line", "Weight"]].values,  # Add Time Code & Identifier to customdata
                    hovertemplate=
                        f"<b>{column}</b>: %{{y:.1f}}<br>" +
                        "<b>Standard</b>: %{customdata[0]}<br>" +
                        "<b>Line</b>: %{customdata[1]}<br>" +
                        "<b>Weight (mg)</b>: %{customdata[2]}<br>" +
                        "<extra></extra>"              
                    ),
            row = i//2 + 1,
            col = i%2 + 1
        )

    # Set the title of the y-axis for each subplot
    fig1.update_yaxes(title_text=column, row=i//2 + 1, col=i%2 + 1)
    fig1.update_xaxes(title_text="", row=i//2 + 1, col=i%2 + 1)

# Save the figure to an HTML file
# fig1.write_html("multiple_scatter_plots.html")

# Show the figure
fig1.show()

### Interactive Scatter Plot with Dropdown Menus
The code creates an interactive scatter plot with dropdown menus to dynamically select the x-axis, y-axis, and z-axis variables. It follows these steps:

**Create Dropdowns** It creates three dropdowns using the widgets.Dropdown() function:
a. x_dropdown: For selecting the x-axis variable.
b. y_dropdown: For selecting the y-axis variable.
c. z_dropdown: For selecting the z-axis variable.

**Set Dropdown Options** It sets the dropdown options for each dropdown using the options parameter, providing a list of numeric columns (numeric_columns) for selection.

**Set Initial Dropdown Values** It sets the initial selected values for each dropdown using the value parameter, selecting the first elements of numeric_columns as the default choices.

**Create Plot Output Widget** It creates an output widget using the widgets.Output() function to display the interactive plot.

**Define Update Plot Function** It defines a function update_plot(x, y, z) that dynamically updates the plot based on the selected dropdown values:
a. It clears the existing plot output using plot_output.clear_output().
b. It creates a go.Figure object with a go.Scatter plot:
i. x: Data from the selected x-axis variable (x).
ii. y: Data from the selected y-axis variable (y).
iii. mode: 'markers' to display data points as markers.
iv. marker_color: Sets the marker color based on the selected z-axis variable (z).
v. text: Adds the z-axis values to the marker text (text).
vi. hovertemplate: Defines the hover template for each data point, including x, y, and z values.
c. It updates the plot layout with a title, x-axis title, and y-axis title based on the selected variables.
d. It displays the updated plot using fig.show().

**Create Update Plot Button** It creates a button using the widgets.Button() function with the description "Update Plot".

**Connect Button to Update Function** It connects the button to the update_plot() function using the on_click() method. Whenever the button is clicked, it triggers the update function, passing the current values of the x_dropdown, y_dropdown, and z_dropdown to update the plot accordingly.

**Display Widgets** It displays the dropdowns, button, and plot output widget using the display() function, arranging them in a suitable layout for user interaction.

This interactive scatter plot allows users to dynamically explore the relationship between different variables in the dataset, providing a visual representation of the data and enabling interactive data exploration.

In [14]:
df_kiel_par.columns

Index(['Row', 'Line', 'Sample', 'Weight', 'Analysis', 'Identifier 1',
       'Identifier 2', '1  Cycle Int  Samp  44', '1  Cycle Int  Ref  44',
       '1  Cycle Int  Samp  45', '1  Cycle Int  Ref  45',
       'd 45CO2/44CO2  Mean', 'd 45CO2/44CO2  Std Dev',
       'd 45CO2/44CO2  ST  Error', 'd 46CO2/44CO2  Mean',
       'd 46CO2/44CO2  Std Dev', 'd 46CO2/44CO2  ST  Error',
       'd 47CO2/44CO2  Mean', 'd 47CO2/44CO2  Std Dev',
       'd 47CO2/44CO2  ST  Error', 'd 48CO2/44CO2  Mean',
       'd 48CO2/44CO2  Std Dev', 'd 48CO2/44CO2  ST  Error',
       'd 49CO2/44CO2  Mean', 'd 49CO2/44CO2  Std Dev',
       'd 49CO2/44CO2  ST  Error', 'd 13C/12C  Mean', 'd 13C/12C  Std Dev',
       'd 13C/12C  ST  Error', 'd 18O/16O  Mean', 'd 18O/16O  Std Dev',
       'd 18O/16O  ST  Error', 'Pressadjust', 'Information',
       'acid_temperature', 'leakrate', 'p_no_acid', 'p_gases',
       'reference_refill', 'total_CO2', 'vm1_after_transfer',
       'initial_intensity', 'bellow_position', 'reference_

In [15]:
# Create a dropdown for the x-axis
x_dropdown = widgets.Dropdown(
    options=df_kiel_par.columns,
    value=df_kiel_par.columns[3],
    description='X-axis:',
)

# Create a dropdown for the y-axis
y_dropdown = widgets.Dropdown(
    options=df_kiel_par.columns,
    value=df_kiel_par.columns[7],
    description='Y-axis:',
)

# Create a dropdown for the z-axis
z_columns = ['Row', 'Sample', 'Line']
z_dropdown = widgets.Dropdown(
    options=z_columns,
    value=z_columns[2],
    description='Z-axis:',
)

# create an output widget for the plot
plot_output = widgets.Output()

# Function to update the plot
def update_plot(x, y, z):
    # plot_output.clear_output(wait=True) # Clear the existing plot output
    with plot_output:
        fig = go.Figure(data=go.Scatter(x=df_kiel_par[x], 
                                        y=df_kiel_par[y], 
                                        mode='markers',
                                        marker_color=df_kiel_par[z],  # Set marker color to z variable
                                        text=df_kiel_par[z],  # Add z values to text
                                        hovertemplate=
                                            f"<b>{x}</b>: %{{x:.1f}}<br>" +
                                            f"<b>{y}</b>: %{{y:.1f}}<br>" +
                                            f"<b>{z}</b>: %{{text}}<br>"
                            )
                        )
        fig.update_layout(title=f"{y} vs {x} colored by {z}", xaxis_title=x, yaxis_title=y)
        fig.show()

# Create a button that will update the plot when clicked
button = widgets.Button(description="Update Plot")
button.on_click(lambda x: update_plot(x_dropdown.value, y_dropdown.value, z_dropdown.value))

# Display the widgets
display(x_dropdown, y_dropdown, z_dropdown, button, plot_output)

Dropdown(description='X-axis:', index=3, options=('Row', 'Line', 'Sample', 'Weight', 'Analysis', 'Identifier 1…

Dropdown(description='Y-axis:', index=7, options=('Row', 'Line', 'Sample', 'Weight', 'Analysis', 'Identifier 1…

Dropdown(description='Z-axis:', index=2, options=('Row', 'Sample', 'Line'), value='Line')

Button(description='Update Plot', style=ButtonStyle())

Output()

### Delta Values Scatter Plots with Error Bars
The code generates multiple scatter plots with error bars to visualize the relationship between weight (mg) and various numeric columns, categorized by line type. It also displays the mean and standard deviation for each numeric column.

**Identify Error and Mean Columns** Identify the columns containing standard deviation values (error_columns) and mean values (mean_columns).

**Create Error Column Dictionary** Create a dictionary (error_dict) that maps each mean column to its corresponding error column.

**Define Subplot Height** Set the height of each subplot using the subplot_height variable.

**Calculate Subplot Layout** Determine the number of rows and columns needed for subplots based on the length of mean_columns.

**Create Subplot Grid** Generate a subplot grid using make_subplots() with the calculated rows, columns, vertical spacing, and horizontal spacing.

**Adjust Figure Height** Set the overall figure height using update_layout() to accommodate all subplots.

**Identify Kiel Acid Line Types** Determine the unique values in the "Line" column

In [16]:
# # Get the list of columns that contain the standard deviation and mean values
# error_columns = [column for column in df_std_cp.columns if 'Std Dev' in column]
# mean_columns = [column for column in df_std_cp.columns if 'Mean' in column]

# # Create a dictionary of error columns, keyed by the mean column name
# error_dict = {}
# for mean_column, error_column in zip(mean_columns, error_columns):
#     error_dict[mean_column] = error_column

# # Define the height of each subplot (in pixels)
# subplot_height = 250
# length = len(mean_columns)

# # Create a subplot for each column in the DataFrame
# fig2 = make_subplots(rows=length // 2 + length % 2, cols=2, vertical_spacing=0.1, horizontal_spacing=0.1)

# # Set the height of the figure
# fig2.update_layout(height=(length // 2 + length % 2) * subplot_height, showlegend=False)

# # Get unique values in the "Line" column
# lines = df_std_cp["Line"].unique()

# # Define colors for each line
# colors = ['blue', 'red']  # Add more colors if there are more than two lines

# # Set opacity for mean points
# mean_opacity = 0.5  # Adjust as needed

# # Iterate through each mean column and create a scatter plot
# for i, mean_column in enumerate(mean_columns):
#     error_column = error_dict[mean_column]
#     for j, line in enumerate(lines):
#         data = df_std_cp[df_std_cp["Line"] == line]
#         mean = data[mean_column]
#         error = data[error_column]
#         hovertext = [
#             f"<b>{mean_column}</b>: {m:.1f} ± {e:.5f}<br>" +
#             "<b>Weight (mg)</b>: {:.1f}<br>".format(w) +
#             "<b>Line</b>: " + str(line) + "<br>"
#             for m, e, w in zip(mean, error, data["Weight"])
#         ]

#         fig2.add_trace(
#             go.Scatter(
#                 x=data["Weight"],
#                 y=mean,
#                 mode='markers',
#                 name=f"Line {line}",
#                 marker=dict(color=colors[j % len(colors)], opacity=mean_opacity),
#                 hoverinfo="text",  # Use custom hover text
#                 text=hovertext,
#                 error_y=dict(
#                     type='data',
#                     array=error,
#                     visible=True
#                 )
#             ),
#             row=i // 2 + 1,
#             col=i % 2 + 1
#         )

#     # Set the title of the y-axis for each subplot
#     fig2.update_yaxes(title_text=mean_column, row=i // 2 + 1, col=i % 2 + 1)
#     fig2.update_xaxes(title_text="Weight (mg)", row=i // 2 + 1, col=i % 2 + 1)

# # Show the figure
# fig2.show()

### Delta Values Scatter Plots per Standard vs Time
This code generates scatter plots with markers for each group_column and displays custom information on hover. It utilizes the plotly.graph_objects library to create and manipulate the figures.

**Data Preparation**

`Define Group Columns`: The group_columns list specifies the columns to be plotted on the y-axis.

`Filter Data`: For each group column, the code filters the data for the corresponding identifier1 group and group column.

**Scatter Plot Creation**

`Create Figure`: A go.Figure object is created for each group column to hold the corresponding scatter plot.

`Generate Traces`: For each identifier1 group within the current group column, a scatter plot trace is generated using go.Scatter.

`Data`: The x and y values are set from the filtered data's "Time Code" and group column, respectively.
`Appearance`: The name is set to the identifier1 key, and the mode is set to "markers" to remove lines.
`Customdata`: A list of tuples is created containing the identifier1 key, weight, and line number. This customdata provides additional information on hover.
`Hovertemplate`: A hovertemplate is defined to display the customdata information (identifier1 key, weight, and line number) along with the y-axis value.
`Add Traces to Figure`: Each scatter plot trace is added to the corresponding figure using fig.add_trace.

**Figure Layout and Display**

`Update Layout`: The layout of each figure is updated using fig.update_layout.

`Title`: The title is set to the current group column name.
`Axis Titles`: The x-axis title is set to "", and the y-axis title is set to the group column name.
`Legend`: The legend is displayed.
`Store Figures`: The generated figures are stored in a dictionary figures for easy access.

`Display Figures`: Each figure in the figures dictionary is displayed using fig.show().

**Hover Information**

The hovertemplate provides additional information when hovering over a data point. It displays the identifier1 key (represented as "Init. intensity (mV)"), weight, and line number.

**Overall Structure**

The code iterates through each group column, generates scatter plots for each identifier1 group within that column, and displays the corresponding figures. It effectively visualizes the data for each group column while providing additional information on hover.

In [17]:
# Define the group columns to plot
group_columns = ['Weight',
       '1  Cycle Int  Samp  44', 
       'd 45CO2/44CO2  Std Dev',
       'd 46CO2/44CO2  Std Dev',
       'd 47CO2/44CO2  Std Dev',
       'd 48CO2/44CO2  Std Dev',
       'd 49CO2/44CO2  Std Dev', 
       'd 13C/12C  Std Dev', 
       'd 18O/16O  Std Dev']

# Create a dictionary to store figures
figures = {}

# Iterate through each group column
for group_column in group_columns:
    # Create a figure for the current group column
    fig = go.Figure()

    # Iterate through each identifier1 group
    for identifier1_key in df_std_cp["Identifier 1"].unique():
        # Filter data for the current group_column and identifier1_key
        filtered_data = df_std_cp[(df_std_cp["Identifier 1"] == identifier1_key)]

        # Create a scatter plot trace with markers and customdata
        customdata = list(zip(filtered_data["Weight"], filtered_data["1  Cycle Int  Samp  44"], filtered_data["Line"]))
        # customdata = [(str(identifier1_key), weight, line_number) for weight, line_number in zip(filtered_data["Weight (mg)"], filtered_data["Line"])]
        trace = go.Scatter(
            x=filtered_data["Time Code"],
            y=filtered_data[group_column],
            name=identifier1_key,
            mode="markers",  # Set mode to 'markers' to remove lines
            marker=dict(size=10, opacity=0.8),  # Customize marker appearance
            customdata=customdata,  # Add customdata for hover information
            hovertemplate="<b>Init. intensity (mV): %{customdata[1]}</b><br>" +
                          "<b>Weight (mg): %{customdata[0]:.1f}</b><br>" +
                           "<b>Line: %{customdata[2]}</b><br>"
        )

        # Add the trace to the figure
        fig.add_trace(trace)

    # Update the layout
    fig.update_layout(
        title=f"<b>{group_column}</b>",
        xaxis_title="",
        yaxis_title=group_column,
        showlegend=True
    )

    # Add the figure to the figures dictionary
    figures[group_column] = fig

# Show the figures
for fig in figures.values():
   fig.show()

## RAW INTENSITY ANALAYSIS

### Rearrange dataframe:
**Reference or Sample measurement** and **Standard type**

In [18]:
group_cols = ["Identifier 1", "Is Ref _", "Time Code"]
group_data = df_intensity_cp.groupby(group_cols, as_index=False).agg(lambda x: list(x))
# Rename the columns for clarity
group_data.columns = ['_'.join(col).strip() if isinstance(col, tuple) else col for col in group_data.columns.values]

# Transform group_data to df_raw_int_ratio
# Intensity columns except for rIntensity 44
intensity_cols = [col for col in group_data.columns if col.startswith("rIntensity") and col != "rIntensity 44"]

# Generate empty dataframe to store results
df_raw_int_ratio = group_data.copy()

# Generate ratio_columns
ratio_columns = [f"{intensity_column}_ratio" for intensity_column in intensity_cols]

# Calculate intensity ratios
for intensity_column, ratio_column in zip(intensity_cols, ratio_columns):
 
    # Update the existing column with new values
    df_raw_int_ratio[ratio_column] = df_raw_int_ratio.apply(lambda row: [a / b for a, b in zip(row[intensity_column], row['rIntensity 44'])], axis=1)

# Reshape the DataFrame for plotting
df_raw_int_ratio = df_raw_int_ratio.melt(id_vars=['Identifier 1', 'Is Ref _', 'Time Code'],
                          value_vars= [f"{col}_ratio" for col in intensity_cols],
                          var_name='Intensity', value_name='Ratio')

# Flatten the lists in the 'Ratio' column
df_raw_int_ratio = df_raw_int_ratio.explode('Ratio')

# Map 'Is Ref' values to 'Sample' and 'Reference'
df_raw_int_ratio['Is Ref _'] = df_raw_int_ratio['Is Ref _'].map({0: "Reference", 1: "Sample"})

# Convert column dtypes to appropriate types
df_raw_int_ratio["Time Code"] = pd.to_datetime(df_raw_int_ratio["Time Code"])
df_raw_int_ratio["Intensity"] = df_raw_int_ratio["Intensity"].astype("string")
df_raw_int_ratio["Ratio"] = df_raw_int_ratio["Ratio"].astype("float")
# Ensure that 'Identifier' and 'Is Ref_' are categorical
df_raw_int_ratio['Identifier 1'] = df_raw_int_ratio['Identifier 1'].astype('category')
df_raw_int_ratio['Is Ref _'] = df_raw_int_ratio['Is Ref _'].astype('category')
df_raw_int_ratio['Time Code'] = df_raw_int_ratio['Time Code'].astype('datetime64[ns]')

# df_raw_int_ratio.info()

# Get the first entry of each list in 'rIntensity 44'
start_intensity_m44 = group_data['rIntensity 44'].apply(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)

# Create new columns in df_raw_int_ratios storing first intensity values/cycles
intensity_columns = [f'rIntensity {i}' for i in range(44, 50)]

for col in intensity_columns:
    new_col_name = f'start_{col}'
    group_data[new_col_name] = group_data[col].apply(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)
    df_raw_int_ratio[new_col_name] = group_data[new_col_name]

df_raw_int_ratio_new = df_raw_int_ratio.copy()

# Generate new indices from 0 to 40 for all repeated indices in the DataFrame
df_raw_int_ratio_new.index = pd.Series(df_raw_int_ratio_new.index).groupby(df_raw_int_ratio_new.index).cumcount()

In [19]:
df_raw_int_ratio_new.head()

Unnamed: 0,Identifier 1,Is Ref _,Time Code,Intensity,Ratio,start_rIntensity 44,start_rIntensity 45,start_rIntensity 46,start_rIntensity 47,start_rIntensity 48,start_rIntensity 49
0,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199522,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
1,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.19952,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
2,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199506,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
3,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199514,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
4,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199511,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128


In [20]:
group_data.tail()

Unnamed: 0,Identifier 1,Is Ref _,Time Code,Row,Line,Sample,Weight (mg),Analysis,Identifier 2,rIntensity 44,...,d 18O/16O Mean,d 18O/16O Std Dev,Information,Reference Refill,start_rIntensity 44,start_rIntensity 45,start_rIntensity 46,start_rIntensity 47,start_rIntensity 48,start_rIntensity 49
29,ISO B,0,2024-01-29 05:20:33,"[23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 2...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 1...","[60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 6...","[28865, 28865, 28865, 28865, 28865, 28865, 288...","[standard, standard, standard, standard, stand...","[17284.09, 17101.075, 16919.606, 16740.667, 16...",...,"[21.828, 21.828, 21.828, 21.828, 21.828, 21.82...","[0.021, 0.021, 0.021, 0.021, 0.021, 0.021, 0.0...",[ Acid: 70.0 (°C); LeakRate: 93; 0 x drops;...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",17284.09,20490.783,24082.746,2773.972,2445.709,-27.437
30,ISO B,1,2024-01-28 23:30:08,"[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 1...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ...","[60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 6...","[28856, 28856, 28856, 28856, 28856, 28856, 288...","[standard, standard, standard, standard, stand...","[15521.088, 15361.653, 15200.794, 15042.725, 1...",...,"[21.866, 21.866, 21.866, 21.866, 21.866, 21.86...","[0.017, 0.017, 0.017, 0.017, 0.017, 0.017, 0.0...",[ Acid: 69.9 (°C); LeakRate: 54; 0 x drops;...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",15521.088,18519.287,21726.398,2519.854,2213.855,-24.072
31,ISO B,1,2024-01-29 05:20:33,"[23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 2...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 1...","[60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 6...","[28865, 28865, 28865, 28865, 28865, 28865, 288...","[standard, standard, standard, standard, stand...","[17319.376, 17135.578, 16950.254, 16766.928, 1...",...,"[21.828, 21.828, 21.828, 21.828, 21.828, 21.82...","[0.021, 0.021, 0.021, 0.021, 0.021, 0.021, 0.0...",[ Acid: 70.0 (°C); LeakRate: 93; 0 x drops;...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",17319.376,20664.984,24244.349,2812.53,2468.577,-27.747
32,ISOB_2,0,2024-01-28 15:45:42,"[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 6...","[28844, 28844, 28844, 28844, 28844, 28844, 288...","[standard, standard, standard, standard, stand...","[16588.972, 16414.159, 16241.674, 16070.896, 1...",...,"[21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21....","[0.017, 0.017, 0.017, 0.017, 0.017, 0.017, 0.0...",[ Acid: 69.8 (°C); LeakRate: 54; 0 x drops;...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",16588.972,19665.212,23113.592,2661.669,2343.62,-27.383
33,ISOB_2,1,2024-01-28 15:45:42,"[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","[60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 6...","[28844, 28844, 28844, 28844, 28844, 28844, 288...","[standard, standard, standard, standard, stand...","[16299.718, 16129.411, 15958.874, 15789.783, 1...",...,"[21.88, 21.88, 21.88, 21.88, 21.88, 21.88, 21....","[0.017, 0.017, 0.017, 0.017, 0.017, 0.017, 0.0...",[ Acid: 69.8 (°C); LeakRate: 54; 0 x drops;...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",16299.718,19447.052,22815.434,2646.189,2319.748,-26.236


In [21]:
# group_data['rIntensity 44'].apply(lambda x: x[0] if isinstance(x, list) and len(x) > 0 else None)

In [22]:
# Get the list of intensity columns
intensity_cols = [col for col in group_data.columns if col.startswith("rIntensity")]

# Empty dataframe to store results
intensity_stats = pd.DataFrame(columns=["Identifier", "Is Ref", "Datetime", "Intensity", "Mean", "Std_Dev"])

# Iterate through intensity columns
for intensity_column in intensity_cols:
    # Lists to store linearity values for each identifier
    linearity_values = []

    # Iterate through the rows of the DataFrame
    for identifier, is_ref, time_code, values in zip(group_data['Identifier 1'], group_data['Is Ref _'], group_data["Time Code"], group_data[intensity_column]):
        # Calculate mean and standard deviation for values per identifier
        mean_intensity = np.mean(values)
        std_dev_intensity = np.std(values)

        # Use Reference and Sample for Is Ref values in the dictionary
        is_ref_labels = {0: "Sample", 1: "Reference"}
        is_ref = is_ref_labels[is_ref]

        # Append the results directly to the DataFrame
        intensity_stats = pd.concat([intensity_stats, pd.DataFrame({
            "Identifier": [identifier],
            "Is Ref": [is_ref],
            "Datetime": [time_code],
            "Intensity": [intensity_column],
            "Mean": [mean_intensity],
            "Std_Dev": [std_dev_intensity]
        })], ignore_index=True)

# Plot mean and standard deviation per intensity column
fig = px.scatter(intensity_stats, x="Identifier", y="Mean", color="Intensity",
                 size="Std_Dev", hover_data=["Is Ref", "Datetime", "Std_Dev"],
                 title="Intensity mean and standard deviation",
                 size_max=15
                 )

# Update layout
fig.update_layout(
    xaxis_title="",
    yaxis_title="Intensity [mV]",
    legend_title="Intensity",
    showlegend=True
)

# Update hovertemplate
fig.update_traces(
    hovertemplate=(
        "<b></b>%{x}<br>" +
        "<b>Type</b>: %{customdata[0]}<br>" +
        "<b>Mean</b>: %{y:.2f}<br>" +
        "<b>Std</b>: %{customdata[2]:.2f}<br>" +
        "<b>Datetime</b>: %{customdata[1]}<br>"
    )
)

# Show the plot
fig.show()

### Raw Intensity Linearity

In [23]:
# Missing values?
# group_data.isna().sum()

In [24]:
# Intensity columns except for rIntensity 44
intensity_cols = [col for col in group_data.columns if col.startswith("rIntensity") and col != "rIntensity 44"]

# Generate empty dataframe to store results
df_raw_int_ratio = group_data.copy()

# Generate ratio_columns
ratio_columns = [f"{intensity_column}_ratio" for intensity_column in intensity_cols]

# Calculate intensity ratios
for intensity_column, ratio_column in zip(intensity_cols, ratio_columns):
 
    # Update the existing column with new values
    df_raw_int_ratio[ratio_column] = df_raw_int_ratio.apply(lambda row: [a / b for a, b in zip(row[intensity_column], row['rIntensity 44'])], axis=1)

# Reshape the DataFrame for plotting
df_raw_int_ratio = df_raw_int_ratio.melt(id_vars=['Identifier 1', 'Is Ref _', 'Time Code'],
                          value_vars= [f"{col}_ratio" for col in intensity_cols],
                          var_name='Intensity', value_name='Ratio')

# Flatten the lists in the 'Ratio' column
df_raw_int_ratio = df_raw_int_ratio.explode('Ratio')

# Convert 'Time Code' to datetime format
df_raw_int_ratio["Time Code"] = pd.to_datetime(df_raw_int_ratio["Time Code"])

# Map 'Is Ref' values to 'Sample' and 'Reference'
df_raw_int_ratio['Is Ref _'] = df_raw_int_ratio['Is Ref _'].map({0: "Reference", 1: "Sample"})

# Introduce jitter: add random time delta in the range of seconds
# using pandas.Timedelta and numpy.random.uniform
# Replace 'Time Code' with whatever unit of time you want the jitter
jitter_seconds = 600  # Set the maximum number of seconds for jitter
df_raw_int_ratio['Jittered Time Code'] = df_raw_int_ratio.apply(
    lambda x: x['Time Code'] + pd.Timedelta(seconds=np.random.uniform(-jitter_seconds, jitter_seconds)),
    axis=1
)

# Create the scatter plot
fig = px.scatter(df_raw_int_ratio, x='Jittered Time Code', y='Ratio', color='Identifier 1',
               hover_data=['Identifier 1', 'Is Ref _', 'Intensity', 'Ratio'],
               title="Intensity ratios per Identifier",
               opacity=0.5, # Set opacity to 0.5 (50%) to see overlapping points
               size_max=15)

fig.update_layout(
 xaxis_title=" ",
 yaxis_title="Intensity Ratio",
 legend_title="Standard",
 showlegend=True
)

# Update hovertemplate
fig.update_traces(
    hovertemplate=(
        "<b></b>%{customdata[0]}<br>" +
        "<b>Is Ref</b>: %{customdata[1]}<br>" +
        "<b>Intensity</b>: %{customdata[2]}<br>" +
        "<b>Ratio</b>: %{y}<br>" +
        "<extra></extra>"
    )
)

# Show plot
fig.show()

In [25]:
df_raw_int_ratio_new.head()

Unnamed: 0,Identifier 1,Is Ref _,Time Code,Intensity,Ratio,start_rIntensity 44,start_rIntensity 45,start_rIntensity 46,start_rIntensity 47,start_rIntensity 48,start_rIntensity 49
0,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199522,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
1,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.19952,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
2,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199506,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
3,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199514,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128
4,CHALK,Reference,2024-01-28 16:23:26,rIntensity 45_ratio,1.199511,15091.874,18103.033,21392.983,2494.442,2206.276,-23.128


### Raw intensity ratio plots

In [26]:
# Get unique intensity values
unique_intensities = df_raw_int_ratio_new['Intensity'].unique()

# Generate a plot for each unique intensity
for intensity in unique_intensities:
    # Filter dataframe for the current intensity
    df_filtered = df_raw_int_ratio_new[df_raw_int_ratio_new['Intensity'] == intensity]
    
    # Create a new figure
    fig = go.Figure()

    # Add a scatter trace for each unique identifier
    for identifier in df_filtered['Identifier 1'].unique():
        df_identifier = df_filtered[df_filtered['Identifier 1'] == identifier]

        # Get the color and marker for the identifier
        color, marker = standard_marker_color(identifier)

        fig.add_trace(go.Scatter(
            x=df_identifier.index,
            y=df_identifier['Ratio'],
            mode='markers',
            # mode='lines+markers',
            name=identifier,
            marker=dict(
                color=color,# standard_marker_color(identifier)[0],  # Set color
                symbol= marker# standard_marker_color(identifier)[1]  # Set marker style
            ),
            customdata=np.stack((df_identifier['Identifier 1'], 
                                 df_identifier['Is Ref _'], 
                                 df_identifier['Intensity'], 
                                 np.datetime_as_string(df_identifier['Time Code'], unit='s')), 
                                 axis=-1),
            hovertemplate="<b>Standard</b>: %{customdata[0]}<br>" +
                          "<b>Is Ref</b>: %{customdata[1]}<br>" +
                          "<b>Intensity</b>: %{customdata[2]}<br>" +
                          "<b>Ratio</b>: %{y}<br>" +
                          "<b>Datetime</b>: %{customdata[3]}<br>" +
                          "<extra></extra>"
        ))

    fig.update_layout(
        title=f"Intensity ratios for Intensity: {intensity}",
        xaxis_title="Cycles",
        yaxis_title="Intensity Ratio",
        legend_title="Standard",
        showlegend=True
    )
    # Show the plot
    fig.show()

In [27]:
len(unique_intensities) // 2 + 1

3

In [28]:
df = df_raw_int_ratio_new.copy()
# Get unique intensity values
unique_intensities = df['Intensity'].unique()

# Determine the number of rows required for subplots based on the number of unique intensities
subplot_rows = len(unique_intensities)

# Create a subplot figure with one column and a row for each unique intensity
fig = make_subplots(rows=subplot_rows, cols=1, shared_xaxes=False, vertical_spacing=0.075)

# Row counter for adding traces to the correct subplot
row = 1

# Generate a plot for each unique intensity
for idx, intensity in enumerate(unique_intensities, start=1):
    # Filter dataframe for the current intensity
    df_filtered = df[df['Intensity'] == intensity]
    
    # Check if the legend for this identifier has already been shown
    # Only show the legend for the central trace
    showlegend = (idx == len(unique_intensities) // 2 + 1)

    # Add a scatter trace for each unique identifier
    for identifier in df_filtered['Identifier 1'].unique():
        df_identifier = df_filtered[df_filtered['Identifier 1'] == identifier]

        # Get the color and marker for the identifier
        color, marker = standard_marker_color(identifier)

        fig.add_trace(go.Scatter(
            x=df_identifier.index,
            y=df_identifier['Ratio'],
            mode='markers',
            # mode='lines+markers',
            name=identifier,
            showlegend=showlegend,  # Only show legend for the central trace
            marker=dict(
                color=color,# standard_marker_color(identifier)[0],  # Set color
                symbol= marker# standard_marker_color(identifier)[1]  # Set marker style
            ),
            
            customdata=np.stack((df_identifier['Identifier 1'], 
                                df_identifier['Is Ref _'], 
                                df_identifier['Intensity'], 
                                np.datetime_as_string(df_identifier['Time Code'], unit='s')), 
                                axis=-1),
            hovertemplate="<b>Standard</b>: %{customdata[0]}<br>" +
                        "<b>Is Ref</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Ratio</b>: %{y}<br>" +
                        "<b>Datetime</b>: %{customdata[3]}<br>" +
                        "<extra></extra>"
        ),
        row=row, col=1)

    fig.update_xaxes(title_text="Cycles", row=row, col=1)
    fig.update_yaxes(title_text=f"{intensity}", row=row, col=1)
    row += 1

fig.update_layout(
    # autosize=False,
    height=300 * subplot_rows,
    title="Intensity Ratios by Intensity and Standard",
    legend_title="Standard",
    showlegend=True,
)
# fig.show()

### Raw Intensity Linearity fit

In [29]:
# Create a 2x1 subplot layout
fig = make_subplots(rows=6, cols=1)

# Placeholder to store slope and r-squared values
identifier_stats = {}

# Create a color mapping based on unique identifiers
unique_identifiers = df_raw_int_ratio_new['Identifier 1'].unique()

# Placeholder for table data
table_data = {'Identifier': [], 'Is Ref': [], 'Intensity': [], 'Slope': [], 'Intercept': [], 'R²': []}

for identifier in df_raw_int_ratio_new['Identifier 1'].unique():
    identifier_stats[identifier] = []
    for is_ref in df_raw_int_ratio_new['Is Ref _'].unique():
        for intensity in df_raw_int_ratio_new['Intensity'].unique():
            # Filter the dataframe for each identifier, Is Ref, and Intensity
            id_ref_intensity_group = df_raw_int_ratio_new[(df_raw_int_ratio_new['Identifier 1'] == identifier) & 
                                                          (df_raw_int_ratio_new['Is Ref _'] == is_ref) &
                                                           (df_raw_int_ratio_new['Intensity'] == intensity)]

            # Extract x and y values for regression
            x = id_ref_intensity_group.index.values
            y = id_ref_intensity_group['Ratio'].astype(float)  # Ensure that ratios are floats for regression

            # Perform linear regression
            slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

            # Rˆ2 value
            r_squared = r_value ** 2

            # Store identifier, is_ref, intensity, slope, and r-squared in the dictionary
            identifier_stats[identifier].append({'is_ref': is_ref, 'intensity': intensity, 'slope': slope, 'intercept': intercept, 'R²': r_squared})

            # Add data to table and get color for each identifier
            color = standard_marker_color(identifier)[0]  # Get color
            table_data['Identifier'].append(identifier) 
            table_data['Is Ref'].append(is_ref)
            table_data['Intensity'].append(intensity)
            table_data['Slope'].append(f'{slope:.2e}')  # Format in scientific notation
            table_data['Intercept'].append(f'{intercept:.2e}')  # Format in scientific notation
            table_data['R²'].append(f'{r_squared:.2e}')  # Format in scientific notation

            # Get the color and marker for the identifier
            color, marker = standard_marker_color(identifier)

            # Custom data for fig hovertemplate
            custom_data = [[identifier, is_ref, intensity, slope, intercept, r_squared, 
                            np.datetime_as_string(id_ref_intensity_group['Time Code'].values[0], unit='s'),
                            ]]

            # Add scatterplot points for slope
            fig.add_trace(
                go.Scatter(
                    y=[identifier], 
                    x=[slope],
                    mode='markers',
                    marker=dict(
                    color=color, # standard_marker_color(identifier)[0],  # Set color
                    symbol= marker # standard_marker_color(identifier)[1]  # Set marker style
                    ),
                    customdata=custom_data, # [[identifier, is_ref, intensity, slope, intercept, r_squared, id_ref_intensity_group['Time Code'].values[0]]],
                    hovertemplate=(
                        "<b>Identifier</b>: %{customdata[0]}<br>" +
                        "<b>Measurement type</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Slope</b>: %{customdata[3]:.2e}<br>" +  # Use scientific notation
                        "<b>Intercept</b>: %{customdata[4]:.2e}<br>" +  # Use scientific notation
                        "<b>R²</b>: %{customdata[5]:.2e}<br>" +  # Use scientific notation
                        "<b>Datetime</b>: %{customdata[6]}<br>" +  # Add Time Code here
                        "<extra></extra>"
                    )
                ),
                row=1, col=1
            )
            # Add scatterplot points for intercept
            fig.add_trace(
                go.Scatter(
                    y=[identifier],    
                    x=[intercept],
                    marker=dict(
                    color=color,  # Set color
                    symbol=marker  # Set marker style
                    ),
                    customdata=custom_data,
                    hovertemplate=(
                        "<b>Identifier</b>: %{customdata[0]}<br>" +
                        "<b>Measurement type</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Slope</b>: %{customdata[3]:.2e}<br>" +  # Use scientific notation
                        "<b>Intercept</b>: %{customdata[4]:.2e}<br>" +  # Use scientific notation
                        "<b>R²</b>: %{customdata[5]:.2e}<br>" +  # Use scientific notation
                        "<b>Datetime</b>: %{customdata[6]}<br>" +  # Add Time Code here
                        "<extra></extra>"
                    )
                ),
                row=2, col=1
            )
            # Add scatterplot points for Rˆ2
            fig.add_trace(
                go.Scatter(
                    y=[identifier],
                    x=[r_squared],
                    marker=dict(
                    color=color,  # Set color
                    symbol=marker  # Set marker style
                    ),
                    customdata=custom_data,
                    hovertemplate=(
                        "<b>Identifier</b>: %{customdata[0]}<br>" +
                        "<b>Measurement type</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Slope</b>: %{customdata[3]:.2e}<br>" +  # Use scientific notation
                        "<b>Intercept</b>: %{customdata[4]:.2e}<br>" +  # Use scientific notation
                        "<b>R²</b>: %{customdata[5]:.2e}<br>" +  # Use scientific notation
                        "<b>Datetime</b>: %{customdata[6]}<br>" +  # Add Time Code here
                        "<extra></extra>"
                    )
                ),
                row=3, col=1
            )
            fig.add_trace(
                go.Scatter(
                    y=[slope], 
                    x=id_ref_intensity_group['start_rIntensity 44'],
                    mode='markers',
                    marker=dict(
                    color=color, # standard_marker_color(identifier)[0],  # Set color
                    symbol= marker # standard_marker_color(identifier)[1]  # Set marker style
                    ),
                    customdata=custom_data, # [[identifier, is_ref, intensity, slope, intercept, r_squared, id_ref_intensity_group['Time Code'].values[0]]],
                    hovertemplate=(
                        "<b>Identifier</b>: %{customdata[0]}<br>" +
                        "<b>Measurement type</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Slope</b>: %{customdata[3]:.2e}<br>" +  # Use scientific notation
                        "<b>Intercept</b>: %{customdata[4]:.2e}<br>" +  # Use scientific notation
                        "<b>R²</b>: %{customdata[5]:.2e}<br>" +  # Use scientific notation
                        "<b>Datetime</b>: %{customdata[6]}<br>" +  # Add Time Code here
                        "<extra></extra>"
                    )
                ),
                row=4, col=1
            )
            fig.add_trace(
                go.Scatter(
                    y=[intercept], 
                    x=id_ref_intensity_group['start_rIntensity 44'],
                    mode='markers',
                    marker=dict(
                    color=color, # standard_marker_color(identifier)[0],  # Set color
                    symbol= marker # standard_marker_color(identifier)[1]  # Set marker style
                    ),
                    customdata=custom_data, # [[identifier, is_ref, intensity, slope, intercept, r_squared, id_ref_intensity_group['Time Code'].values[0]]],
                    hovertemplate=(
                        "<b>Identifier</b>: %{customdata[0]}<br>" +
                        "<b>Measurement type</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Slope</b>: %{customdata[3]:.2e}<br>" +  # Use scientific notation
                        "<b>Intercept</b>: %{customdata[4]:.2e}<br>" +  # Use scientific notation
                        "<b>R²</b>: %{customdata[5]:.2e}<br>" +  # Use scientific notation
                        "<b>Datetime</b>: %{customdata[6]}<br>" +  # Add Time Code here
                        "<extra></extra>"
                    )
                ),
                row=5, col=1
            )
            fig.add_trace(
                go.Scatter(
                    y=[r_squared], 
                    x=id_ref_intensity_group['start_rIntensity 44'],
                    mode='markers',
                    marker=dict(
                    color=color, # standard_marker_color(identifier)[0],  # Set color
                    symbol= marker # standard_marker_color(identifier)[1]  # Set marker style
                    ),
                    customdata=custom_data, # [[identifier, is_ref, intensity, slope, intercept, r_squared, id_ref_intensity_group['Time Code'].values[0]]],
                    hovertemplate=(
                        "<b>Identifier</b>: %{customdata[0]}<br>" +
                        "<b>Measurement type</b>: %{customdata[1]}<br>" +
                        "<b>Intensity</b>: %{customdata[2]}<br>" +
                        "<b>Slope</b>: %{customdata[3]:.2e}<br>" +  # Use scientific notation
                        "<b>Intercept</b>: %{customdata[4]:.2e}<br>" +  # Use scientific notation
                        "<b>R²</b>: %{customdata[5]:.2e}<br>" +  # Use scientific notation
                        "<b>Datetime</b>: %{customdata[6]}<br>" +  # Add Time Code here
                        "<extra></extra>"
                    )
                ),
                row=6, col=1
            )


# Update xaxis and yaxis properties if needed
fig.update_yaxes(title_text="", type='category', row=1, col=1)
fig.update_xaxes(title_text="Slope", row=1, col=1, tickformat='.1e')  # Use scientific notation
fig.update_yaxes(title_text="", type='category', row=2, col=1)
fig.update_xaxes(title_text="Intercept", row=2, col=1, tickformat='.1e')  # Use scientific notation
fig.update_yaxes(title_text="", type='category', row=3, col=1)
fig.update_xaxes(title_text="R²", row=3, col=1, tickformat='.1e')  # Use scientific notation
fig.update_yaxes(title_text="Slope", row=4, col=1, tickformat='.1e')
fig.update_xaxes(title_text="Starting intensity 44 [mV]", row=4) #, col=1, tickformat='.1e')  # Use scientific notation
fig.update_yaxes(title_text="Intercept", row=5, col=1, tickformat='.1e')
fig.update_xaxes(title_text="Starting intensity 44 [mV]", row=5)
fig.update_yaxes(title_text="R²", row=6, col=1, tickformat='.1e')
fig.update_xaxes(title_text="Starting intensity 44 [mV]", row=6)

# Update layout if necessary, e.g., adding titles, adjusting height, etc.
fig.update_layout(height=2500, 
                  title_text="Intensity cycles linear regression slope, intercept and R²",
                  showlegend=False)

# Show the figure
fig.show()

# Create a color mapping based on unique identifiers
color_mapping = {identifier: standard_marker_color(identifier)[0] for identifier in table_data['Identifier']}

# Create color arrays for cells
identifier_colors = [color_mapping.get(identifier, 'white') for identifier in table_data['Identifier']]

# Create 2D color array
cells_fill_color = [identifier_colors] * (len(table_data) - 1)

# Create table
fig_table = go.Figure(data=[go.Table(
    header=dict(values=list(table_data.keys()),
                align='left'),
    cells=dict(values=list(table_data.values()),
               # fill_color=cells_fill_color,
               align='left')
)])

fig_table.show()

### Raw intensity plots

In [30]:
# Create an empty figure
fig = go.Figure()

# Iterate through the rows of the DataFrame
for identifier, is_ref, time_code, values in zip(group_data['Identifier 1'], group_data['Is Ref _'], group_data["Time Code"], group_data['rIntensity 44']):
    
    # Create a line plot for each row
    legend_group = f'Is Ref _ {is_ref}'
    legend_name = 'Reference' if is_ref == 1 else 'Sample'
    # Create a line plot for each row
    fig.add_trace(go.Scatter(x=list(range(len(values))), y=values,
                             mode='lines+markers',
                             name=f'{identifier} - {legend_name}',
                            legendgroup=legend_group,
                             text=[f'Value: {v:.2f}' for v in values],
                             hoverinfo='text',
                             hovertemplate= f"<b>Identifier</b>: {identifier}<br>" +
                                     f"<b>Measurement type</b>: {legend_name}<br>" +
                                     "<b>Intensity [mV]</b>: %{y:.2f}<br>" +
                                     "<b>Cycle</b>: %{x}<br>" +
                                     f"<b>Datetime</b>: {time_code}<br>" +
                                     "<extra></extra>" # Hide the extra hover info

                             ))

# Customize the layout of the plot
fig.update_layout(title='Raw Intensities m44',
                  xaxis_title='Cycle',
                  yaxis_title='Raw intensity m44',
                  showlegend=False)

# Show the plot
fig.show()

In [31]:
# Get unique identifiers
unique_identifiers = group_data['Identifier 1'].unique()

# Get a list of all intensity related columns you'd use for plotting
intensity_columns = [col for col in group_data.columns if col.startswith('rIntensity')]

# Iterate through intensities
for intensity in intensity_columns:
    # Create a separate figure for each unique identifier
    for identifier in unique_identifiers:
        # Filter DataFrame for the current identifier and intensity
        subset_df = group_data[(group_data['Identifier 1'] == identifier)]

        # Create an empty figure
        fig = go.Figure()

        # Iterate through the rows of the subset DataFrame
        for is_ref, values, time_code in zip(subset_df['Is Ref _'], subset_df[intensity], subset_df['Time Code']):
            # Create a line plot for each row
            legend_group = f'Is Ref _ {is_ref}'
            legend_name = 'Reference' if is_ref == 1 else 'Sample'

            # Get the color and marker for the identifier
            line_color = 'darkgrey' if is_ref == 1 else '#FFAA33' # Set line color based on Is Ref
            color, marker = standard_marker_color(identifier)

            # Create a line plot for each row
            fig.add_trace(go.Scatter(x=list(range(len(values))), y=values,
                                     mode='lines+markers',
                                     name=f'{identifier} - {legend_name}',
                                     legendgroup=legend_group,
                                     text=[f'Value: {v:.2f}' for v in values],
                                     hoverinfo='text',
                                     marker=dict(color=color, symbol=marker), # Use the color and marker
                                     line=dict(color=line_color), # Set line color
                                     hovertemplate=f"<b>Identifier</b>: {identifier}<br>" +
                                                   f"<b>Measurement type</b>: {legend_name}<br>" +
                                                   "<b>Intensity [mV]</b>: %{y:.2f}<br>" +
                                                   "<b>Cycle</b>: %{x}<br>" +
                                                   f"<b>Datetime</b>: {time_code}<br>" +
                                                   "<extra></extra>",  # Hide the extra hover information
                                     ))

        # Customize the layout of the plot
        fig.update_layout(title=f'Raw Intensity - Standard: {identifier}, Intensity: {intensity}',
                          xaxis_title='Cycle',
                          yaxis_title='Raw Intensity [mV]',
                          showlegend=False)

        # Show the plot
        # fig.show()

In [32]:
def create_figures(group_data):
    # Get unique identifiers
    unique_identifiers = group_data['Identifier 1'].unique()

    # Get a list of all intensity related columns you'd use for plotting
    intensity_columns = [col for col in group_data.columns if col.startswith('rIntensity')]

    # Calculate the total number of subplots
    total_subplots = len(unique_identifiers) * len(intensity_columns)

    # Create an empty figure with the correct number of subplots
    fig = make_subplots(rows=total_subplots, cols=1)

    # Counter for the current subplot
    subplot_counter = 1

    # Iterate through intensities
    for intensity in intensity_columns:
        # Create a separate subplot for each unique identifier
        for identifier in unique_identifiers:
            # Filter DataFrame for the current identifier and intensity
            subset_df = group_data[(group_data['Identifier 1'] == identifier)]

            # Iterate through the rows of the subset DataFrame
            for is_ref, values, time_code in zip(subset_df['Is Ref _'], subset_df[intensity], subset_df['Time Code']):
                # Create a line plot for each row
                legend_group = f'Is Ref _ {is_ref}'
                legend_name = 'Reference' if is_ref == 1 else 'Sample'

                # Get the color and marker for the identifier
                line_color = 'black' if is_ref == 1 else '#FFAA33' # Set line color based on Is Ref
                color, marker = standard_marker_color(identifier)

                # Create a line plot for each row
                fig.add_trace(go.Scatter(x=list(range(len(values))), y=values,
                                         mode='lines+markers',
                                         name=f'{identifier} - {legend_name}',
                                         legendgroup=legend_group,
                                         text=[f'Value: {v:.2f}' for v in values],
                                         hoverinfo='text',
                                         marker=dict(color=color, symbol=marker), # Use the color and marker
                                         line=dict(color=line_color), # Set line color
                                         hovertemplate=f"<b>Identifier</b>: {identifier}<br>" +
                                                       f"<b>Measurement type</b>: {legend_name}<br>" +
                                                       "<b>Intensity [mV]</b>: %{y:.2f}<br>" +
                                                       "<b>Cycle</b>: %{x}<br>" +
                                                       f"<b>Datetime</b>: {time_code}<br>" +
                                                       "<extra></extra>",  # Hide the extra hover information
                                         ), row=subplot_counter, col=1)

            # Customize the layout of the subplot
            fig.update_xaxes(title_text='Cycle', row=subplot_counter, col=1)
            fig.update_yaxes(title_text=f'{intensity} - {identifier} [mV]', row=subplot_counter, col=1)
            fig.update_layout(showlegend=False)
        
            # Increment the subplot counter
            subplot_counter += 1
    
    fig.update_layout(
        # autosize=False,
        height=500 * total_subplots,
        title="Raw Intensity and Standard",
        legend_title="Standard",
        showlegend=False,
    )
    # Return the figure
    return fig

In [33]:
# show the figures
figures = create_figures(group_data)
# figures.show()