# AS7341 9.18.23 Dilution Tests
This is an analysis of the AS7341 9.18.23 Dilution Tests. This file contains the code for each of the analysis performed on the 512x Gain dilution tests complete with the means and ranges of the tests, RSME and R2 values.

## Mean Values
The mean values of each dillution were taken across the 5 trials. 

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy.stats import linregress

from sklearn.metrics import mean_squared_error, r2_score
# Replace with the actual folder path containing your CSV files
folder_path = "/Users/jessiewynne/chla_fluorometer/AS7241 Dillutions 9.18.23 /"

# Initialize empty lists to store data from multiple CSV files
data_points = []

# Create lists to store legend labels and colors
legend_labels = []
colors = plt.cm.viridis(np.linspace(0, 1, len(os.listdir(folder_path))))

# Create a dictionary to store the mean values for each test category
mean_values = {}
min_values = {}
max_values = {}

# Create a dictionary to store the range values for each test category
range_values = {}

for filename, color in zip(os.listdir(folder_path), colors):
    if filename.endswith(".csv"):
        file_path = os.path.join(folder_path, filename)
        
        # Read data from the current CSV file
        df = pd.read_csv(file_path, encoding='utf-8')

        # Filter out rows where the 'Test' column is 'test'
        df = df[df['Test'].str.lower() != 'test']

        # Convert 'F8 (Raw)' column to numeric values
        df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')

        # Get unique test categories (dilution values) in the order of appearance
        categories = df['Test'].unique()
            # Compute the mean value and range for each test category in the current CSV file
        for category in categories:
            # Exclude the first data point from each category
            category_df = df[df['Test'] == category][1:]
            category_f8_raw_mean = category_df['F8 (Raw)'].mean()
            
            # Check if the 'Test' value can be converted to a float, otherwise, skip it
            try:
                x_value = float(category)
                
                # Store the mean value in the dictionary
                if category not in mean_values:
                    mean_values[category] = []
                mean_values[category].append(category_f8_raw_mean)
                
                legend_labels.append(filename)  # Add legend label corresponding to the CSV file
            except ValueError:
                pass
# Calculate the mean value across all CSV files for each test category
for category, mean_list in mean_values.items():
    mean_values[category] = np.mean(mean_list)

# Create a scatter plot for the mean F8 values
x_values = [float(category) for category in mean_values.keys()]
y_values = list(mean_values.values())

# Calculate the range of values for each test category across all CSV files
for category in mean_values.keys():
    # Get all the values for this category from all CSV files
    all_values = []
    for filename in os.listdir(folder_path):
        if filename.endswith(".csv"):
            file_path = os.path.join(folder_path, filename)
            df = pd.read_csv(file_path, encoding='utf-8')
            df = df[df['Test'].str.lower() != 'test']
            
            # Convert 'F8 (Raw)' column to numeric values
            df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
            
            category_df = df[df['Test'] == category][1:]  # Exclude the first data point
            category_values = category_df['F8 (Raw)'].dropna().tolist()
            all_values.extend(category_values)
    
    # Calculate the range and add it to the range_values dictionary
    range_values[category] = np.max(all_values) - np.min(all_values)
    # Sort the range_values dictionary by range (ascending order)
    sorted_range_values = dict(sorted(range_values.items(), key=lambda item: item[1]))
    for category, mean_list in mean_values.items():
        mean_value = np.mean(mean_list)
        print(f"{category} - Mean Values: {mean_value:.4f}")
   
print()

0.0 - Mean Values: 16.9380
0.25 - Mean Values: 18.7622
0.50 - Mean Values: 15.2069
0.75 - Mean Values: 24.1620
1.0 - Mean Values: 25.5368
2.0 - Mean Values: 29.9222
4.0 - Mean Values: 46.3667
6.0 - Mean Values: 67.9944
8.0 - Mean Values: 70.8315
10.0 - Mean Values: 81.5222
20.0 - Mean Values: 165.8222
30.0 - Mean Values: 229.4556
40.0 - Mean Values: 304.5222
50.0 - Mean Values: 362.1050
0.0 - Mean Values: 16.9380
0.25 - Mean Values: 18.7622
0.50 - Mean Values: 15.2069
0.75 - Mean Values: 24.1620
1.0 - Mean Values: 25.5368
2.0 - Mean Values: 29.9222
4.0 - Mean Values: 46.3667
6.0 - Mean Values: 67.9944
8.0 - Mean Values: 70.8315
10.0 - Mean Values: 81.5222
20.0 - Mean Values: 165.8222
30.0 - Mean Values: 229.4556
40.0 - Mean Values: 304.5222
50.0 - Mean Values: 362.1050
0.0 - Mean Values: 16.9380
0.25 - Mean Values: 18.7622
0.50 - Mean Values: 15.2069
0.75 - Mean Values: 24.1620
1.0 - Mean Values: 25.5368
2.0 - Mean Values: 29.9222
4.0 - Mean Values: 46.3667
6.0 - Mean Values: 67.9944
8

## Mean Analysis 
Accoridng the means of each dilution across the 5 trials, the sensitivity of the chlrophyll sensor is 0.75 ug/L. This is a decrease in sensitivity than some of the singluar 512x gain tests with 700ms of integration. 

## Range Values

In [6]:
for category, mean_list in mean_values.items():
    mean_value = np.mean(mean_list)   
    print(f"{category} - Range Values: {sorted_range_values[category]}")

0.0 - Range Values: 15
0.25 - Range Values: 29
0.50 - Range Values: 14
0.75 - Range Values: 19
1.0 - Range Values: 23
2.0 - Range Values: 27
4.0 - Range Values: 38
6.0 - Range Values: 59
8.0 - Range Values: 28
10.0 - Range Values: 31
20.0 - Range Values: 82
30.0 - Range Values: 85
40.0 - Range Values: 117
50.0 - Range Values: 163


## Range Analysis 
There is a rather large range in each of the dilutions over the 5 trials. This could be an issue in repeatability of the sensor measurements. 

## RMSE Values 
RMSE value calculated across the 5 trials

In [14]:
# Calculate the line of best fit parameters (slope and intercept)
slope, intercept, r_value, p_value, std_err = linregress(x_values, y_values)

# Create a function to calculate predicted values using the line of best fit equation
def predict_values(x, slope, intercept):
    return slope * x + intercept

# Calculate R-squared and RMSE
y_predicted = predict_values(np.array(x_values), slope, intercept)
r_squared = r2_score(y_values, y_predicted)
rmse = np.sqrt(mean_squared_error(y_values, y_predicted))
print(rmse)

4.624547095520336


## RMSE Analysis 
The RMSE calcualted was high. This menas that the predictive accuracy of the model is not very accurtate. This could be due to the large range in the dillution values between the trials. 

## R-Sqaured

In [15]:
r_squared = r2_score(y_values, y_predicted)
print(r_squared)

0.9982728869563973


## R-Squared Analysis
The R-sqaured across all 5 trials is high. This indicates that the change in chlorophyll concentration corresponds to changes in the AS7341 values. 

## Conclusion
There is a large range of AS7341 values in each dilutuion across the trials. This is indicative of poor repeatability in the sensor. The high RMSE also indicates a poor repeatability of the sensor as the predictive accuracy of the model is not very accurate. 

## Next Steps
1. Repeat the tests again to revaluate the repeatabiltiy of the sensor
2. Perform dilutions from 0.25ug/L to 10.0ug/L to evaluate the sensor on small ammouts of chlorphyll 