# AS7341 8.22.23 Dillution Tests
This is an analysis of the AS7341 8.22.23 Dilution Tests. This file contains the code for each of the analysis performed on the 128x Gain, 256x Gain and 512x Gain dilution tests complete with confidence intervals, uncertainty percentages, RSME and R2 values.

## 95% Confidence Intervals

## 512x Gain, 700ms Integration Confidence Interval

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import linregress, t
from sklearn.metrics import r2_score, mean_squared_error
import math

# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/512x_700ms_2.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')

# Get unique test categories (dilution values) in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category (dilution)
category_f8_raw_dict = {}

# Create dictionaries to store confidence intervals and ranges of uncertainty
confidence_intervals = {}

# Calculate the mean and standard deviation for the entire dataset for 'F8 (Raw)' data
dataset_f8_raw_mean = df['F8 (Raw)'].mean()
dataset_f8_raw_std = df['F8 (Raw)'].std()

# Calculate the sample size for the entire dataset
dataset_sample_size = len(df['F8 (Raw)'])

# Calculate the standard error for the entire dataset's mean
dataset_standard_error = dataset_f8_raw_std / np.sqrt(dataset_sample_size)

# Calculate the margin of error using the t-distribution
confidence_level = 0.95
margin_of_error = t.ppf((1 + confidence_level) / 2, dataset_sample_size - 1) * dataset_standard_error

# Calculate the confidence interval for the entire dataset's mean
confidence_interval = (dataset_f8_raw_mean - margin_of_error, dataset_f8_raw_mean + margin_of_error)
# Calculate the range of uncertainty (95% CI width) for the entire dataset
uncertainty_range = confidence_interval[1] - confidence_interval[0]

# Print the confidence interval and range of uncertainty for the entire dataset
print("\nEntire Dataset:")
print(f"95% Confidence Interval for F8 (Raw): {confidence_interval[0]:.4f} to {confidence_interval[1]:.4f}")
print(f"Range of Uncertainty (95% CI Width) for Entire Dataset: {uncertainty_range:.4f}")
print()


Entire Dataset:
95% Confidence Interval for F8 (Raw): 75.1315 to 96.9403
Range of Uncertainty (95% CI Width) for Entire Dataset: 21.8088



## 256x Gain, 700ms Integration Confidence Interval

In [3]:
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/256x_700ms.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')

# Get unique test categories (dilution values) in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category (dilution)
category_f8_raw_dict = {}

# Create dictionaries to store confidence intervals and ranges of uncertainty
confidence_intervals = {}

# Calculate the mean and standard deviation for the entire dataset for 'F8 (Raw)' data
dataset_f8_raw_mean = df['F8 (Raw)'].mean()
dataset_f8_raw_std = df['F8 (Raw)'].std()

# Calculate the sample size for the entire dataset
dataset_sample_size = len(df['F8 (Raw)'])

# Calculate the standard error for the entire dataset's mean
dataset_standard_error = dataset_f8_raw_std / np.sqrt(dataset_sample_size)

# Calculate the margin of error using the t-distribution
confidence_level = 0.95
margin_of_error = t.ppf((1 + confidence_level) / 2, dataset_sample_size - 1) * dataset_standard_error

# Calculate the confidence interval for the entire dataset's mean
confidence_interval = (dataset_f8_raw_mean - margin_of_error, dataset_f8_raw_mean + margin_of_error)
# Calculate the range of uncertainty (95% CI width) for the entire dataset
uncertainty_range = confidence_interval[1] - confidence_interval[0]

# Print the confidence interval and range of uncertainty for the entire dataset
print("\nEntire Dataset:")
print(f"95% Confidence Interval for F8 (Raw): {confidence_interval[0]:.4f} to {confidence_interval[1]:.4f}")
print(f"Range of Uncertainty (95% CI Width) for Entire Dataset: {uncertainty_range:.4f}")
print()


Entire Dataset:
95% Confidence Interval for F8 (Raw): 42.0202 to 53.6763
Range of Uncertainty (95% CI Width) for Entire Dataset: 11.6561



## Confidence Interval Analysis

Overall the CI interval width increases as the gain increases. This could indicate that the as the gain increases the variability of chlorophyll measurements increases. This could be desireable as an increase in variability/range of chlorophyll measurements could lead to increase in sensitivity of the sensor. 

## Sensitivity Evaluation

### 512 Gain, 700ms Integration

In [8]:
## Mean, SD and Error
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/512x_700ms_2.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
# Get unique test categories in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category
category_f8_raw_dict = {}
# Store unique x-values and their corresponding data points
x_values = []
y_values = []
std_devs = []
for category in categories:
    # Exclude the first data point from each category
    category_df = df[df['Test'] == category][1:]
    category_f8_raw_dict[category] = category_df['F8 (Raw)']

  # Calculate the mean and standard deviation for 'F8 (Raw)' data in each category
    category_f8_raw_mean = category_df['F8 (Raw)'].mean()
    category_f8_raw_std = category_df['F8 (Raw)'].std()
        # Calculate the sample size for each category
    category_sample_size = len(category_df['F8 (Raw)'])
    # Calculate the standard error for each category's mean
    category_standard_error = category_f8_raw_std / np.sqrt(category_sample_size)

    print(f"{category} ")
    print(f"{category}")
    print(category_f8_raw_mean, "mean value")
    print(category_standard_error, "standard error")
    print(category_f8_raw_std, "standard deviation")
    print()
        # Check 

    try:
        x_value = float(category)
        x_values.append(x_value)
        y_values.append(category_f8_raw_mean)
        std_devs.append(category_f8_raw_std)
    except ValueError:
        pass


0.0 
0.0
10.333333333333334 mean value
0.30249507099101003 standard error
1.2833778958394957 standard deviation

0.25 
0.25
9.11111111111111 mean value
0.290243180538229 standard error
1.2313975269103985 standard deviation

0.50 
0.50
18.27777777777778 mean value
0.3597889319121795 standard error
1.5264551613058026 standard deviation

0.75 
0.75
24.88888888888889 mean value
0.24103384202072908 standard error
1.0226199851298272 standard deviation

2.0 
2.0
30.0 mean value
0.40422604172722165 standard error
1.7149858514250884 standard deviation

4.0 
4.0
49.21052631578947 mean value
0.31137262016313766 standard error
1.3572417850765923 standard deviation

6.0 
6.0
54.36842105263158 mean value
0.3172480933337741 standard error
1.382852378872881 standard deviation

8.0 
8.0
66.89473684210526 mean value
0.24054928422459798 standard error
1.0485300208760655 standard deviation

10.0 
10.0
75.10526315789474 mean value
0.31432408538892836 standard error
1.3701069237311885 standard deviation

20

### 256 Gain, 700ms Integration

In [9]:
## Mean, SD and Error
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/256x_700ms.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
# Get unique test categories in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category
category_f8_raw_dict = {}
# Store unique x-values and their corresponding data points
x_values = []
y_values = []
std_devs = []
for category in categories:
    # Exclude the first data point from each category
    category_df = df[df['Test'] == category][1:]
    category_f8_raw_dict[category] = category_df['F8 (Raw)']

  # Calculate the mean and standard deviation for 'F8 (Raw)' data in each category
    category_f8_raw_mean = category_df['F8 (Raw)'].mean()
    category_f8_raw_std = category_df['F8 (Raw)'].std()
        # Calculate the sample size for each category
    category_sample_size = len(category_df['F8 (Raw)'])
    # Calculate the standard error for each category's mean
    category_standard_error = category_f8_raw_std / np.sqrt(category_sample_size)

    print(f"{category} ")
    print(f"{category}")
    print(category_f8_raw_mean, "mean value")
    print(category_standard_error, "standard error")
    print(category_f8_raw_std, "standard deviation")
    print()
        # Check 

    try:
        x_value = float(category)
        x_values.append(x_value)
        y_values.append(category_f8_raw_mean)
        std_devs.append(category_f8_raw_std)
    except ValueError:
        pass

0.0 
0.0
6.166666666666667 mean value
0.1457457720325344 standard error
0.6183469424008423 standard deviation

0.25 
0.25
11.055555555555555 mean value
0.17096844914416118 standard error
0.7253576985527025 standard deviation

0.50 
0.50
8.222222222222221 mean value
0.172553971012341 standard error
0.7320844981409595 standard deviation

0.75 
0.75
9.391304347826088 mean value
1.2865051018991644 standard error
6.169861722590655 standard deviation

2.0 
2.0
22.0 mean value
0.19802950859533489 standard error
0.8401680504168059 standard deviation

4.0 
4.0
27.333333333333332 mean value
0.16169041669088868 standard error
0.6859943405700354 standard deviation

6.0 
6.0
25.0 mean value
0.20232565955562798 standard error
0.8819171036881969 standard deviation

8.0 
8.0
43.111111111111114 mean value
0.15942890088431988 standard error
0.6763995415945232 standard deviation

10.0 
10.0
48.666666666666664 mean value
0.18687063686046268 standard error
0.8563488385776752 standard deviation

20.0 
20.0


## Sesnitivity Analysis

The 512x Gain has a sensitivity of 0.50 ug/L, while the 256x Gain has a sensitivity of 0.75ug/L. Meaning that as the gain increases the sensitivity of the sensor increases as well. 

## RMSE

### 512x Gain, 700ms Integration RMSE

In [4]:
# RMSE
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/512x_700ms_2.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
# Get unique test categories in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category
category_f8_raw_dict = {}
# Store unique x-values and their corresponding data points
x_values = []
y_values = []
std_devs = []

for category in categories:
    # Exclude the first data point from each category
    category_df = df[df['Test'] == category][1:]
    category_f8_raw_dict[category] = category_df['F8 (Raw)']

    # Calculate the mean and standard deviation for 'F8 (Raw)' data in each category
    category_f8_raw_mean = category_df['F8 (Raw)'].mean()
    category_f8_raw_std = category_df['F8 (Raw)'].std()

    # Calculate the sample size for each category
    category_sample_size = len(category_df['F8 (Raw)'])

    try:
        x_value = float(category)
        x_values.append(x_value)
        y_values.append(category_f8_raw_mean)
        std_devs.append(category_f8_raw_std)
    except ValueError:
        pass


# Calculate the line of best fit parameters (slope and intercept)
slope, intercept, r_value, p_value, std_err = linregress(x_values, y_values)

def predict_values(x, slope, intercept):
    return slope * x + intercept
# Calculate R-squared and RMSE
y_predicted = predict_values(np.array(x_values), slope, intercept)
rmse = np.sqrt(mean_squared_error(y_values, y_predicted))
print(rmse)

11.364193578479265


### 256x Gain, 700ms Integration RMSE

In [5]:
# RMSE
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/256x_700ms.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
# Get unique test categories in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category
category_f8_raw_dict = {}
# Store unique x-values and their corresponding data points
x_values = []
y_values = []
std_devs = []

for category in categories:
    # Exclude the first data point from each category
    category_df = df[df['Test'] == category][1:]
    category_f8_raw_dict[category] = category_df['F8 (Raw)']

    # Calculate the mean and standard deviation for 'F8 (Raw)' data in each category
    category_f8_raw_mean = category_df['F8 (Raw)'].mean()
    category_f8_raw_std = category_df['F8 (Raw)'].std()

    # Calculate the sample size for each category
    category_sample_size = len(category_df['F8 (Raw)'])

    try:
        x_value = float(category)
        x_values.append(x_value)
        y_values.append(category_f8_raw_mean)
        std_devs.append(category_f8_raw_std)
    except ValueError:
        pass


# Calculate the line of best fit parameters (slope and intercept)
slope, intercept, r_value, p_value, std_err = linregress(x_values, y_values)

def predict_values(x, slope, intercept):
    return slope * x + intercept
# Calculate R-squared and RMSE
y_predicted = predict_values(np.array(x_values), slope, intercept)
rmse = np.sqrt(mean_squared_error(y_values, y_predicted))
print(rmse)

6.922142169978342


## RMSE Analysis

The RMSE seems to increase as the Gain of the sensor increases, meaning as the Gain increases predicting the F8 values becomes more difficult. When looking at the F8 values in comparison to the line of best fit it appears that the F8 value for 40.0 ug/L is the main outliar compared to the line of best fit. This could be due to a possible error in the dilution value itself, since it is the only obvious outliar 

## R2 Values

### 512x Gain, 700ms Integration R2

In [6]:
# R_Squared
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/512x_700ms_2.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
# Get unique test categories in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category
category_f8_raw_dict = {}
# Store unique x-values and their corresponding data points
x_values = []
y_values = []
std_devs = []

for category in categories:
    # Exclude the first data point from each category
    category_df = df[df['Test'] == category][1:]
    category_f8_raw_dict[category] = category_df['F8 (Raw)']

    # Calculate the mean and standard deviation for 'F8 (Raw)' data in each category
    category_f8_raw_mean = category_df['F8 (Raw)'].mean()
    category_f8_raw_std = category_df['F8 (Raw)'].std()

    # Calculate the sample size for each category
    category_sample_size = len(category_df['F8 (Raw)'])

    try:
        x_value = float(category)
        x_values.append(x_value)
        y_values.append(category_f8_raw_mean)
        std_devs.append(category_f8_raw_std)
    except ValueError:
        pass


# Calculate the line of best fit parameters (slope and intercept)
slope, intercept, r_value, p_value, std_err = linregress(x_values, y_values)

def predict_values(x, slope, intercept):
    return slope * x + intercept
# Calculate R-squared and RMSE
y_predicted = predict_values(np.array(x_values), slope, intercept)
rmse = np.sqrt(mean_squared_error(y_values, y_predicted))
r_squared = r2_score(y_values, y_predicted)
print(r_squared)

0.9833749149962714


### 256x Gain, 700ms Integration R2

In [7]:
# R_Squared
# Replace with the actual file path
file_path = "/Users/jessiewynne/chla_fluorometer/AS7341 Dilutuions 9.14.23/256x_700ms.csv"
# Read the CSV file without skipping any rows
df = pd.read_csv(file_path, encoding='utf-8')

# Filter out rows where the 'Test' column is 'test'
df = df[df['Test'].str.lower() != 'test']

# Convert 'F8 (Raw)' column to numeric values
df['F8 (Raw)'] = pd.to_numeric(df['F8 (Raw)'], errors='coerce')
# Get unique test categories in the order of appearance
categories = df['Test'].unique()

# Create a dictionary to store the 'F8 (Raw)' values for each category
category_f8_raw_dict = {}
# Store unique x-values and their corresponding data points
x_values = []
y_values = []
std_devs = []

for category in categories:
    # Exclude the first data point from each category
    category_df = df[df['Test'] == category][1:]
    category_f8_raw_dict[category] = category_df['F8 (Raw)']

    # Calculate the mean and standard deviation for 'F8 (Raw)' data in each category
    category_f8_raw_mean = category_df['F8 (Raw)'].mean()
    category_f8_raw_std = category_df['F8 (Raw)'].std()

    # Calculate the sample size for each category
    category_sample_size = len(category_df['F8 (Raw)'])

    try:
        x_value = float(category)
        x_values.append(x_value)
        y_values.append(category_f8_raw_mean)
        std_devs.append(category_f8_raw_std)
    except ValueError:
        pass


# Calculate the line of best fit parameters (slope and intercept)
slope, intercept, r_value, p_value, std_err = linregress(x_values, y_values)

def predict_values(x, slope, intercept):
    return slope * x + intercept
# Calculate R-squared and RMSE
y_predicted = predict_values(np.array(x_values), slope, intercept)
rmse = np.sqrt(mean_squared_error(y_values, y_predicted))
r_squared = r2_score(y_values, y_predicted)
print(r_squared)

0.978990299142044


## R2 Analysis

Both R2 values indicate that the change in F8 values correspond to the changing chlorophyll concentrations. The R2 value is slightly higher for the 512x Gain. 

## Conclusions

Overall it appears that the increase in the integration time as well as the Gain value leads to a higher sensitivity of the sensor. This increase in the sensitivity of the sensor to 0.50ug/L reaches the goal established on 8.22.23 of achieving a sensitivity of 0.5 ug/L. The R2 values are similar to the dilution tests on 8.22.23, they are only slightly smaller. The RMSE values of these dilutions are quite high indicating that predicting the F8 values is difficult and that they at times do not follow the line of best fit. When looking at the F8 values in comparison to the line of best fit in both Gain values, there is one obvious outliar at 40.0 ug/L. Because there is a single outliar this could be due to a dilution error which could be contributing to the high RMSE. 

## Future Steps
1. Due to the increased sensitivity, integration of a brighter LED or lense is not needed at this time
2. Re-running the tests at 512x Gain with a 700ms integration time for 3-5 trials will be necessary to evaluate variability in the measurements 
3. Re-running these tests with new dilutions will be be necessary to evaluate if the 40.0ug/L F8 measurement is actually an outliar and see if a new dilution for this will decrease the RMSE