# Statistially significant difference analysis

The purpose of this test is to check if there is a significant difference between different two groups.<br>
- [TEST] D11 STD 230912 >>Mo/S2-REF-1 → 'Ref1'
- [TEST] D11 STD 230912 >>Mo/S2-GRP-1 → 'Bernoulli'   → Front side treated with Bernoulli between unloader and wiper 2 (with vacuum).
- [TEST] D11 STD 230912 >>Mo/S2-GRP-2 → 'Sili_noVac'  → Front side treated with Silicone SC between unloader and wiper 2 (no vacuum).
- [TEST] D11 STD 230912 >>Mo/S2-GRP-3 → 'Ber_stacked' → Washed and stacked in pallets with Bernoulli.
- [TEST] D11 STD 230912 >>Mo/S2-GRP-4 → 'Sili_Vac'    → Front side treated with Silicone SC between unloader and wiper 2 (with vacuum).
- [TEST] D11 STD 230912 >>Mo/S2-REF-2 → 'Ref2'

## Extract test data

In [1]:
import pandas as pd
import numpy as np

# package used to connect to sql database
from sqlalchemy import create_engine, text  
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from scipy import stats
rng = np.random.default_rng()

# Very useful function to set up transparency degrees
def rgb_to_rgba(rgb_value, alpha):
    return f"rgba{rgb_value[3:-1]}, {alpha})"

In [2]:
engine = create_engine("mysql://chang.liu:SQpuQYPzRsrG9Rem@192.168.60.223/midsummerdb")

In [3]:
sql_p = text('''
select
    Cell.id,
    Cell.machine_id,
    Cell.serial_number,
    Cell.created as cell_created,
    PyrometerHistory.created as Pyro_created,
    ROUND(AVG(CASE WHEN pyrometer_id = '66' THEN PyrometerHistory.temperature
			  ELSE NULL END),0)  as Pyro_S6S7,
    CellRecipeInfo.name as recipe_name,
	IVSummary.created as IVS_created,
    IVSummary.voltage_open_circuit,
    IVSummary.current_short_circuit,
    IVSummary.fill_factor,
    IVSummary.efficiency,
    IVSummary.resistance_shunt,
    IVSummary.resistance_serie
FROM
    midsummerdb.Cell
JOIN
    midsummerdb.PyrometerHistory ON midsummerdb.PyrometerHistory.Cell_id = midsummerdb.Cell.id
JOIN
    midsummerdb.IVSummary ON midsummerdb.Cell.id = midsummerdb.IVSummary.cell_id
JOIN
    midsummerdb.CellRecipeInfo ON midsummerdb.CellRecipeInfo.id = midsummerdb.Cell.CellRecipeInfo_id
WHERE
    midsummerdb.Cell.created BETWEEN '2023-09-12 16:00:00' AND '2023-09-12 21:00:51' 
AND 
    midsummerdb.Cell.machine_id = 38
group by Cell.id;''')
df_p = pd.read_sql(sql_p, engine)
# midsummerdb.Cell.created BETWEEN '2023-08-29 15:35:00' AND '2023-08-29 17:30:00'

In [4]:
df_p.rename(columns={"voltage_open_circuit": "Voc", "current_short_circuit": "Isc",
                        'fill_factor':'FF', 'efficiency':'Eff',
                        'resistance_shunt':'Rsh', 'resistance_serie':'Rs'}, inplace = True)
df_p2 = df_p.dropna(subset=["Voc", "Isc", 'FF', 'Eff', 'Rsh', 'Rs'])  # Remove NA rows if there is any

df_p3 = df_p2[df_p2['Eff'] > 0] # Remove wrong readings
df_p3 = df_p3[df_p3['Rs'] < 100] # Remove wrong readings

conditions1 = [df_p3['recipe_name'] == '[TEST] D11 STD 230829 >>Mo/S2',
               df_p3['recipe_name'] == '[TEST] D11 STD 230912 >>Mo/S2-REF-1', 
               df_p3['recipe_name'] == '[TEST] D11 STD 230912 >>Mo/S2-GRP-1',
               df_p3['recipe_name'] == '[TEST] D11 STD 230912 >>Mo/S2-GRP-2',
               df_p3['recipe_name'] == '[TEST] D11 STD 230912 >>Mo/S2-GRP-3',
               df_p3['recipe_name'] == '[TEST] D11 STD 230912 >>Mo/S2-GRP-4',
               df_p3['recipe_name'] == '[TEST] D11 STD 230912 >>Mo/S2-REF-2']
values1 = ['Ref1','Ref1', 'Bernoulli','Sili_noVac', 'Ber_stacked', 'Sili_Vac', 'Ref2']

df_p4 = df_p3.copy()

df_p4['category'] = np.select(conditions1, values1) 

df_p4.head(n=1) # display the dataframe

Unnamed: 0,id,machine_id,serial_number,cell_created,Pyro_created,Pyro_S6S7,recipe_name,IVS_created,Voc,Isc,FF,Eff,Rsh,Rs,category
0,2942663,38,102927363,2023-09-12 16:00:10,2023-09-12 16:00:47,446.0,[TEST] D11 STD 230829 >>Mo/S2,2023-09-12 19:25:12,0.671,7.472,75.5,15.848,2868.05,6.49242,Ref1


# Raw data visualisation

In [148]:
batch_names = ['Ref1', 'Bernoulli','Sili_noVac', 'Ber_stacked', 'Sili_Vac', 'Ref2']
batch_colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)', 'rgb(214, 39, 40)', 'rgb(148, 103, 189)', 
                'rgb(140, 86, 75)', 'rgb(227, 119, 194)', 'rgb(127, 127, 127)', 'rgb(188, 189, 34)']
fig_height = 800
fig_width = 1000

def eachRowPlot(date, rowName, n_row):
        # rowName
    fig.add_trace(go.Scatter(x = df_group[date], y = df_group[rowName], 
                                mode = "markers", marker_size = 6, name = batch_name,
                                marker_line_color = batch_color, showlegend = False,
                                marker_color = rgb_to_rgba(batch_color,0.5), 
                                marker_line_width=0.2), 
                                row=n_row, col=1)
    fig.add_trace(go.Box(y = df_group[rowName], name = batch_name,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=2) 
    fig.add_trace(go.Histogram(x = df_group[rowName], name = batch_name,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=3)

fig = make_subplots(rows=8, cols=3, column_widths=[0.2, 0.4, 0.4] , vertical_spacing=0.03, horizontal_spacing= 0.03
                    ) # Set layout for multiple plots

for i in [*range(len(batch_names))]:
    batch_name = batch_names[i]
    batch_color = batch_colors[i]
    df_group = df_p4[df_p4['category'] == batch_name] 
    
    # Pyrometer
    eachRowPlot(date = "cell_created", rowName = 'Pyro_S6S7', n_row = 1)
    
    # Eff
    eachRowPlot(date = "cell_created", rowName = 'Eff', n_row = 2)
    
    # Voc
    eachRowPlot(date = "cell_created", rowName = 'Voc', n_row = 3)
    
    # Isc
    eachRowPlot(date = "cell_created", rowName = 'Isc', n_row = 4)

    # FF
    eachRowPlot(date = "cell_created", rowName = 'FF', n_row = 5)

    # Rs
    eachRowPlot(date = "cell_created", rowName = 'Rs', n_row = 6)

    # Rsh
    eachRowPlot(date = "cell_created", rowName = 'Rsh', n_row = 7)

    fig.add_trace(go.Histogram(x = df_group['category'], 
                                   marker_color = rgb_to_rgba(batch_color,0.5), 
                                   name = batch_name, 
                                   texttemplate= '%{y}', textfont_size=15
                                   ), row=8, col=1)

  
fig.update_layout(height=fig_height, width=fig_width, 
                    margin = dict(l=2, r=2, t=15, b=2),
                    legend=dict(orientation="h", yanchor="bottom", y=1.01, xanchor="left", x=0.01),
                    yaxis = dict(title="PyroS67 [°C]"),
                    yaxis4 = dict(title="Eff [%]"),
                    yaxis7 = dict(title="Voc [V]"),
                    yaxis10 = dict(title="Isc [A]"),
                    yaxis13 = dict(title="FF [%]"),
                    yaxis16 = dict(title="Rs [mΩ]"),
                    yaxis19 = dict(title="Rsh [mΩ]"),
                    yaxis22 = dict(title="n samples")
                    # barmode='overlay', 
                    
                    )
fig.show()

# Theory
## Two-sample/Independent t test
The two-sample t-test (also known as the independent samples t-test) is a method used to test whether the unknown population means of two groups are equal or not. Anova is the more than two groups of sample version of the t test. It is also used to analyse results from A/B tests.
### Hypotheses
- Null hypotheses: the two underlying population means are the same
- Alternative hypotheses: the means of the two populations are not equal
### Assumptions
- Data values must be independent. Measurements for one observation do not affect measurements for any other observation.
- Data in each group must be obtained via a random sample from the population.
- Data in each group are normally distributed.
- Data values are continuous.
- The variances for the two independent groups are equal.
### Result
- T-statistic: A t-test reduces the entire data into a single value, called the t-statistic. This single value serves as a measure of evidence against the stated hypothesis. A t-statistic close to zero represents the lowest evidence against the hypothesis.  A larger t-statistic value represents strong evidence against the hypothesis.
- P-value:  A p-value is the percentage probability of the t-statistic to have occurred by chance. It is represented as a decimal, e.g., a p-value of 0.05 represents a 5% probability of seeing a t-statistic at least as extreme as the one calculated, assuming the null hypothesis was true. 

If p-value < 0.05, we can reject the Null hypothesis. Then it is concluded that there are significant difference among different groups.<br>
If P-value > 0.05, we can not reject the Null. Then, there is no enough evidence to support the significant difference.<br>

Tip: An easy understanding of this judgment is under the null assumption, the probability (p-value) of seeing these data is so low, which means that this situation will unlikely happen. Therefore, the null assumption is probably wrong.
### Assumptions not met?
If the assumptions for the independent t-test are not met, the calculated p-value may be incorrect. However, if the two samples are of equal size, the t-test is quite robust to a slight skewness of the data. The t-test is not robust if the variances differ significantly.

If the variables are not normally distributed, the Mann-Whitney U test can be used. The Mann-Whitney U Test is the non-parametric counterpart of the independent t-test.

## Mann-Whitney U test
The Mann-Whitney U test is a non-parametric test, it can be used to test whether there is a difference between two samples (groups), and the data need not be normally distributed.
### Hypotheses
- Null hypotheses: median1 = median2
- Alternative hypotheses: median1 ≠ median2 or median1 < median2 or median1 > median2
### Assumptions
The Mann-Whitney U test is thus the non-parametric counterpart to the t-test for independent samples; it is subject to less stringent assumptions than the t-test. Therefore, the Mann-Whitney U test is always used when the requirement of normal distribution for the t-test is not met.
### Results
- U statistics
- p value

## Skewness
Pearson's second coefficient = 3*(mean - median)/standard deviation
- the skewness is between -0.5 & 0.5, the data are nearly symmetrical.
- the skewness is between -1 & -0.5 (negative skewed) or between 0.5 & 1(positive skewed), the data are slightly skewed.
- the skewness is lower than -1 (negative skewed) or greater than 1 (positive skewed), the data are extremely skewed.

# Report explanation

If we only conduct a single statistical test (randomly select n_sample samples from the Ref and Bernoulli group respectively), we will encounter one of these p-values which may result in type I or type II errors no matter which statistical test (t, anova, non-parametric rank) was used. Therefore, a bootstrapping procedure has been used. The procedure is described as follows:
- 1. Randomly take n_sample samples from both the `Reference` group and the `testing` group with replacement.
- 2. Conduct Mann-Whitney U test or t/anova test to check if there is a significant difference between the two groups of samples.
- 3. Obtain the p-value of the statistical test.
- 4. Repeat step 1 and step 3 for n_test=1000 times.
- 5. Draw p-value distribution diagrams based on the test category and sample group category. <br>

After this procedure, we can compare the p-value distributions between (`Reference` Vs `Reference`) and (`Reference` Vs `testing`). If the two look similar in the histogram shape, this can serve as strong evidence saying that the two groups have no significant difference, and vice versa.

Two statistical tests have been used in the following report, one is Mann-whitney U test, the other one is t test (which gives a pretty similar result as anova).
In theory, the Mann-Whitney U test is more reliable than the t-test since our cell data are (left) skewed.

# Ref1 Vs Bernoulli (Ref Vs Group1)

In [5]:
df_RB = df_p4[(df_p4['cell_created'] < '2023-09-12 17:46:34')]
batch_names = ['Ref1', 'Bernoulli']
batch_colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)', 'rgb(214, 39, 40)', 'rgb(148, 103, 189)', 
                'rgb(140, 86, 75)', 'rgb(227, 119, 194)', 'rgb(127, 127, 127)', 'rgb(188, 189, 34)']
fig_height = 800
fig_width = 780

def eachRowPlot(date, rowName, n_row):
        # rowName
    fig.add_trace(go.Scatter(x = df_group[date], y = df_group[rowName], 
                                mode = "markers", marker_size = 6, name = batch_name,
                                marker_line_color = batch_color, showlegend = False,
                                marker_color = rgb_to_rgba(batch_color,0.5), 
                                marker_line_width=0.2), 
                                row=n_row, col=1)
    fig.add_trace(go.Box(y = df_group[rowName], name = batch_name,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=2) 
    fig.add_trace(go.Histogram(x = df_group[rowName], name = batch_name, histnorm='probability', nbinsx=20,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=3)

fig = make_subplots(rows=8, cols=3#, shared_xaxes=True #,vertical_spacing=0.01, horizontal_spacing= 0.01
                    ) # Set layout for multiple plots

for i in [*range(len(batch_names))]:
    batch_name = batch_names[i]
    batch_color = batch_colors[i]
    df_group = df_RB[df_RB['category'] == batch_name] 
    
    # Pyrometer
    eachRowPlot(date = "cell_created", rowName = 'Pyro_S6S7', n_row = 1)
    
    # Eff
    eachRowPlot(date = "cell_created", rowName = 'Eff', n_row = 2)
    
    # Voc
    eachRowPlot(date = "cell_created", rowName = 'Voc', n_row = 3)
    
    # Isc
    eachRowPlot(date = "cell_created", rowName = 'Isc', n_row = 4)

    # FF
    eachRowPlot(date = "cell_created", rowName = 'FF', n_row = 5)

    # Rs
    eachRowPlot(date = "cell_created", rowName = 'Rs', n_row = 6)

    # Rsh
    eachRowPlot(date = "cell_created", rowName = 'Rsh', n_row = 7)

    fig.add_trace(go.Histogram(x = df_group['category'], 
                                   marker_color = rgb_to_rgba(batch_color,0.5), 
                                   name = batch_name, 
                                   texttemplate= '%{y}', textfont_size=15
                                   ), row=8, col=1)

  
fig.update_layout(height=fig_height, width=fig_width, 
                    margin = dict(l=2, r=2, t=15, b=2),
                    legend=dict(orientation="h", yanchor="bottom", y=1.01, xanchor="left", x=0.01),
                    yaxis = dict(title="PyroS67 [°C]"),
                    yaxis4 = dict(title="Eff [%]"),
                    yaxis7 = dict(title="Voc [V]"),
                    yaxis10 = dict(title="Isc [A]"),
                    yaxis13 = dict(title="FF [%]"),
                    yaxis16 = dict(title="Rs [mΩ]"),
                    yaxis19 = dict(title="Rsh [mΩ]"),
                    yaxis22 = dict(title="n samples"),
                    barmode='overlay', bargap = 0.05
                    )
fig.show()

## Power Analysis

In [7]:
import statsmodels.api as sm
from statsmodels.stats.power import NormalIndPower

# Define parameters
effect_size = 0.34467120181405897  # Choose an appropriate effect size
alpha = 0.05  # Significance level (usually 0.05 for 95% confidence)
power = 0.80  # Desired power level

# Create a NormalIndPower object
power_analysis = NormalIndPower()

# Calculate the sample size for each group
sample_size = power_analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power, ratio=1, alternative='larger')
power_score = power_analysis.solve_power(effect_size=effect_size, alpha=alpha, nobs1 = 120, ratio=1, alternative='larger')
# Print the sample size for each group
print("Minimum sample size per group:", sample_size)
print("Statistical power", power_score)

Minimum sample size per group: 104.08499866666558
Statistical power 0.8473085024221704


In [None]:
x = [1,2,3,4,5,6,7,8,9]
y = [11,12,13,14,15,16,17,18,19]

mean_x = np.mean(x)
mean_y = np.mean(y)
std_x = np.std(x, ddof= 1)
std_y = np.std(y, ddof= 1)

n_x = len(x)
n_y = len(y)

t_num = mean_x - mean_y
t_deno_equal = np.sqrt( ((n_x - 1)* std_x**2 + (n_y - 1) * std_y**2) / (n_x + n_y -2) ) * np.sqrt(1/n_x + 1/n_y)
t_stat_equal = t_num / t_deno_equal

t_deno_unequal = np.sqrt((std_x**2/n_x)+(std_y**2/n_y))
t_stat_unequal = t_num / t_deno_unequal







## Define functions

In [161]:
# Firstly define the function to generate p-values from n_test statistical repeated tests.
# n_sample: should be less than the minimum number of samples of the two cell groups in comparison.
# n_test: you can choose how many repeated tests you want to run.
# cate_interest: is one of 'Eff', 'Voc', 'Isc', 'FF', 'Rs', 'Rsh'.
# Halternative: choose among 'two-sided', 'greater', 'less'.
def calculate_p_values(n_sample, n_test, cate_interest, Halternative):
    # Conduct the statistical test for 1000 times by resampling, this is what the p-value distribution should 
    # look like if two groups of data are all from the REF.
    data_test = df_RB[['category', cate_interest]]

    p_values_mannwhitneyu_RR = []
    p_values_mannwhitneyu_RB = []
    p_values_ttest_RR = []
    p_values_ttest_RB = []


    for _ in range(n_test):
        # Take the method of bootstrapping without replacement. The test group samples are the same, 
        # and the same number of the ref group samples are randomly selected due to larger availability.
        resampled_data1 = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample, replace=True)) # , replace=True
        resampled_data2 = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample , replace=True)) # , replace=True

        # Determine the data used for different tests.
        data_RB = resampled_data1[resampled_data1['category'] == 'Ref1'][cate_interest] # data used for both Ref Vs Ref and Ref Vs Bernouli
        data_R = resampled_data2[resampled_data2['category'] == 'Ref1'][cate_interest] # data only used for Ref Vs Ref
        data_B = resampled_data2[resampled_data2['category'] == 'Bernoulli'][cate_interest] # data only used for Ref Vs Bernouli

        # Fit the non-normal distribution to normal distribution
        # data_RB_t, lambda_RB_t = stats.boxcox(data_RB)
        # data_R_t , lambda_R_t = stats.boxcox(data_R)
        # data_B_t , lambda_B_t = stats.boxcox(data_B)
        
        # t test is used after the solar cell data are transformed by Box-Cox.
        tRR, p_value_ttest_RR_ = stats.ttest_ind(data_RB, data_R)
        tRB, p_value_ttest_RB_ = stats.ttest_ind(data_RB, data_B, equal_var = False)

        # Calculate Effect Size for the independent t test
        mean_

        # The mann-whitney u test is more suitable for skewed data like our solar cell data. 
        uRR, p_value_mannwhitneyu_RR = stats.mannwhitneyu(data_RB, data_R, alternative = Halternative)
        uRB, p_value_mannwhitneyu_RB = stats.mannwhitneyu(data_RB, data_B, alternative = Halternative)

        # Calculate Effect Size for MannWhitney U test
        d_uRR = 2*uRR/(n_sample*n_sample) - 1
        d_uRB = 2*uRB/(n_sample*n_sample) - 1
        ds_uRR = ds_uRR.append()
        ds_uRB = ds_uRB.append()



        # Append n_test (1000) test results to a list for later visualisation.
        p_values_mannwhitneyu_RR.append(p_value_mannwhitneyu_RR)
        p_values_mannwhitneyu_RB.append(p_value_mannwhitneyu_RB)

        # Change two-tailed p-value to one-tail value by dividing by 2.

        if Halternative == 'greater':
             p_value_ttest_RR = p_value_ttest_RR_ / 2 if tRR >= 0 else 1 - p_value_ttest_RR_ / 2
             p_value_ttest_RB = p_value_ttest_RB_ / 2 if tRB >= 0 else 1 - p_value_ttest_RB_ / 2
        elif Halternative == 'less':
             p_value_ttest_RR = p_value_ttest_RR_ / 2 if tRR <= 0 else 1 - p_value_ttest_RR_ / 2
             p_value_ttest_RB = p_value_ttest_RB_ / 2 if tRB <= 0 else 1 - p_value_ttest_RB_ / 2
        else:
             p_value_ttest_RR = p_value_ttest_RR_
             p_value_ttest_RB = p_value_ttest_RB_
            
        p_values_ttest_RR.append(p_value_ttest_RR)
        p_values_ttest_RB.append(p_value_ttest_RB)

    data1 = {'p_value': p_values_mannwhitneyu_RR + p_values_mannwhitneyu_RB,
            'test': (['MannWhitneyU'] * n_test * 2),
            'category': (['Ref1 Vs Ref1'] * n_test) + (['Ref1 Vs Bernoulli'] * n_test)}

    data2 = {'p_value': p_values_ttest_RR + p_values_ttest_RB,
            'test': (['Ttest'] * n_test * 2),
            'category': (['Ref1 Vs Ref1'] * n_test) + (['Ref1 Vs Bernoulli'] * n_test)}

    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2)

    # Concatenate the DataFrames vertically
    combined_df = pd.concat([df1, df2], ignore_index=True)
    combined_df['type'] = combined_df['test'] + '/' + combined_df['category']
    
    # plot the results in a facet graph
    fig = make_subplots(rows=2, cols=2,vertical_spacing=0.03, horizontal_spacing=.02, 
                    shared_xaxes= True, shared_yaxes= True) 
    batch_colors = ['rgb(31, 119, 180)', 'rgba(31, 119, 180, 0.6)', 'rgb(44, 160, 44)', 'rgba(44, 160, 44, 0.6)']
 
    fig = px.histogram(combined_df, x="p_value", color= 'type',
                    histnorm='probability', color_discrete_sequence = batch_colors,
                    facet_row="test", facet_col="category")    
    fig.add_vline(x=0.05,  line_width=1, line_dash="dash", line_color='rgb(214, 39, 40)')
    fig.update_traces(xbins=dict(start=0, size = 0.05))
    fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    fig.add_annotation(text= '0.05', x=0.05, y=-0.07,  font_color= 'rgb(214, 39, 40)',
                    xref="x", yref="paper", font_size=12, showarrow=False)
    fig.add_annotation(text= '0.05', x=0.05, y=-0.07,  font_color= 'rgb(214, 39, 40)',
                    xref="x2", yref="paper", font_size=12, showarrow=False)
    fig.update_layout(title_text=f"<b>Distribution of p-values for " + cate_interest + "</b> (n_sample = " + str(n_sample) + ", n_test = " + str(n_test) + ")", 
                            height=500, width=780, template='plotly',
                            bargap=0.1)
    fig.show()

## Eff

In [162]:
calculate_p_values(n_sample=120, n_test=1000, cate_interest='Eff', Halternative = 'greater')

**Interpretation**: Controversy results are given from the Mann-Whitney U test and the T-test. However, if we compare the (`Ref1` Vs `Bernoulli`) to (`Ref1` Vs `Ref1`), there are clear differences in distribution shapes no matter in Mann-Whitney U or T-tests. Therefore, it is suggested we reject the null hypotheses that there is no significant difference between the two groups, and accept the alternative that the `Ref1` is greater than the `Bernoulli` group in populations.

## Voc

In [164]:
calculate_p_values(n_sample=120, n_test=1000, cate_interest='Voc', Halternative = 'greater')

**Interpretation**: Controversy results are given from the Mann-Whitney U test and the T-test. However, if we compare the (`Ref1` Vs `Bernoulli`) to (`Ref1` Vs `Ref1`), there are clear differences in distribution shapes no matter in Mann-Whitney U or T-tests. Therefore, it is suggested we reject the null hypotheses that there is no significant difference between the two groups, and accept the alternative that the `Ref1` is greater than the `Bernoulli` group in populations.

## Isc

In [165]:
calculate_p_values(n_sample=120, n_test=1000, cate_interest='Isc', Halternative = 'greater')

**Interpretation**: Both the Mann-Whitney U test and the T-test imply a significant difference between the two groups. In addition, if we compare the (`Ref1` Vs `Bernoulli`) to (`Ref1` Vs `Ref1`), there are as well clear differences in distribution shapes no matter in Mann-Whitney U or T-tests. Therefore, we reject the null hypotheses that there is no significant difference between the two groups, and accept the alternative that the `Ref1` is greater than the `Bernoulli` group in populations.

## FF

In [166]:
calculate_p_values(n_sample=120, n_test=1000, cate_interest='FF', Halternative = 'two-sided')

Interpretation: Both the Mann-Whitney U test and the T-test fail to reject the null hypothesis that `Ref1` and `Bernoulli` groups have no significant difference. Furthermore, if we compare the (`Ref1` Vs `Bernoulli`) to (`Ref1` Vs `Ref1`), there are no clear differences in distribution shapes no matter in Mann-Whitney U or T-tests. Therefore, it is suggested that there is no significant difference between the two groups in populations. 'two-sided' means that this is a two-tailed test, the alternative hypotheses is that the `Ref1` distribution is different from the `Bernoulli` distribution.

## Rs

In [167]:
calculate_p_values(n_sample=120, n_test=1000, cate_interest='Rs', Halternative = 'two-sided')

**Interpretation**: Both the Mann-Whitney U test and the T-test fail to reject the null hypothesis that `Ref1` and `Bernoulli` groups have no siginificant difference. Furthermore, if we compare the (`Ref1` Vs `Bernoulli`) to (`Ref1` Vs `Ref1`), there are  no clear differences in distribution shapes no matter in Mann-Whitney U or T-tests. Therefore, it is suggested that there is no significant difference between the two groups.<br>
'two-sided' means two-tailed test. The alternative hypothesis is : the distribution underlying sample `Ref1` is not the same as the distribution underlying sample `Bernoulli`.

## Rsh

In [168]:
calculate_p_values(n_sample=120, n_test=1000, cate_interest='Rsh', Halternative = 'two-sided')

**Interpretation**: Both the Mann-Whitney U test and the T-test fail to reject the null hypothesis that `Ref1` and `Bernoulli` groups have no siginificant difference. Furthermore, if we compare the (`Ref1` Vs `Bernoulli`) to (`Ref1` Vs `Ref1`), the distribution shapes are quite similar no matter in Mann-Whitney U test or T test. Therefore, it is suggested that there is no significant difference between the two groups.<br>
'two-sided' means the statistical test is two-tailed. The alternative hypothesis is : the distribution underlying sample `Ref1` is not the same as the distribution underlying sample `Bernoulli`.

## How the mann-whitney U test was calculated?

In [67]:
n_sample = 42
cate_interest = 'Rsh'
data_test = df_RB[['category', cate_interest]]

# Take equal samples from both groups
data_test = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample, random_state=4)) # , replace=True if bootstrapping wieh replacement

# Calculate ranks for both groups
data_xy = data_test.copy()
data_xy.loc[:, 'Rank'] = data_xy[cate_interest].rank()

# Calculate the rank sums for both groups
cate_names = data_xy['category'].unique()
[rankSumX, rankSumY] = data_xy.groupby(by=['category'])['Rank'].sum()

# Calculate the U-values
[nX, nY] = data_xy.groupby(by=['category'])['Rank'].count()
Ux = nX*nY + nX*(nX+1)/2 - rankSumX
Uy = nX*nY + nY*(nY+1)/2 - rankSumY

# U-Wert
U_Wert = min(Ux, Uy)
# Expected value of U
U_ex = nX*nY/2
# Standard error of U
# U_sigma = np.sqrt(nX*nY*(nX+nY+1)/12)

# Standard error of U with tie correction
_, ties = np.unique(data_xy['Rank'], return_counts=True, axis=-1)
tieSum = (ties ** 3 - ties).sum()
U_sigmaTie = np.sqrt( (nX*nY/12) * ( (nX+nY+1) - tieSum/((nX+nY)*(nX+nY-1)) ) )

# z-value
#z = (U_Wert - U_ex)/U_sigma
zTie1 = (Ux - U_ex - 0.5)/U_sigmaTie
p1 = stats.norm.sf(abs(zTie1))
zTie2 = (Uy - U_ex - 0.5)/U_sigmaTie
p2 = stats.norm.sf(zTie2)

# print(f"Ux:{Ux} \nUy:{Uy} \nzTie1:{zTie1}\nzTie2:{zTie2}\np1:{p1}\np2:{p2}")


**Interpretation**: All tests support the conclusion that there is no significant difference betweent the two groups.

# Ref2 Vs Bernouli when stacked cells (Ref2 Vs Group 3)

In [175]:
df_RBerStacked = df_p4[(df_p4['cell_created'] > '2023-09-12 18:31:03')]
batch_names = ['Ber_stacked', 'Ref2']
batch_colors = [# 'rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)', 
                'rgb(214, 39, 40)', #'rgb(148, 103, 189)', 
                'rgb(140, 86, 75)', 'rgb(227, 119, 194)', 'rgb(127, 127, 127)', 'rgb(188, 189, 34)']
fig_height = 800
fig_width = 780

def eachRowPlot(date, rowName, n_row):
        # rowName
    fig.add_trace(go.Scatter(x = df_group[date], y = df_group[rowName], 
                                mode = "markers", marker_size = 6, name = batch_name,
                                marker_line_color = batch_color, showlegend = False,
                                marker_color = rgb_to_rgba(batch_color,0.5), 
                                marker_line_width=0.2), 
                                row=n_row, col=1)
    fig.add_trace(go.Box(y = df_group[rowName], name = batch_name,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=2) 
    fig.add_trace(go.Histogram(x = df_group[rowName], name = batch_name, histnorm='probability', nbinsx=20,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=3)

fig = make_subplots(rows=8, cols=3#, shared_xaxes=True #,vertical_spacing=0.01, horizontal_spacing= 0.01
                    ) # Set layout for multiple plots

for i in [*range(len(batch_names))]:
    batch_name = batch_names[i]
    batch_color = batch_colors[i]
    df_group = df_RBerStacked[df_RBerStacked['category'] == batch_name] 
    
    # Pyrometer
    eachRowPlot(date = "cell_created", rowName = 'Pyro_S6S7', n_row = 1)
    
    # Eff
    eachRowPlot(date = "cell_created", rowName = 'Eff', n_row = 2)
    
    # Voc
    eachRowPlot(date = "cell_created", rowName = 'Voc', n_row = 3)
    
    # Isc
    eachRowPlot(date = "cell_created", rowName = 'Isc', n_row = 4)

    # FF
    eachRowPlot(date = "cell_created", rowName = 'FF', n_row = 5)

    # Rs
    eachRowPlot(date = "cell_created", rowName = 'Rs', n_row = 6)

    # Rsh
    eachRowPlot(date = "cell_created", rowName = 'Rsh', n_row = 7)

    fig.add_trace(go.Histogram(x = df_group['category'], 
                                   marker_color = rgb_to_rgba(batch_color,0.5), 
                                   name = batch_name, 
                                   texttemplate= '%{y}', textfont_size=15
                                   ), row=8, col=1)

  
fig.update_layout(height=fig_height, width=fig_width, 
                    margin = dict(l=2, r=2, t=15, b=2),
                    legend=dict(orientation="h", yanchor="bottom", y=1.01, xanchor="left", x=0.01),
                    yaxis = dict(title="PyroS67 [°C]"),
                    yaxis4 = dict(title="Eff [%]"),
                    yaxis7 = dict(title="Voc [V]"),
                    yaxis10 = dict(title="Isc [A]"),
                    yaxis13 = dict(title="FF [%]"),
                    yaxis16 = dict(title="Rs [mΩ]"),
                    yaxis19 = dict(title="Rsh [mΩ]"),
                    yaxis22 = dict(title="n samples"),
                    # barmode='overlay', 
                    bargap = 0.05
                    )
fig.show()

## Define functions

In [176]:
# Firstly define the function to generate p-values from n_test statistical repeated tests.
def calculate_p_values_RBer_stacked(n_sample, n_test, cate_interest, Halternative):
    # Conduct the statistical test for 1000 times by resampling, this is what the p-value distribution should 
    # look like if two groups of data are all from the REF.
    data_test = df_RBerStacked[['category', cate_interest]]

    p_values_mannwhitneyu_RR = []
    p_values_mannwhitneyu_RS = []
    p_values_ttest_RR = []
    p_values_ttest_RS = []


    for _ in range(n_test):
        # Take the method of bootstrapping without replacement. The test group samples are the same, 
        # and the same number of the ref group samples are randomly selected due to larger availability.
        resampled_data1 = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample, replace=True)) # , replace=True
        resampled_data2 = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample, replace=True)) # , replace=True

        # Determine the data used for different tests.
        data_RS = resampled_data1[resampled_data1['category'] == 'Ref2'][cate_interest] # data used for both Ref Vs Ref and Ref Vs Bernouli
        data_R = resampled_data2[resampled_data2['category'] == 'Ref2'][cate_interest] # data only used for Ref Vs Ref
        data_S = resampled_data2[resampled_data2['category'] == 'Ber_stacked'][cate_interest] # data only used for Ref Vs Bernouli
        
        # t test is used after the solar cell data are transformed by Box-Cox.
        tRR, p_value_ttest_RR_ = stats.ttest_ind(data_RS, data_R)
        tRS, p_value_ttest_RS_ = stats.ttest_ind(data_RS, data_S, equal_var = False)

        # The mann-whitney u test is more suitable for skewed data like our solar cell data. 
        _, p_value_mannwhitneyu_RR = stats.mannwhitneyu(data_RS, data_R, alternative = Halternative)
        _, p_value_mannwhitneyu_RS = stats.mannwhitneyu(data_RS, data_S, alternative = Halternative)

        # Append 1000 test results to a list for later visualisation.
        p_values_mannwhitneyu_RR.append(p_value_mannwhitneyu_RR)
        p_values_mannwhitneyu_RS.append(p_value_mannwhitneyu_RS)

        # Change two-tailed p-value to one-tail value by dividing by 2.

        if Halternative == 'greater':
             p_value_ttest_RR = p_value_ttest_RR_ / 2 if tRR >= 0 else 1 - p_value_ttest_RR_ / 2
             p_value_ttest_RS = p_value_ttest_RS_ / 2 if tRS >= 0 else 1 - p_value_ttest_RS_ / 2
        elif Halternative == 'less':
             p_value_ttest_RR = p_value_ttest_RR_ / 2 if tRR <= 0 else 1 - p_value_ttest_RR_ / 2
             p_value_ttest_RS = p_value_ttest_RS_ / 2 if tRS <= 0 else 1 - p_value_ttest_RS_ / 2
        else:
             p_value_ttest_RR = p_value_ttest_RR_
             p_value_ttest_RS = p_value_ttest_RS_
            
        p_values_ttest_RR.append(p_value_ttest_RR)
        p_values_ttest_RS.append(p_value_ttest_RS)

    # Combine results to a single dataframe
    data1 = {'p_value': p_values_mannwhitneyu_RR + p_values_mannwhitneyu_RS,
            'test': (['MannWhitneyU'] * n_test * 2),
            'category': (['Ref2 Vs Ref2'] * n_test) + (['Ref2 Vs Ber_stacked'] * n_test)}

    data2 = {'p_value': p_values_ttest_RR + p_values_ttest_RS,
            'test': (['Ttest'] * n_test * 2),
            'category': (['Ref2 Vs Ref2'] * n_test) + (['Ref2 Vs Ber_stacked'] * n_test)}

    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2)

    # Concatenate the DataFrames vertically
    combined_df = pd.concat([df1, df2], ignore_index=True)
    combined_df['type'] = combined_df['test'] + '/' + combined_df['category']
    
    # plot the results in a facet graph
    fig = make_subplots(rows=2, cols=2,vertical_spacing=0.03, horizontal_spacing=.02, 
                    shared_xaxes= True, shared_yaxes= True) 
    batch_colors = ['rgb(31, 119, 180)', 'rgba(31, 119, 180, 0.6)', 'rgb(44, 160, 44)', 'rgba(44, 160, 44, 0.6)']
 
    fig = px.histogram(combined_df, x="p_value", color= 'type',
                    histnorm='probability', color_discrete_sequence = batch_colors,
                    facet_row="test", facet_col="category")    
    fig.add_vline(x=0.05,  line_width=1, line_dash="dash", line_color='rgb(214, 39, 40)')
    fig.update_traces(xbins=dict(start=0,size=0.05))
    fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    fig.add_annotation(text= '0.05', x=0.05, y=-0.07,  font_color= 'rgb(214, 39, 40)',
                    xref="x", yref="paper", font_size=12, showarrow=False)
    fig.add_annotation(text= '0.05', x=0.05, y=-0.07,  font_color= 'rgb(214, 39, 40)',
                    xref="x2", yref="paper", font_size=12, showarrow=False)
    fig.update_layout(title_text=f"<b>Distribution of p-values for " + cate_interest + "</b> (n_sample = " + str(n_sample) + ", n_test = " + str(n_test) + ")", 
                            height=500, width=780, template='plotly',
                            bargap=0.1)
    fig.show()

## Eff

In [114]:
calculate_p_values_RBer_stacked(n_sample=120, n_test=1000, cate_interest='Eff', Halternative='greater')

**Interpretation**: Most of tests give p-values less than 0.05, which suggests us reject the Null hypotheses that the distributions underlying two populations are the same. Thus, the two groups are significantly different from each other.<br>
'greater' means the alternative hypotheses is that the `Ref2` groups is larger than the `Bernoulli with stacked cells in a pallet` group.

## Voc

In [117]:
calculate_p_values_RBer_stacked(n_sample=120, n_test=1000, cate_interest='Voc', Halternative='greater')

**Interpretation**: Most of tests give p-values less than 0.05, which suggests us reject the Null hypotheses and accept the Alternative hypotheses.<br>
'greater' means the alternative hypotheses is that the `Ref2` groups is larger than the `Bernoulli with stacked cells in a pallet` group.

## Isc

In [118]:
calculate_p_values_RBer_stacked(n_sample=120, n_test=1000, cate_interest='Isc', Halternative='greater')

**Interpretation**: Most of tests give p-values less than 0.05, which suggests us reject the Null hypotheses and accept the Alternative hypotheses.<br>
'greater' means the alternative hypotheses is that the `Ref2` groups is larger than the `Bernoulli with stacked cells in a pallet` group.

## FF

In [119]:
calculate_p_values_RBer_stacked(n_sample=120, n_test=1000, cate_interest='FF', Halternative='greater')

**Interpretation**: Most of tests give p-values less than 0.05, which suggests us reject the Null hypotheses and accept the Alternative hypotheses.<br>
'greater' means the alternative hypotheses is that the `Ref2` groups is larger than the `Bernoulli with stacked cells in a pallet` group.

## Rs

In [121]:
calculate_p_values_RBer_stacked(n_sample=120, n_test=1000, cate_interest='Rs', Halternative='less')

**Interpretation**: Most of tests give p-values less than 0.05, which suggests us reject the Null hypotheses and accept the Alternative hypotheses.<br>
'less' means the alternative hypotheses is that the values in the `Ref2` groups is less than that in the `Bernoulli with stacked cells in a pallet` group.

## Rsh

In [123]:
calculate_p_values_RBer_stacked(n_sample=120, n_test=1000, cate_interest='Rsh', Halternative='greater')

**Interpretation**: Most of tests give p-values less than 0.05, which suggests us reject the Null hypotheses and accept the Alternative hypotheses.<br>
'greater' means the alternative hypotheses is that the values in the `Ref2` group is larger than that in the `Bernoulli with stacked cells in a pallet` group.

# Ref2 Vs Silicone using vacuum (Ref2 Vs Group 4)

In [145]:
df_RSili_Vac = df_p4[(df_p4['cell_created'] > '2023-09-12 18:31:03')]
batch_names = ['Sili_Vac', 'Ref2']
batch_colors = [# 'rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)', 'rgb(214, 39, 40)', #
                'rgb(148, 103, 189)', 
                'rgb(140, 86, 75)', 'rgb(227, 119, 194)', 'rgb(127, 127, 127)', 'rgb(188, 189, 34)']
fig_height = 800
fig_width = 780

def eachRowPlot(date, rowName, n_row):
        # rowName
    fig.add_trace(go.Scatter(x = df_group[date], y = df_group[rowName], 
                                mode = "markers", marker_size = 6, name = batch_name,
                                marker_line_color = batch_color, showlegend = False,
                                marker_color = rgb_to_rgba(batch_color,0.5), 
                                marker_line_width=0.2), 
                                row=n_row, col=1)
    fig.add_trace(go.Box(y = df_group[rowName], name = batch_name,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=2) 
    fig.add_trace(go.Histogram(x = df_group[rowName], name = batch_name, histnorm='probability', nbinsx=20,
                            marker_color = rgb_to_rgba(batch_color,0.5), showlegend = False), 
                            row=n_row, col=3)

fig = make_subplots(rows=8, cols=3#, shared_xaxes=True #,vertical_spacing=0.01, horizontal_spacing= 0.01
                    ) # Set layout for multiple plots

for i in [*range(len(batch_names))]:
    batch_name = batch_names[i]
    batch_color = batch_colors[i]
    df_group = df_RSili_Vac[df_RSili_Vac['category'] == batch_name] 
    
    # Pyrometer
    eachRowPlot(date = "cell_created", rowName = 'Pyro_S6S7', n_row = 1)
    
    # Eff
    eachRowPlot(date = "cell_created", rowName = 'Eff', n_row = 2)
    
    # Voc
    eachRowPlot(date = "cell_created", rowName = 'Voc', n_row = 3)
    
    # Isc
    eachRowPlot(date = "cell_created", rowName = 'Isc', n_row = 4)

    # FF
    eachRowPlot(date = "cell_created", rowName = 'FF', n_row = 5)

    # Rs
    eachRowPlot(date = "cell_created", rowName = 'Rs', n_row = 6)

    # Rsh
    eachRowPlot(date = "cell_created", rowName = 'Rsh', n_row = 7)

    fig.add_trace(go.Histogram(x = df_group['category'], 
                                   marker_color = rgb_to_rgba(batch_color,0.5), 
                                   name = batch_name, 
                                   texttemplate= '%{y}', textfont_size=15
                                   ), row=8, col=1)

  
fig.update_layout(height=fig_height, width=fig_width, 
                    margin = dict(l=2, r=2, t=15, b=2),
                    legend=dict(orientation="h", yanchor="bottom", y=1.01, xanchor="left", x=0.01),
                    yaxis = dict(title="PyroS67 [°C]"),
                    yaxis4 = dict(title="Eff [%]"),
                    yaxis7 = dict(title="Voc [V]"),
                    yaxis10 = dict(title="Isc [A]"),
                    yaxis13 = dict(title="FF [%]"),
                    yaxis16 = dict(title="Rs [mΩ]"),
                    yaxis19 = dict(title="Rsh [mΩ]"),
                    yaxis22 = dict(title="n samples"),
                    barmode='overlay', 
                    bargap = 0.05
                    )
fig.show()

## Define functions

In [141]:
# Firstly define the function to generate p-values from n_test statistical repeated tests.
def calculate_p_values_RSili_Vac(n_sample, n_test, cate_interest, Halternative):
    # Conduct the statistical test for 1000 times by resampling, this is what the p-value distribution should 
    # look like if two groups of data are all from the REF.
    data_test = df_RSili_Vac[['category', cate_interest]]

    p_values_mannwhitneyu_RR = []
    p_values_mannwhitneyu_RS = []
    p_values_ttest_RR = []
    p_values_ttest_RS = []


    for _ in range(n_test):
        # Take the method of bootstrapping without replacement. The test group samples are the same, 
        # and the same number of the ref group samples are randomly selected due to larger availability.
        resampled_data1 = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample, replace=True)) # , replace=True
        resampled_data2 = data_test.groupby('category', group_keys=False).apply(lambda x: x.sample(n=n_sample, replace=True)) # , replace=True

        # Determine the data used for different tests.
        data_RS = resampled_data1[resampled_data1['category'] == 'Ref2'][cate_interest] # data used for both Ref Vs Ref and Ref Vs Bernouli
        data_R = resampled_data2[resampled_data2['category'] == 'Ref2'][cate_interest] # data only used for Ref Vs Ref
        data_S = resampled_data2[resampled_data2['category'] == 'Sili_Vac'][cate_interest] # data only used for Ref Vs Bernouli
        
        # t test is used after the solar cell data are transformed by Box-Cox.
        tRR, p_value_ttest_RR_ = stats.ttest_ind(data_RS, data_R)
        tRS, p_value_ttest_RS_ = stats.ttest_ind(data_RS, data_S, equal_var = False)

        # The mann-whitney u test is more suitable for skewed data like our solar cell data. 
        _, p_value_mannwhitneyu_RR = stats.mannwhitneyu(data_RS, data_R, alternative = Halternative)
        _, p_value_mannwhitneyu_RS = stats.mannwhitneyu(data_RS, data_S, alternative = Halternative)

        # Append 1000 test results to a list for later visualisation.
        p_values_mannwhitneyu_RR.append(p_value_mannwhitneyu_RR)
        p_values_mannwhitneyu_RS.append(p_value_mannwhitneyu_RS)

        # Change two-tailed p-value to one-tail value by dividing by 2.

        if Halternative == 'greater':
             p_value_ttest_RR = p_value_ttest_RR_ / 2 if tRR >= 0 else 1 - p_value_ttest_RR_ / 2
             p_value_ttest_RS = p_value_ttest_RS_ / 2 if tRS >= 0 else 1 - p_value_ttest_RS_ / 2
        elif Halternative == 'less':
             p_value_ttest_RR = p_value_ttest_RR_ / 2 if tRR <= 0 else 1 - p_value_ttest_RR_ / 2
             p_value_ttest_RS = p_value_ttest_RS_ / 2 if tRS <= 0 else 1 - p_value_ttest_RS_ / 2
        else:
             p_value_ttest_RR = p_value_ttest_RR_
             p_value_ttest_RS = p_value_ttest_RS_
            
        p_values_ttest_RR.append(p_value_ttest_RR)
        p_values_ttest_RS.append(p_value_ttest_RS)

    # Combine results to a single dataframe
    data1 = {'p_value': p_values_mannwhitneyu_RR + p_values_mannwhitneyu_RS,
            'test': (['MannWhitneyU'] * n_test * 2),
            'category': (['Ref2 Vs Ref2'] * n_test) + (['Ref2 Vs Sili_Vac'] * n_test)}

    data2 = {'p_value': p_values_ttest_RR + p_values_ttest_RS,
            'test': (['Ttest'] * n_test * 2),
            'category': (['Ref2 Vs Ref2'] * n_test) + (['Ref2 Vs Sili_Vac'] * n_test)}

    df1 = pd.DataFrame(data1)
    df2 = pd.DataFrame(data2)

    # Concatenate the DataFrames vertically
    combined_df = pd.concat([df1, df2], ignore_index=True)
    combined_df['type'] = combined_df['test'] + '/' + combined_df['category']
    
    # plot the results in a facet graph
    fig = make_subplots(rows=2, cols=2,vertical_spacing=0.03, horizontal_spacing=.02, 
                    shared_xaxes= True, shared_yaxes= True) 
    batch_colors = ['rgb(31, 119, 180)', 'rgba(31, 119, 180, 0.6)', 'rgb(44, 160, 44)', 'rgba(44, 160, 44, 0.6)']
 
    fig = px.histogram(combined_df, x="p_value", color= 'type',
                    histnorm='probability', color_discrete_sequence = batch_colors,
                    facet_row="test", facet_col="category")    
    fig.add_vline(x=0.05,  line_width=1, line_dash="dash", line_color='rgb(214, 39, 40)')
    fig.update_traces(xbins=dict(start=0,size=0.05))
    fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    fig.add_annotation(text= '0.05', x=0.05, y=-0.07,  font_color= 'rgb(214, 39, 40)',
                    xref="x", yref="paper", font_size=12, showarrow=False)
    fig.add_annotation(text= '0.05', x=0.05, y=-0.07,  font_color= 'rgb(214, 39, 40)',
                    xref="x2", yref="paper", font_size=12, showarrow=False)
    fig.update_layout(title_text=f"<b>Distribution of p-values for " + cate_interest + "</b> (n_sample = " + str(n_sample) + ", n_test = " + str(n_test) + ")", 
                            height=500, width=780, template='plotly',
                            bargap=0.1)
    fig.show()

## Eff

In [142]:
calculate_p_values_RSili_Vac(n_sample=120, n_test=1000, cate_interest='Eff', Halternative='less')

**Interpretation**: Even though T-tests and Mann-Whitney U tests generate contrary results, the Eff data are quite skewed. Thus, the Mann-Whitney test results are more reliable. Most of the Mann-Whitney tests reject the Null hypotheses that the `Ref2` and `Silicone with vacuum` groups share the same distribution and accept the alternative hypotheses that the `Ref2` group values are less than the `Silicone with vacuum` group.

## Voc

In [143]:
calculate_p_values_RSili_Vac(n_sample=120, n_test=1000, cate_interest='Voc', Halternative='less')

**Interpretation**: Even though T-tests and Mann-Whitney U tests generate contrary results, the Eff data are quite skewed. Thus, the Mann-Whitney test results are more reliable. Most of the Mann-Whitney tests reject the Null hypotheses that the `Ref2` and `Silicone with vacuum` groups share the same distribution and accept the alternative hypotheses that the `Ref2` group values are less than the `Silicone with vacuum` group.

## Isc

In [144]:
calculate_p_values_RSili_Vac(n_sample=120, n_test=1000, cate_interest='Isc', Halternative='less')

**Interpretation**: The T-tests and Mann-Whitney U tests generate similar results when the data is less skewed. They reject the Null hypotheses that the the`Ref2` and `Silicone with vacuum` groups share the same distribution and accept the alternative hypotheses that the `Ref2` group values are less than the `Silicone with vacuum` group.

## FF

In [130]:
calculate_p_values_RSili_Vac(n_sample=120, n_test=1000, cate_interest='FF', Halternative='less')

**Interpretation**: Even though T-tests and Mann-Whitney U tests generate contrary results, the FF data are quite skewed too. Thus, the Mann-Whitney test results are more reliable. Most of the Mann-Whitney tests reject the Null hypotheses that the `Ref2` and `Silicone with vacuum` groups share the same distribution and accept the alternative hypotheses that the `Ref2` group values are less than the `Silicone with vacuum` group.

## Rs

In [132]:
calculate_p_values_RSili_Vac(n_sample=120, n_test=1000, cate_interest='Rs', Halternative='two-sided')

**Interpretation**: Both of T-tests and Mann-Whitney U tests retain the Null hypotheses that the `Ref2` and `Silicone with vacuum` groups share the same distribution.

## Rsh

In [146]:
calculate_p_values_RSili_Vac(n_sample=120, n_test=1000, cate_interest='Rsh', Halternative='less')

**Interpretation**: Both of T-tests and Mann-Whitney U tests retain the Null hypotheses that the `Ref2` and `Silicone with vacuum` groups share the same distribution.