<a href="https://colab.research.google.com/github/chenweioh/GCP-Inspector-Toolkit/blob/main/Site_Selection_SingleMetric_PValue_Calculator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#User Guide: Site Analysis Using Fisher's Exact Test for a Single Metric
###Overview
This Python script is specifically tailored for those interested in evaluating the statistical significance of a single metric across multiple study sites. The script uses Fisher's Exact Test to calculate p-values that indicate the degree to which each site's metric deviates from the overall metric.

###What Does This Script Do?

It calculates the p-value for a single metric (like Enrollment/Screened, PD/Enrollment, or AE/Enrollment) for each site. It uses a 2x2 table to compare each site's metric to the overall metric across all sites and then prints out the calculated p-value for each site.

##How to Use It
###Prerequisites
**Local Machine:**
You must have Python installed on your computer.You also need to install the scipy library. You can install it using pip:

```
pip install scipy
```
**Google Colab**: Alternatively, you can run the script in Google Colab where installing Python or packages is not required. If using Colab, you might receive a warning. Just click "Run Anyway" as it is safe to proceed.


###Steps to Run the Script
**Input Your Data**: Replace the example data in the numerator_at_sites and denominator_at_sites variables with your actual data for the specific metric you are investigating.

**Run the Script**: Run the script in your Python environment. If you're using a service like Google Colab, you may see a warning when you run the script. Click "Run Anyway"; it's safe.

**Review the Output**: The script will print out a 2x2 table and the calculated p-value for each site. In the end, it will also list the two sites with the lowest p-values.

###Customizing for Different Metrics
To apply the script to different metrics, you only need to identify which of your variables will act as the numerator and which will be the denominator in the ratio you're investigating.

For example, for the metric PD/Enrollment:

PD would be the numerator.
Enrollment would be the denominator.
Replace the numerator_at_sites and denominator_at_sites variables in the code with your specific metrics. Make sure to adjust the total_numerator and total_denominator variables to reflect the total counts across all sites for your chosen metrics.

And there you have it! You're now equipped to assess variations of a single metric across different study sites.

In [None]:
from scipy.stats import fisher_exact

# ====== INSTRUCTIONS ======

# This code calculates the Fisher's Exact Test p-values for different metrics (ratios) at multiple clinical trial sites.
# It sorts the sites based on the p-values in ascending order, highlighting the two sites with the smallest p-values.

# 1. Replace the example numbers in `numerator_at_sites` and `denominator_at_sites` with the actual numbers for your metric at each site.
# 2. Update `total_numerator` and `total_denominator` to reflect the total counts for the metric of interest across all sites.
# 3. Run the code. If you run this in Colab, you may receive a warning. Click "Run Anyway," as the code is safe.

# Example metrics:
# - For PD/Enrollment: PD is the numerator, Enrollment is the denominator
# - For AE/Enrollment: AE is the numerator, Enrollment is the denominator
# - For Enrollment/Screened: Enrollment is the numerator, Screened is the denominator

# ==========================

# Number of subjects for the numerator and the denominator for each site
# Replace these numbers with your actual data
numerator_at_sites = [5,10,4,1,0,1,2,1,0,1]  # Example data
denominator_at_sites = [16,65,25,11,9,18,4,10,13,3]  # Example data

# Total counts for the numerator and the denominator across all sites
# Replace these numbers with your actual totals
total_numerator = 4347  # Example data
total_denominator = 7437 # Example data

# Initialize a list to hold p-values and site numbers
p_values = []

# Calculate p-value for each site
for i in range(len(numerator_at_sites)):
    # 2x2 table
    table = [
        [numerator_at_sites[i], denominator_at_sites[i] - numerator_at_sites[i]],
        [total_numerator - numerator_at_sites[i], total_denominator - total_numerator - (denominator_at_sites[i] - numerator_at_sites[i])]
    ]
    print(f"2x2 Table for Site {i + 1}:")
    print(table)

    _, p_value = fisher_exact(table)
    print(f"P-value for Site {i + 1}: {p_value}\n")

    p_values.append((i + 1, p_value))

# Sort by p-value
sorted_p_values = sorted(p_values, key=lambda x: x[1])

# Display the two sites with the smallest p-values
print("Two sites with the smallest p-values:")
for i in range(2):
    print(f"Site {sorted_p_values[i][0]}: p-value = {sorted_p_values[i][1]}")


2x2 Table for Site 1:
[[5, 11], [4342, 3079]]
P-value for Site 1: 0.03939285773066778

2x2 Table for Site 2:
[[10, 55], [4337, 3035]]
P-value for Site 2: 1.4472900727476543e-12

2x2 Table for Site 3:
[[4, 21], [4343, 3069]]
P-value for Site 3: 1.742014621158726e-05

2x2 Table for Site 4:
[[1, 10], [4346, 3080]]
P-value for Site 4: 0.001041798030423139

2x2 Table for Site 5:
[[0, 9], [4347, 3081]]
P-value for Site 5: 0.00036650927376602347

2x2 Table for Site 6:
[[1, 17], [4346, 3073]]
P-value for Site 6: 3.500411442040307e-06

2x2 Table for Site 7:
[[2, 2], [4345, 3088]]
P-value for Site 7: 1.0

2x2 Table for Site 8:
[[1, 9], [4346, 3081]]
P-value for Site 8: 0.002296900000414938

2x2 Table for Site 9:
[[0, 13], [4347, 3077]]
P-value for Site 9: 1.0835999227526605e-05

2x2 Table for Site 10:
[[1, 2], [4346, 3088]]
P-value for Site 10: 0.5740673220037045

Two sites with the smallest p-values:
Site 2: p-value = 1.4472900727476543e-12
Site 6: p-value = 3.500411442040307e-06
