# Balance Risk Set Matching
Detablan, Paul France M.

# Balanced Risk Set Matching (Li et al., 2001)
This research examines the use of cystoscopy and hydrodistention as treatments for interstitial cystitis, a chronic, non-life-threatening condition characterized by persistent symptoms. The study aims to compare treatment outcomes among patients with similar symptom histories but differing timelines of receiving interventions. The journal article emphasizes analyzing how the timing of treatment administration influences patient results, exploring whether earlier or later therapeutic approaches yield distinct effects on symptom management and recovery.

Link to article: [Balanced Risk Set Matching: Journal of the American Statistical Association](https://doi.org/10.1198/016214501753208573)

# Programming Assignment (Data Analytics)

This assignment is a test of your algorithmic thinking. You can use any A.I. tools ot assist you in this coding assignment.

Instructions:

- This is done by pair, preferably your thesis partner.
- Each person should create a Github Repo titled 'Assignment_1_Data_Analytics'.
- Read the journal provided.
- Develop a Python implementation of the procedures in the journal.
- Deadline is before premidterm week.

## install packages

In [None]:
pip install 

## Import required packages

In [3]:
# Import numerical and data analysis libraries  
import numpy as np  
import pandas as pd  

# Include distance computation and utilities  
from scipy.spatial.distance import cdist  
from collections import defaultdict  

# Add plotting capabilities  
import matplotlib.pyplot as plt  


## Initialization of Global Variables and Seed

In [9]:
import numpy as np

# Set the random seed for reproducibility
np.random.seed(25)

# Define key variables
total_patients = 210
years_of_evaluation = 4
months_of_evaluation = years_of_evaluation * 12
MAX_ALLOWED_MATCHES = 100

# Titles for data visualization
PLOT_TITLES = [
    'Initial Assessment',
    'During Treatment',
    '3 Months Post-Treatment',
    '6 Months Post-Treatment',
    'Change (3 Months Post-Treatment)',
    'Change (6 Months Post-Treatment)'
]

# Labels for categorizing groups
GROUP_LABELS = ['Untreated/Treated Later', 'Treated']


## Initialize patient assessment data at the time of entry

In [10]:
# Load baseline assessment data when a patient is registered
initial_data = pd.DataFrame({
    "patient_id": np.arange(total_patients),
    "gender": np.random.choice(['M', 'F'], total_patients),
    "pain_level": np.random.randint(0, 10, total_patients),
    "urgency_score": np.random.randint(0, 10, total_patients),
    "frequency_count": np.random.randint(0, 20, total_patients)
})

initial_data.head()

Unnamed: 0,patient_id,gender,pain_level,urgency_score,frequency_count
0,0,M,5,4,10
1,1,M,6,0,18
2,2,M,9,3,16
3,3,F,8,9,6
4,4,F,6,0,19


Patients undergo evaluations approximately every 3 months for a period of up to 4 years. During each assessment, three key metrics are recorded:

- Pain Level
- Urgency Score
- Nocturnal Frequency

Both pain level and urgency score are subjective ratings measured on a scale from 0 to 9.

## Conduct evaluations at 3-month intervals

In [11]:
# Initialize an empty DataFrame to store evaluation results
patient_evaluations = pd.DataFrame()

for patient in range(total_patients):
    selected_treatment_time = np.random.choice(list(np.arange(3, months_of_evaluation + 1, 3)) + [None])

    # Conduct evaluations every 3 months for up to 4 years
    for month in range(3, months_of_evaluation + 1, 3):
        pain_level = np.random.randint(0, 10, 1)
        urgency_score = np.random.randint(0, 10, 1)
        frequency_count = np.random.randint(0, 20, 1)
        time_elapsed = month

        if selected_treatment_time is None or month < selected_treatment_time:
            treatment_start = None
            treatment_status = 0
        else:
            treatment_start = selected_treatment_time
            treatment_status = 1

        # Append the evaluation data for the patient
        patient_evaluations = pd.concat([patient_evaluations, pd.DataFrame({
            'patient_id': patient,
            'pain_level': pain_level,
            'urgency_score': urgency_score,
            'frequency_count': frequency_count,
            'time_elapsed': time_elapsed,
            'treatment_start': treatment_start,
            'treatment_status': treatment_status
        })])

# Compute the average evaluation scores for each patient
patient_evaluations.groupby('patient_id')[['pain_level', 'urgency_score', 'frequency_count']].mean().head()

Unnamed: 0_level_0,pain_level,urgency_score,frequency_count
patient_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,5.125,4.375,11.5625
1,4.25,3.1875,9.0
2,5.375,4.5,11.625
3,4.625,3.9375,9.0
4,3.9375,5.0,11.75
