# Generate Healthcare Data for Analysis

Create a mock dataset for 12 ambulatory practices including:
- Visit volume
- Average wait time
- Patient satisfaction score
- Appointment no-show rate
- Follow-up adherence rate
- Staff-to-patient ratio
- Provider productivity (visits per FTE)
- Quality measure compliance (e.g., A1C screening for diabetic patients)

Rows will be based on monthly data from each practice

Refer the to [Data Dictionary](Practice/sr_data_analyst_project1_data_dictionary.md) For the Data Variables Used in this Dataset.

In [None]:
# Import Packages
import pandas as pd
import random
import numpy as np
import math

## Define functions to generate random values within specific ranges

### Visit Volume

In [None]:
def random_visit_volume():
    return random.randint(500,5000)

### Average Patient Wait Time Before Being Seen (in minutes)

In [None]:
def avg_wait_time_min():
    return round(random(uniform(0,60), 2))

### Average patient satisfaction score(0-100 scale)

In [None]:
def patient_satisfaction_score():
    return round(random(uniform(0,100), 2))

### Percentage of scheduled appointments that patients missed
**Best Approach: Beta Distribution (Adjustable Skew)**
The beta distribution is great for controlling skewness. By setting alpha=1 and adjusting beta, you can control how much the distribution leans toward 0.

In [None]:
def no_show_rate(mean=75, std_dev=15, min_val=0, max_val=100):
    while True:
        sample = np.random.normal(mean, std_dev)
        if min_val <= sample <= max_val:
            return int(round(sample))

In [None]:
def followup_adherence_rate(mean=75, std_dev=15, min_val=0, max_val=100):
    while True:
        sample = np.random.normal(mean, std_dev)
        if min_val <= sample <= max_val:
            return int(round(sample))

## Create a list of possible values for categorical 

In [None]:
practice_id = ['AP001', 'AP002', 'AP003', 'AP004', 'AP005', 'AP006', 'AP007', 'AP008', 'AP009', 'AP010', 'AP011', 'AP012']
quarter = ['Q1', 'Q2', 'Q3', 'Q4']


In [None]:
data = {
    'Practice_ID': [random.choice(practice_id) for _ in range(5000)],
    'Quarter': [random.choice(quarter) for _ in range(5000)],
    'Visit_Volume': [random_visit_volume() for _ in range(5000)],
    'Avg_Wait_Time_Min': [avg_wait_time_min() for _ in range(5000)],
    'Patient_Satisfaction_Score': [patient_satisfaction_score() for _ in range(5000)],
    'No_Show_Rate': [no_show_rate() for _ in range(5000)],
    'Followup_Adherence_Rate': [followup_adherence_rate() for _ in range(5000)],
    'Staff_to_Patient_Ratio': 
    'Provider_Productivity': 
    'A1C_Screening_Compliance': 
    'Total_Providers_FTE': 
    'Total_Staff_FTE': 
    'Unique_Patients_Seen': 
}