# Step 01: Patient Arrival Model
Here we construct a Patient Arrival Model, randomly generating a set of arrival times. Based on the assumed working hours, which are 8:00 AM to 5:00 PM, we will truncate the data accordingly.

In [1]:
import numpy as np
def generate_patient_arrivals(mean_interarrival_time=10):
    """
    Generate patient arrivals based on an exponential distribution of interarrival times.
    
    Parameters:
    mean_interarrival_time (int): The mean time between arrivals, in minutes.
    total_time (int): The total simulation time, in minutes (default is 720 minutes, equivalent to 12 hours).
    
    Returns:
    arrival_times (numpy array): An array of patient arrival times within the specified total time.
    total_patients (int): The total number of patients arriving within the specified total time.
    """
    
    # Generate interarrival times following an exponential distribution
    interarrival_times = np.random.exponential(scale=mean_interarrival_time, size=10000)
    
    # Calculate the arrival time for each patient
    arrival_times = np.cumsum(interarrival_times)
    
    # Only keep arrival times within the total time of 540 minutes
    arrival_times = arrival_times[arrival_times <= 540]
    
    # Round the arrival times to the nearest integer
    arrival_times = np.round(arrival_times).astype(int)
    
    # Calculate the total number of patients
    total_patients = len(arrival_times)
    
    return arrival_times, total_patients

Below, we will conduct a simple test of this program:

In [2]:
arrival_times_test, total_patients_test = generate_patient_arrivals()
print("Patient Arrival Times Vector:", arrival_times_test)
print("Total Number of Patients:", total_patients_test)

Patient Arrival Times Vector: [ 12  20  27  35  39  50  56  58  60  87  87  99 107 142 146 159 170 175
 181 184 191 197 200 204 207 223 236 237 240 243 245 252 256 262 269 275
 277 283 299 302 307 363 367 385 425 427 428 481 492 501 529 537 539 539]
Total Number of Patients: 54


To improve the readability of this code, the following method converts the data into the HH:MM time format.

In [3]:
def time_from_minutes(minutes_vector):
    """
    Convert a vector of minutes from 8:00 into a vector of times in HH:MM format.
    
    Parameters:
    minutes_vector (np.ndarray): A NumPy array containing integers representing the minutes passed since 8:00.
    
    Returns:
    np.ndarray: A NumPy array containing strings in HH:MM format representing the current time.
    """
    base_hour = 8
    base_minute = 0
    
    def convert_minutes_to_time(minutes):
        total_minutes = base_hour * 60 + base_minute + minutes
        hours = (total_minutes // 60) % 24
        minutes = total_minutes % 60
        return f"{hours:02}:{minutes:02}"
    
    time_vector = np.vectorize(convert_minutes_to_time)(minutes_vector)
    return time_vector

Then we do a quick test:

In [4]:
arrival_times_modified = time_from_minutes(arrival_times_test)
print("Patient Arrival Times Vector:", arrival_times_modified)

Patient Arrival Times Vector: ['08:12' '08:20' '08:27' '08:35' '08:39' '08:50' '08:56' '08:58' '09:00'
 '09:27' '09:27' '09:39' '09:47' '10:22' '10:26' '10:39' '10:50' '10:55'
 '11:01' '11:04' '11:11' '11:17' '11:20' '11:24' '11:27' '11:43' '11:56'
 '11:57' '12:00' '12:03' '12:05' '12:12' '12:16' '12:22' '12:29' '12:35'
 '12:37' '12:43' '12:59' '13:02' '13:07' '14:03' '14:07' '14:25' '15:05'
 '15:07' '15:08' '16:01' '16:12' '16:21' '16:49' '16:57' '16:59' '16:59']


# Step 02: All other parameters

Here we list all other parameters we need: 

1. Time spent in Reception or Precheck
2. Scores of PHQ-9
3. Scores of GAD-7
4. Time to the offline
5. Transition to in-person? (Effective only for data where both PHQ and GAD scores are less than 10. For others that were initially in-person, mark as `TRUE`.)
6. Diagnosis Time (related to PHQ and GAD, as shown in the table below:)

(Sum PHQ and GAD to get score range)

| Score range | Period(min) |
|-------------|-------------|
| 0-8         | 1-15        |
| 9-18        | 15-25       |
| 19-28       | 25-35       |
| 29-38       | 35-45       |
| >39         | 45-60       |

For that the last two parameters are based on the first four parameters, we first give codes to generate data of parameter 1234. 

- Time spent in Reception or Precheck

In [5]:
import random

def generate_time_spent_in_precheck(n):
    """
    Generate a list of time_spent_in_precheck values.

    Each value in the list follows a uniform distribution between 5 and 10.

    Parameters:
    n (int): The number of values to generate.

    Returns:
    list: A list of n values where each value is a float between 5 and 10.
    """
    # Initialize an empty list to store the generated values
    time_spent_in_precheck = []
    
    # Loop n times to generate n values
    for _ in range(n):
        # Generate a random float value between 5 and 10
        value = random.uniform(5, 10)
        # Append the generated value to the list
        time_spent_in_precheck.append(value)
    
    # Round to the nearest integer
    time_spent_in_precheck = np.round(time_spent_in_precheck).astype(int)
    # Return the list of generated values
    return time_spent_in_precheck

Then we do a quick test: 

In [6]:
test_result1 = generate_time_spent_in_precheck(10)
print(test_result1)

[ 7  6  6  7  9 10  7  6  7  8]


- Time to the offline

In [7]:
def generate_time_to_offline():
    """
    Generate a random time to offline value within the range 10 to 540.

    Returns:
    int: A random integer within the range 10 to 540.
    """
    return int(random.uniform(10, 540))

- Scores of PHQ-9 and GAD-7

Here, we assume that the measure of depression(both PHQ and GAD) follows a normal distribution. This way, we can generate data for different categories of people by adjusting the mean and variance.

In [8]:
def generate_depression_scores(n, mu, sigma_squared,lower_bound,upper_bound):
    """
    Generate a list of depression index scores.

    Each score follows a normal distribution with the specified mean and variance,
    and is constrained within the range of 0 to 49.

    Parameters:
    n (int): The number of scores to generate.
    mu (float): The mean of the normal distribution.
    sigma_squared (float): The variance of the normal distribution.

    Returns:
    list: A list of n scores where each score is a float between 0 and 49.
    """
    # Calculate the standard deviation from the variance
    sigma = sigma_squared ** 0.5
    
    # Initialize an empty list to store the generated scores
    depression_scores = []
    
    # Loop n times to generate n scores
    for _ in range(n):
        # Generate a random score from the normal distribution
        score = random.gauss(mu, sigma)
        # Constrain the score within the range of 0 to 49
        if score < lower_bound:
            score = lower_bound
        elif score > upper_bound:
            score = upper_bound
        # Append the constrained score to the list
        depression_scores.append(score)
    
    # Round to the nearest integer
    depression_scores= np.round(depression_scores).astype(int)
    # Return the list of generated scores
    return depression_scores

Then we do a quick test on it.

In [9]:
test_result2 = generate_depression_scores(10, 8, 50, 0, 27)
print(test_result2)
test_result3 = generate_depression_scores(10, 8, 50, 0, 21)
print(test_result3)

[ 0  6 21 20 15  5  6 14 17  1]
[ 1  3  4  6  7  0  6 11 11 16]


Before moving on to the next step, we design a method to map the sum of PHQ and GAD scores to the diagnosis time.

In [10]:
def map_scores_to_periods(scores):
    """
    Map each depression index score to a corresponding period based on the given table.

    Parameters:
    scores (list): A list of depression index scores.

    Returns:
    list: A list of random values within the corresponding periods for each score.
    """
    # Define the mapping from score ranges to periods
    score_to_period = [
        (0, 8, 1, 15),
        (9, 18, 15, 25),
        (19, 28, 25, 35),
        (29, 38, 35, 45),
        (39, 49, 45, 60)
    ]
    
    # Initialize an empty list to store the mapped periods
    periods = []
    
    # Loop through each score in the input list
    for score in scores:
        # Find the corresponding period range for the score
        for min_score, max_score, min_period, max_period in score_to_period:
            if min_score <= score <= max_score:
                # Generate a random value within the period range
                period = random.uniform(min_period, max_period)
                periods.append(period)
                break
    
    # Round to the nearest integer
    periods= np.round(periods).astype(int)
    # Return the list of mapped periods
    return periods


Then we do a quick test: 

In [11]:
test_scores = [5, 10, 20, 30, 40]
test_result4 = map_scores_to_periods(test_scores)
print(test_result4)

[ 3 21 25 38 57]


# Step 03: Further data processing

Now we will organize the above data into a matrix in this way:

|Person 1     | Person 2    | ......      | 
|-------------|-------------|-------------|
| Arrival time 1 | Arrival time 2 | ...... |
| Time in Reception/Pre-check1      | Time in Reception/Pre-check2      |...... |
| Score PHQ 1      | Score PHQ 2      |......|
| Score GAD 1      | Score GAD 2     |......|


In [12]:
def generate_a_data_matrix(mean=20, mu=8, var=50):
    """
    This function generates a data matrix for patient information in a medical study.
    It includes the following steps:
    1. Generate the arrival times of the patients and the total number of patients.
    2. Generate the time each patient spends in precheck.
    3. Generate PHQ (Patient Health Questionnaire) depression scores for each patient.
    4. Generate GAD (Generalized Anxiety Disorder) scores for each patient.
    5. Stack the generated data into a matrix and return it.

    Returns:
        numpy.ndarray: A 2D array where each row corresponds to a different type of data:
                       - Arrival times
                       - Time spent in precheck
                       - PHQ scores
                       - GAD scores
    """
    arrival_times, total_patients = generate_patient_arrivals(mean)
    time_in_precheck = generate_time_spent_in_precheck(total_patients)
    PHQ = generate_depression_scores(total_patients, mu, var, 0, 27)
    GAD = generate_depression_scores(total_patients, mu, var, 0, 21)
    
    return np.vstack((arrival_times, time_in_precheck, PHQ, GAD))

Then we do a quick test

In [13]:
print(generate_a_data_matrix())

[[ 54  60  94 110 111 122 140 151 157 161 165 174 188 193 205 211 234 238
  263 281 318 343 343 371 391 418 440]
 [  6   6   6   8   8   9   7   8   8   9   8   8   8  10   9   7   6  10
    8   8   7   9   6   6   7   7   8]
 [  0   6  14   9   5  14   4  13  15   6  10   0  16   0   1  11   2  11
    7   4   0  17   4   6  11  15  14]
 [ 12   0  12  15   0   8   7   9   5   9  17   0  10   1   8   0  11  12
   17   4  16   1   6   0   8   6  14]]


Now, we will try to simulate the situation of transitioning to offline. The approach is as follows:

1.	We consider the people transitioning to offline as new patients. However, since they must directly go through offline channels to enter the system, we assign them a mandatory offline tag.
2.	Now, for programming purposes, we understand the condition for this mandatory offline tag: it is possible to obtain the mandatory offline tag if both indicators are below 10.
3.	For general patients, they do not have the mandatory offline tag.
4.	If they have the mandatory offline tag, it is marked as `True`. If not, it is marked as `False`.

In [14]:
def add_mandatory_offline_tag(matrix, n):
    """
    Add a mandatory offline tag to the given matrix.

    If the values in the third and fourth rows are both less than 10, 
    add a fifth row with a True value with a probability of n, otherwise False.

    Parameters:
    matrix (numpy.ndarray): A 4xN matrix of integers.
    n (float): The probability of adding True if the condition is met.

    Returns:
    numpy.ndarray: The modified matrix with an added fifth row.
    """
    
    # Initialize an empty list to store the fifth row values
    mandatory_offline_tag = []
    
    # Loop through each column in the matrix
    for i in range(matrix.shape[1]):
        # Get the values from the third and fourth rows
        third_row_value = matrix[2, i]
        fourth_row_value = matrix[3, i]
        
        # Check if both values are less than 10
        if third_row_value < 10 and fourth_row_value < 10:
            # Add True with a probability of n, otherwise False
            tag = np.random.rand() < n
        else:
            # Add False if the condition is not met
            tag = False
        
        # Append the tag to the list
        mandatory_offline_tag.append(tag)
    
    # Convert the list to a numpy array and reshape to a row vector
    mandatory_offline_tag = np.array(mandatory_offline_tag).reshape(1, -1)
    
    # Append the fifth row to the input matrix
    modified_matrix = np.vstack((matrix, mandatory_offline_tag))
    
    return modified_matrix

Then we do a quick test: 

In [15]:
print(add_mandatory_offline_tag(generate_a_data_matrix(),0.2))

[[ 29  37  37  81  84 102 137 154 156 172 196 207 260 279 308 328 340 369
  411 433 446 455 497 504 528]
 [  9   8   7   8   9   8  10   8   5   9   6   6  10   8   7   9   7   6
    9   6   6   8   9   5   5]
 [  3  14  17   7  15  14   3  22   4   9   7  16  13   8   7   6  14   6
   10  11  17  16  17   3  20]
 [  9  12   0   8   0  12  13   7  16   3  21   7   0  17   9   6   0   0
   15   7   9  19   4  11   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0
    0   0   0   0   0   0   0]]


Now, based on the above results, we will update the matrix, keeping only the times when patients visit the clinic.

In [16]:
def update_matrix_with_offline_times(matrix):
    """
    Update the given matrix by adding a time to offline value to the first row 
    if the corresponding column in the last row is marked as True.

    Parameters:
    matrix (numpy.ndarray): A 5xN matrix where the last row contains True/False values.

    Returns:
    numpy.ndarray: The modified matrix with updated values in the first row.
    """
    # Check if the input matrix has 5 rows
    if matrix.shape[0] != 5:
        raise ValueError("The input matrix must have 5 rows.")
    
    # Loop through each column in the matrix
    for i in range(matrix.shape[1]):
        # Check if the last row in the current column is True
        if matrix[4, i]:
            # Generate a random time to offline value
            time_to_offline_value = generate_time_to_offline()
            # Add the generated value to the first row of the current column
            matrix[0, i] += time_to_offline_value
    
    return matrix

Then we do a quick test:

In [17]:
print(update_matrix_with_offline_times(add_mandatory_offline_tag(generate_a_data_matrix(), 0.2)))

[[ 13  82  87  95 102 109 205 220 229 393 393 419 440 719 474 475 502 505
  526 533]
 [  8  10  10   7   6   9   6  10  10   8   7   8   6   9   7   7   7   6
    9   8]
 [ 20  16   4   9   1   5  17   9   2  10   9   0  15   5  20  11  18  14
    3  10]
 [  8   0  12  10  12   6  17   5  21   1   5  18   5   6   3  13  16   2
    1  15]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0
    0   0]]


The final step is to remove all patients who meet the conditions for online consultations, unless the patient has a mandatory offline tag.

In [18]:
def remove_online_patients(matrix):
    """
    Delete columns from the matrix where:
    1. Both values in the third and fourth rows are less than 10.
    2. The value in the last row (fifth row) is False.

    Parameters:
    matrix (numpy.ndarray): A 5xN matrix.

    Returns:
    numpy.ndarray: The modified matrix with columns deleted based on the conditions.
    """
    # Check if the input matrix has 5 rows
    if matrix.shape[0] != 5:
        raise ValueError("The input matrix must have 5 rows.")
    
    # Initialize a list to keep track of columns to keep
    columns_to_keep = []
    
    # Loop through each column in the matrix
    for i in range(matrix.shape[1]):
        # Get the values from the third and fourth rows
        third_row_value = matrix[2, i]
        fourth_row_value = matrix[3, i]
        # Get the value from the last row
        last_row_value = matrix[4, i]
        
        # Check the conditions for keeping the column
        if not (third_row_value < 10 and fourth_row_value < 10 and last_row_value == False):
            columns_to_keep.append(i)
    
    # Create a new matrix with only the columns to keep
    modified_matrix = matrix[:, columns_to_keep]
    
    return modified_matrix

In [19]:
test_data = update_matrix_with_offline_times(add_mandatory_offline_tag(generate_a_data_matrix(), 0.2))
print(test_data)
test_data2 = remove_online_patients(test_data)
print(test_data2)

[[ 28  54  66  69  73  81  89 120 171 191 200 223 369 241 241 353 279 288
  298 300 328 398 412 430 461 492 536 529]
 [  9  10   7   5   9  10   9  10   9   8   6   7   8   8   7   6   7   9
    8   6   5   8   6   5   9   9  10   5]
 [  0  16   7   6  12  11  14   8  20  12   3   1   6  15  10   7  14   8
    0  18   6   6  12   5  21  13   3  13]
 [ 11   0  10   1   2   5  11  12   3   0   2  17   4  20   9   6  10   0
   11   0   0   5  11   2   0   8   5  17]
 [  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   1   0   0
    0   0   0   0   0   0   0   0   1   0]]
[[ 28  54  66  73  81  89 120 171 191 223 369 241 241 353 279 298 300 412
  461 492 536 529]
 [  9  10   7   9  10   9  10   9   8   7   8   8   7   6   7   8   6   6
    9   9  10   5]
 [  0  16   7  12  11  14   8  20  12   1   6  15  10   7  14   0  18  12
   21  13   3  13]
 [ 11   0  10   2   5  11  12   3   0  17   4  20   9   6  10  11   0  11
    0   8   5  17]
 [  0   0   0   0   0   0   0   0   0   0 

Then we do not need the tag any more.

In [20]:
def remove_last_row(matrix):
    """
    Remove the last row from the given matrix.

    Parameters:
    matrix (numpy.ndarray): An MxN matrix.

    Returns:
    numpy.ndarray: The modified matrix with the last row removed.
    """
    # Check if the matrix has more than one row
    if matrix.shape[0] < 2:
        raise ValueError("The matrix must have at least two rows to remove the last one.")
    
    # Remove the last row using numpy slicing
    modified_matrix = matrix[:-1, :]
    
    return modified_matrix

Then we do a quick test:

In [21]:
test_data3 = remove_last_row(test_data2)
print(test_data3)

[[ 28  54  66  73  81  89 120 171 191 223 369 241 241 353 279 298 300 412
  461 492 536 529]
 [  9  10   7   9  10   9  10   9   8   7   8   8   7   6   7   8   6   6
    9   9  10   5]
 [  0  16   7  12  11  14   8  20  12   1   6  15  10   7  14   0  18  12
   21  13   3  13]
 [ 11   0  10   2   5  11  12   3   0  17   4  20   9   6  10  11   0  11
    0   8   5  17]]


Now, we will sort the arrival times again in ascending order and remove items greater than 540.

In [22]:
def remove_columns_and_sort(matrix):
    """
    Remove columns from the matrix where the first row has values greater than 540,
    then sort the matrix based on the first row in ascending order.

    Parameters:
    matrix (numpy.ndarray): An MxN matrix.

    Returns:
    numpy.ndarray: The modified matrix with columns removed and sorted.
    """
    # Remove columns where the first row has values greater than 540
    filtered_matrix = matrix[:, matrix[0, :] <= 540]
    
    # Sort the matrix based on the first row
    sorted_indices = np.argsort(filtered_matrix[0, :])
    sorted_matrix = filtered_matrix[:, sorted_indices]
    
    return sorted_matrix

Then we do quick test: 

In [23]:
test_data4 = remove_columns_and_sort(test_data3)
print(test_data3)
print(test_data4)

[[ 28  54  66  73  81  89 120 171 191 223 369 241 241 353 279 298 300 412
  461 492 536 529]
 [  9  10   7   9  10   9  10   9   8   7   8   8   7   6   7   8   6   6
    9   9  10   5]
 [  0  16   7  12  11  14   8  20  12   1   6  15  10   7  14   0  18  12
   21  13   3  13]
 [ 11   0  10   2   5  11  12   3   0  17   4  20   9   6  10  11   0  11
    0   8   5  17]]
[[ 28  54  66  73  81  89 120 171 191 223 241 241 279 298 300 353 369 412
  461 492 529 536]
 [  9  10   7   9  10   9  10   9   8   7   8   7   7   8   6   6   8   6
    9   9   5  10]
 [  0  16   7  12  11  14   8  20  12   1  15  10  14   0  18   7   6  12
   21  13  13   3]
 [ 11   0  10   2   5  11  12   3   0  17  20   9  10  11   0   6   4  11
    0   8  17   5]]


Finally, we use a comprehensive method to integrate all the above methods.

In [24]:
def convert_to_offline_patient_data(data0,n):
    """
    This function generates and processes a data matrix for offline patients in a medical study.
    The steps include:
    1. Generate the initial data matrix for patients.
    2. Add a mandatory offline tag to the data matrix.
    3. Update the matrix with offline times for patients.
    4. Remove data for online patients from the matrix.
    5. Remove the last row of the matrix.
    6. Remove specific columns and sort the matrix.

    Parameters:
        n (int): The number of patients to process for offline tagging.

    Returns:
        numpy.ndarray: A processed 2D array for offline patients with specified transformations applied.
    """
    data1 = add_mandatory_offline_tag(data0, n)
    data2 = update_matrix_with_offline_times(data1)
    data3 = remove_online_patients(data2)
    data4 = remove_last_row(data3)
    data5 = remove_columns_and_sort(data4)

    return data5

Finnally we do a quick test to finish this step.

In [25]:
final_test_data = generate_a_data_matrix()
print(final_test_data)
print(convert_to_offline_patient_data(final_test_data,0.2))

[[ 17  60  62  70  98 100 174 196 201 204 229 231 249 256 295 306 331 346
  370 382 406 423 425 429 473 536 539]
 [  5   5   5   8   7   8   7  10   7   9   9   9   7   6   7   9   6   8
    5   6  10   8   8   6   7   6   6]
 [ 11   1  18  15  16   0   6   4   0   1   0  12  11   8  17   0   9  13
    0  20   5   2  11   6   1  13   0]
 [ 21   6  13   9   8   0  16  14  13   5  15   1   9  11  20   1   5  11
   16   1  14   8   6   3  10  19  17]]
[[ 17  62  70  98 174 196 201 229 231 249 256 295 346 370 382 406 425 473
  536 539]
 [  5   5   8   7   7  10   7   9   9   7   6   7   8   5   6  10   8   7
    6   6]
 [ 11  18  15  16   6   4   0   0  12  11   8  17  13   0  20   5  11   1
   13   0]
 [ 21  13   9   8  16  14  13  15   1   9  11  20  11  16   1  14   6  10
   19  17]]


# Step 04: Construction of the Queuing Model

Our previous steps have provided a data matrix structured as shown below.
|Person 1     | Person 2    | ......      | 
|-------------|-------------|-------------|
| Arrival time 1 | Arrival time 2 | ...... |
| Time in Reception/Pre-check1      | Time in Reception/Pre-check2      |...... |
| Score PHQ 1      | Score PHQ 2      |......|
| Score GAD 1      | Score GAD 2     |......|

Regardless of whether it is for a purely offline method or for a method incorporating online consultations, this matrix remains consistent. For the online scenario, the model transitions to offline using the above methods.

To be precise, for a given data matrix(data0): 

- Case 1: Purely offline

Use the quening model directly

- Case 2: With online

Apply `convert_to_offline_patient_data(data0,n)` $\Rightarrow$ Use the quening model

First, we transform the matrix into the following form:
|Person 1     | Person 2    | ......      | 
|-------------|-------------|-------------|
| Arrival time 1 + Time in Reception/Pre-check1    | Arrival time 2 + Time in Reception/Pre-check2  | ...... |
| Diagnosis time 1   | Diagnosis time 2     |......|

In this matrix, the first row of data represents the time patients start waiting, and the second row of data represents the time needed for diagnosis.



In [26]:
def final_transformation(matrix):
    """
    Transform the given matrix as per the specified rules.

    1. Sum the first and second rows to get the first row of the new matrix.
    2. Sum the third and fourth rows and apply map_scores_to_periods method to get the second row of the new matrix.
    3. Sort the matrix based on the first row in ascending order.

    Parameters:
    matrix (numpy.ndarray): An MxN matrix.

    Returns:
    numpy.ndarray: The transformed matrix.
    """
    # Check if the matrix has at least 4 rows
    if matrix.shape[0] < 4:
        raise ValueError("The input matrix must have at least 4 rows.")
    
    # Sum the first and second rows
    first_row = matrix[0, :] + matrix[1, :]
    
    # Sum the third and fourth rows
    summed_scores = matrix[2, :] + matrix[3, :]
    
    # Apply map_scores_to_periods to the summed scores
    second_row = map_scores_to_periods(summed_scores)
    
    # Convert second_row to numpy array for consistency
    second_row = np.array(second_row)
    
    # Combine the new first and second rows into a new matrix
    transformed_matrix = np.vstack((first_row, second_row))
    
    # Sort the matrix based on the first row
    sorted_indices = np.argsort(transformed_matrix[0, :])
    sorted_matrix = transformed_matrix[:, sorted_indices]
    
    return sorted_matrix

Then we do a quick test: 

In [27]:
final_test_data2 = final_transformation(final_test_data)
print(final_test_data2)

[[ 22  65  67  78 105 108 181 206 208 213 238 240 256 262 302 315 337 354
  375 388 416 431 433 435 480 542 545]
 [ 44   5  43  29  33  12  33  15  21   2  25  19  33  30  45   8  22  31
   24  31  25  22  16  20  20  39  24]]


In [28]:
def queuing_model(matrix):
    """
    Calculate the actual service start times for each patient based on their arrival and service times.

    Parameters:
    matrix (numpy.ndarray): A 2xN matrix where the first row represents arrival times and the second row represents service times.

    Returns:
    numpy.ndarray: A vector representing the actual service start times for each patient.
    """
    # Check if the matrix has exactly 2 rows
    if matrix.shape[0] != 2:
        raise ValueError("The input matrix must have 2 rows.")
    
    # Extract the arrival and service times
    arrival_times = matrix[0, :]
    service_times = matrix[1, :]
    
    # Initialize an array to store the actual service start times
    actual_service_times = np.zeros_like(arrival_times)
    
    # Initialize the time when the next patient can be served
    next_available_time = 0
    
    # Loop through each patient
    for i in range(arrival_times.shape[0]):
        # Calculate the actual service start time for the current patient
        if arrival_times[i] >= next_available_time:
            # If the patient arrives after or when the server is available
            actual_service_times[i] = arrival_times[i]
        else:
            # If the patient arrives before the server is available
            actual_service_times[i] = next_available_time
        
        # Update the next available time
        next_available_time = actual_service_times[i] + service_times[i]
    
    return actual_service_times

Then we do a quick test:

Here we temporarily discard the above test data and regenerate test data to simulate both high and low patient numbers to ensure this step is correct.

In [29]:
low_patient_numbers_test = final_transformation(generate_a_data_matrix(50))
print(low_patient_numbers_test)
print(queuing_model(low_patient_numbers_test))


[[ 37 130 136 226 239 349 420]
 [ 23  22  25  24  33   7  34]]
[ 37 130 152 226 250 349 420]


It looks good! 

Then we do another one.

In [30]:
high_patient_numbers_test = final_transformation(generate_a_data_matrix(10))
print(high_patient_numbers_test)
print(queuing_model(high_patient_numbers_test))

[[ 10  18  24  30  46  62  69  71  82  90  90  91  94  98 103 108 115 117
  127 130 135 149 156 157 174 188 201 206 252 262 271 277 291 325 335 365
  373 383 385 429 429 451 457 465 479 494 497 500 503 506 513 514 519 525
  526 545]
 [ 18  21  16  38  41  30   9   4  17  25  20   7  26  32  22  25  26  19
   30  29  28  38  39   4  13   7  23  30   2  34  10  22  22  25  26  17
    5  34  16  28  37  17  17  12  20  18  33  26  38  20  22  14  37  19
   31  33]]
[  10   28   49   65  103  144  174  183  187  204  229  249  256  282
  314  336  361  387  406  436  465  493  531  570  574  587  594  617
  647  649  683  693  715  737  762  788  805  810  844  860  888  925
  942  959  971  991 1009 1042 1068 1106 1126 1148 1162 1199 1218 1249]


It looks good as well! 

Next, we move on to the final step, which is calculating the waiting time.

In [31]:
def calculate_waiting_time(matrix):
    """
    Calculate the average waiting time for patients based on their arrival and service times.

    Parameters:
    matrix (numpy.ndarray): A 2xN matrix where the first row represents arrival times and the second row represents service times.

    Returns:
    float: The average waiting time for patients.
    """
    # Extract the first row as the arrival times vector
    arrival_times = matrix[0, :]
    
    # Apply the queuing_model function to get the actual service start times vector
    service_start_times = queuing_model(matrix)
    
    # Calculate the waiting times by subtracting arrival times from service start times
    waiting_times = service_start_times - arrival_times
    
    # Calculate the total waiting time
    total_waiting_time = np.sum(waiting_times)
    
    # Calculate the average waiting time
    average_waiting_time = total_waiting_time / waiting_times.size
    
    return average_waiting_time

Then we do some quick tests: 

In [32]:
print(calculate_waiting_time(low_patient_numbers_test))
print(calculate_waiting_time(high_patient_numbers_test))

3.857142857142857
357.4642857142857


Finally, we will integrate and explain the above code to conclude this step.

In [33]:
def model_to_calculate_average_waiting_time(data):
    """
    This function calculates the average waiting time for patients in a medical study.
    The steps include:
    1. Apply a final transformation to the input data matrix.
    2. Calculate the average waiting time based on the transformed data.

    Parameters:
        data (numpy.ndarray): The input data matrix containing patient information.

    Returns:
        float: The calculated average waiting time for patients.
    """
    data_modified = final_transformation(data)
    average_waiting_time = calculate_waiting_time(data_modified)

    return average_waiting_time

Quick test: 

In [34]:
final_test_data3 = generate_a_data_matrix(30)
print(model_to_calculate_average_waiting_time(final_test_data3))

23.444444444444443


The next step is to analyze the code. Before that, I will provide a user manual for the above code to help with modifications and usage.

# User manual for this notebook
# 代码使用说明书：

This program has three main methods to calculate the average waiting time:

- Method 1: Generate Data

`def generate_a_data_matrix(mean=20, mu=8, var=50):`

In this function, `mean` represents the average patient arrival time, which follows an exponential distribution. You can change this mean value to simulate scenarios with sparser traffic, for instance, by increasing this mean value.

The `mu` and `var` parameters represent the mean and variance of the normal distribution for depression evaluation indices. You can change these mean and variance values to simulate scenarios where the overall severity of patients' depression is more serious, for instance, by increasing the mean.

- Method 2 (for scenarios including offline situations only): Generate Offline Consultation Matrix

`def convert_to_offline_patient_data(data0, n):`

Here, `data0` should be the output value of the preceding method, i.e., the output value of `generate_a_data_matrix`. `n` represents the probability of converting to offline. All model structures are embedded within this method (see the code for specific implementation). This method will return data for patients requiring offline diagnosis to calculate the waiting time.

- Method 3: Queueing Model and Waiting Time Calculation:

`model_to_calculate_average_waiting_time(data)`

Given a patient data matrix, this method will directly return the average waiting time.

A typical procedure to compare the results of two models is as follows:

- Step 1: Generate the data matrix `patient_data = generate_a_data_matrix(mean=20, mu=8, var=50)`

- Step 2: For the online model, generate the offline consultation matrix: `offline_patient_data = convert_to_offline_patient_data(patient_data, 0.5)`

- Step 3: Compare the results of the two models

`print(model_to_calculate_average_waiting_time(patient_data))`
`print(model_to_calculate_average_waiting_time(offline_patient_data))`

本程序为了实现对于平均等待时间的计算，有三个比较主要的方法：
- 方法一：生成数据

`def generate_a_data_matrix(mean=20, mu=8, var=50):`

这里面`mean`表示的是：病人到达时间，指数分布的均值，所以可以改变这个均值，比如将这个均值变大，来模拟客流量更加稀疏的情况。

后面的`mu`,`var`是抑郁症评价指标，所服从的正太分布的均值和方差，可以改变均值和方差，比如将均值增大，来模拟病人总体抑郁情况严重的情况。

- 方法二（仅供包含线上的情况使用）： 生成线下就诊矩阵

`def convert_to_offline_patient_data(data0,n):`

这里的`data0`要作为前序方法的输出值，也就是`generate_a_data_matrix`的输出值。`n`表示转为线下的概率。模型的所有结构内嵌在这个方法内（具体实现方法见代码）。这个方法将返回需要线下诊断的人数据，来计算等待时间。

- 方法三：排队模型和等待时间计算：

`model_to_calculate_average_waiting_time(data)`

给定病人数据矩阵，该方法将直接返回平均等待时间。

对于一个常规的，比较两个模型结果的方法，如下所示：

- 步骤一： 生成数据矩阵`patient_data = generate_a_data_matrix(mean=20, mu=8, var=50)`

- 步骤二： 对于线上模型，生成线下就诊矩阵： `offline_patient_data = convert_to_offline_patient_data(patient_data,0.5)`

- 步骤三： 比较两个模型的结果 

`print(model_to_calculate_average_waiting_time(patient_data))`
`print(model_to_calculate_average_waiting_time(offline_patient_data))`


In [35]:
patient_data = generate_a_data_matrix(mean=20, mu=8, var=50)
offline_patient_data = convert_to_offline_patient_data(patient_data,0.2)
print(model_to_calculate_average_waiting_time(patient_data))
print(model_to_calculate_average_waiting_time(offline_patient_data))

103.65517241379311
75.68181818181819


# Model Analysis