<a href="https://colab.research.google.com/github/beedrumms/eta_/blob/main/absentee_simulation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [27]:
# Mounting google colab on drive 
from google.colab import drive, files
import os
drive.mount("/content/drive")

import pandas as pd
import numpy as np
import seaborn as sns
import random
import matplotlib.pyplot as plt
import statistics as stats 

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


##Ontario’s chief health officer estimates that 20-30% of employees could be absent from work at the beginning of February. 

-	In a normal year, 6% of staff are typically absent at any given time. 
-	In December, when 10% of staff were absent, 50 out of Ontario’s 5,000 schools were closed due to staff absences. 
-	Teacher absences can be covered using substitute teachers. There are many potential substitute teachers (approximately 40% the number of regular teachers), but it is unclear whether any given absence could be covered by a substitute. 
estimate of the number of schools that might be closed at the beginning of February due to staff absences

##estimate of the number of schools that might be closed at the beginning of February due to staff absences

In [2]:
# things we know 
nTeachers = 150000
nSubstitues = nTeachers *.40
nOntario_schools = 5000
nSick_avg = nTeachers*.06
avg_teacher_per_school = nTeachers/nOntario_schools

print("Approximate number of teachers:", nTeachers, '\n', "Number of substitutes:", nSubstitues, '\n', "Number of schools in Ontario:", nOntario_schools, '\n', 
      "Average number of teachers sick on any day:", nSick_avg, '\n', "Averge number of teachers per school:", avg_teacher_per_school)

Approximate number of teachers: 150000 
 Number of substitutes: 60000.0 
 Number of schools in Ontario: 5000 
 Average number of teachers sick on any day: 9000.0 
 Averge number of teachers per school: 30.0


#Linear Approach
- dependent variable = school closures 
- independent variables = staff sick, substitutions 

In [4]:
# We know that for a .04 (4%) increase in teacher illness increased the # of closures from 0 - 50 (0% - 1%)
(nTeachers*.10 - nSick_avg) == nTeachers * .04

s = nTeachers * .04

nTeachers / s # 25 intervals to get through the entire population of teachers 

# that means that if this was a linear relationship, only 25% of schools would close if the entire teacher pop was sick 

25.0

In [6]:
print("20% of teachers:", nTeachers*.20, '\n', "30% of teachers:", nTeachers*.30)

# if 4% = 1% closures linearly - than 1% acount for 0.25% (.0025) in closures
# so every 1% increase

print("10% more accounts for", 10 * 0.0025,"% closures")
# first 10% acount for 1% second 10% acount for 2.5% 
print("schools closed for 20%:", nOntario_schools * 0.035)

print("20% more accounts for", 20 * 0.0025,"% closures")
# first 10% acount for 1% second 10% acount for 2.5% 
print("schools closed for 30%:", nOntario_schools * 0.05)

20% of teachers: 30000.0 
 30% of teachers: 45000.0
10% more accounts for 0.025 % closures
schools closed for 20%: 175.00000000000003
20% more accounts for 0.05 % closures
schools closed for 30%: 250.0


#Simulation
####Things we assume:
- when there are 6% absences, 100% of positions are filled with substitutes 
- the probability of finding a sub when you are sick does not change 
- the probability of finding a sub is .55 
- everything is equal oportunity (equal chance of getting sick) 
- evenly distributed (the averge # of teachers per school is the same) 
- variables are normaly distributed 
- if 6% sick is the average, and the dist is normal -- anything outside the bounds of 3%-9% is not normal and therefore a strain 


In [18]:
def closures_simulation(sick_rate):
  nSchools = [0] * 5000
  supply_pool = 150000 * .40
  closures = 0
  pReplacement = .55

  sick_rate_T = int(150000 * sick_rate)
  sick_rate_S = int(60000 * sick_rate)

  supply_pool_left = supply_pool - sick_rate_S

  for i in range(0, sick_rate_T):
    a = (np.random.randint(100))/100
    if a > pReplacement or supply_pool_left == 0:
      rand_school = (np.random.randint(5000)) # if a sub cannot be find - absence recorded at school 
      nSchools[rand_school] += 1  
    elif a < pReplacement: # if a sub is found, -1 from the sub pool
      supply_pool_left -= 1
         
  for i in nSchools:
    if i > 4: # 0.1333 of the staff at each school 
      closures += 1

  # fin = [closures, closures/5000, supply_pool_left, supply_pool_left/60000, sick_rate_T, sick_rate_T/150000] # for more info :) 
  fin = closures

  return fin

In [19]:
closures_simulation(.10) # this is working and is pretty close to the given values

69

In [20]:
def trials(n,sick_rate):
  lst = []
  for i in range(0,n):
    j = closures_simulation(sick_rate)
    lst.append(j)
  return lst

In [28]:
#RUNNING THE SIMULATION
ten_df = trials(10000, .10)
twenty_df = trials(10000, .20)
thirty_df = trials(10000, .30)

In [31]:
print("Avg closures when 10% of teachers are sick:", stats.mean(ten_df))
print("Avg closures when 20% of teachers are sick:", stats.mean(twenty_df))
print("Avg closures when 30% of teachers are sick:", stats.mean(thirty_df))

Avg closures when 10% of teachers are sick: 56.5554
Avg closures when 20% of teachers are sick: 641.3786
Avg closures when 30% of teachers are sick: 1816.9549


In [38]:
ten = list(ten_df)
twenty = list(twenty_df)
thirty = list(thirty_df)

df = pd.DataFrame(zip(ten, twenty, thirty))
df.columns = ["10%", "20%", "30%"]

In [41]:
fig = go.Figure(data=[go.Histogram(x=df['10%'])])
fig.show()

In [45]:
fig = go.Figure(data=[go.Histogram(x=df['20%'])])
fig.show()

In [48]:
fig = go.Figure(data=[go.Histogram(x=df['30%'])])
fig.show()

In [44]:
import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Histogram(x=df['10%'], name='10% Sick'))
fig.add_trace(go.Histogram(x=df['20%'], name='20% Sick'))
fig.add_trace(go.Histogram(x=df['30%'], name='30% Sick'))

# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.show()