# Construct school interaction networks for schools used for calibration from basic school statistics

**Note**: teacher <-> student contacts in the contact networks created in this script are of strenght "far" (loose).

In [2]:
import networkx as nx
import pandas as pd
from os.path import join
import numpy as np

# network construction utilities
from scseirx import construct_school_network as csn

# parallelisation functionality
from multiprocess import Pool
import psutil
from tqdm import tqdm

In this script, contact networks of "average" Austrian schools, depending on school type are created for the purpose of calibrating the simulation. These characteristics (mean number of classes, mean students per class) were determined from [statistics about Austrian schools](https://www.bmbwf.gv.at/Themen/schule/schulsystem/gd.html) (year 2017/18, page 10) and confirmed in interviews with a range of Austrian teachers and school directors conducted in December 2020. The school types modeled here are
* Primary schools (Volksschule), ```primary```
* Primary schools with daycare (Volksschule mit Ganztagesbetreuung), ```primary_dc```
* Lower secondary schools (Unterstufe), ```lower_secondary```
* Lower secondary schools with daycare (Unterstufe mit Ganztagesbetreuung), ```lower_secondary_dc```
* Upper secondary schools (Oberstufe), ```upper_secondary```
* Secondary schools (Gymnasium), ```secondary```
* Secondary schools with daycare (Gymnasium mit Ganztagesbetreuung), ```secondary_dc``` 

For every school type, 500 different implementations of the same network are created. Two networks of the same school type with the same characteristics (class size, number of students) can still be different, because the households are created randomly from underlying distributions. This is important since a strong component of disease spread between classes in the same school is the connection between siblings that live in the same household and go to the same school but different classes. These connections will be different for every implementation of the same school network. Here we create 500 different networks, since for the calibration we simulate 500 runs for every parameter combination, and therefore every run is simulated on a slightly different network, sampling different constellations of households.

**NOTE**: A more detailed description about the design decisions entering the modeling of each school type can be found in the document ```school_type_documentation```. In the following, "students" always refers to the number of students per class.  

## Background information

### School characteristics

In [4]:
# different age structures in Austrian school types
age_brackets = {'primary':[6, 7, 8, 9],
                'primary_dc':[6, 7, 8, 9],
                'lower_secondary':[10, 11, 12, 13],
                'lower_secondary_dc':[10, 11, 12, 13],
                'upper_secondary':[14, 15, 16, 17],
                'secondary':[10, 11, 12, 13, 14, 15, 16, 17],
                'secondary_dc':[10, 11, 12, 13, 14, 15, 16, 17]
               }

In [5]:
# average number of classes per school type and students per class
school_characteristics = {
    # Primary schools
    # Volksschule: schools 3033, classes: 18245, students: 339382
    'primary':            {'classes':8, 'students':19},
    'primary_dc':         {'classes':8, 'students':19},
    
    # Lower secondary schools
    # Hauptschule: schools 47, classes 104, students: 1993
    # Mittelschule: schools 1131, classes: 10354, students: 205905
    # Sonderschule: schools 292, classes: 1626, students: 14815
    # Total: schools: 1470, classes: 12084, students: 222713
    'lower_secondary':    {'classes':8, 'students':18},
    'lower_secondary_dc': {'classes':8, 'students':18},
    
    # Upper secondary schools
    # Oberstufenrealgymnasium: schools 114, classes 1183, students: 26211
    # BMHS: schools 734, classes 8042, students 187592
    # Total: schools: 848, classes 9225, students: 213803
    'upper_secondary':    {'classes':10, 'students':23}, # rounded down from 10.8 classes
    
    # Secondary schools
    # AHS Langform: schools 281, classes 7610, students 179633
    'secondary':          {'classes':28, 'students':24}, # rounded up from 27.1 classes
    'secondary_dc':       {'classes':28, 'students':24} # rounded up from 27.1 classes
}

### Characteristics of Austrian families

Family sizes with children < 18 years old from the [Austrian microcensus 2019](https://www.statistik.at/web_de/statistiken/menschen_und_gesellschaft/bevoelkerung/haushalte_familien_lebensformen/familien/index.html) (Note: 63.45 % of all households have no children), file ```familien_nach_familientyp_und_zahl_der_kinder_ausgewaehlter_altersgruppen_```:

* 1 child: 48.15 % (81.95 % two parents, 18.05 % single parents)
* 2 children: 38.12 % (89.70 % two parents, 10.30% single parents)
* 3 children: 10.69 % (88.26 % two parents, 11.74 % single parents)
* 4 or more children: 3.04 % (87.44 % two parents, 12.56 % single parents)

In [6]:
# given the precondition that the family has at least one child, how many
# children does the family have?
p_children = {1:0.4815, 2:0.3812, 3:0.1069, 4:0.0304}

# probability of being a single parent, depending on the number of children
p_parents = {1:{1:0.1805, 2:0.8195},
             2:{1:0.1030, 2:0.8970},
             3:{1:0.1174, 2:0.8826},
             4:{1:0.1256, 2:0.8744}
            }

General household sizes of households with one family (2.51% of households have more than one family) [Austrain household statistics 2019](https://www.statistik.at/web_de/statistiken/menschen_und_gesellschaft/bevoelkerung/haushalte_familien_lebensformen/haushalte/index.html), files 
* ```ergebnisse_im_ueberblick_privathaushalte_1985_-_2019```
* ```familien_nach_familientyp_und_zahl_der_kinder_ausgewaehlter_altersgruppen_``` 

Percentages:
* single $\frac{(3950 - 2388)}{3959}$ = 39.54 %
* couple, no kids $\frac{1001}{3959}$ = 25.28 % 
* single parent with one kid < 18: $\frac{277}{3950} \cdot \frac{87.0}{137.4}$ = 4.44 %
* single parent with two kids < 18: $\frac{277}{3950} \cdot \frac{37.3}{137.4}$ = 1.9%
* single parent with three or more kids < 18: $\frac{277}{3950} \cdot \frac{13.1}{137.4}$ = 0.67%
* couples with one kid < 18: $\frac{1050}{3950} \cdot \frac{252.4}{606.7}$ = 11.06 %
* couples with two kids < 18: $\frac{1050}{3950} \cdot \frac{255.5}{606.7}$ = 11.19 %
* couples with three or more kids <18: $\frac{1050}{3950} \cdot \frac{98.9}{606.7}$ = 4.33 % 
* households with three adults (statistic: household with  kids > 18 years): 1.59 % 

In [7]:
# probability of a household having a certain size, independent of having a child
teacher_p_adults = {1:0.4655, 2:0.5186, 3:0.0159}
teacher_p_children = {1:{0:0.8495, 1:0.0953, 2:0.0408, 3:0.0144},
                      2:{0:0.4874, 1:0.2133, 2:0.2158, 3:0.0835},
                      3:{0:1, 1:0, 2:0, 3:0}}

### Link type <-> contact type mapping

The simulation relies on specified contact strengths (close, intermediate, far, very far) to determine infection risk. Nevertheless, depending on the setting, there are a multitude of different contacts (link types) between different agent groups and during different activities. The below dictionary provides a complete list of all link types that exist in the school setting, and a mapping of every link type to the corresponding contact type.

In [8]:
contact_map = {
    'student_household':'close', 
    'student_student_intra_class':'far',
    'student_student_table_neighbour':'intermediate',
    'student_student_daycare':'far',
    'teacher_household':'close',
    'teacher_teacher_short':'far', 
    'teacher_teacher_long':'intermediate',
    'teacher_teacher_team_teaching':'intermediate',
    'teacher_teacher_daycare_supervision':'intermediate',
    'teaching_teacher_student':'far',
    'daycare_supervision_teacher_student':'far'
}
# Note: student_student_daycare overwrites student_student_intra_class and
# student_student_table_neighbour

# Note: teacher_teacher_daycare_supervision and teacher_teacher_team_teaching 
# overwrite teacher_teacher_short and teacher_teacher_long

### Teacher social contacts

Network density scores from an [article about interactions between teachers](https://academic.oup.com/her/article/23/1/62/834723?login=true) for "socialize with outside of school" (```r_friend```) and "engage in conversation regularly" (```r_conversation```).

In [9]:
r_teacher_friend = 0.059
r_teacher_conversation = 0.255

## Compose calibration schools

In [8]:
def run(params):
    school_type, i, N_floors = params
    
    N_classes = school_characteristics[school_type]['classes']
    class_size = school_characteristics[school_type]['students']
    
    school_name = '{}_classes-{}_students-{}'.format(school_type,\
            N_classes, class_size)
    
    G, teacher_schedule, student_schedule = csn.compose_school_graph(\
                school_type, N_classes, class_size, N_floors, p_children,
                p_parents, teacher_p_adults, teacher_p_children, 
                r_teacher_conversation, r_teacher_friend)

    # map the link types to contact types
    csn.map_contacts(G, contact_map)

    # we do not need family members that are not siblings for calibration
    # purposes -> remove them to have less agents in the simulation and
    # speed up the calibration runs
    family_members = [n for n, tp in G.nodes(data='type') \
                if tp in ['family_member_student', 'family_member_teacher']]
    G.remove_nodes_from(family_members)

    # save the graph
    nx.readwrite.gpickle.write_gpickle(G, join(dst,'{}/{}_{}.bz2'\
                        .format(school_type, school_name, i)), protocol=4)

    # extract & save the node list
    node_list = csn.get_node_list(G)
    node_list.to_csv(join(dst,'{}/{}_node_list_{}.csv')\
                        .format(school_type, school_name, i), index=False)
    # save the schedule
    if i==1:
        for s, atype in zip([teacher_schedule, student_schedule],\
                            ['teachers', 'students']):
            s.to_csv(join(dst, '{}/{}_schedule_{}.csv'\
                        .format(school_type, school_name, atype)))

In [21]:
# in principle there is functionality in place to generate contacts
# between students in different classes, depending on the floor the
# classes are on. We currently don't use this functionality, as 
# schools all implement measures to keep between-class-contacts to
# a minimum- Therefore floor specifications are not important for our
# school layout and we just assume that all classes are on the same
# floor.
N_floors = 1

school_types = ['primary', 'primary_dc', 'lower_secondary',
                'lower_secondary_dc', 'upper_secondary', 'secondary']

dst = '../../data/contact_networks/calibration'

N_networks = 2000

school_params = [(st, i, N_floors) for st in school_types[0:1] \
                                      for i in range(N_networks)]

number_of_cores = psutil.cpu_count(logical=True) - 2
pool = Pool(number_of_cores)

for row in tqdm(pool.imap_unordered(func=run, iterable=school_params),
                total=len(school_params)):
    pass

# turn off your parallel workers 
pool.close()

100%|██████████| 2000/2000 [41:25<00:00,  1.24s/it] 


## Calculate mean degrees

In [16]:
dst = '../../data/contact_networks/calibration'
N_networks = 2000
N_weekdays = 7

school_types = ['primary', 'primary_dc', 'lower_secondary',
                'lower_secondary_dc', 'upper_secondary', 'secondary']
wd_dict = {1:'school', 6:'weekend'}

degree_df = pd.DataFrame()
for school_type in school_types:
    print(school_type)
    N_classes = school_characteristics[school_type]['classes']
    class_size = school_characteristics[school_type]['students']
    
    school_name = '{}_classes-{}_students-{}'.format(school_type,\
            N_classes, class_size)
    
    for i in range(N_networks):
        if i%100 == 0:
            print(i)
        G = nx.readwrite.gpickle.read_gpickle(\
            join(dst, school_type, '{}_{}.bz2'.format(school_name, i)))
        
        weekday_connections = {}
        all_edges = G.edges(keys=True, data='weekday')
        N_weekdays = 7
        for j in [1, 6]:
            wd_edges = [(u, v, k) for (u, v, k, wd) in all_edges if wd == j]
            G_wd = G.edge_subgraph(wd_edges).copy()
            
            students = [x for x,y in G_wd.nodes(data=True) if y['type'] == 'student']
            teachers = [x for x,y in G_wd.nodes(data=True) if y['type'] == 'teacher']
            family_members = [x for x,y in G_wd.nodes(data=True) if y['type'] == 'family_member']

            student_degree = np.asarray([G_wd.degree(s) for s in students]).mean()
            teacher_degree = np.asarray([G_wd.degree(t) for t in teachers]).mean()
            family_member_degree = np.asarray([G_wd.degree(f) for f in family_members]).mean()

            degree_df = degree_df.append({
                'school_type':school_type,
                'network':i,
                'weekday':wd_dict[j],
                'student_degree':student_degree,
                'teacher_degree':teacher_degree,
                'family_member_degree':family_member_degree
            }, ignore_index=True)
            
degree_df.to_csv(join(dst, 'node_degrees.csv'), index=False)

primary
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
primary_dc
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
lower_secondary
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
lower_secondary_dc
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
upper_secondary
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
secondary
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900


In [17]:
degree_df.groupby(['school_type', 'weekday']).agg('mean')

Unnamed: 0_level_0,Unnamed: 1_level_0,family_member_degree,network,student_degree,teacher_degree
school_type,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
lower_secondary,school,2.814638,999.5,29.575295,80.379
lower_secondary,weekend,2.814638,999.5,2.610795,1.845816
lower_secondary_dc,school,2.819571,999.5,32.281389,52.287167
lower_secondary_dc,weekend,2.819571,999.5,2.651611,1.773107
primary,school,2.825838,999.5,23.585434,43.149292
primary,weekend,2.825838,999.5,2.628789,1.873385
primary_dc,school,2.820116,999.5,32.375612,43.308813
primary_dc,weekend,2.820116,999.5,2.615947,1.828166
secondary,school,2.789579,999.5,33.617989,99.580721
secondary,weekend,2.789579,999.5,2.696556,1.839221


In [18]:
# factor by which the node degree of teachers in secondary schools is larger
# than that in primary schools
100/43

2.3255813953488373