# Risk of COVID-19 transmission calculator

Using this Jupyter notebook to generate all the possible valid answer combinations for my quiz based on Figure 3 in [Jones N R, Qureshi Z U, Temple R J, Larwood J P J, Greenhalgh T, Bourouiba L et al. Two metres or one: what is the evidence for physical distancing in covid-19? *BMJ* 2020;370:m3223](https://www.bmj.com/content/370/bmj.m3223#F3).

A list of which variable correspond to which conditions is available in **COVID_QandA_logic.xlsx**.

In [1]:
import itertools

# Define the variables for each group
group_1 = ['a', 'b', 'c']
group_2 = ['d', 'e']
group_3 = ['f', 'g']
group_4 = ['h', 'i']
group_5 = ['j', 'k', 'l']

# Generate all possible combinations using itertools.product
combinations = list(itertools.product(group_1, group_2, group_3, group_4, group_5))

# Convert the combinations to a string format, e.g., 'a + d + f + h + j'
combinations_str = [' + '.join(combo) for combo in combinations]

# Display the first few combinations
for combo in combinations_str:
    print(combo)

a + d + f + h + j
a + d + f + h + k
a + d + f + h + l
a + d + f + i + j
a + d + f + i + k
a + d + f + i + l
a + d + g + h + j
a + d + g + h + k
a + d + g + h + l
a + d + g + i + j
a + d + g + i + k
a + d + g + i + l
a + e + f + h + j
a + e + f + h + k
a + e + f + h + l
a + e + f + i + j
a + e + f + i + k
a + e + f + i + l
a + e + g + h + j
a + e + g + h + k
a + e + g + h + l
a + e + g + i + j
a + e + g + i + k
a + e + g + i + l
b + d + f + h + j
b + d + f + h + k
b + d + f + h + l
b + d + f + i + j
b + d + f + i + k
b + d + f + i + l
b + d + g + h + j
b + d + g + h + k
b + d + g + h + l
b + d + g + i + j
b + d + g + i + k
b + d + g + i + l
b + e + f + h + j
b + e + f + h + k
b + e + f + h + l
b + e + f + i + j
b + e + f + i + k
b + e + f + i + l
b + e + g + h + j
b + e + g + h + k
b + e + g + h + l
b + e + g + i + j
b + e + g + i + k
b + e + g + i + l
c + d + f + h + j
c + d + f + h + k
c + d + f + h + l
c + d + f + i + j
c + d + f + i + k
c + d + f + i + l
c + d + g + h + j
c + d + g 

In [2]:
import pandas as pd

# Create a DataFrame with the combinations
df_combinations = pd.DataFrame(combinations_str, columns=['Combination'])

# Save the DataFrame to a CSV file
df_combinations.to_csv('combinations.csv', index=False)

# Display the first few rows of the DataFrame
df_combinations.head()

Unnamed: 0,Combination
0,a + d + f + h + j
1,a + d + f + h + k
2,a + d + f + h + l
3,a + d + f + i + j
4,a + d + f + i + k


In [3]:
# Creating new dataframe with both combinations and resulting categories. I manually added the categories based on the figure I'm working off.

df = pd.read_csv('combinations_with_categories.csv')

df.head()

Unnamed: 0,Combination,Category
0,a + d + f + h + j,Low risk
1,a + d + f + h + k,Low risk
2,a + d + f + h + l,Low risk
3,a + d + f + i + j,Low risk
4,a + d + f + i + k,Low risk


I think I have just replicated in Python what I had already tried to do in Excel, but at least this way I'm less likely to make errors.

In [5]:
# Let's run a few checks to confirm that I haven't made any basic data transcription errors. 

df['Category'].nunique()

4

In [6]:
df['Category'].value_counts()

Category
High risk          25
Medium risk        24
Low risk           21
Low/Medium risk     2
Name: count, dtype: int64

It was at this point I went down **a matrix algebra rabbit hole**. I don't recommend following me because it was a bad idea which ultimately didn't work. 

In [7]:
# First I need the 72 combinations as a list.

df['Combination'].tolist()

['a + d + f + h + j',
 'a + d + f + h + k',
 'a + d + f + h + l',
 'a + d + f + i + j',
 'a + d + f + i + k',
 'a + d + f + i + l',
 'a + d + g + h + j',
 'a + d + g + h + k',
 'a + d + g + h + l',
 'a + d + g + i + j',
 'a + d + g + i + k',
 'a + d + g + i + l',
 'a + e + f + h + j',
 'a + e + f + h + k',
 'a + e + f + h + l',
 'a + e + f + i + j',
 'a + e + f + i + k',
 'a + e + f + i + l',
 'a + e + g + h + j',
 'a + e + g + h + k',
 'a + e + g + h + l',
 'a + e + g + i + j',
 'a + e + g + i + k',
 'a + e + g + i + l',
 'b + d + f + h + j',
 'b + d + f + h + k',
 'b + d + f + h + l',
 'b + d + f + i + j',
 'b + d + f + i + k',
 'b + d + f + i + l',
 'b + d + g + h + j',
 'b + d + g + h + k',
 'b + d + g + h + l',
 'b + d + g + i + j',
 'b + d + g + i + k',
 'b + d + g + i + l',
 'b + e + f + h + j',
 'b + e + f + h + k',
 'b + e + f + h + l',
 'b + e + f + i + j',
 'b + e + f + i + k',
 'b + e + f + i + l',
 'b + e + g + h + j',
 'b + e + g + h + k',
 'b + e + g + h + l',
 'b + e + 

In [8]:
# Let's try this matrix algebra idea

import numpy as np

# List of all variables
variables = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

# List of combinations
combinations = [
    'a + d + f + h + j',
    'a + d + f + h + k',
    'a + d + f + h + l',
    'a + d + f + i + j',
    'a + d + f + i + k',
    'a + d + f + i + l',
    'a + d + g + h + j',
    'a + d + g + h + k',
    'a + d + g + h + l',
    'a + d + g + i + j',
    'a + d + g + i + k',
    'a + d + g + i + l',
    'a + e + f + h + j',
    'a + e + f + h + k',
    'a + e + f + h + l',
    'a + e + f + i + j',
    'a + e + f + i + k',
    'a + e + f + i + l',
    'a + e + g + h + j',
    'a + e + g + h + k',
    'a + e + g + h + l',
    'a + e + g + i + j',
    'a + e + g + i + k',
    'a + e + g + i + l',
    'b + d + f + h + j',
    'b + d + f + h + k',
    'b + d + f + h + l',
    'b + d + f + i + j',
    'b + d + f + i + k',
    'b + d + f + i + l',
    'b + d + g + h + j',
    'b + d + g + h + k',
    'b + d + g + h + l',
    'b + d + g + i + j',
    'b + d + g + i + k',
    'b + d + g + i + l',
    'b + e + f + h + j',
    'b + e + f + h + k',
    'b + e + f + h + l',
    'b + e + f + i + j',
    'b + e + f + i + k',
    'b + e + f + i + l',
    'b + e + g + h + j',
    'b + e + g + h + k',
    'b + e + g + h + l',
    'b + e + g + i + j',
    'b + e + g + i + k',
    'b + e + g + i + l',
    'c + d + f + h + j',
    'c + d + f + h + k',
    'c + d + f + h + l',
    'c + d + f + i + j',
    'c + d + f + i + k',
    'c + d + f + i + l',
    'c + d + g + h + j',
    'c + d + g + h + k',
    'c + d + g + h + l',
    'c + d + g + i + j',
    'c + d + g + i + k',
    'c + d + g + i + l',
    'c + e + f + h + j',
    'c + e + f + h + k',
    'c + e + f + h + l',
    'c + e + f + i + j',
    'c + e + f + i + k',
    'c + e + f + i + l',
    'c + e + g + h + j',
    'c + e + g + h + k',
    'c + e + g + h + l',
    'c + e + g + i + j',
    'c + e + g + i + k',
    'c + e + g + i + l'
]

# Function to convert each combination into a row of 0s and 1s
def combination_to_row(combination, variables):
    row = [0] * len(variables)
    
    # Remove the "+" sign and split by spaces
    terms = combination.replace(' + ', ' ').split()
    
    # Mark the variables present in this combination with 1
    for term in terms:
        index = variables.index(term)
        row[index] = 1
    
    return row

# Create a list of rows for the matrix (one row per combination)
matrix_rows = [combination_to_row(combination, variables) for combination in combinations]

# Convert to a numpy array for easier manipulation
matrix = np.array(matrix_rows)

# Print the matrix
print(matrix)

[[1 0 0 1 0 1 0 1 0 1 0 0]
 [1 0 0 1 0 1 0 1 0 0 1 0]
 [1 0 0 1 0 1 0 1 0 0 0 1]
 [1 0 0 1 0 1 0 0 1 1 0 0]
 [1 0 0 1 0 1 0 0 1 0 1 0]
 [1 0 0 1 0 1 0 0 1 0 0 1]
 [1 0 0 1 0 0 1 1 0 1 0 0]
 [1 0 0 1 0 0 1 1 0 0 1 0]
 [1 0 0 1 0 0 1 1 0 0 0 1]
 [1 0 0 1 0 0 1 0 1 1 0 0]
 [1 0 0 1 0 0 1 0 1 0 1 0]
 [1 0 0 1 0 0 1 0 1 0 0 1]
 [1 0 0 0 1 1 0 1 0 1 0 0]
 [1 0 0 0 1 1 0 1 0 0 1 0]
 [1 0 0 0 1 1 0 1 0 0 0 1]
 [1 0 0 0 1 1 0 0 1 1 0 0]
 [1 0 0 0 1 1 0 0 1 0 1 0]
 [1 0 0 0 1 1 0 0 1 0 0 1]
 [1 0 0 0 1 0 1 1 0 1 0 0]
 [1 0 0 0 1 0 1 1 0 0 1 0]
 [1 0 0 0 1 0 1 1 0 0 0 1]
 [1 0 0 0 1 0 1 0 1 1 0 0]
 [1 0 0 0 1 0 1 0 1 0 1 0]
 [1 0 0 0 1 0 1 0 1 0 0 1]
 [0 1 0 1 0 1 0 1 0 1 0 0]
 [0 1 0 1 0 1 0 1 0 0 1 0]
 [0 1 0 1 0 1 0 1 0 0 0 1]
 [0 1 0 1 0 1 0 0 1 1 0 0]
 [0 1 0 1 0 1 0 0 1 0 1 0]
 [0 1 0 1 0 1 0 0 1 0 0 1]
 [0 1 0 1 0 0 1 1 0 1 0 0]
 [0 1 0 1 0 0 1 1 0 0 1 0]
 [0 1 0 1 0 0 1 1 0 0 0 1]
 [0 1 0 1 0 0 1 0 1 1 0 0]
 [0 1 0 1 0 0 1 0 1 0 1 0]
 [0 1 0 1 0 0 1 0 1 0 0 1]
 [0 1 0 0 1 1 0 1 0 1 0 0]
 

In [11]:
matrix.dtype

dtype('int64')

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

df

Unnamed: 0,Combination,Category
0,a + d + f + h + j,Low risk
1,a + d + f + h + k,Low risk
2,a + d + f + h + l,Low risk
3,a + d + f + i + j,Low risk
4,a + d + f + i + k,Low risk
...,...,...
67,c + e + g + h + k,High risk
68,c + e + g + h + l,High risk
69,c + e + g + i + j,High risk
70,c + e + g + i + k,High risk


In [14]:
category_series = df['Category']

type(category_series)  # Check the type to confirm it's a Series

pandas.core.series.Series

In [15]:
category_series

0      Low risk
1      Low risk
2      Low risk
3      Low risk
4      Low risk
        ...    
67    High risk
68    High risk
69    High risk
70    High risk
71    High risk
Name: Category, Length: 72, dtype: object

In [17]:
X = np.array(matrix)
X

array([[1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0],
       [1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0],
       [1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0],
       [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1],
       [1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0],
       [1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0],
       [1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1],
       [1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0],
       [1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1],
       [1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0],
       [1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0],
       [1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0],
       [1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0],
       [1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1],
       [1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0],
       [1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0],
       [1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0],
       [1,

In [18]:
y = pd.Series(category_series)
y

0      Low risk
1      Low risk
2      Low risk
3      Low risk
4      Low risk
        ...    
67    High risk
68    High risk
69    High risk
70    High risk
71    High risk
Name: Category, Length: 72, dtype: object

In [19]:
category_series.value_counts()

Category
High risk          25
Medium risk        24
Low risk           21
Low/Medium risk     2
Name: count, dtype: int64

In [21]:
category_map = {'Low risk': 0, 'Low/Medium risk': 1, 'Medium risk':2, 'High risk': 3}
y_numeric = y.map(category_map)
y_numeric

0     0
1     0
2     0
3     0
4     0
     ..
67    3
68    3
69    3
70    3
71    3
Name: Category, Length: 72, dtype: int64

In [22]:
from sklearn.model_selection import train_test_split

# Split the data (X = features, y = labels) into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_numeric, test_size=0.3, random_state=42)

In [23]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the model
model = RandomForestClassifier(random_state=42)

# Train the model
model.fit(X_train, y_train)

0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [24]:
from sklearn.metrics import accuracy_score

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

Model Accuracy: 50.00%


Matrix algebra was a bad idea when I don't understand what is going on. So I'm going to code the calculator another way. 

In [25]:
df

Unnamed: 0,Combination,Category
0,a + d + f + h + j,Low risk
1,a + d + f + h + k,Low risk
2,a + d + f + h + l,Low risk
3,a + d + f + i + j,Low risk
4,a + d + f + i + k,Low risk
...,...,...
67,c + e + g + h + k,High risk
68,c + e + g + h + l,High risk
69,c + e + g + i + j,High risk
70,c + e + g + i + k,High risk


Hoping there is a way for the quiz to output a combination and for this combination to be looked up in the above dataframe (or csv) and the category returned.

**Spoiler:** There was, although it was easiest to turn the csv into a json first.