## Covid Bayes Net


First we create a Pomegranate Bayesian Network using the protobuf interface and our utility package, bayes, that enables one to conveniently create a Bayesian network by hand without having to fill in every probability.  The same Baysian network can start with our hand entered guesses, but can later learn from data.  Any amount of questions can be answered, including no questions, to get the probabilities of having covid , the severity of illness, and the chances of going to the hospital.  The file the Bayesian net is written in is covid_bayes.py.  Our utility package can then be used to query the network with different patient states.  This is convenient for tuning hand entered parameters .



In [1]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install -e ../covid-bayesnet
#!{sys.executable} -m pip install protobuf

Requirement already up-to-date: pip in /usr/local/lib/python3.6/dist-packages (20.2.2)
Obtaining file:///home/opencog/covid-bayesnet
Installing collected packages: covid-bayes
  Running setup.py develop for covid-bayes
Successfully installed covid-bayes


In [2]:
pwd


'/home/opencog/covid-bayesnet'

In [3]:
!{sys.executable} -m pip install -r requirements.txt
!sh buildproto.sh



### Covid net
Here put the name of the network you have written with the protobuf and bayes utilities.  Our example is printed out.  We created 4 functions in our utility package: any, all, if_the_else, and avg, which can together be used to express almost any set of rules, documented elsewhere.  The last lines implement the creation of the description of the network,  bayesianNetwork , with the protobuf utilities, and then the creation of the Pomegranate network with the description.  

In [4]:
text_file = open('./sn_bayes/covid_bayes.py')
file_content = text_file.read()
print(file_content)
text_file.close()

import sn_bayes
from sn_bayes.utils import any
from sn_bayes.utils import all
from sn_bayes.utils import avg
from sn_bayes.utils import if_then_else
from sn_bayes.utils import bayesInitialize
from sn_bayes.utils import addCpt


import sn_service.service_spec.bayesian_pb2
from sn_service.service_spec.bayesian_pb2 import BayesianNetwork


def covid_bayes():
	bayesianNetwork = BayesianNetwork()



	#probabilities within distributions must sum to 1.0
	#questions left blank or "prefer not to answer" will be computed



	#basics/init

	discreteDistribution = bayesianNetwork.discreteDistributions.add()
	discreteDistribution.name = "acute_medical_condition"
	variable = discreteDistribution.variables.add()
	variable.name = "acute_medical_condition"
	variable.probability = 0.02
	variable = discreteDistribution.variables.add()
	variable.name = "no_acute_medical_condition"
	variable.probability = 0.98

	# basics/demographics questions 


	discreteDistribution = bayesianNetwork.discreteDistribution

 Next run the file.  All routines are stateless.

In [5]:
import sn_bayes
from sn_bayes import covid_bayes
#%run -i './bayes/covid_bayes.py'
#%run -i './bayes/test.py'
bayesianNetwork = covid_bayes.covid_bayes()

For convenience of creating a table, here are all the variables in the net in order.  They should be mapped into the individual buckets of the rules.  The variables up to "systemically_disadvantaged" are the ones that the user may answer.  The rest are computed.

In [6]:
import sn_bayes
from sn_bayes.utils import get_var_positions
var_positions = get_var_positions(bayesianNetwork)
var_positions

{'acute_medical_condition': 0,
 'age': 1,
 'sex': 2,
 'height_in_feet': 3,
 'weight_in_pounds': 4,
 'ethnicity': 5,
 'cardiovascular_disease': 6,
 'diabetes': 7,
 'hypertension': 8,
 'lung_disease': 9,
 'kidney_disease': 10,
 'cancer': 11,
 'immunocompromised': 12,
 'psychological_disorders': 13,
 'body_temperature': 14,
 'low_oxygen_symptoms': 15,
 'shortness_of_breath': 16,
 'cough': 17,
 'colored_spots_on_toes': 18,
 'hx_lung_disease': 19,
 'hx_family_lung_disease': 20,
 'muscle_weakness': 21,
 'difficulty_moving': 22,
 'neck_stiffness': 23,
 'low_urine': 24,
 'frequent_diarrhea': 25,
 'nausea': 26,
 'vomiting': 27,
 'decreased_smell_or_taste': 28,
 'sore_throat': 29,
 'pink_eye': 30,
 'headache': 31,
 'bmi': 32,
 'exposure': 33,
 'isolation_space': 34,
 'leaving_house_per_day': 35,
 'high_risk_place_per_week': 36,
 'deliveries_per_week': 37,
 'mask': 38,
 'public_transportation_per_week': 39,
 'workplace_social_distancing': 40,
 'neighbors_social_distancing': 41,
 'visits_per_week'

Here are the positions of every bucketed response for each of the above variables:

In [7]:
import sn_bayes
from sn_bayes.utils import get_var_val_positions
var_val_positions = get_var_val_positions(bayesianNetwork)
var_val_positions

{'acute_medical_condition': {'acute_medical_condition': 0,
  'no_acute_medical_condition': 1},
 'age': {'elderly': 0, 'adult': 1, 'young_adult': 2, 'teen': 3, 'child': 4},
 'sex': {'male': 0, 'female': 1},
 'height_in_feet': {'height_above_seven': 0,
  'height_six_to_seven': 1,
  'height_five_to_six': 2,
  'height_four_to_five': 3,
  'height_under_four': 4},
 'weight_in_pounds': {'weight_over_250': 0,
  'weight_175_to_220': 1,
  'weight_125_to_175': 2,
  'weight_100_to_125': 3,
  'weight_under_100': 4},
 'ethnicity': {'african_american': 0,
  'hispanic': 1,
  'ethnicity_other': 2,
  'african': 3,
  'middle_eastern': 4,
  'native_american': 5,
  'pacific_islander': 6,
  'asian': 7,
  'caucasian': 8},
 'cardiovascular_disease': {'cardiovascular_disease': 0,
  'no_cardiovascular_disease': 1},
 'diabetes': {'diabetes': 0, 'no_diabetes': 1},
 'hypertension': {'hypertension': 0, 'no_hypertension': 1},
 'lung_disease': {'lung_disease': 0, 'no_lung_disease': 1},
 'kidney_disease': {'kidney_dis

We create a spreadsheet of the above for reference

In [8]:
import pandas as pd

rows_list = []
outname = "varvals.csv"
for var, valdict in var_val_positions.items():
    rowdict = {} 
    rows_list.append(rowdict)
    varstr= var +"("
    for val, pos in valdict.items():
        varstr += val
        varstr+= ","
            
    varstr=varstr[:-1]+")"
    rowdict["variable"] = varstr        

df = pd.DataFrame(rows_list)      
df.to_csv(outname, index = False)

Here we create a spreadsheet that makes a Cartesian product of desired outputs with the remainder of the variables. This can be used to, for example, fill in a treatment recommendation for variables that are returned as causing the output in the explanation module 

In [9]:
import pandas as pd
#cols = ["leaves","internal","output"]
desired_output =['social_distancing',
                 'emergency_treatment',
                 'covid_risk',
                 'covid_severity']
rows_list = []
outname = "upshot.csv"

#put leaves in the first column, then internal vars along with the vars that feed them in the second.  
#and do a cartesian product with the outvars that are also internal vars (should be all of them)

for output in desired_output:
    for dist in bayesianNetwork.discreteDistributions:
        print(dist.name)
        rowdict = {} 
        rowdict["leaves"]= dist.name
        rowdict["internal"]=""
        rowdict["output"]= output
        rows_list.append(rowdict)
    for table in bayesianNetwork.conditionalProbabilityTables:
        print ("table: {}".format(table.name))
        rowdict = {}
        rowdict["leaves"]=""
        internal_str = table.name + " ("
        for var in table.randomVariables:
            print(var.name)
            internal_str += var.name
            internal_str+= ","
            
        internal_str=internal_str[:-1]+")"
        rowdict["internal"] = internal_str
        rowdict["output"]= output
        rows_list.append(rowdict)
df = pd.DataFrame(rows_list)      
df.to_csv(outname, index = False)

acute_medical_condition
age
sex
height_in_feet
weight_in_pounds
ethnicity
cardiovascular_disease
diabetes
hypertension
lung_disease
kidney_disease
cancer
immunocompromised
psychological_disorders
body_temperature
low_oxygen_symptoms
shortness_of_breath
cough
colored_spots_on_toes
hx_lung_disease
hx_family_lung_disease
muscle_weakness
difficulty_moving
neck_stiffness
low_urine
frequent_diarrhea
nausea
vomiting
decreased_smell_or_taste
sore_throat
pink_eye
headache
bmi
exposure
isolation_space
leaving_house_per_day
high_risk_place_per_week
deliveries_per_week
mask
public_transportation_per_week
workplace_social_distancing
neighbors_social_distancing
visits_per_week
local_govt_social_distancing
wash_hands_per_day
severe_neck_pain
tested
swab_test
antibody_test
saliva_test
hotspot_anomaly
heart_rate_anomaly
heart_rate_variability_anomaly
oxygen_anomaly
table: covid_test
swab_test
antibody_test
saliva_test
table: metabolic_disease
cardiovascular_disease
diabetes
hypertension
table: chronic_

Here is a one more csv that uses a utility function to express the net in  a tree form, in a data frame and then prints it out 

In [10]:
import sn_bayes
from sn_bayes.utils import make_tree
import pandas as pd
df = make_tree(bayesianNetwork)
outname = "tree.csv"
df.to_csv(outname, index = False)

In [11]:
df

Unnamed: 0,level0,level1,level2,level3,level4,level5,level6
0,acute_medical_condition),"covid_test(swab_test,antibody_test,saliva_test)","comorbidities(chronic_conditions,metabolic_dis...",serious_shortness_of_breath(shortness_of_breat...,"low_covid(shortness_of_breath,body_temperature...","covid_symptom_level(high_covid,medium_covid,lo...","covid_risk(covid_symptom_level,covid_environme..."
1,age),"metabolic_disease(cardiovascular_disease,diabe...",covid_symptoms(gastrointestinal_covid_symptoms...,"covid_vulnerabilities(covid_symptoms,social_di...","medium_covid(serious_shortness_of_breath,body_...",,"covid_risk_binary(covid_symptom_level,covid_en..."
2,sex),"chronic_conditions(lung_disease,cancer,kidney_...",social_distancing(social_distancing_environmen...,"covid_severity(age,comorbidities)","high_covid(cough,muscle_weakness,low_oxygen_sy...",,
3,height_in_feet),"demographics(age,ethnicity,bmi)",social_distancing_binary(social_distancing_env...,"covid_severity_binary(age,comorbidities)",,,
4,weight_in_pounds),"specific_covid_symptoms(colored_spots_on_toes,...","emergency_treatment(possible_dehydration,possi...",,,,
5,ethnicity),"head_and_neck_covid_symptoms(neck_stiffness,so...",,,,,
6,cardiovascular_disease),"gastrointestinal_covid_symptoms(low_urine,naus...",,,,,
7,diabetes),"personal_social_distancing(isolation_space,del...",,,,,
8,hypertension),social_distancing_connectedness(visits_per_wee...,,,,,
9,lung_disease),social_distancing_environment(workplace_social...,,,,,


Here is the description of the Bayesian network in protobuf that we just created with the running of the python file.  Only the "leaves" have initial probabilities (DiscreteDistribution), the conditional probabilities (ConditionalProbabilityTable) are to be computed:

In [12]:
bayesianNetwork

discreteDistributions {
  name: "acute_medical_condition"
  variables {
    name: "acute_medical_condition"
    probability: 0.019999999552965164
  }
  variables {
    name: "no_acute_medical_condition"
    probability: 0.9800000190734863
  }
}
discreteDistributions {
  name: "age"
  variables {
    name: "elderly"
    probability: 0.10000000149011612
  }
  variables {
    name: "adult"
    probability: 0.30000001192092896
  }
  variables {
    name: "young_adult"
    probability: 0.20000000298023224
  }
  variables {
    name: "teen"
    probability: 0.10000000149011612
  }
  variables {
    name: "child"
    probability: 0.10000000149011612
  }
}
discreteDistributions {
  name: "sex"
  variables {
    name: "male"
    probability: 0.5
  }
  variables {
    name: "female"
    probability: 0.5
  }
}
discreteDistributions {
  name: "height_in_feet"
  variables {
    name: "height_above_seven"
    probability: 0.05000000074505806
  }
  variables {
    name: "height_six_to_seven"
    prob

Here is the Pomegranate net we just made with the description.  Here we compile the net so we can compute probabilities with it.

In [13]:
from sn_bayes.utils import bayesInitialize
covid = bayesInitialize(bayesianNetwork)

In [14]:
covid.bake()

We we call a Pomegranate routine that shows the computed probabilities of every variable.  We have made our own utility that pulls out particular variables.

In [15]:
covid.predict_proba({}) 

array([{
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "acute_medical_condition" :0.01999999918043658,
            "no_acute_medical_condition" :0.9800000008195634
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "elderly" :0.12499999883584717,
            "adult" :0.3750000058207651,
            "young_adult" :0.24999999767169342,
            "teen" :0.12499999883584717,
            "child" :0.12499999883584717
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "male" :0.5,
            "female" :0.5
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
       

Here we use our query util to enter patient states as evidence (evidence), and then see the probabilities of particular variables we are interested in (outvars), in this case, the output variables.  It needs the compiled Pomegranate Bayesian network as well as the description of the network.  This routine can be used to enter a particular patients state , one question at a time, to get a continual change in probability of all states. Below, we first look at the probability of emergency treatment, covid risk, and covid severity, in general.  Then, we look at it given the information that the patient is elderly. We see that the chance of severity has increased. Adding more risk factors increases risk more, and adding more severity factors increases severity more.

In [16]:
import sn_bayes
from sn_bayes.utils import query
evidence = {}
outvars= ["social_distancing", "social_distancing_binary",
          "emergency_treatment",
          "covid_risk","covid_risk_binary",
          "covid_severity","covid_severity_binary"]
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2856514502734508,
  'no_emergency_treatment': 0.7143485497265493},
 'covid_risk': {'covid_risk': 0.123660833764136,
  'no_covid_risk': 0.8763391662358639},
 'covid_risk_binary': {'covid_risk': 0.44838607564917243,
  'no_covid_risk': 0.5516139243508277},
 'covid_severity': {'no_covid_severity': 0.012094859212994448,
  'medium_covid_severity': 0.49999999624452446,
  'low_covid_severity': 0.036284577638982865,
  'high_covid_severity': 0.4516205669034982},
 'covid_severity_binary': {'no_covid_severity': 0.048379436851977146,
  'covid_severity': 0.9516205631480229}}

In [17]:
evidence["age"]= "elderly"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2856514502734508,
  'no_emergency_treatment': 0.7143485497265493},
 'covid_risk': {'covid_risk': 0.12366083376413599,
  'no_covid_risk': 0.876339166235864},
 'covid_risk_binary': {'covid_risk': 0.449334803375233,
  'no_covid_risk': 0.550665196624767},
 'covid_severity': {'no_covid_severity': 0.0,
  'medium_covid_severity': 0.06391411890351985,
  'low_covid_severity': 0.0,
  'high_covid_severity': 0.9360858810964796},
 'covid_severity_binary': {'no_covid_severity': 0.0, 'covid_severity': 1.0}}

In [18]:
evidence["body_temperature"]= "body_temperature_above_102F"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2870088482520748,
  'no_emergency_treatment': 0.7129911517479252},
 'covid_risk': {'covid_risk': 0.12366083376413599,
  'no_covid_risk': 0.876339166235864},
 'covid_risk_binary': {'covid_risk': 1.0, 'no_covid_risk': 0.0},
 'covid_severity': {'no_covid_severity': 0.0,
  'medium_covid_severity': 0.06391411890351985,
  'low_covid_severity': 0.0,
  'high_covid_severity': 0.9360858810964796},
 'covid_severity_binary': {'no_covid_severity': 0.0, 'covid_severity': 1.0}}

In [19]:
evidence["diabetes"]= "diabetes"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2870088482520748,
  'no_emergency_treatment': 0.7129911517479252},
 'covid_risk': {'covid_risk': 0.12366083376413599,
  'no_covid_risk': 0.876339166235864},
 'covid_risk_binary': {'covid_risk': 1.0, 'no_covid_risk': 0.0},
 'covid_severity': {'no_covid_severity': 0.0,
  'medium_covid_severity': 0.0,
  'low_covid_severity': 0.0,
  'high_covid_severity': 1.0},
 'covid_severity_binary': {'no_covid_severity': 0.0, 'covid_severity': 1.0}}

In [20]:
evidence["diabetes_medication"]= "no_diabetes_medication"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2870088482520748,
  'no_emergency_treatment': 0.7129911517479252},
 'covid_risk': {'covid_risk': 0.12366083376413599,
  'no_covid_risk': 0.876339166235864},
 'covid_risk_binary': {'covid_risk': 1.0, 'no_covid_risk': 0.0},
 'covid_severity': {'no_covid_severity': 0.0,
  'medium_covid_severity': 0.0,
  'low_covid_severity': 0.0,
  'high_covid_severity': 1.0},
 'covid_severity_binary': {'no_covid_severity': 0.0, 'covid_severity': 1.0}}

In [21]:
evidence["hotspot_anomaly"]= "hotspot_anomaly"
results = query(covid,bayesianNetwork,evidence,outvars)
results


{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2870088482520748,
  'no_emergency_treatment': 0.7129911517479252},
 'covid_risk': {'covid_risk': 0.12366083376413602,
  'no_covid_risk': 0.876339166235864},
 'covid_risk_binary': {'covid_risk': 1.0, 'no_covid_risk': 0.0},
 'covid_severity': {'no_covid_severity': 0.0,
  'medium_covid_severity': 0.0,
  'low_covid_severity': 0.0,
  'high_covid_severity': 1.0},
 'covid_severity_binary': {'no_covid_severity': 0.0, 'covid_severity': 1.0}}

In [22]:
evidence["exposure"]= "exposure_in_family_not_isolated"
results = query(covid,bayesianNetwork,evidence,outvars)
results


{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2870088482520748,
  'no_emergency_treatment': 0.7129911517479252},
 'covid_risk': {'covid_risk': 0.12366083376413599,
  'no_covid_risk': 0.876339166235864},
 'covid_risk_binary': {'covid_risk': 1.0, 'no_covid_risk': 0.0},
 'covid_severity': {'no_covid_severity': 0.0,
  'medium_covid_severity': 0.0,
  'low_covid_severity': 0.0,
  'high_covid_severity': 1.0},
 'covid_severity_binary': {'no_covid_severity': 0.0, 'covid_severity': 1.0}}

Here is an example of using querys to debug the rules we wrote in covid_bayes.py .  We reset the evidence, and see that things we thought would have made a difference in risk or severity do not.  This is important feedback:  the reason is the overuse of the routine "avg" and not enough use of the routine "any" in our creation of the conditional probability tables in covid_bayes.py. 

In [23]:
evidence = {}
evidence["ethnicity"]= "african_american"
results = query(covid,bayesianNetwork,evidence,outvars)
results


{'social_distancing': {'no_social_distancing': 0.002432639946873795,
  'low_social_distancing': 0.8399619806253042,
  'high_social_distancing': 0.0,
  'medium_social_distancing': 0.15760537942782177},
 'social_distancing_binary': {'no_social_distancing': 0.8423946205721781,
  'social_distancing': 0.1576053794278218},
 'emergency_treatment': {'emergency_treatment': 0.2856514502734508,
  'no_emergency_treatment': 0.7143485497265493},
 'covid_risk': {'covid_risk': 0.12366083376413603,
  'no_covid_risk': 0.8763391662358638},
 'covid_risk_binary': {'covid_risk': 0.45030184629985154,
  'no_covid_risk': 0.5496981537001485},
 'covid_severity': {'no_covid_severity': 0.0038044118212736703,
  'medium_covid_severity': 0.4999999956268381,
  'low_covid_severity': 0.011413235463820547,
  'high_covid_severity': 0.4847823570880676},
 'covid_severity_binary': {'no_covid_severity': 0.015217647285094003,
  'covid_severity': 0.9847823527149059}}

Now start the server from somewhere else than here.  Use the first command uncommented if you are running in a notebook , or the second command if you are running from command line.

In [24]:
#%run -i './service/bayes_service.py' 
#python3 ./service/bayes_service.py


Here is a test of the server.  It is saving an ID for the network sent to it, in a pickle file of json.  If you run the service again, it will start at another id.

In [25]:
%run -i './test_bayes_service.py' auto

Exception
<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1598457046.317177605","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3948,"referenced_errors":[{"created":"@1598457046.317175517","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":394,"grpc_status":14}]}"
>
