## Covid Bayes Net


First we create a Pomegranate Bayesian Network using the protobuf interface and our utility package, bayes, that enables one to conveniently create a Bayesian network by hand without having to fill in every probability.  The same Baysian network can start with our hand entered guesses, but can later learn from data.  Any amount of questions can be answered, including no questions, to get the probabilities of having covid , the severity of illness, and the chances of going to the hospital.  The file the Bayesian net is written in is covid_bayes.py.  Our utility package can then be used to query the network with different patient states.  This is convenient for tuning hand entered parameters .



In [1]:
cd /home/opencog

/home/opencog


In [2]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install covid_bayesnet
#!{sys.executable} -m pip install protobuf

Requirement already up-to-date: pip in /usr/local/lib/python3.6/dist-packages (20.1.1)
[31mERROR: Could not find a version that satisfies the requirement covid_bayesnet (from versions: none)[0m
[31mERROR: No matching distribution found for covid_bayesnet[0m


In [3]:
cd ./covid-bayesnet


/home/opencog/covid-bayesnet


In [4]:
!{sys.executable} -m pip install -r requirements.txt
!sh buildproto.sh



### Covid net
Here put the name of the network you have written with the protobuf and bayes utilities.  Our example is printed out.  We created 4 functions in our utility package: any, all, if_the_else, and avg, which can together be used to express almost any set of rules, documented elsewhere.  The last lines implement the creation of the description of the network,  bayesianNetwork , with the protobuf utilities, and then the creation of the Pomegranate network with the description.  

In [5]:
text_file = open('./sn_bayes/covid_bayes.py')
#text_file = open('./sn_bayes/test.py')
file_content = text_file.read()
print(file_content)
text_file.close()

import sn_bayes
from sn_bayes.utils import any
from sn_bayes.utils import all
from sn_bayes.utils import avg
from sn_bayes.utils import if_then_else
from sn_bayes.utils import bayesInitialize
from sn_bayes.utils import addCpt


import sn_service.service_spec.bayesian_pb2
from sn_service.service_spec.bayesian_pb2 import BayesianNetwork


def covid_bayes():
	bayesianNetwork = BayesianNetwork()



	#probabilities within distributions must sum to 1.0
	#questions left blank or "prefer not to answer" will be computed



	#basics/init

	discreteDistribution = bayesianNetwork.discreteDistributions.add()
	discreteDistribution.name = "acute_medical_condition"
	variable = discreteDistribution.variables.add()
	variable.name = "acute_medical_condition"
	variable.probability = 0.02
	variable = discreteDistribution.variables.add()
	variable.name = "no_acute_medical_condition"
	variable.probability = 0.98

	# basics/demographics questions 


	discreteDistribution = bayesianNetwork.discreteDistribution

 Next run the file.  All routines are stateless.

In [6]:
import sn_bayes
from sn_bayes import covid_bayes
#%run -i './bayes/covid_bayes.py'
#%run -i './bayes/test.py'
bayesianNetwork = covid_bayes.covid_bayes()

For convenience of creating a table, here are all the variables in the net in order.  They should be mapped into the individual buckets of the rules.  The variables up to "systemically_disadvantaged" are the ones that the user may answer.  The rest are computed.

In [7]:
import sn_bayes
from sn_bayes.utils import get_var_positions
var_positions = get_var_positions(bayesianNetwork)
var_positions

{'acute_medical_condition': 0,
 'age': 1,
 'sex': 2,
 'heterosome': 3,
 'height_in_feet': 4,
 'weight_in_pounds': 5,
 'ethnicity': 6,
 'education': 7,
 'employment': 8,
 'marital_status': 9,
 'number_of_children': 10,
 'income_in_USD': 11,
 'community': 12,
 'pregnancy_in_months': 13,
 'sleep_quickly': 14,
 'sleep_in_hours': 15,
 'sex_per_month': 16,
 'cigarettes_per_week': 17,
 'cigars_per_week': 18,
 'hookah_per_week': 19,
 'snuff_per_week': 20,
 'vapes_per_week': 21,
 'alchohol_glasses_per_week': 22,
 'adderall_per_week': 23,
 'ritalin_per_week': 24,
 'cocaine_per_week': 25,
 'methamphetamine_per_week': 26,
 'ecstasy_per_week': 27,
 'speed_per_week': 28,
 'amphetamines_per_week': 29,
 'other_substance_per_week': 30,
 'opiods_per_week': 31,
 'depressants_per_week': 32,
 'cannabis_per_week': 33,
 'hallucinogens_per_week': 34,
 'dissociatives_per_week': 35,
 'inhalants_per_week': 36,
 'activity_level': 37,
 'lonely': 38,
 'close_confidants': 39,
 'regular_exams': 40,
 'blood_pressure':

Here are the positions of every bucketed response for each of the above variables:

In [8]:
import sn_bayes
from sn_bayes.utils import get_var_val_positions
var_val_positions = get_var_val_positions(bayesianNetwork)
var_val_positions

{'acute_medical_condition': {'acute_medical_condition': 0,
  'no_acute_medical_condition': 1},
 'age': {'elderly': 0, 'adult': 1, 'young_adult': 2, 'teen': 3, 'child': 4},
 'sex': {'male': 0, 'female': 1},
 'heterosome': {'other': 0, 'X': 1, 'XXY': 2, 'XYY': 3, 'XY': 4, 'XX': 5},
 'height_in_feet': {'height_above_seven': 0,
  'height_six_to_seven': 1,
  'height_five_to_six': 2,
  'height_four_to_five': 3,
  'height_under_four': 4},
 'weight_in_pounds': {'weight_over_250': 0,
  'weight_175_to_220': 1,
  'weight_125_to_175': 2,
  'weight_100_to_125': 3,
  'weight_under_100': 4},
 'ethnicity': {'african_american': 0,
  'hispanic': 1,
  'ethnicity_other': 2,
  'african': 3,
  'middle_eastern': 4,
  'native_american': 5,
  'pacific_islander': 6,
  'asian': 7,
  'caucasian': 8},
 'education': {'education_other': 0,
  'some_high_school': 1,
  'high_school': 2,
  'vocational': 3,
  'bachelors': 4,
  'masters': 5,
  'phd': 6},
 'employment': {'unemployed': 0,
  'employment_other': 1,
  'retired

Here is the description of the Bayesian network in protobuf that we just created with the running of the python file.  Only the "leaves" have initial probabilities (DiscreteDistribution), the conditional probabilities (ConditionalProbabilityTable) are to be computed:

In [9]:
bayesianNetwork

discreteDistributions {
  name: "acute_medical_condition"
  variables {
    name: "acute_medical_condition"
    probability: 0.019999999552965164
  }
  variables {
    name: "no_acute_medical_condition"
    probability: 0.9800000190734863
  }
}
discreteDistributions {
  name: "age"
  variables {
    name: "elderly"
    probability: 0.10000000149011612
  }
  variables {
    name: "adult"
    probability: 0.30000001192092896
  }
  variables {
    name: "young_adult"
    probability: 0.20000000298023224
  }
  variables {
    name: "teen"
    probability: 0.10000000149011612
  }
  variables {
    name: "child"
    probability: 0.10000000149011612
  }
}
discreteDistributions {
  name: "sex"
  variables {
    name: "male"
    probability: 0.5
  }
  variables {
    name: "female"
    probability: 0.5
  }
}
discreteDistributions {
  name: "heterosome"
  variables {
    name: "other"
    probability: 0.0005000000237487257
  }
  variables {
    name: "X"
    probability: 0.0005000000237487257
  

Here is the Pomegranate net we just made with the description.  Here we compile the net so we can compute probabilities with it.

In [10]:
from sn_bayes.utils import bayesInitialize
covid = bayesInitialize(bayesianNetwork)

In [11]:
covid.bake()

We we call a Pomegranate routine that shows the computed probabilities of every variable.  We have made our own utility that pulls out particular variables.

In [12]:
covid.predict_proba({}) 

array([{
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "acute_medical_condition" :0.01999999918043658,
            "no_acute_medical_condition" :0.9800000008195634
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "elderly" :0.12499999883584699,
            "adult" :0.37500000582076554,
            "young_adult" :0.2499999976716935,
            "teen" :0.12499999883584699,
            "child" :0.12499999883584699
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "male" :0.5,
            "female" :0.5
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
       

Here we use our query util to enter patient states as evidence (evidence), and then see the probabilities of particular variables we are interested in (outvars), in this case, the output variables.  It needs the compiled Pomegranate Bayesian network as well as the description of the network.  This routine can be used to enter a particular patients state , one question at a time, to get a continual change in probability of all states. Below, we first look at the probability of emergency treatment, covid risk, and covid severity, in general.  Then, we look at it given the information that the patient is elderly. We see that the chance of severity has increased. Adding more risk factors increases risk more, and adding more severity factors increases severity more.

In [13]:
import sn_bayes
from sn_bayes.utils import query
evidence = {}
outvars= ["emergency_treatment","covid_risk","covid_severity"]
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7143485497265493,
  'emergency_treatment': 0.2856514502734508},
 'covid_risk': {'low_covid_risk': 0.7687638067446652,
  'no_covid_risk': 0.0028869937004576017,
  'high_covid_risk': 0.0055997442631326895,
  'medium_covid_risk': 0.2227494552917446},
 'covid_severity': {'no_covid_severity': 0.11329080319947629,
  'high_covid_severity': 0.04683678341788609,
  'low_covid_severity': 0.33987240959842824,
  'medium_covid_severity': 0.5000000037842095}}

In [14]:
evidence["age"]= "elderly"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7143485497265493,
  'emergency_treatment': 0.2856514502734508},
 'covid_risk': {'low_covid_risk': 0.7687638067446652,
  'no_covid_risk': 0.0028869937004576017,
  'high_covid_risk': 0.0055997442631326895,
  'medium_covid_risk': 0.2227494552917446},
 'covid_severity': {'no_covid_severity': 0.0,
  'high_covid_severity': 0.0936735659633689,
  'low_covid_severity': 0.0,
  'medium_covid_severity': 0.9063264340366306}}

In [15]:
evidence["body_temperature"]= "body_temperature_above_102F"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7129911517479252,
  'emergency_treatment': 0.2870088482520748},
 'covid_risk': {'low_covid_risk': 0.14562325440229593,
  'no_covid_risk': 0.0,
  'high_covid_risk': 0.00779897000170426,
  'medium_covid_risk': 0.8465777755959996},
 'covid_severity': {'no_covid_severity': 0.0,
  'high_covid_severity': 0.0936735659633689,
  'low_covid_severity': 0.0,
  'medium_covid_severity': 0.9063264340366306}}

In [16]:
evidence["diabetes"]= "has_diabetes"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7129911517479252,
  'emergency_treatment': 0.2870088482520748},
 'covid_risk': {'low_covid_risk': 0.14562325440229593,
  'no_covid_risk': 0.0,
  'high_covid_risk': 0.00779897000170426,
  'medium_covid_risk': 0.8465777755959996},
 'covid_severity': {'no_covid_severity': 0.0,
  'high_covid_severity': 0.9535522593662626,
  'low_covid_severity': 0.0,
  'medium_covid_severity': 0.04644774063373701}}

In [17]:
evidence["diabetes_medication"]= "no_diabetes_medication"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7129911517479252,
  'emergency_treatment': 0.2870088482520748},
 'covid_risk': {'low_covid_risk': 0.14562325440229593,
  'no_covid_risk': 0.0,
  'high_covid_risk': 0.00779897000170426,
  'medium_covid_risk': 0.8465777755959996},
 'covid_severity': {'no_covid_severity': 0.0,
  'high_covid_severity': 1.0,
  'low_covid_severity': 0.0,
  'medium_covid_severity': 0.0}}

In [18]:
evidence["hotspot"]= "abnormally_high_hotspot"
results = query(covid,bayesianNetwork,evidence,outvars)
results


{'emergency_treatment': {'no_emergency_treatment': 0.7129911517479252,
  'emergency_treatment': 0.2870088482520748},
 'covid_risk': {'low_covid_risk': 0.0,
  'no_covid_risk': 0.0,
  'high_covid_risk': 0.12477070140934861,
  'medium_covid_risk': 0.8752292985906509},
 'covid_severity': {'no_covid_severity': 0.0,
  'high_covid_severity': 1.0,
  'low_covid_severity': 0.0,
  'medium_covid_severity': 0.0}}

In [19]:
evidence["exposure"]= "exposure_in_family_not_isolated"
results = query(covid,bayesianNetwork,evidence,outvars)
results


{'emergency_treatment': {'no_emergency_treatment': 0.7129911517479252,
  'emergency_treatment': 0.2870088482520748},
 'covid_risk': {'low_covid_risk': 0.0,
  'no_covid_risk': 0.0,
  'high_covid_risk': 1.0,
  'medium_covid_risk': 0.0},
 'covid_severity': {'no_covid_severity': 0.0,
  'high_covid_severity': 1.0,
  'low_covid_severity': 0.0,
  'medium_covid_severity': 0.0}}

Here is an example of using querys to debug the rules we wrote in covid_bayes.py .  We reset the evidence, and see that things we thought would have made a difference in risk or severity do not.  This is important feedback:  the reason is the overuse of the routine "avg" and not enough use of the routine "any" in our creation of the conditional probability tables in covid_bayes.py. 

In [20]:
evidence = {}
evidence["ethnicity"]= "african_american"
results = query(covid,bayesianNetwork,evidence,outvars)
results


{'emergency_treatment': {'no_emergency_treatment': 0.7143485497265493,
  'emergency_treatment': 0.2856514502734508},
 'covid_risk': {'low_covid_risk': 0.7687638067446652,
  'no_covid_risk': 0.0028869937004576017,
  'high_covid_risk': 0.0055997442631326895,
  'medium_covid_risk': 0.2227494552917446},
 'covid_severity': {'no_covid_severity': 0.11325280074670177,
  'high_covid_severity': 0.04698879323181571,
  'low_covid_severity': 0.3397584022401046,
  'medium_covid_severity': 0.500000003781378}}

In [21]:
evidence["education"]= "some_high_school"
evidence["income_in_USD"]= "under_25k_USD"
evidence["employment"]= "unemployed"
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7143485497265493,
  'emergency_treatment': 0.2856514502734508},
 'covid_risk': {'low_covid_risk': 0.7687638067446652,
  'no_covid_risk': 0.0028869937004576017,
  'high_covid_risk': 0.0055997442631326895,
  'medium_covid_risk': 0.2227494552917446},
 'covid_severity': {'no_covid_severity': 0.1131864608124822,
  'high_covid_severity': 0.04725415297363656,
  'low_covid_severity': 0.33955938243744593,
  'medium_covid_severity': 0.5000000037764353}}

In [22]:
outvars.append("social_distancing")
results = query(covid,bayesianNetwork,evidence,outvars)
results

{'emergency_treatment': {'no_emergency_treatment': 0.7143485497265493,
  'emergency_treatment': 0.2856514502734508},
 'covid_risk': {'low_covid_risk': 0.7687638067446652,
  'no_covid_risk': 0.0028869937004576017,
  'high_covid_risk': 0.0055997442631326895,
  'medium_covid_risk': 0.2227494552917446},
 'covid_severity': {'no_covid_severity': 0.1131864608124822,
  'high_covid_severity': 0.04725415297363656,
  'low_covid_severity': 0.33955938243744593,
  'medium_covid_severity': 0.5000000037764353},
 'social_distancing': {'safe_social_distancing': 0.010031699771306418,
  'some_social_distancing': 0.9433590249336508,
  'no_social_distancing': 0.04660927529504287}}

Now start the server from somewhere else than here.  Use the first command uncommented if you are running in a notebook , or the second command if you are running from command line.

In [23]:
#%run -i './sn_service/bayes_service.py' 
#python3 ./sn_service/bayes_service.py


Here is a test of the server.  It is saving an ID for the network sent to it, in a pickle file of json.  If you run the service again, it will start at another id.

In [25]:
#This script runs the baseline above, where evidence is {}, to get the priors:
%run -i './test_bayes_service.py' auto

response.id
1
response.error_msg

response.varAnswers
[var_num: 136
varStates {
  probability: 0.28565144538879395
}
varStates {
  state_num: 1
  probability: 0.714348554611206
}
, var_num: 137
varStates {
  state_num: 3
  probability: 0.0028869938105344772
}
varStates {
  probability: 0.005599744152277708
}
varStates {
  state_num: 2
  probability: 0.7687637805938721
}
varStates {
  state_num: 1
  probability: 0.22274945676326752
}
, var_num: 138
varStates {
  state_num: 3
  probability: 0.11329080164432526
}
varStates {
  state_num: 2
  probability: 0.33987241983413696
}
varStates {
  probability: 0.04683678224682808
}
varStates {
  state_num: 1
  probability: 0.5
}
]
response.error_msg



In [28]:
#to find an explanation before an explanation function is implemented :

evidence["hotspot"]= "abnormally_high_hotspot"
covid.predict_proba(evidence) 

array([{
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "acute_medical_condition" :0.01999999918043658,
            "no_acute_medical_condition" :0.9800000008195634
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "elderly" :0.12499999883584699,
            "adult" :0.37500000582076554,
            "young_adult" :0.2499999976716935,
            "teen" :0.12499999883584699,
            "child" :0.12499999883584699
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "male" :0.5,
            "female" :0.5
        }
    ],
    "frozen" :false
},
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
       