<span style="color:red">Aishwarya Supekar</span>

Here is a data set to pull your model parameters from.  For all fields, 0 means False and 1 means True.

Problem (Re)Statement:

* Shortness of breath (dyspnea) may be due to tuberculosis, lung cancer or bronchitis, or none of them, or more than one of them. 
* A recent visit to Asia increases the chances of tuberculosis.
* Smoking is known to be a risk factor for both lung cancer and bronchitis. 
* A positive chest X-ray suggests either lung cancer or tuberculosis, but cannot distinguish between them

In [None]:
from pandas import *
import pandas as pd
df = pandas.read_csv("asia.csv")

In [None]:
df.head()

<image src="asia.gif" size=200/>

<image src="asia.gif"/>


Begin by writing out your model.  For example here are names of some nodes, and the arcs that connect them.  The arrow -> means a parent/child relationship

<pre>
Asia                                   -> Tuberculosis

Tuberculosis, LungCancer, Bronchitis   -> Dyspnea


</pre>

<span style="color:red">
Informally write your model in this cell -- using the notation above. 
It will determine the parameters you will need to get from the data set
</span>

In [None]:
from pomegranate import *

# Smoking                   --------> Lung Cancer and Bronchitis

# Tuberculosis, Lung Cancer --------> X-ray result

In [None]:
(df.VisitToAsia.value_counts() / df.shape[0]).sort_index()

In [None]:
(df.Smoker.value_counts() / df.shape[0]).sort_index()

In [None]:
pd.crosstab(df.Tuberculosis, df.VisitToAsia, normalize='columns')

In [None]:
pd.crosstab(df.LungCancer, df.Smoker, normalize='columns')

In [None]:
pd.crosstab(df.Bronchitis, df.Smoker, normalize='columns')

In [None]:
pd.crosstab(df.Smoker, [df.LungCancer, df.Bronchitis], normalize='columns')

In [None]:
pd.crosstab(df.Bronchitis, df.Smoker, normalize='columns')

In [None]:
pd.crosstab(df.XRay, [df.Tuberculosis, df.LungCancer] , normalize='columns')

In [None]:
pd.crosstab(df.Dyspnea, [df.Tuberculosis, df.LungCancer, df.Bronchitis] , normalize='columns')

In [None]:
pd.crosstab(df.LungCancer, df.Tuberculosis, normalize='columns')

In [None]:
(df.Tuberculosis.value_counts() / df.shape[0]).sort_index()

In [None]:
pd.crosstab(df.VisitToAsia, df.LungCancer, normalize='columns')

In [None]:
pd.crosstab(df.XRay, df.LungCancer, normalize='columns')

Now define your distributions


In [None]:
#  All your distributions in this cell
VisitToAsia = DiscreteDistribution({True : 0.0094, False : 0.9906})

tuberculosis = ConditionalProbabilityTable(
                 [[True, True, 0.074468],
                 [True, False, 0.925532],
                 [False, True, 0.009085],
                 [False, False, 0.990915]], [VisitToAsia])

smoking = DiscreteDistribution({True : 0.4969, False : 0.5031})

lungcancer = ConditionalProbabilityTable(
                 [[True, True, 0.103844],
                 [True, False, 0.896156],
                 [False, True, 0.009143],
                 [False, False, 0.990857]], [smoking])

bronchitis = ConditionalProbabilityTable(
                 [[True, True, 0.600724],
                 [True, False, 0.399276],
                 [False, True, 0.291393],
                 [False, False, 0.708607]], [smoking])

xRay = ConditionalProbabilityTable(
                 [[True, True, True, 1.0],
                  [True, True, False, 0.0],
                  
                 [True, False, True, 0.977528],
                 [True, False, False,0.022472],
                  
                 [False, True, True , 0.971119],
                 [False, True, False, 0.028881],
                  
                 [False, False, True, 0.050273],
                 [False, False,False, 0.949727]], [tuberculosis, lungcancer])

dyspnea = ConditionalProbabilityTable(
                 [[True, True,  True, True, 1.0],
                  [True, True,  True, False, 0.0],
                  
                  [True, True,  False,True, 0.571429],
                  [True, True,  False,False, 0.428571],
                  
                  [True, False, True, True, 0.914286],
                  [True, False, True, False,0.085714],
                  
                  [True, False, False,True, 0.703704],
                  [True, False, False,False, 0.296296],
                  
                  [False, True, True, True, 0.899371],
                  [False, True, True, False, 0.100629],
                  
                  [False, True, False,True, 0.665254],
                  [False, True, False,False,0.334746],
                  
                  [False, False,True, True, 0.800586],
                  [False, False,True, False, 0.199414],
                  
                  [False, False,False,True,0.092536],
                  [False, False,False,False,0.907464]],[tuberculosis, lungcancer, bronchitis])

Next define the nodes in your network

In [None]:
# All your nodes in this cell
Asia = Node(VisitToAsia, "Asia")
Tuberculosis = Node(tuberculosis, "Tuberculosis")
Smoking = Node(smoking, "Smoking")
Lungcancer = Node(lungcancer, "Lungcancer")
Bronchitis = Node(bronchitis, "Bronchitis")
Xray = Node(xRay, "Xray")
Dyspnea = Node(dyspnea, "Dyspnea")


Define your model, adding states and edges

In [None]:
model = BayesianNetwork("Cancer")
model.add_states(Asia, Tuberculosis, Smoking, Lungcancer, Bronchitis, Xray, Dyspnea)
model.add_edge(Asia, Tuberculosis)
model.add_edge(Smoking, Lungcancer)
model.add_edge(Smoking, Bronchitis)
model.add_edge(Lungcancer, Xray)
model.add_edge(Tuberculosis, Xray)
model.add_edge(Tuberculosis, Dyspnea)
model.add_edge(Lungcancer, Dyspnea)
model.add_edge(Bronchitis, Dyspnea)
model.bake()
           
def nodeIndex(model, nodeName, probs):
    return list(map(lambda s: s.name, model.states)).index(nodeName)

def probDist(nodeName, model, evidence):
    return model.predict_proba(evidence)[nodeIndex(model, nodeName, model)].parameters[0]

------------------------------------------------

#### Questions

1.  What is the probability that an individual in the sampled population has either lung cancer or tuberculosis or both?

In [None]:
LungCancer_prob = probDist('Lungcancer', model, {})[True]

Tuberculosis_prob = probDist('Tuberculosis', model, {})[True]

# P(answer ) = P(Tuberculosis) + P(Lungcancer) - P(Tuberculosis and Lungcancer)
Answer = LungCancer_prob + Tuberculosis_prob - (LungCancer_prob * Tuberculosis_prob)

print(Answer)

Answer question 1:- The probabilty that an individual in the sampled population has either lung cancer or tuberculosis or both is nearly 6.5%

2.  What is the probability that an individual in the sampled population will have a positive chest X-ray?  

In [None]:
Positive_Xray =  probDist('Xray', model, {})
print(Positive_Xray[True])

Answer question 2:- The probability that an individual in the sampled population will have a positive chest X-ray is 11.0%.

3.  What is the probability that a smoker with a positive chest X-ray has lung cancer?  Does this probability depend on whether or not the individual has visited Asia?

In [None]:
Lungcancer_NoAsia =  probDist('Lungcancer', model, {'Smoking' : True, 'Xray' : True})[True]
print(Lungcancer_NoAsia)
                                     
Lungcancer_Asia =  probDist('Lungcancer', model, {'Smoking' : True, 'Xray' : True, 'Asia' : True})[True]
print(Lungcancer_Asia)

Answer question 3 - The probability that a smoker with a positive chest X-ray has lung cancer is 65.55%. If the individual has visited Asia the probability of lungcancer changes to 48.59% and if they do not visit Asia then the probability is 51.40%. This indicates that, if the individual has visited Asia, the probability of him to get an Lungcancer reduced by 64% than having a Tuberculosis, as visiting Asia increases the chances of Tuberculosis.

4.  How much does a trip to Asia affect the likelihood of an individual having Dyspnea?

In [None]:
Asia_visit = probDist('Dyspnea', model, {'Asia': True})[True]

NotAsia = probDist('Dyspnea', model, {'Asia': False})[True]

effect = (Asia_visit - NotAsia)/Asia_visit

print(effect)

Answer question 4:- Visit to Asia increases the likelihood of Dyspnea by 5.2%.

5.  Suppose you are a nonsmoker individual presenting with Dyspnea and you have never been to Asia. Based on this information what are the relative likelihoods that you have (a) Tuberculosis, (b) Lung Cancer, (c) Bronchitis, or (d) none of them?

In [None]:

tuberprob = probDist('Tuberculosis', model, {'Smoker' : False, 'Dyspnea': True, 'Asia' : False})[True]
print(tuberprob)

lung_cancer_prob = probDist('Lungcancer', model, {'Smoking' : False, 'Dyspnea': True, 'Asia': False})[True]
print(lung_cancer_prob)

bronchitis_prob = probDist('Bronchitis', model, {'Smoking' : False, 'Dyspnea': True, 'Asia': False})[True]
print(bronchitis_prob)

stayHealthy = 1 - (tuberprob[True] + lung_cancer_prob[True] + bronchitis_prob[True]) 
print('Stay Healthy', stayHealthy)


Answer question 5:-Depending on the given conditions, the probability of having Tuberculosis is 1.6%. The probability of having LungCancer is 2.1%. The probability of having Bronchitis is 7.6%. Also, the probability of having none of these is 20.0%.

6.  In your panic you have a chest XRay done, which comes out negative.   How does that change the relative likelihoods?

In [None]:
tuberculosis = probDist('Tuberculosis', model, {'Smoking' : False, 'Dyspnea': True, 'Asia': False, 'Xray': False})[True]

Lungcancer = probDist('Lungcancer', model, {'Smoking' : False, 'Dyspnea': True, 'Asia': False, 'Xray': False})[True]

bronchitis = probDist('Bronchitis', model, {'Smoking' : False, 'Dyspnea': True, 'Asia': False, 'Xray': False})[True]

stayHealthy = 1 - (tuberculosis[True] + Lungcancer[True] + bronchitis[True])


print('Tuberculosis probability is ', tuberculosis)

print('Lungcancer probability is', Lungcancer)

print('Brochitis probability is', bronchitis)

print('Stay Healthy probability is', stayHealthy)


Answer question 6:- Depending on the given conditions, the probability of having negative Xray changes the staying healthy probability to 21.86% from 20.0% which is an increase of 8.509% from having a positive chest Xray.

7.  On the basis of this information, should you seek medical attention?

Answer question 7:- As the probability of getting an Xray negative result is very high, it seems that we do not need to seek medical attention