<center><img src="https://javier.rodriguez.org.mx/itesm/2014/tecnologico-de-monterrey-blue.png" width="450" align="center"></center>
<br><p><center><h1><b>Bayesian Network Analysis of Cardiopulmonary Diseases and Symptoms Caused by Smoking and COVID-19</b></h1></center></p><br>
<p style="text-align: right;">Friday 23th, August, 2024</p>
<p style="text-align: right;">Alejandro Santiago Baca Eyssautier</p>

<br>

---

<p><h4> <b>1. Abstract</b> </h4></p>

This study presents an analysis of cardiopulmonary diseases and their associated symptoms using a Bayesian network. The network models the probabilistic relationships between smoking, various health conditions such as heart disease, lung cancer, and symptoms associated with COVID-19. The analysis demonstrates the impact of smoking on the likelihood of developing these conditions and extends to case studies of COVID-19 diagnosis based on symptom analysis. The results, derived from conditional probability tables, highlight the use of Bayesian networks as a powerful tool for medical diagnosis and decision-making.

*Keywords*: Bayesian Network, Cardiopulmonary Diseases, Smoking, COVID-19, Probabilistic Inference, Medical Diagnosis

<br>

---

<p><h4> <b>2. Introduction</b> </h4></p>

Cardiopulmonary diseases and COVID-19 are among the leading causes of morbidity and mortality worldwide. Smoking is a significant risk factor for cardiopulmonary diseases, while COVID-19 presents a complex set of symptoms that overlap with other respiratory illnesses like the flu and pneumonia. Understanding the relationships between smoking, heart disease, lung cancer, and COVID-19 symptoms is crucial for early diagnosis and effective treatment. Bayesian networks offer a probabilistic framework to model these relationships, allowing for the integration of uncertainty and the prediction of outcomes based on observed evidence.

This study employs a Bayesian network to model the relationships between smoking, heart disease, lung cancer, and COVID-19 symptoms. The network structure is designed based on clinical knowledge, and the conditional probabilities are derived from existing medical data. Through this model, we aim to quantify the impact of smoking on various health outcomes and extend the model to COVID-19 diagnosis, providing a tool for medical decision-making.

<br>

---

<p><h4> <b>3. Theoretical Framework</b> </h4></p><br>

**3.1 Bayesian Networks**

A Bayesian network is a graphical model that represents a set of variables and their conditional dependencies through a directed acyclic graph (DAG). Each node in the graph represents a variable, and the edges denote the conditional dependencies between these variables. The strength of these dependencies is quantified using conditional probability tables (CPTs).

- **Core Principle**: Bayesian networks rely on Bayes' Theorem, which allows for the computation of the posterior probability of a hypothesis given observed evidence. In the context of medical diagnosis, Bayesian networks can update the probability of a disease given the presence of symptoms or risk factors, such as smoking or COVID-19. (Jensen, F. V., 1996) (Korb, K. B., & Nicholson, A. E. , 2010) (Pearl, J., 1988)

The formula for updating the probability is:

$$P(H∣E)= \frac{P(E∣H)⋅P(H)}{P(E)}$$
​	
Where:
- $P(H∣E)$ is the posterior probability of the hypothesis $H$ (e.g., presence of disease) given evidence $E$ (e.g., symptoms).
- $P(E∣H)$ is the likelihood of observing the evidence given the hypothesis.
- $P(H)$ is the prior probability of the hypothesis.
- $P(E)$ is the marginal probability of the evidence.

<br>

---

<p><h4> <b>4. Methodology</b> </h4></p>

**4.1 Probabilities and Conditional Probability Tables (CPTs)**

The Bayesian network is constructed using the following variables:

1. **Smoking (Fumar)**: Indicates whether the individual smokes.
2. **Heart Disease (Corazón)**: Indicates the presence of heart disease.
3. **Lung Cancer (Cáncer Pulmón)**: Indicates the presence of lung cancer.
4. **Cough (Tos)**: Indicates whether the individual has a cough.
5. **Fatigue (Cansancio)**: Indicates whether the individual experiences fatigue.
6. **Tachycardia (Taquicardia)**: Indicates whether the individual has tachycardia.
7. **COVID-19 Symptoms**: Includes symptoms such as fever, dry cough, difficulty breathing, and loss of taste or smell.

The conditional probabilities for these variables are defined as follows:

- **P(Fumar)**: The probability of smoking.
- **P(Corazón | Fumar)**: The probability of heart disease given the smoking status.
- **P(Cáncer Pulmón | Fumar)**: The probability of lung cancer given the smoking status.
- **P(Tos | Fumar, Cáncer Pulmón)**: The probability of cough given smoking status and lung cancer.
- **P(Cansancio | Corazón, Cáncer Pulmón)**: The probability of fatigue given heart disease and lung cancer.
- **P(Taquicardia | Corazón)**: The probability of tachycardia given heart disease.
- **P(COVID-19 Symptoms)**: The probability of presenting symptoms related to COVID-19 based on various condition

<bn><br>

**4.2 Case Study: COVID-19 Symptom Analysis**

The World Health Organization (WHO) established the criteria to recognize COVID-19 symptoms. In this study, three cases were analyzed where patients' symptoms were recorded to determine whether they had the flu, pneumonia, or COVID-19. The symptoms of each patient were documented, and a Bayesian network was applied to estimate the probability of each disease based on the observed symptoms. This approach allowed for a probabilistic assessment of the likelihood of each condition, providing a quantitative basis for diagnosis.

The specific cases analyzed were as follows:

1. **Case 1**: The patient presented with a headache, fatigue, and difficulty breathing. Additional tests showed that the patient did not have a fever, dry cough, high fever, speech or mobility impairment, and maintained taste and smell.
2. **Case 2**: The patient arrived alone and was unable to speak or move. The symptoms presented included fever, high fever, dry cough, and difficulty breathing. Other symptoms were unknown due to the patient's condition.
3. **Case 3**: The patient presented with a mild fever, fatigue, dry cough, and difficulty breathing, without motor impairment or loss of taste or smell.

The Bayesian network was employed to analyze the symptoms presented by these patients and to determine the probability of flu, pneumonia, or COVID-19 based on the symptom profile provided.


---

<p><h4> <b>5. Results</b> </h4></p>

- **5.1 Example**:
Bayesian network of cardiopulmonary diseases and their symptoms caused by smoking.

<center><img src="assets/Figure1.png" width="450" align="center"></center>
<center><b>Figure 1:</b> Bayesian network representing the relationships between smoking, heart disease, lung cancer, and related symptoms.</p></center> 

<br>

<p>Probabilities associated with the Bayesian network in Figure 1:</p>
 
<br>

$$\begin{vmatrix} 
P(fumar) & fumar = si & fumar = no \\
  & 0.3  & 0.7
\end{vmatrix}$$

<p><center><b>Table 1:</b> Probability of smoking (Fumar).</center></p><br>

$$\begin{vmatrix} 
P(fumar) & corazón = si = si & fcorazón = no \\
si  & 0.3  & 0.7 \\
no  & 0.1  & 0.9
\end{vmatrix}$$

<p><center><b>Table 2:</b> Probability of heart disease (Corazón) given smoking status.</center></p><br>

$$\begin{vmatrix} 
P(cáncer pulmón) & cáncer pulmón = si & cáncer pulmón = no \\
si  & 0.35  & 0.65 \\
no  & 0.05 & 0.95
\end{vmatrix}$$

<p><center><b>Table 3:</b> Probability of lung cancer (Cáncer Pulmón) given smoking status.</center></p><br>

$$\begin{vmatrix} 
P(tos) & tos = si & tos = no \\
fumar = si, cáncer pulmón = si  & 0.99  & 0.01 \\
fumar = si & 0.95 & 0.05 \\ \\
cáncer pulmón =no  \\
fumar = no, cáncer pulmón = si & 0.7 & 0.3 \\
fumar = no, cáncer pulmón = no & 0.10  & 0.9 
\end{vmatrix}$$

<p><center><b>Table 4:</b> Probability of cough (Tos) given smoking status and lung cancer.</center></p><br>

$$\begin{vmatrix} 
P(cansancio) & cansancio = si & cansancio = no \\
corazón = si, cáncer pulmón = si  & 0.99  & 0.01 \\
corazón = si, cáncer pulmón = no  & 0.9  & 0.1 \\
corazón = no, cáncer pulmón = si  & 0.8  & 0.2 \\
corazón = no, cáncer pulmón = no  & 0.1  & 0.9 \\
\end{vmatrix}$$

<p><center><b>Table 5:</b> Probability of fatigue (Cansancio) given heart disease and lung cancer.</center></p><br>

$$\begin{vmatrix} 
P(taquicardia) & taquicardia = si & taquicardia = no \\
corazón = si  & 0.95  & 0.05 \\
corazón = no  & 0.05 & 0.95
\end{vmatrix}$$

<p><center><b>Table 6:</b> Probability of tachycardia (Taquicardia) given heart disease.</center></p><br>

- **5.2 Activity:** In Figure 2, we extend this Bayesian network analysis to determine the conditional probability P(B | E = presente) given observed symptoms:

<p><center><img src="assets/Figure2.png" width="450" align="center"></center> <br>
<center><b>Figure 2:</b> Detailed analysis of events a, b, c and d, on the conditional probability of a disease given the presence of symptoms, using Bayesian inference.</p> </center>

<br><p>Associated probabilities and calculations are as follows:</p>

<br>

$$\begin{vmatrix} 
P(a) & a = presente & a = ausente \\
& .2  & 0.8 
\end{vmatrix}$$

<p><center><b>Table 7:</b> Probability of event a.</center></p><br>

$$\begin{vmatrix} 
P(c | a,b) & c = presente & c = ausente \\
a = presente, b = presente  & .3  & 0.7 \\
a = presente, b = asuente  & .85  & 0.15 \\
a = asuente, b = presente  & .7  & 0.3 \\
a = asuente, b = asuente  & .6  & 0.4 \\
\end{vmatrix}$$

<p><center><b>Table 8:</b> Probability of event c given a and b.</center></p><br>

$$\begin{vmatrix} 
P(e | c, d) & e = presente & e = ausente \\
c = presente, d = presente  & .11  & 0.89 \\
c = presente, d = asuente  & .3 & 0.7 \\
c = asuente, d = presente  & .1  & 0.9 \\
c = asuente, d = asuente  & .9  & 0.1 \\
\end{vmatrix}$$

<p><center><b>Table 9:</b> Probability of event e given c and d.</center></p><br>

$$\begin{vmatrix} 
P(b) & b = presente & b = ausente \\
& .65  & 0.35 
\end{vmatrix}$$

<p><center><b>Table 10:</b> Probability of event b.</center></p><br>

$$\begin{vmatrix} 
P(d | c) & d = presente & d = ausente \\
c = presente  & .25  & 0.75 \\
c = ausente  & .9 & 0.1 \\
\end{vmatrix}$$

<p><center><b>Table 11:</b> Probability of event d given c.</center></p><br>

In the analysis, $P(B | E = presente) = 0.648$ as shown in Figure 3, and $P(B | E = ausente) = 0.651$ as depicted in Figure 4. Furthermore, Figure 5 presents the calculation for $P(E | A = presente, C = ausente) = 0.18$.

<p><center><img src="assets/Figure3.png" width="450" align="center"></center>
<br><p><center><b>Figure 3:</b> Calculation of the conditional probability P(B|E=presente)=0.648 using the Bayesian network.</center></p><br> 

<p><center><img src="assets/Figure4.png" width="450" align="center"></center> 
<br><p><center><b>Figure 4:</b> Calculation of the conditional probability P(B|E=ausente)=0.651 using the Bayesian network.</center></p><br>

<p><center><img src="assets/Figure5.png" width="450" align="center"></center> 
<br><p><center><b>Figure 5:</b> Calculation of the conditional probability P(E|A=presente, C=ausente)=0.18 using the Bayesian network.</center></p>

<br>

The results of the Bayesian network analysis provide insights into the relationships between smoking, heart disease, lung cancer, and associated symptoms. The network allows for the computation of posterior probabilities given observed symptoms or risk factors.

For example:
- $P(B | E = present) = 0.648$: This indicates that the probability of having the condition $B$ given that symptom  $E$ is present is 64.8%.
- $P(B | E = absent) = 0.651$: This indicates that the probability of having the condition $B$ given that symptom $E$ is absent is 65.1%.

Additionally, the probability of developing lung cancer given smoking status and other variables can be computed using the CPTs, allowing for a detailed assessment of health risks.

<br>

- **5.3 Case Study**:
For the analysis of COVID-19 symptoms, the following case study is presented.

<p><center><img src="assets/Figure6_CaseStudy.png" width="550" align="center"></center> 
<br><center><b>Figure 6:</b> Bayesian network analysis for case studies involving COVID-19 symptoms.</p> </center><br>

<p>The database allowed us to extract the following insights for the patients under study:</p> 
<p><center><img src="assets/Figure7_Database.png" width="800" align="center"></center> 
<center><br><b>Figure 7:</b> Symptom database for case studies.</p></center><br>

The report provided for each case shows the network's predictions and accuracy metrics.

<p><center><img src="assets/Figure8_Case1_Report.png" width="450" align="center"></center> 
<br><center><b>Figure 8:</b> Case 1: Patient report showing the network's predictions and accuracy for COVID-19 diagnosis.</p></center><br>

<p><center><img src="assets/Figure9_Case1.png" width="550" align="center"></center> 
<br><center><b>Figure 9:</b> Bayesian network analysis for Case 1, indicating the likelihood of flu, pneumonia, and COVID-19.</p> </center><br>

<p><center><img src="assets/Figure10_Case2_Report.png" width="450" align="center"></center> 
<br><center><b>Figure 10:</b> Case 2: Patient report showing the network's predictions and accuracy for COVID-19 diagnosis.</p></center><br>

 <p><center><img src="assets/Figure11_Case2.png" width="550" align="center"></center> 
 <br><center><b>Figure 11:</b> Bayesian network analysis for Case 2, indicating the likelihood of flu, pneumonia, and COVID-19.</p> </center><br>
 
 <p><center><img src="assets/Figure12_Case3_Report.png" width="450" align="center"></center> 
 <br><center><b>Figure 12:</b> Case 3: Patient report showing the network's predictions and accuracy for COVID-19 diagnosis.</p> </center><br>
 
 <p><center><img src="assets/Figure13_Case3.png" width="550" align="center"></center> 
 <br><center><b>Figure 13:</b> Bayesian network analysis for Case 3, indicating the likelihood of flu, pneumonia, and COVID-19.</p></center><br>

<br>

- **5.4 Case Study: COVID-19 Symptom Analysis Results**

The results of the Bayesian network analysis for the three cases are as follows:

1. **Case 1**:
    - 33.3% probability of flu
    - 61.1% probability of pneumonia
    - 30.1% probability of COVID-19
2. **Case 2:**
    - 38.1% probability of flu
    - 60.3% probability of pneumonia
    - 57.1% probability of COVID-19
3. **Case 3**:
    - 33.3% probability of flu
    - 61.1% probability of pneumonia
    - 30.1% probability of COVID-19

These cases illustrate how Bayesian networks can be utilized to diagnose COVID-19 and other respiratory conditions based on symptom analysis. By using the network, clinicians can quantitatively assess the likelihood of different diseases and make more informed decisions regarding diagnosis and treatment plans.

<br>

---

<p><h4> <b>6. Discussion</b> </h4></p>

The Bayesian network model provides a structured approach to understanding the impact of smoking and COVID-19 on cardiopulmonary diseases. The model's ability to update probabilities based on new evidence makes it a valuable tool for medical practitioners, particularly in diagnosing complex cases where multiple factors interact.

The findings underscore the significant influence of smoking on the likelihood of developing both heart disease and lung cancer. Furthermore, the model's application in diagnosing COVID-19 illustrates its flexibility and usefulness in addressing emerging health threats. The network's predictions about symptoms such as cough, fatigue, and tachycardia can help guide clinical decision-making, enabling more personalized and accurate treatment plans.

However, the model is limited by the assumptions inherent in the CPTs, which are based on existing data. Future work could involve refining these probabilities with more recent or specific clinical data to enhance the model's accuracy.

---

<p><h4> <b>7. Conclusion</b> </h4></p>

This study demonstrates the utility of Bayesian networks in modeling the relationships between smoking, COVID-19, and cardiopulmonary diseases. The network's ability to integrate multiple variables and update predictions based on evidence highlights its potential as a tool for medical diagnosis and decision-making.

Further research is needed to refine the model and validate its predictions in clinical settings. By incorporating more detailed and up-to-date data, the Bayesian network could become an even more powerful tool for understanding and mitigating the health risks associated with smoking and COVID-19.

---

<p><h4> <b>8. References</b> </h4></p>

- Pearl, J. (1988). *Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference*. Morgan Kaufmann.
- Korb, K. B., & Nicholson, A. E. (2010). *Bayesian Artificial Intelligence*. CRC Press.
- Jensen, F. V. (1996). *An Introduction to Bayesian Networks*. UCL Press.

---

<p><h4> <b>9. Appendix</b> </h4></p>

The appendix includes the Python code used to implement the Bayesian network analysis and the scripts used for probability calculations. It should be noted that while the provided code is a potential approach, it could not be executed correctly in the current environment. Therefore, this implementation is presented as a theoretical example.

---

<p><h4> <b>10. Python Implementation</b> </h4></p>

Below is a Python implementation of the Bayesian network described in the report:

In [None]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# Define the structure of the Bayesian Network
model = BayesianNetwork([('Fumar', 'Corazon'),
                         ('Fumar', 'CancerPulmon'),
                         ('Fumar', 'Tos'),
                         ('CancerPulmon', 'Tos'),
                         ('Corazon', 'Cansancio'),
                         ('CancerPulmon', 'Cansancio'),
                         ('Corazon', 'Taquicardia')])

# Define the Conditional Probability Distributions (CPDs)
cpd_fumar = TabularCPD(variable='Fumar', variable_card=2, values=[[0.3], [0.7]])
cpd_corazon = TabularCPD(variable='Corazon', variable_card=2,
                         values=[[0.3, 0.1],
                                 [0.7, 0.9]],
                         evidence=['Fumar'],
                         evidence_card=[2])
cpd_cancer_pulmon = TabularCPD(variable='CancerPulmon', variable_card=2,
                               values=[[0.35, 0.05],
                                       [0.65, 0.95]],
                               evidence=['Fumar'],
                               evidence_card=[2])
cpd_tos = TabularCPD(variable='Tos', variable_card=2,
                     values=[[0.99, 0.95, 0.7, 0.1],
                             [0.01, 0.05, 0.3, 0.9]],
                     evidence=['Fumar', 'CancerPulmon'],
                     evidence_card=[2, 2])
cpd_cansancio = TabularCPD(variable='Cansancio', variable_card=2,
                           values=[[0.99, 0.9, 0.8, 0.1],
                                   [0.01, 0.1, 0.2, 0.9]],
                           evidence=['Corazon', 'CancerPulmon'],
                           evidence_card=[2, 2])
cpd_taquicardia = TabularCPD(variable='Taquicardia', variable_card=2,
                             values=[[0.95, 0.05],
                                     [0.05, 0.95]],
                             evidence=['Corazon'],
                             evidence_card=[2])

# Add CPDs to the model
model.add_cpds(cpd_fumar, cpd_corazon, cpd_cancer_pulmon, cpd_tos, cpd_cansancio, cpd_taquicardia)

# Check if the model is correctly defined
assert model.check_model()

# Perform inference
infer = VariableElimination(model)

# Example query: P(B | E=present)
result = infer.query(variables=['Fumar'], evidence={'Taquicardia': 1})
print(result)

This code implements the Bayesian network described in the report using the pgmpy library. However, due to environmental constraints, it could not be executed successfully in the current setup. Therefore, this implementation serves as a theoretical framework for Bayesian network analysis in this context.