<a href="https://colab.research.google.com/github/PabloAguirreSolana/Bayesian-Beleif-Networks/blob/main/Bayesian_Belief_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Using Bayesian Belief Networks to prove autocratic and democratic gobernment support in México using Latinobarómetro surveys (2018-2020)**

- This notebook aim is to provide the full code of a series of experiments that try to prove some hypothesis regarding the proness in mexican society to support an authoritarian regime vs a democractic one.

- The data bases used for this experiment were taken from Latinobarómetro 2018 and 2020. The data frames used for this notebook, had been previously curated and treated to suit this experiements. *(A notebook in this repository is provided to check what type of wrangling we did)*.

- The models of this notebook, do not pretend to be exahustive of a Probabilistic Bayesian Network, they are a baseline by which different evidence can be tested, depending on the variables included in the model.

- As a probabalistic model, results are dependent on the complexity of the  network, their conditional and independence probabilities, thus, further proof is needed to consider this model as a causal model.

- We suggest to interpret this model in terms of its directional probabilities given a series of conditions, that can change, if those conditions change.

In [None]:
#Import the necessary libraries
!pip install pybbn
import pandas as pd
from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController



In [None]:
#Load files
df18 = pd.read_excel('/content/df18.xlsx')
df20 = pd.read_excel('/content/df20.xlsx')

In [None]:
#For a simple sanity check, check if there are any missing value in the dataframe, BNN networks cannot handel NaNs.
percentage_missing_values18 = (df18.isnull().sum() / len(df18)) * 100
print(percentage_missing_values18)
percentage_missing_values20 = (df18.isnull().sum() / len(df18)) * 100
print(percentage_missing_values20)

Ciudad           0.0
Edad             0.0
Sexo             0.0
DemAut           0.0
Satdem           0.0
Aprobpres        0.0
ApoyoDem         0.0
Partido          0.0
Class            0.0
Educ             0.0
Benef            0.0
Ciudad_Label     0.0
Region           0.0
Estado           0.0
Localidad        0.0
DemAut_cat       0.0
Satdem_cat       0.0
Aprobpres_cat    0.0
ApoyoDem_cat     0.0
Partido_cat      0.0
dtype: float64
Ciudad           0.0
Edad             0.0
Sexo             0.0
DemAut           0.0
Satdem           0.0
Aprobpres        0.0
ApoyoDem         0.0
Partido          0.0
Class            0.0
Educ             0.0
Benef            0.0
Ciudad_Label     0.0
Region           0.0
Estado           0.0
Localidad        0.0
DemAut_cat       0.0
Satdem_cat       0.0
Aprobpres_cat    0.0
ApoyoDem_cat     0.0
Partido_cat      0.0
dtype: float64


- First we need to calculate the probabilities for each of the relations that we establish in the network, probabilities are nothing more
than frequencies from contingency tables of the different combination each node has with its edges.

- The probabilities calculated will be different for each data set, and for each combination of variables that will be included in the model.

# Model_2020

In [None]:
# Probabilities for Model_1 2020
# For our model we have the follwing probablities, each probability has to be transformed into a numpy array so that the network can be computed


prob_a20 = pd.crosstab(df20['Satdem_cat'], 'Empty', margins = False,
            normalize='columns').sort_index().to_numpy().reshape(-1).tolist()

prob_b20 = pd.crosstab(df20['Satdem_cat'],df20['ApoyoDem_cat'],
                 margins=False, normalize='index').sort_index().to_numpy().reshape(-1).tolist()


prob_c20 = pd.crosstab([df20['ApoyoDem_cat'],df20['Partido_cat']],df20['DemAut_cat'],
                 margins=False, normalize='index').sort_index().to_numpy().reshape(-1).tolist()

prob_d20 = pd.crosstab(df20['Partido_cat'], 'Empty', margins = False,
            normalize='columns').sort_index().to_numpy().reshape(-1).tolist()

In [None]:
# Lets round the probabilities to two decimal points
prob_a20 = [ round(elem, 2) for elem in prob_a20]
prob_b20 = [ round(elem, 2) for elem in prob_b20]
prob_c20 = [ round(elem, 2) for elem in prob_c20]
prob_d20 = [ round(elem, 2) for elem in prob_d20]

In [None]:
#Printing the probs help us to take a look at the different arrays that will serve as the network inputs
print(prob_a20)
print(prob_b20)
print(prob_c20)
print(prob_d20)

[0.6, 0.4]
[0.38, 0.62, 0.3, 0.7]
[0.46, 0.54, 0.43, 0.57, 0.42, 0.58, 0.3, 0.7, 0.26, 0.74, 0.29, 0.71]
[0.15, 0.28, 0.57]


In [None]:
# Create the nodes
a_20 = BbnNode(Variable(0, 'Sat_Dem', ['No_Satisfecho', 'Satisfecho']), prob_a20)
b_20 = BbnNode(Variable(1, 'Apoyo_Dem', ['Desacuerdo', 'De_Acuerdo']), prob_b20)
c_20 = BbnNode(Variable(2, 'Partido', ['ALIANZA', 'MORENA', 'Nosabe']), prob_c20)
d_20 = BbnNode(Variable(3, 'Aut_Dem', ['Autoritario', 'Democracia']), prob_d20)


In [None]:
#Create the network structure
bbn = Bbn() \
    .add_node(a_20) \
    .add_node(b_20) \
    .add_node(c_20) \
    .add_node(d_20) \
    .add_edge(Edge(a_20, b_20, EdgeType.DIRECTED)) \
    .add_edge(Edge(b_20, c_20, EdgeType.DIRECTED)) \
    .add_edge(Edge(d_20, c_20, EdgeType.DIRECTED))

join_tree = InferenceController.apply(bbn)

In [None]:
#Print the posterior probabilities for the baseline network
for node, posteriors in join_tree.get_posteriors().items():
    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
    print(f'{node} : {p}')

Apoyo_Dem : Desacuerdo=0.34050, De_Acuerdo=0.65950
Sat_Dem : No_Satisfecho=0.59937, Satisfecho=0.40063
Partido : ALIANZA=0.36497, MORENA=0.28499, Nosabe=0.35004
Aut_Dem : Autoritario=0.29599, Democracia=0.70401


In [None]:
#Insert an observation evidence, this can be any combination of the nodes lables
#Subsitute node in "get_bbn_node_by_name('')" and with_evidence('', 1.0)

ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('Sat_Dem')) \
    .with_evidence('Satisfecho', 1.0) \
    .build()
join_tree.set_observation(ev)

<pybbn.graph.jointree.JoinTree at 0x7f00c6d090c0>

In [None]:
#Print the posterior probabilities with new evidence
for node, posteriors in join_tree.get_posteriors().items():
    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
    print(f'{node} : {p}')

Apoyo_Dem : Desacuerdo=0.29307, De_Acuerdo=0.70693
Sat_Dem : No_Satisfecho=0.00000, Satisfecho=1.00000
Partido : ALIANZA=0.36609, MORENA=0.28364, Nosabe=0.35027
Aut_Dem : Autoritario=0.29370, Democracia=0.70630


# Model_2018


In [None]:
# Probabilities for Model_1 2018
# For our model we have the follwing probablities, each probability has to be transformed into a numpy array so that the network can be computed
# In this model we sustitue 'Aprobpres_cat' for 'Partido'

prob2_a18 = pd.crosstab(df18['Satdem_cat'], 'Empty', margins = False,
            normalize='columns').sort_index().to_numpy().reshape(-1).tolist()

prob2_b18 = pd.crosstab(df18['Satdem_cat'],df18['ApoyoDem_cat'],
                 margins=False, normalize='index').sort_index().to_numpy().reshape(-1).tolist()


prob2_c18 = pd.crosstab([df18['ApoyoDem_cat'],df18['Partido_cat']],df18['DemAut_cat'],
                 margins=False, normalize='index').sort_index().to_numpy().reshape(-1).tolist()

prob2_d18 = pd.crosstab(df18['Partido_cat'], 'Empty', margins = False,
            normalize='columns').sort_index().to_numpy().reshape(-1).tolist()

In [None]:
# Lets round the probabilities to two decimal points
prob2_a18 = [ round(elem, 2) for elem in prob2_a18]
prob2_b18 = [ round(elem, 2) for elem in prob2_b18]
prob2_c18 = [ round(elem, 2) for elem in prob2_c18]
prob2_d18 = [ round(elem, 2) for elem in prob2_d18]

In [None]:
#Printing the probs help us to take a look at the different arrays that will serve as the network inputs
print(prob2_a18)
print(prob2_b18)
print(prob2_c18)
print(prob2_d18)

[0.79, 0.21]
[0.39, 0.61, 0.2, 0.8]
[0.24, 0.76, 0.29, 0.71, 0.33, 0.67, 0.2, 0.8, 0.16, 0.84, 0.18, 0.82]
[0.22, 0.34, 0.44]


In [None]:
# Create the nodes
a2_18 = BbnNode(Variable(0, 'Sat_Dem', ['No_Satisfecho', 'Satisfecho']), prob2_a18)
b2_18 = BbnNode(Variable(1, 'Apoyo_Dem', ['Desacuerdo', 'De_Acuerdo']), prob2_b18)
c2_18 = BbnNode(Variable(2, 'Partido', ['ALIANZA', 'MORENA', 'Nosabe']), prob2_c18)
d2_18 = BbnNode(Variable(3, 'Aut_Dem', ['Autoritario', 'Democracia']), prob2_d18)


In [None]:
# create the network structure
bbn = Bbn() \
    .add_node(a2_18) \
    .add_node(b2_18) \
    .add_node(c2_18) \
    .add_node(d2_18) \
    .add_edge(Edge(a2_18, b2_18, EdgeType.DIRECTED)) \
    .add_edge(Edge(b2_18, c2_18, EdgeType.DIRECTED)) \
    .add_edge(Edge(d2_18, c2_18, EdgeType.DIRECTED))

join_tree = InferenceController.apply(bbn)

In [None]:
#Print the posterior probabilities for the baseline network
for node, posteriors in join_tree.get_posteriors().items():
    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
    print(f'{node} : {p}')

Apoyo_Dem : Desacuerdo=0.34605, De_Acuerdo=0.65395
Sat_Dem : No_Satisfecho=0.78944, Satisfecho=0.21056
Partido : ALIANZA=0.36238, MORENA=0.28786, Nosabe=0.34976
Aut_Dem : Autoritario=0.30298, Democracia=0.69702


In [None]:
#Insert an observation evidence, this can be any combination of the nodes lables
#Subsitute node in "get_bbn_node_by_name('')" and with_evidence('', 1.0)

ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('Apoyo_Dem')) \
    .with_evidence('De_Acuerdo', 1.0) \
    .build()
join_tree.set_observation(ev)

<pybbn.graph.jointree.JoinTree at 0x7f00c6b2ab90>

In [None]:
#Print the posterior probabilities with new evidence
for node, posteriors in join_tree.get_posteriors().items():
    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
    print(f'{node} : {p}')

Apoyo_Dem : Desacuerdo=0.00000, De_Acuerdo=1.00000
Sat_Dem : No_Satisfecho=0.00000, Satisfecho=1.00000
Partido : ALIANZA=0.37421, MORENA=0.26930, Nosabe=0.35649
Aut_Dem : Autoritario=0.28974, Democracia=0.71026


------------------------------------------------------------------------------------------------------------------------------------------------------------------------