# Solution Template

Use this notebook as a guide to implement your solution. Keep in mind that some cells should remain as they are so that you code works properly, for instance, the following cell in which the required libraries are imported.

In [None]:
import pandas as pd
import numpy as np
import networkx as nx # for drawing graphs
import matplotlib.pyplot as plt # for drawing graphs
from pybbn.graph.dag import Bbn # for creating Bayesian Belief Networks (BBN)
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController

Just run the next cell to load the data.

In [None]:
diabetes = pd.read_csv('diabetes-dataset.csv')
diabetes.head()

Create a new column called `Overweight` in which a person whose `BMI` is above 25 will be tagged as a one, and zero otherwise.

In [None]:
diabetes['Overweight'] =

You are to code the next function, which discretize all the variables of the dataset, except `Outcome` and `Overweight`. Remember that you will discretize with respect to the quantiles of each variables: if a variable value is less than Q1, then said value is replaced by a **zero**; if the given value is greater or equal than Q1 but less than Q2, then the value should be replaced by a **one**; if the variable value is greater or equal than Q2 but less than Q3, then the value should be replaced by a **two**; finally, if a variable value is greater than Q3, it should be assigned the value **three**. 

In [None]:
def discretize(df):
    
    """
    This function receives a dataframe as input and returns a dataframe in which each variable has been 
    discretized. 
    """
    
    "INSERT YOUR CODE HERE"
    
    return discretized_df

In [None]:
discrete_df = discretize(diabetes)
discrete_df

In the following cel you are to create two dictionaries: `graph` will store the topology of the Bayesian network, so each element is associated to a list that contains the names of the parents of said element; `values` stores the values that each variable of the network takes, which are the discrete values that were computed above.

In [None]:
graph = {'Overweight': [], 
         'DiabetesPedigreeFunction': [], 
         'Age': [], 
         'Pregnancies': [],
         'SkinThickness' : [], 
         'BMI': [],
         'Outcome': [],
         'BloodPressure': [],
         'Insulin': [],
         'Glucose': []}

values = {'Overweight': [], 
          'DiabetesPedigreeFunction': [], 
          'Age': [], 
          'Pregnancies': [],
          'SkinThickness' : [], 
          'BMI': [],
          'Outcome': [],
          'BloodPressure': [],
          'Insulin': [],
          'Glucose': []}

The next function obtains the probabilities of a given node. This function will be used later to create a dictionary in which each element contains a node and its list of probabilities.

In [None]:
def probabilities(df, node):
    
    """
    This function computes the probabilities of a given node. It should receive a dataframe and the dictionaries
    graph and values. The probabilities shoud be stored in a list and returned in probabilities_list.
    """
    
    probabilities_list = []
    
    "INSERT YOUR CODE HERE"
    
    return probabilities_list

The following function must create a dictionary in which item is a node and its corresponding list of probabilities

In [None]:
def tables(df):
    
    """
    This function returns a dictionary in which each element is a node and its list of probabilities. It should 
    call the above function, probabilities, which computes the probabilities of a given node. 
    """
    
    probabilities_tables = {}
    
    "INSERT YOUR CODE HERE"
        
    return probabilities_tables

Create the nodes of the network in this cell. For each line, replace `"node index"` and the empty list by the proper variable name and variable values, respectively.

In [None]:
overweight = BbnNode(Variable("node index", 'Overweight', []), overweight_probabilities)
diabetes_pedigree_function = BbnNode(Variable("node index", 'DiabetesPedigreeFunction', []), diabetes_pedigree_function_probabilities)
age = BbnNode(Variable("node index", 'Age', []), age_probabilities)
pregnancies = BbnNode(Variable("node index", 'Pregnancies', []), pregnancies_probabilities)
skin_thickness = BbnNode(Variable("node index", 'SkinThickness', []), skin_thickness_probabilities)
bmi = BbnNode(Variable("node index", 'BMI', []), bmi_probabilities)
outcome = BbnNode(Variable("node index", 'Outcome', []), outcome_probabilities)
blood_pressure = BbnNode(Variable("node index", 'BloodPressure', []), blood_pressure_probabilities)
insulin = BbnNode(Variable("node index", 'Insulin', []), insulin_probabilities)
glucose = BbnNode(Variable("node index", 'Glucose', []), glucose_probabilities)

Implement your graph in the following cell. Add as many nodes and edges as necessary. Replace the strings by the proper variables.

In [None]:
bbn = Bbn() \
    .add_node("node name") \
    .add_edge(Edge("origin node", "destination node", EdgeType.DIRECTED)) \

Do not forget to run this cell and do not modify it, inferences depend on it.

In [None]:
# Convert the BBN to a join tree. Do not modify this cell.

join_tree = InferenceController.apply(bbn)

The following cell is very useful for visualizing your Bayesian network. It is very recommended that you make the necessary changes and run it to verify that your network was implementented correctly.

In [None]:
# Set node positions.

pos = {}

# Set options for graph looks. You might have to adjust these parameters.

options = {"font_size" : 16, "node_size" : 11000, "node_color" : "yellow", 
           "edgecolors" : "black", "edge_color" : "red", "linewidths" : 5, 
           "width": 5}
    
# Generate graph.

n, d = bbn.to_nx_graph()
nx.draw(n, with_labels=True, labels=d, pos=pos, **options)

# Update margins and print the graph.

ax = plt.gca()
ax.margins(0.3)
plt.axis("off")
plt.show()

The goal of `print_probs` is to print out the probability distributions of all the nodes of the network. You can modify this code to print only the distributions of certain nodes if you find that helpful.

In [None]:
# Define a function for printing marginal probabilities.

def print_probs():
    for node in join_tree.get_bbn_nodes():
        potential = join_tree.get_bbn_potential(node)
        print("Node:", node)
        print("Values:")
        print(potential)
        print('----------------')
    
# Use the above function to print marginal probabilities.

print_probs()

The function `evidence` helps tyou to create evidence that will be used for making inferences. Do not modify this cell, please.

In [None]:
# To add evidence of events that happened so probability distribution can be recalculated.

def evidence(ev, nod, val, like):
    ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name(nod)) \
    .with_evidence(val, like) \
    .build()
    join_tree.set_observation(ev)

Now you are ready to add evidence and print out the new distributions of your network. 

In [None]:
# Use above function to add evidence.

evidence('ev1', 'node name', 'value', 1)

# Print marginal probabilities.

print_probs()

If you need to reset the Bayesian network, rerun this line of code or rerun the above cell twice.

In [None]:
join_tree = InferenceController.apply(bbn)