# 2019-01-04 Exam Simulation

### General Instructions:

The following set of exercises are meant to give you a glimpse of what you will be asked to solve during an _actual_ examination session of **Python Programming (for Data Science)**. More specifically, each exercise below asks you either to answer some questions or to implement a certain function (possibly, you will need to implement your own "helper" functions as well, if doing so makes the overall task simpler to achieve). 
Plus, each exercise accounts for a certain number of points, which you will earn **if and only if** the answer you provide is correct or the implementation you come up with passes successfully **all** the tests (both those that are visible to you and those that are hidden).<br />

To actually write down your implementation, make sure to fill in any place that says <code style="color:green">**_# YOUR CODE HERE_**</code>. Note also that you should **either comment or delete** any <code style="color:green">**raise NotImplementedError()**</code> exception.<br />
Once you are done, save this notebook and rename it as follows:

<code>**YOURUSERNAME_2019-01-04.ipynb**</code>

where <code>**YOURUSERNAME**</code> is your actual username. To be consistent, we are expecting your username to be composed by your first name's initial, followed by your full lastname. As an example, in my case this notebook must be saved as <code>**gtolomei_2019-01-04.ipynb**</code> (Remember to insert an underscore <code>**'_'**</code> between your username and the date).<br />

Finally, go back to the [Moodle page](https://elearning.unipd.it/math/mod/assign/view.php?id=13250) and check for the "**2019-01-04 Exam Simulation**" item; there, you will be able to upload your notebook file for grading.
**NOTE:** As this is just a simulation, the grade you will get from it is "virtual", i.e., it will not affect your final mark; in fact, you might also opt not to submit your notebook for grading although by doing so you will not be able to get significant feedback from this simulation.

Note that there is no limit on the number of submissions; however, be careful when you upload a new version of this notebook because each submission overwrites the previous one. 
After the due date indicated above, the latest uploaded notebook will be considered as the one to be graded.

The archive you have downloaded (<code style="color:magenta">**2019-01-04-exam-simulation.tar**</code>) is organized according to the following directory structure:

<code style="color:red">**2019-01-04-exam-simulation**</code> (root)<br />
|----<code style="color:green">**2019-01-04.ipynb**</code> (_this_ notebook)<br />
|----<code>**dataset.csv**</code><br />
|----<code>**README.txt**</code>

**First Name** = Your _first name_ here

**Last Name** = Your _last name_ here

In [None]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Adding the following line, allows Jupyter Notebook to visualize plots
# produced by matplotlib directly below the code cell which generated those.
%matplotlib inline
import seaborn as sns
from nose.tools import assert_equal
from operator import itemgetter

EPSILON = .000001 # tiny tolerance for managing subtle differences resulting from floating point operations

DATASET_FILE = "dataset.csv"

# Part 1: General Coding (16 points)

For **Part 1**, you will be asked to use the dictionary below - called <code>**friends**</code> - which represents a small social network graph extracted from **10** characters of the _Friends_ TV series. Each entry of this dictionary contains the name of the character as key and a list of tuples as value; each tuple, in turn, is composed of two items: the name of a character and a float in the range **(0, 1]** which measures the strength of this relationship.<br /> 
Please, execute the cell right below and answer the following questions.

In [None]:
friends = {
    'Rachel Green': [('Ross Geller', 1.0), 
                     ('Paul Stevens', 0.46),
                     ('Chandler Bing', 0.73),
                     ('Monica Geller', 0.92),
                     ('Phoebe Buffay', 0.94),
                     ('Joey Tribbiani', 0.97),
                     ('Gunther', 0.01)
                    ],
    'Monica Geller': [('Ross Geller', 0.85),
                      ('Phoebe Buffay', 0.89),
                      ('Rachel Green', 0.89),
                      ('Chandler Bing', 1.0),
                      ('Joey Tribbiani', 0.90),
                      ('Janice Hosenstein', 0.07),
                      ('Gunther', 0.51),
                      ('Ursula Buffay', 0.42),
                      ('Paul Stevens', 0.27)
                     ],
    'Phoebe Buffay': [('Ursula Buffay', 0.01),
                      ('Joey Tribbiani', 0.96),
                      ('Monica Geller', 0.81),
                      ('Rachel Green', 0.88),
                      ('Chandler Bing', 0.64),
                      ('Ross Geller', 0.83),
                      ('Gunther', 0.25)
                     ],
    'Joey Tribbiani': [('Phoebe Buffay', 0.98),
                       ('Chandler Bing', 1.0),
                       ('Ross Geller', 0.99),
                       ('Rachel Green', 0.97),
                       ('Monica Geller', 0.95)
                      ],
    'Chandler Bing': [('Monica Geller', 1.0),
                      ('Joey Tribbiani', 1.0),
                      ('Ross Geller', 0.97),
                      ('Phoebe Buffay', 0.81),
                      ('Rachel Green', 0.68),
                      ('Gunther', 0.37),
                      ('Janice Hosenstein', 0.16)
                     ],
    'Ross Geller': [('Monica Geller', 0.95),
                    ('Joey Tribbiani', 0.96),
                    ('Chandler Bing', 0.96),
                    ('Phoebe Buffay', 0.91),
                    ('Rachel Green', 1.0),
                    ('Paul Stevens', 0.04)],
    'Gunther': [('Rachel Green', 1.0)],
    'Ursula Buffay': [],
    'Janice Hosenstein': [('Chandler Bing', 1.0),
                          ('Monica Geller', 0.01)
                         ],
    'Paul Stevens': [('Rachel Green', 0.88),
                     ('Ross Geller', 0.29),
                     ('Monica Geller', 0.57)
                    ]
}

## Exercise 1.1 (2 points)

Implement the function <code>**is_friend_of**</code> which takes as input two strings representing two characters, i.e., <code>**u**</code> and <code>**v**</code>, and returns <code>**True**</code> iff <code>**u**</code> is friend of <code>**v**</code>, <code>**False**</code> otherwise.

(**NOTE:** Friendhsip relation is not symmetric, and therefore <code>**u**</code> being friend of <code>**v**</code> **does not** necessarily imply <code>**v**</code> is friend of <code>**u**</code>.)

In [None]:
def is_friend_of(u, v):
    """
    Return True iff u is friend of v, False otherwise.
    """
    ### BEGIN SOLUTION
    if u in friends:
        friends_u = set([f_u[0] for f_u in friends[u]])
        return v in friends_u
    return False
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `is_friend_of` function
"""

# Tests
assert_equal(True, is_friend_of('Rachel Green', 'Ross Geller'))
assert_equal(False, is_friend_of('Paul Stevens', 'Joey Tribbiani'))
assert_equal(True, is_friend_of('Chandler Bing', 'Gunther'))
assert_equal(False, is_friend_of('Gunther', 'Chandler Bing'))
### BEGIN HIDDEN TESTS
assert_equal(True, is_friend_of('Joey Tribbiani', 'Phoebe Buffay'))
assert_equal(True, is_friend_of('Paul Stevens', 'Rachel Green'))
assert_equal(False, is_friend_of('Gunther', 'Janice Hosenstein'))
assert_equal(False, is_friend_of('Joey Tribbiani', 'Ursula Buffay'))
### END HIDDEN TESTS

## Exercise 1.2 (2 points)

Implement the function <code>**n_input_friends_of**</code> which takes as input a character <code>**u**</code> and returns the number of characters <code>**v**</code> that have <code>**u**</code> as one of their friends. In other words, this function computes the _indegree_ of the input node of the social graph.

(**HINT:** You can take advantage of the <code>**is_friend_of**</code> function implemented above...)

In [None]:
def n_input_friends_of(u):
    """
    Return the number of charachters that have u as one of their friends.
    """
    ### BEGIN SOLUTION
    n = 0
    for f in friends:
        if f != u:
            if is_friend_of(f, u):
                n += 1
    return n
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `n_input_friends_of` function
"""

# Tests
assert_equal(7, n_input_friends_of('Rachel Green'))
assert_equal(2, n_input_friends_of('Janice Hosenstein'))
assert_equal(5, n_input_friends_of('Joey Tribbiani'))
### BEGIN HIDDEN TESTS
assert_equal(2, n_input_friends_of('Ursula Buffay'))
assert_equal(3, n_input_friends_of('Paul Stevens'))
assert_equal(4, n_input_friends_of('Gunther'))
### END HIDDEN TESTS

## Exercise 1.3 (3 points)

Implement the function <code>**best_friend_of**</code> which takes as input a string representing a character <code>**u**</code>, and returns the name of its best friend. The best friend of any character <code>**u**</code> is the character <code>**v**</code> in <code>**u**</code>'s friend list having the **highest friendship score**. If more than one best friend is found, the function should return the first one in alphabetical order. Finally, the function should return <code>**None**</code> if <code>**u**</code> has no (best) friends at all.

(**HINT:** You can use <code>**itemgetter**</code> to sort a list of tuples according to a specific item within each tuple. For example: <code>**sorted(list_of_tuples, key=itemgetter(1))**</code> sorts <code>**list_of_tuples**</code> by the **second** element of each tuple.)

In [None]:
def best_friend_of(u):
    """
    Return the name of u's best friend, or None if this doesn't exist.
    """
    ### BEGIN SOLUTION
    if friends[u]:
        friends_u = sorted(friends[u], key=itemgetter(0))
        return sorted(friends_u, key=itemgetter(1), reverse=True)[0][0]
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `best_friend_of` function
"""

# Tests
assert_equal('Ross Geller', best_friend_of('Rachel Green'))
assert_equal('Joey Tribbiani', best_friend_of('Chandler Bing'))
assert_equal(None, best_friend_of('Ursula Buffay'))
### BEGIN HIDDEN TESTS
assert_equal('Rachel Green', best_friend_of('Paul Stevens'))
assert_equal('Chandler Bing', best_friend_of('Janice Hosenstein'))
assert_equal('Joey Tribbiani', best_friend_of('Phoebe Buffay'))
assert_equal('Rachel Green', best_friend_of('Ross Geller'))
### END HIDDEN TESTS

## Exercise 1.4 (4 points)

Implement the function <code>**friendship_stats**</code> which returns a custom data structure, i.e., a dictionary, where each key is a character and each value is a tuple containing the <code>**n_friends**</code>, <code>**min**</code>, <code>**max**</code>, <code>**avg**</code>, and <code>**median**</code> friendship scores (in this very specific order), computed across characters she is friend of. For any character who has no friends, the corresponding value in this dictionary will be the default tuple <code>(0, -1, -1, -1, -1)</code>.<br />
(Note that <code>**n_friends**</code> corresponds to the _outdegree_ of the node representing each character).

In [None]:
def friendship_stats():
    """
    Return a dictionary where each key is a character and each value is a tuple containing 
    the number of (outgoing) friends, min, max, avg, and median values of friendship score for that character.
    """
    friendship_stats = {} # This is the variable that you shall return
    ### BEGIN SOLUTION
    for f in friends:
        friendship_stats[f] = [u[1] for u in friends[f]]
    
    for f in friendship_stats:
        n_friends_f = len(friendship_stats[f])
        if n_friends_f > 0:
            stats = (n_friends_f,
                     np.min(friendship_stats[f]), 
                     np.max(friendship_stats[f]), 
                     np.mean(friendship_stats[f]), 
                     np.median(friendship_stats[f])
                    )
        else:
            stats = (n_friends_f, -1, -1, -1, -1)
        friendship_stats[f] = stats
    ### END SOLUTION
    return friendship_stats

In [None]:
"""
Test the correctness of the implementation of the `friendship_stats` function
"""

# Call off `friendship_stats` function
stats = friendship_stats()

# Tests
# number of (outgoing) friends of 'Ross Geller'
assert_equal(6, stats['Ross Geller'][0]) 
# min friendship score of (outgoing) friends of 'Rachel Green'
assert_equal(True, np.abs(0.01 - stats['Rachel Green'][1]) < EPSILON)
# max friendship score of (outgoing) friends of 'Ursula Buffay'
assert_equal(-1, stats['Ursula Buffay'][2])
# avg friendship score of (outgoing) friends of 'Chandler Bing'
assert_equal(True, np.abs(0.71285714285714286 - stats['Chandler Bing'][3]) < EPSILON)
# median friendship score of (outgoing) friends of 'Joey Tribbiani'
assert_equal(True, np.abs(0.97999999999999998 - stats['Joey Tribbiani'][4]) < EPSILON)
### BEGIN HIDDEN TESTS
# number of (outgoing) friends of 'Paul Stevens'
assert_equal(3, stats['Paul Stevens'][0]) 
# min friendship score of (outgoing) friends of 'Monica Geller'
assert_equal(True, np.abs(0.070000000000000007 - stats['Monica Geller'][1]) < EPSILON)
# max friendship score of (outgoing) friends of 'Phoebe Buffay'
assert_equal(True, np.abs(0.95999999999999996 - stats['Phoebe Buffay'][2]) < EPSILON)
# avg friendship score of (outgoing) friends of 'Gunther'
assert_equal(1.0, stats['Gunther'][3])
# median friendship score of (outgoing) friends of 'Janice Hosenstein'
assert_equal(True, np.abs(0.505 - stats['Janice Hosenstein'][4]) < EPSILON)
# check return type
assert_equal(dict, type(stats))
### END HIDDEN TESTS

## Exercise 1.5 (5 points)

Implement the function <code>**most_likely_friendship_chain**</code>, which takes as input **two lists of strings** representing two _paths_ on the social network graph above, i.e., <code>**path_i**</code> and <code>**path_j**</code>. For example, <code>**path_i = ['Ross Geller', 'Chandler Bing', 'Janice Hosenstein']**</code> and <code>**path_j = ['Monica Geller', 'Phoebe Buffay']**</code>.<br />
Instead of using the original dictionary <code>**friends**</code>, as you did for exercises above, here you will be using another data structure called <code>**markovian_friends**</code>, and created in the cell right below. This new data structure is a dictionary of dictionaries (rather than a dictionary of lists of tuples). Each key of <code>**markovian_friends**</code> is still the name of a _Friends_ character <code>**n**</code>, and each value associated to <code>**n**</code> is a nested dictionary whose keys are <code>**n**</code>'s friends and whose values represent the _probability_ of <code>**n**</code> reaching one of her/his friends.

The goal of <code>**most_likely_friendship_chain**</code> function is to return which path between <code>**path_i**</code> and <code>**path_j**</code> is the most likely, along with its associated probability. <br />
You can assume the likelihood of a path as being the **product of each individual probabilities** for moving from a character to the next one along the path (_Markov property_):
$$
\texttt{likelihood(['Ross Geller', 'Chandler Bing', 'Janice Hosenstein'])} = \\
P(\texttt{'Ross Geller'}, \texttt{'Chandler Bing'}) * P(\texttt{'Chandler Bing'}, \texttt{'Janice Hosenstein'})
$$

Note that if a path contains two **consecutive** characters that are not friends with each other, its overall likelihood falls back to <code>**0**</code>. Plus, if the likelihood of **both** <code>**path_i**</code> and <code>**path_j**</code> is <code>**0**</code>, the function should return <code>**None**</code>. Finally, if the two input paths have exactly the same likelihood, yet different from <code>**0**</code>, you can return either one or the other.

(**SUGGESTION:** Implement an helper function, e.g., <code>**path_likelihood**</code> which measures the overall likelihood of a **single** path; then, delegate off to this helper function inside the body of <code>**most_likely_friendship_chain**</code>, and return a tuple containing the most likely path along with the computed likelihood, or <code>**None**</code> if both paths have 0-likelihood).

In [None]:
import json

markovian_friends = {} # This is the data structure you will be using for solving this exercise

for f in friends:
    tot_friendship_score_f = np.sum([u[1] for u in friends[f]])
    markovian_friends[f] = dict([(u[0], u[1]/tot_friendship_score_f) for u in friends[f]])
    
print(json.dumps(markovian_friends, indent=4, sort_keys=True))

In [None]:
### SUGGESTION: Implement the function `path_likelihood` below which will return the likelihood of a single path
### If you don't want to use this approach, and you would rather prefer to implement 
### the function `most_likely_friendship_chain` directly, please remember to delete this helper function before!
def path_likelihood(path):
    ### BEGIN SOLUTION
    if len(path) == 1:
        return 1
    else:
        source = path[0]
        if path[1] in markovian_friends[source]:
            return markovian_friends[source][path[1]] * path_likelihood(path[1:])
        else:
            return 0
    ### END SOLUTION
        
def most_likely_friendship_chain(path_i, path_j):
    """
    Return a tuple (path, likelihood) where path is the most likely path between path_i and path_j 
    and likelihood is the actual likelihood value of the most likely path.
    Return None if both path_i and path_j represent 0-likelihood paths.
    """
    ### BEGIN SOLUTION
    if path_likelihood(path_i) != 0 or path_likelihood(path_j) != 0:
        if path_likelihood(path_i) >= path_likelihood(path_j):
            return (path_i, path_likelihood(path_i))
        else:
            return (path_j, path_likelihood(path_j))
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `most_likely_friendship_chain` function
"""

# Tests
assert_equal(['Monica Geller', 'Chandler Bing', 'Janice Hosenstein'], 
            most_likely_friendship_chain(['Joey Tribbiani', 'Ross Geller', 'Paul Stevens'], 
                                         ['Monica Geller', 'Chandler Bing', 'Janice Hosenstein'])[0])

assert_equal(True, np.abs(0.0055282979752608674 - 
            most_likely_friendship_chain(['Joey Tribbiani', 'Ross Geller', 'Paul Stevens'], 
                                         ['Monica Geller', 'Chandler Bing', 'Janice Hosenstein'])[1]) < EPSILON)

assert_equal(['Rachel Green', 'Ross Geller', 'Chandler Bing', 'Janice Hosenstein'], 
            most_likely_friendship_chain(['Ursula Buffay', 'Rachel Green'], 
                                         ['Rachel Green', 'Ross Geller', 'Chandler Bing', 'Janice Hosenstein'])[0])

assert_equal(True, np.abs(0.00126962553007 - 
            most_likely_friendship_chain(['Ursula Buffay', 'Rachel Green'], 
                                         ['Rachel Green', 'Ross Geller', 'Chandler Bing', 'Janice Hosenstein'])[1]) < EPSILON)

assert_equal(['Gunther'], 
            most_likely_friendship_chain(['Gunther'], 
                                         ['Rachel Green', 'Chandler Bing'])[0])

assert_equal(1, 
            most_likely_friendship_chain(['Gunther'], 
                                         ['Rachel Green', 'Chandler Bing'])[1])

assert_equal(None, 
            most_likely_friendship_chain(['Joey Tribbiani', 'Phoebe Buffay', 'Paul Stevens'], 
                                         ['Rachel Green', 'Janice Hosenstein']))
### BEGIN HIDDEN TESTS
assert_equal(['Rachel Green', 'Ross Geller', 'Monica Geller', 'Chandler Bing'], 
            most_likely_friendship_chain(['Rachel Green', 'Ross Geller', 'Monica Geller', 'Chandler Bing'], 
                                         ['Phoebe Buffay', 'Ursula Buffay'])[0])

assert_equal(True, np.abs(0.0067558591788800752 - 
                          most_likely_friendship_chain(['Rachel Green', 'Ross Geller', 'Monica Geller', 'Chandler Bing'], 
                                         ['Phoebe Buffay', 'Ursula Buffay'])[1]) < EPSILON)

assert_equal(None, 
            most_likely_friendship_chain(['Joey Tribbiani', 'Paul Stevens'], 
                                         ['Gunther', 'Janice Hosenstein', 'Chandler Bing']))

assert_equal(1, 
            most_likely_friendship_chain(['Gunther', 'Rachel Green'], 
                                         ['Ursula Buffay'])[1])
### END HIDDEN TESTS

# Part 2: Data Science (16 points)

In this part, you will be working with the dataset file <code>**dataset.csv**</code>. For a complete description of this data source, please refer to the <code>**README.txt**</code>.
In a nutshell, this dataset has a total of **303 instances** about cardiological patients.
Each patient (i.e., row in the file) is described by **13** features (i.e., columns) and labeled with an integer value (i.e., the 14th and last column called <code>**disease**</code>), ranging from **0** (meaning that the patient has no disease) to **4** (high-disease).<br />
The cell below is responsible for correctly loading the dataset from the <code>**dataset.csv**</code> file. Once this is executed, you can start answering the questions below.

In [None]:
# Load the dataset stored at `DATASET_FILE` using "," as field separator and '?' to detect NAs
columns = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 
           'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'disease']
data = pd.read_csv(DATASET_FILE, 
                   sep=",", 
                   header=None,
                   na_values='?',
                   names=columns)
data.head()

## Exercise 2.1 (2 points)

Implement the function <code>**idx_of_female_instances**</code> below, which takes as input a <code>**pandas.DataFrame**</code> object and returns the list of index values which correspond to instances (i.e., rows) representing **female** patients.

In [None]:
def idx_of_female_instances(data):
    """
    Return the list of index values which correspond to instances (i.e., rows) 
    representing female patients.
    """
    ### BEGIN SOLUTION
    return data[data.sex == 0].index.values
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `idx_of_female_instances` function
"""

assert_equal(4, idx_of_female_instances(data)[0])
assert_equal(7, idx_of_female_instances(data)[2])
assert_equal(21, idx_of_female_instances(data)[5])
### BEGIN HIDDEN TESTS
assert_equal(97, len(idx_of_female_instances(data)))
assert_equal(301, idx_of_female_instances(data)[-1])
assert_equal(294, idx_of_female_instances(data)[-3])
assert_equal(231, idx_of_female_instances(data)[73])
### END HIDDEN TESTS

## Exercise 2.2 (2 points)

Implement the function <code>**max_cholesterol**</code> below. This takes as input a <code>**pandas.DataFrame**</code> object and a categorical value <code>**disease**</code> (i.e., **0**, **1**, **2**, **3** or **4** to indicate all levels of possible disease), and returns the **maximum** value of cholesterol of patients having that level of disease.

In [None]:
def max_cholesterol(data, level_of_disease):
    """
    Return the maximum value of cholesterol for patients at a specific level of disease.
    """
    ### BEGIN SOLUTION
    return data[data.disease == level_of_disease].chol.max()
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `max_cholesterol` function
"""

assert_equal(564, max_cholesterol(data, 0)) # max cholesterol for patients with no disease
assert_equal(407, max_cholesterol(data, 4)) # max cholesterol for patients with maximum level of disease
### BEGIN HIDDEN TESTS
assert_equal(335, max_cholesterol(data, 1)) # max cholesterol for patients with level-1 disease
assert_equal(409, max_cholesterol(data, 2)) # max cholesterol for patients with level-2 disease
assert_equal(353, max_cholesterol(data, 3)) # max cholesterol for patients with level-3 disease
### END HIDDEN TESTS

## Exercise 2.3 (3 points)

Implement the function <code>**trestbps_stats**</code> below. This takes as input a <code>**pandas.DataFrame**</code> object and returns a tuple containing the min, max, avg, and median value of <code>**trestbps**</code>, yet computed on a _slice_ of the input <code>**pandas.DataFrame**</code>.<br />
The sliced dataset represents the subpopulation containing **male** patients whose age is **between 47 and 52 years old** (extreme included) and having a value of cholesterol (<code>**chol**</code>) above the _oveall_ aveage.

In [None]:
def trestbps_stats(data):
    """
    Return a tuple containing the min, max, avg, and median value of `trestbps` feature,
    yet limited to a slice of the input DataFrame (data). 
    In particular, this slice will contain instances referring to male patients (sex = 1)
    whose age is between 47 and 52 (extremes included) and having a value of cholesterol above the overall average.
    """
    ### BEGIN SOLUTION
    sliced_data = data[(data.sex == 1) & (data.age >= 47) & (data.age <= 52) & (data.chol > data.chol.mean())]
    #sliced_data = data[(data.sex == 1) & (data.age >= 47) & (data.age <= 52) & 
                        #(data.chol.map(lambda x: True if x > np.mean(data.chol) else False))]
    return (sliced_data.trestbps.min(), 
            sliced_data.trestbps.max(), 
            sliced_data.trestbps.mean(), 
            sliced_data.trestbps.median())
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `trestbps_stats` function
"""

# Call off `trestbps_stats` function
stats = trestbps_stats(data)

assert_equal(110, stats[0]) # assess minimum trestbps
assert_equal(152, stats[1]) # assess maximum trestbps
### BEGIN HIDDEN TESTS
assert_equal(True, np.abs(131.23076923076923 - stats[2]) < EPSILON) # assess average trestbps
assert_equal(130, stats[3]) # assess median trestbps
assert_equal(4, len(stats))
assert_equal(tuple, type(stats))
### END HIDDEN TESTS

## Exercise 2.4 (4 points)

Implement the function <code>**thalach_stats**</code> below. This takes as input a <code>**pandas.DataFrame**</code> object and returns a tuple containing the min, max, avg, and median value of <code>**thalach**</code>, yet computed on a _slice_ of the input <code>**pandas.DataFrame**</code>.<br />
The sliced dataset represents the subpopulation containing **female** patients whose age **is not yet 50 years old** **or** (female patients) having blood pressure (<code>**trestbps**</code>) below the median computed across female aged **25 or more**.

In [None]:
def thalach_stats(data):
    """
    Return a tuple containing the min, max, avg, and median value of `thalach` feature,
    yet limited to a slice of the input DataFrame (data). 
    In particular, this slice will contain instances referring to female patients (sex = 0)
    who is not yet 50 years old, or (female patients) having a value of blood pressure below the median of 
    female patients aged 25 ore more.
    """
    ### BEGIN SOLUTION
    sliced_data = data[(data.sex == 0) & (data.age < 50) | 
                       (data.sex == 0) & (data.trestbps < data[(data.sex == 0) & (data.age >= 25)].trestbps.median())]
    
    return (sliced_data.thalach.min(), 
            sliced_data.thalach.max(), 
            sliced_data.thalach.mean(), 
            sliced_data.thalach.median())
    ### END SOLUTION

In [None]:
"""
Test the correctness of the implementation of the `thalach_stats` function
"""

# Call off `thalach_stats` function
stats = thalach_stats(data)

assert_equal(96, stats[0]) # assess minimum thalach
assert_equal(192, stats[1]) # assess maximum thalach
# ### BEGIN HIDDEN TESTS
assert_equal(True, np.abs(151.8 - stats[2]) < EPSILON) # assess average thalach
assert_equal(159, stats[3]) # assess median thalach
assert_equal(4, len(stats))
assert_equal(tuple, type(stats))
# ### END HIDDEN TESTS

## Exercise 2.5 (5 points)

This exercise is made of **3** main questions, which you can answer independently to each other.

### Question 1 (1 point)

Feature labeled as <code>**thal**</code> represents a categorical variable which can take on **3** distinct values: 
-  **3** = normal 
-  **6** = fixed defect
-  **7** = reversable defect

Assign to the variable <code>**count_reversable_defect**</code> below the total number of patients in the dataset exhibiting a reversable defect value of <code>**thal**</code>.

In [None]:
count_reversable_defect = None

### BEGIN SOLUTION
count_reversable_defect = data.thal.value_counts()[7]
### END SOLUTION

In [None]:
"""
Test the correctness of the `count_reversable_defect`
"""

assert_equal(False, (count_reversable_defect == None))
### BEGIN HIDDEN TESTS
assert_equal(117, count_reversable_defect)
### END HIDDEN TESTS

### Question 2 (1 point)

Check whether the numerical feature <code>**trestbps**</code> (blood pressure) has any _outliers_ using box plot; assign the result of the plot to the variable <code>**box_plot_trestbps**</code>. In addition to that, set the variable <code>**has_trestbps_outliers**</code> to either <code>**True**</code> or <code>**False**</code> depending on whether you observe outliers from the box plot or not.

In [None]:
box_plot_trestbps = None
has_trestbps_outliers = None

### BEGIN SOLUTION
# Create a Figure containing 1x1 subplots
fig, ax = plt.subplots(1, 1, figsize=(8,6))
# Box plot 'trestbps'
box_plot_trestbps = sns.boxplot(data.loc[data.trestbps.notnull(), 'trestbps'], color='#0099cc', ax=ax)
has_trestbps_outliers = True
### END SOLUTION

In [None]:
"""
Test the correctness of `box_plot_trestbps` and `has_trestbps_outliers`
"""

assert_equal(False, (box_plot_trestbps == None))
assert_equal(False, (has_trestbps_outliers == None))
### BEGIN HIDDEN TESTS
assert_equal(True, has_trestbps_outliers)
### END HIDDEN TESTS

### Question 3 (3 points)

Implement the function <code>**compute_trestbps_fences**</code> below, which takes as input a <code>**pandas.DataFrame**</code> object and returns a two-item tuple (i.e., a pair), with the left and right **fence** value of <code>**trestbps**</code>.<br />
Left (resp., right) fence is the value corresponding to the left (resp., right) vertical bar displayed on a box plot, which determines the left (resp., right) boundary between "expected" observations and outliers. Both left and right fence are empirically computed as follows:

$$
F_\textrm{left} = Q_1 - 1.5 * \texttt{IQR};~~F_\textrm{right} = Q_3 + 1.5 * \texttt{IQR}
$$

where $Q_1$ and $Q_3$ represents the 1st and 3rd quartile of the distribution of interest, and $\texttt{IQR} = Q_3 - Q_1$.

(**HINT:** You can either invoke the <code>**quantile**</code> function defined on a <code>**pandas.Series**</code> object **or** use the <code>**numpy.percentile**</code> function which takes as input a <code>**pandas.Series**</code> object or, more generally, any object that can easily be converted into a <code>**numpy.array**</code>).

In [None]:
def compute_trestbps_fences(data):
    """
    Return a two-item tuple containing the left and right fence of `trestbps` feature, respectively
    """
    trestbps_fence_left = None # 1st item of the tuple to be returned
    trestbps_fence_right = None # 2nd item of the tuple to be returned
    # Assign the two variables above the correct values...
    ### BEGIN SOLUTION
    trestbps_q1, trestbps_q3 = data.loc[data.trestbps.notnull(), 'trestbps'].quantile([.25, .75])
    trestbps_IQR = (trestbps_q3 - trestbps_q1)
    trestbps_fence_left = trestbps_q1 - 1.5 * trestbps_IQR
    trestbps_fence_right = trestbps_q3 + 1.5 * trestbps_IQR
    ### END SOLUTION
    # Finally, return the tuple
    return (trestbps_fence_left, trestbps_fence_right)


In [None]:
"""
Test the correctness of the `compute_trestbps_fences`
"""

# Call off the function `compute_trestbps_fences`
trestbps_fence_left, trestbps_fence_right = compute_trestbps_fences(data)

assert_equal(False, trestbps_fence_left==None)
assert_equal(False, trestbps_fence_right==None)
assert_equal(80, (trestbps_fence_right - trestbps_fence_left))
### BEGIN HIDDEN TESTS
assert_equal(90, trestbps_fence_left)
assert_equal(170, trestbps_fence_right)
### END HIDDEN TESTS