![alt text][top-banner]

[top-banner]: ./callysto-top-banner.jpg

In [73]:
%%html

<script>
 function code_toggle() {
   if (code_shown){
     $('div.input').hide('500');
     $('#toggleButton').val('Show Code')
   } else {
     $('div.input').show('500');
     $('#toggleButton').val('Hide Code')
   }
   code_shown = !code_shown
 }

 $( document ).ready(function(){
   code_shown=false;
   $('div.input').hide()
 });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>

In [74]:
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from ipywidgets import interact, widgets, Button, Layout

from scipy import stats
from collections import Counter
from array import array
from statistics import mode
import IPython
import pandas

# Effect of Outliers on Central Tendency
This notebook(CC-66) will explore the effect of outliers on central tendency (mean, median, and mode).

This notebook is organized as follows. In section 1, we discuss the preliminaries, definition of outliers and followed by two primary examples. In the next section,  we first compute the basic central tendency without outliers by importing data set from local folder. In section 3, once we've covered the basics, you'll have the opportunity to use some basic Python tools to adjust a dataset by adding one or more outliers, and then observe the effect on mean, median, and mode. In the following section, we use python library as a data source. In section 5, we summarized the notebook. Section 6, represents the exercises problem for students. 

## <font color='Blue'> 1 Preliminaries </font> 
To get the very basic ideas of central tendency you may visit another notebook [CC-65](CC65-Central_Tendency.ipynb) 



### 1.1 Outlier : Out, liar
A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.

##### 1.1.1 Example 
Data set $= 2, 26, 23, 27, 25, 28, 29, 24, 99 $
<br> $2$ and $99$ are samller and larger from other data respectively.
<br>Therefore, the outlier of the data set are $2$ and $99$


##### 1.1.2 Example 
In math-00 course, ten students obtained following marks (out of $100$): </br>
Marks $= 78, 87, 85, 96, 84, 92, 102, 79, 81, 97$

In the data set there is no data that is too smaller or larger than the other values. But as we are given the full marks of the exam was out of $100$ that means an individual cen get at best $100$ mark in the course. So, the mark that exceed this maximum range would be fraud value in the data set. Therefore, $102$ is the outlier in the data set.  

##   <font color='Blue'> 2 Central Tendency Computation: Sodium content </font>

Now we will calculate the mean, median, and mode for a dataset stored in a local text (csv) file. This particular data happens to be the sodium content per serving in a selection of supermarket items. You'll notice one number in the list below that's far greater than the others: this is the outlier in our dataset. (The product? In case you were wondering, it's soy sauce.)

### 2.1 Import Data
The block of code below imports our data. The first line imports the file. The next lines tell Jupyter how to read the file, and the last line tells Jupyter to display the information it read from the file.

In [75]:
import csv
data = []
with open('exampleData.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        data.append(int(row[1]))
print("Data read from the file: ", data)

Data read from the file:  [871, 1250, 6458, 270, 250, 205, 340, 843, 482, 521, 450, 360, 780, 340, 510, 460, 380, 290, 458, 335, 355, 590, 600, 595, 547, 1036, 410, 190, 530]


Now that we've loaded our data, we need to process it. We're using the Python language, which relies on a number of preconfigured programs called *libraries* to accomplish tasks. The first lines of code below import the tools we'll need. 

### 2.2 Calcualtion : General function
After that, we define a *function*, named "computeCenTendency", which we will use to calculate the mean, median, and mode of our dataset. (Together, the mean, median, and mode statistics are known as *measures of central tendency* since they are used to measure the extent to which our data is concentrated around a central value.)

In [76]:
def computeCenTendency(dataset):
    
    #mean value
    mean= np.mean(dataset)
    print("Mean: ", round(mean,3))

    #median value
    # First we need to sort the data in ascending order
    dataset.sort()
    median = np.median(dataset)
    print("Median: ", round(median,3))
    
    # Mode
    hits = []
    for item in dataset:
        tally = dataset.count(item)
        #Makes a tuple that is the number of huts paired with the relevant number
        values = (tally, item)
        # Only add one entry for each number in the set
        if values not in hits:
            hits.append(values)
    hits.sort(reverse=True)
    if hits[0][0]>hits[1][0]:
        print("Mode:", round(hits[0][1],3), "(appeared", hits[0][0], "times.)")
    else:
        print("There is no mode")

    Counter(dataset)
    return mean, median, hits[0][1]


### 2.3 Calculation : Using the function
Now we call the above defined function. The require parameters of the function is the list of data. 

In [77]:
centralTendency = []
centralTendency = computeCenTendency(data)

Mean:  714.0
Median:  460.0
Mode: 340 (appeared 2 times.)


### 2.4 Histogram

Next, we will produce a **histogram** of our data. A histogram is a useful plot for visualizing how the values in our dataset are distributed. Again, we begin by importing the Python libraries needed to accomplish the task. Then we define our histogram plot. The definition depends on a number of parameters, such the number of 'bins'. When there are a lot of data points, we don't plot each point individually. Instead, we group together nearby values. Each such grouping, given by a range of values, is called a *bin*. The height of each bar in the histogram is given by the number of data points in the corresponding bin.

#### 2.4.1 General function
First, we will imlement a general function for plotting histogram so that we can use it for every example. 

In [78]:
def plotHistogram(x_values, num_bins, xLabel, yLabel, histTitle):
    n, bins, patches = plt.hist(x_values, num_bins, facecolor='blue', alpha=0.5)

    plt.xlabel(xLabel)
    plt.ylabel(yLabel)
    plt.title(histTitle)
    plt.show()

Now we define another function to get the value of *bin* value size from the user and generate a histogram based on the value.

In [79]:

def callPlottingFunction(num_bins):
    print("Generating... plot for :", num_bins)
    plotHistogram(data, num_bins , 'Sodium Content', 'values', 'Histogram of 30 products in Australian supermarkets')



#### 2.4.2 Interactive histogram
To present the histogram in a interactive way, here we have used python widgets called slider. We invoke a built in function named *interact()*, that automatically creates user interface controls for exploring code and data interactively. 

In the following slider, we can adjust the *bin* size (labeled as *num_bins* ) and generate histogram with the value of num_bins. Student can change the value between the range of $0$ to $100$. By default the value of *num_bins* is set to 50. To generate the histogram for different values click anywhere on the slider. 

N.B. The interactivity here is a little rough: the graph disappears and reappears each time the slider value changes!

In [80]:
interact(callPlottingFunction, num_bins=widgets.IntSlider(min=0,max=100,step=1,value=50));

## <font color='Blue'> 3 Exploring the effect of outliers </font> 

In the example above, the sodium content in the soy sauce produced a clear outlier. Next, we will begin with a dataset that does not initially contain any outliers, and compute its mean, median, and mode. You will then have the opportunity to add additional data points to the set, and then compute their effect on these statistics.

### 3.1 Dataset without outlier
We remove the entry of sodium content in the soy sauce from the above used csv file. Now we perform another iteration to compute every task that we defined in section 2.  

In [81]:
dataWithoutOutlier = []
with open('exampleData_NoOutlier.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        dataWithoutOutlier.append(int(row[1]))
print("Data read from the file: ", dataWithoutOutlier)

centralTendencyWithoutOutlier = []
centralTendencyWithoutOutlier = computeCenTendency(dataWithoutOutlier)

def callPlottingFunctionNoOutlier(num_bins):
    print("Generating... plot for :", num_bins)
    plotHistogram(dataWithoutOutlier, num_bins , 'Sodium Content', 'values', 'Histogram of 30 products in Australian supermarkets')

interact(callPlottingFunctionNoOutlier, num_bins=widgets.IntSlider(min=0,max=100,step=1,value=50));

Data read from the file:  [871, 1250, 270, 250, 205, 340, 843, 482, 521, 450, 360, 780, 340, 510, 460, 380, 290, 458, 335, 355, 590, 600, 595, 547, 1036, 410, 190, 530]
Mean:  508.857
Median:  459.0
Mode: 340 (appeared 2 times.)


### 3.2 Adding Outlier
Here, we provide an opportunity so that student can add outlier by their own choice. First, they have to enter how many outlier(s) they want to add. Then the program will take the input outlier(s).  

In [82]:
inputs = []
numbers = int(input("Number(s) of Outlier : "))
inputs = [input("Outlier " + str(i+1) + ": ")  for i in range(numbers)]
inputs = list(map(int, inputs))

datasetWithOutlier = []
datasetWithOutlier = dataWithoutOutlier + inputs

if(numbers == 1):
    print("Dataset with outlier (last", numbers, "entry was added) : ", datasetWithOutlier)
else:
    print("Dataset with outliers (last", numbers, "entries were added) : ", datasetWithOutlier)

Number(s) of Outlier : 2
Outlier 1: 23234
Outlier 2: 23232
Dataset with outliers (last 2 entries were added) :  [190, 205, 250, 270, 290, 335, 340, 340, 355, 360, 380, 410, 450, 458, 460, 482, 510, 521, 530, 547, 590, 595, 600, 780, 843, 871, 1036, 1250, 23234, 23232]


Now we will compute the central tendency and produce histogram of the dataset with outlier.

In [83]:
centralTendencyWithOutlier = []
centralTendencyWithOutlier = computeCenTendency(datasetWithOutlier)

Mean:  2023.8
Median:  471.0
Mode: 340 (appeared 2 times.)


### 3.3 Historgram with Outliers
Here, we plot a histogram of the data that contain outlier(s). Student can also define the size of *bins* on the slider and observe the change of grpah representation. 

In [84]:
def callPlottingFunctionWithOutlier(num_bins):
    print("Generating... plot for :", num_bins)
    plotHistogram(datasetWithOutlier, num_bins, 'Sodium Content', 'values', 'Histogram with Outliers')
interact(callPlottingFunctionWithOutlier, num_bins=widgets.IntSlider(min=0,max=100,step=1,value=50));

**Effects of Outlier :** From the above results and histogram, we can see after adding outliers with the dataset the mean value changed dramatically(based on the number and weight of outlier). Because to determine the mean value we need to add all values.  Otliers are the numbers that does not belong into the regular values. Generally it would be a large number. That is why we sum up all numbers it produce the result so high. On the contrary, to find out the median value we need to considere the middle number(s), that is why it moves to the right and will increase a bit. But the mode is very unlikely to update, because we define mode by the repetition. Input a very large number as an outlier and observe the histogra. You will notice all regular data should be in one particular area and the inputted value is far away. 

In [85]:
import plotly.plotly as py
import plotly.graph_objs as go

import pandas as pd
import colorlover as cl

criteriaArray = ['Mean', 'Median', 'Mode']
centralTendencyCriteriaArray = np.asarray(criteriaArray)
centralTendencyWithoutOutlierArray = np.around(np.asarray(centralTendencyWithoutOutlier), 3)
centralTendencyWithOutlierArray = np.around(np.asarray(centralTendencyWithOutlier), 3) 
tableValues = [centralTendencyCriteriaArray, centralTendencyWithoutOutlierArray,centralTendencyWithOutlierArray]


colors = cl.scales['3']['div']['RdYlGn']
data = {'cenTendencyValues' : tableValues,
        'Color' : colors}

df = pd.DataFrame(data)


trace0 = go.Table(
  header = dict(
    values = [" ","Before adding outlier", "After adding outlier"],
    line = dict(color = 'white'),
    fill = dict(color = 'white'),
    align = ['center'],
    font = dict(color = 'black', size = 15)
  ),
  cells = dict(
    values = df.cenTendencyValues,
    line = dict(color = '#506784'),
    fill = dict(color = [df.Color]),
    align = 'center',
    font = dict(color = 'black', size = 13)
    ))

data = [trace0]

py.iplot(data, filename = "Effets of Outliers")

## <font color='Blue'> 4 Example Using Python Library </font>
In this section we discuss about pandas dataframe. We use **nba_2013.csv** file as dataset that contains $30$ different criteria of $480$ nba players of 2013.

First, we import the Python library (pandas) and then read the csv rows. 

In [86]:
nba = pandas.read_csv("nba_2013.csv")
print(round(nba.head(10),3))

              player pos  age bref_team_id   g  gs    mp   fg   fga    fg.  \
0         Quincy Acy  SF   23          TOT  63   0   847   66   141  0.468   
1       Steven Adams   C   20          OKC  81  20  1197   93   185  0.503   
2        Jeff Adrien  PF   27          TOT  53  12   961  143   275  0.520   
3      Arron Afflalo  SG   28          ORL  73  73  2552  464  1011  0.459   
4      Alexis Ajinca   C   25          NOP  56  30   951  136   249  0.546   
5       Cole Aldrich   C   25          NYK  46   2   330   33    61  0.541   
6  LaMarcus Aldridge  PF   28          POR  69  69  2498  652  1423  0.458   
7        Lavoy Allen  PF   24          TOT  65   2  1072  134   300  0.447   
8          Ray Allen  SG   38          MIA  73   9  1936  240   543  0.442   
9         Tony Allen  SG   32          MEM  55  28  1278  204   413  0.494   

      ...      drb  trb  ast  stl  blk  tov   pf   pts     season  season_end  
0     ...      144  216   28   23   26   30  122   171  2013-

Determining the dimensions of nba dataset.

In [87]:
print("Total number of rows : ", nba.shape[0])
print("Total number of colums : ", nba.shape[1])

Total number of rows :  481
Total number of colums :  31


###  4.1 Mean

In [88]:
#Mean 
fullDataframeMean = []
fullDataframeMean = nba.mean()
print(round(fullDataframeMean.head(10),3))

age       26.509
g         53.254
gs        25.572
mp      1237.387
fg       192.881
fga      424.464
fg.        0.436
x3p       39.613
x3pa     110.131
x3p.       0.285
dtype: float64


###  4.2 Median 

In [89]:
#Median
fullDataframeMedian = []
fullDataframeMedian = nba.median()
print(round(fullDataframeMedian.head(10),3))

age       26.000
g         61.000
gs        10.000
mp      1141.000
fg       146.000
fga      332.000
fg.        0.438
x3p       16.000
x3pa      48.000
x3p.       0.331
dtype: float64


###  4.3 Mode

In [90]:
#Mode
fullDataFrameMode = []
fullDataFrameMode = nba.mode()
print(round(fullDataFrameMode.head(5),3))

          player  pos   age bref_team_id     g   gs      mp   fg  fga    fg.  \
0     A.J. Price   SG  25.0          TOT  82.0  0.0    15.0  0.0  1.0  0.429   
1   Aaron Brooks  NaN   NaN          NaN   NaN  NaN   392.0  NaN  NaN    NaN   
2     Aaron Gray  NaN   NaN          NaN   NaN  NaN  1416.0  NaN  NaN    NaN   
3  Adonis Thomas  NaN   NaN          NaN   NaN  NaN     NaN  NaN  NaN    NaN   
4  Al Harrington  NaN   NaN          NaN   NaN  NaN     NaN  NaN  NaN    NaN   

      ...      drb  trb  ast  stl  blk  tov   pf  pts     season  season_end  
0     ...      0.0  3.0  0.0  0.0  0.0  0.0  1.0  0.0  2013-2014      2013.0  
1     ...      NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN        NaN         NaN  
2     ...      NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN        NaN         NaN  
3     ...      NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN        NaN         NaN  
4     ...      NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN        NaN         NaN  

[5 rows x 31 columns]


###  4.4 Histogram 

In [91]:
def callPlottingFunctionNBA(num_bins):
    print("Generating... plot for :", num_bins)
    plotHistogram(nba[["ast"]], num_bins, "nba players", "values", "Histogram of 2013 nba players")
interact(callPlottingFunctionNBA, num_bins = widgets.IntSlider(min=0,max=100,step=1,value=10));

## <font color='Blue'> 5 Conclusion </font>
The three most common statistical averages are (arithmetic) mean, median and mode. We observed the effect of outliers on mean, median, and mode. Mean updated very quickly for adding outlier(s),median changes very slowly and most cases mode would remain same. 

## <font color='Blue'>6 Test yourself </font>

### 6.1 Practice Problem : Easy
Consider the following GPAs of	students	from two	semesters	of	Stat-0000 course: 
<br>
Semester $1: 3.1, 3.2, 2.8,  2.9, 3.0, 3.4,	2.3, 3.2, 2.1,	3.5$ <br>
Semester $2: 2.2, 4.1,	2.6, 2.7, 3.8, 2.8,	2.4, 3.2, 2.7,	2.9$


In [92]:
def display(question, answerList):
    print(question)
    IPython.display.display(answerList)

In [93]:
question611 = "6.1.1 What will be the mean of semester 1?"

answer611 = widgets.RadioButtons(options=['Select best one','2.95', '3', '3.05', '3.10', 'None of the above'],
                             value= 'Select best one', description='Choices:')


def checkAnswer(a):
    IPython.display.clear_output(wait=False)
    display(question611, answer611)
    if answer611.value == '3.05':
        print("Correct Answer!")
    else:
        if answer611.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")

display(question611, answer611)

answer611.observe(checkAnswer, 'value')

6.1.1 What will be the mean of semester 1?


In [94]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question611">
 <button id = "611"
onclick="toggle('answer611');">Solution</button> 
</div>
<div style="display:none" id="answer611">
To find out the mean of semester 1 we divide the sum of all data by the numbers of data.<br/>
Therefore, Mean $=  \frac{3.1 + 3.2 + 2.8 + 2.9 + 3.0 + 3.4 + 2.3 + 3.2 + 2.1 + 3.5}{10} = 3.05$ <br/>


</div>

</body>
</html>

In [95]:
question612 = "6.1.2 What will be the median of semester 2?"

answer612 = widgets.RadioButtons(options=['Select best one', '2.43', '2.5', '2.67', '2.75', 'None of the above'],
                              value = 'Select best one', description='Choices:')

def check612(b):
    IPython.display.clear_output(wait=False)
    display(question612, answer612)
    if answer612.value == '2.75':
        print("Correct Answer!")
    else:
        if answer612.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")

IPython.display.clear_output(wait=False)
display(question612, answer612)
answer612.observe(check612, 'value')

6.1.2 What will be the median of semester 2?


In [96]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question612">
 <button id = "612"
onclick="toggle('answer612');">Solution</button> 
</div>
<div style="display:none" id="answer612">
To determine the median at first we need to sort the data. <br/>
Sorted data,<br/>
Semester  $2: 2.2,2.4,2.6,2.7,2.7,2.8,2.9,3.2,3.8,4.1$ <br/>
There are $10$ numbers in the data set, so the average of $5^{th}$ and $6^{th}$ will be the median. <br/>
So, the Median $= \frac{2.7 + 2.8}{2} = 2.75$ <br/>


</div>

</body>
</html>

### 6.2 Practice Problem : Medium


#### 6.2.1 Distribution type
We want to know how the values are distributed. To determine the type, if $mean<median<mode$ then it is called negatively skewed and if $mean > median >mode$ then it is positively skewed. 

In [97]:
question621 = "6.2.1 What is the distribution type of semester 1? (Hint:first, calcualte mean, median, mode)"

answer621 = widgets.RadioButtons(options=['Select best one', 'Negatively Skewed', 'Positively skewed', 'both of them', 'None of the above'],
                              value = 'Select best one', description='Choices:')

def check621(c):
    IPython.display.clear_output(wait=False)
    display(question621, answer621)
    if answer621.value == 'Negatively Skewed':
        print("Correct Answer!")
    else:
        if answer621.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")
        
IPython.display.clear_output(wait=False)
display(question621, answer621)
answer621.observe(check621, 'value')

6.2.1 What is the distribution type of semester 1? (Hint:first, calcualte mean, median, mode)


In [98]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question621">
 <button id = "621"
onclick="toggle('answer621');">Solution</button> 
</div>
<div style="display:none" id="answer621">
At first we have to calculate the mean, median and mode of semester 1 <br/> 
Mean $=  \frac{3.1 + 3.2 + 2.8 + 2.9 + 3.0 + 3.4 + 2.3 + 3.2 + 2.1 + 3.5}{10} = 2.95$ <br/>
For finding the median we need to sort the data. <br/>
Sorted data,<br/>
Semester  $1: 2.1,2.3,2.8,2.9,3.0,3.1,3.2,3.2,3.4,3.5$ <br/>
There are $10$ numbers in the data set, so the average of $5^{th}$ and $6^{th}$ will be the median. <br/>
Median $= \frac{3.0 + 3.1}{2} = 3.05$ <br/>
From the above sorted data we can see that only $3.2$ repeats twice. <br/>
So, the mode is $3.2$ <br/>

The mean is less than median, and median is less than mode. That means, mean$<$ median $<$ mode. <br/>
Therefore the distribution type of semester 1 data is Negatively Skewed. <br/>

</div>

</body>
</html>

#### 6.2.2 Median of medians
Now, we learn how to calculate the median of medians. We define the term as quartile. When we compute the median (M) of a dataset we divide them into two groups. Let say, in group $1$ all values should be less than or equal of median.
<br>
**First quartile $(M_1):$** the median of first group. Now, we have smaller data sets and we can find the median as we discussed in preliminaries.   <br>
**Third quartile $(M_3)$** the median of second group where every value is equal or greater than median. <br>
The **Second quartile** is our median (M).
<br>
For example, our data set is: 
$S_1 = {2, 5, 7, 8, 9}$<br>
So the median (M) is $: 7$<br>
and now we have two groups:
<br>
Group $1: {2,5}$ <br>
Group $2: {8,9}$ <br>
So the first quartile $(M_1)$ is: $\frac{2+5}{2} = 3.5$ <br>
similarly, third quartile, $(M_3) = 8.5$

<br>
    Based on the discussion, answer the following questions. 

In [99]:
answer622 = widgets.RadioButtons(options=[  'Select best one','2.5 and 3.1', '2.6 and 3.1', '2.6 and 3.2', '2.8 and 3.2', 'None of the above'],
                             value = 'Select best one', description='Choices:')

question622 = "6.2.2 What is the first and third quartiles of semester 2?"


def check622(d):
    IPython.display.clear_output(wait=False)
    display(question622, answer622)
    if answer622.value == '2.6 and 3.2':
        print("Correct Answer!")
    else:
        if answer622.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")
        
IPython.display.clear_output(wait=False)
display(question622,answer622 )
answer622.observe(check622, 'value')

6.2.2 What is the first and third quartiles of semester 2?


In [100]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question622">
 <button id = "622"
onclick="toggle('answer622');">Solution</button> 
</div>
<div style="display:none" id="answer622">
At first we have to sort the data of semester 2 <br/> 
Sorted data,
Semester  2: 2.2,2.4,2.6,2.7,2.7,2.8,2.9,3.2,3.8,4.1
Data for first Median :  2.2, 2.4, 2.6, 2.7, 2.7 <br/>
    So the first median or first quartile is 2.6 <br/>
Similarly, third quartile $M_3$ is 3.2 <br/>

</div>

</body>
</html>

### 6.3 Practice Problem : Hard
A student has gotten the following marks on his tests: 87, 95, 76, and 88. He wants an 85 or better overall.

In [101]:
answer631 = widgets.RadioButtons(options=[ 'Select best one','78', '78.5', '79', '80', 'None of the above'],
                             value = 'Select best one', description='Choices:')

question631 = "6.3.1 What is the minimum mark he must get on the last test in order to achieve that average?"

def check631(e):
    IPython.display.clear_output(wait=False)
    display(question631, answer631)
    if answer631.value == '79':
        print("Correct Answer!")
    else:
        if answer631.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")
        
IPython.display.clear_output(wait=False)
display(question631, answer631)
answer631.observe(check631, 'value')

6.3.1 What is the minimum mark he must get on the last test in order to achieve that average?


In [102]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question631">
 <button id = "631"
onclick="toggle('answer631');">Solution</button> 
</div>
<div style="display:none" id="answer631">
The minimum mark is what we need to find. To find the average of all his marks (the known ones, plus the unknown one), 
we have to add up all the grades, and then divide by the number of marks. Since we do not have a score for the last test yet, 
we will use a variable to stand for this unknown value: "x". Then computation to find the desired average is: <br />

the first step : (87 + 95 + 76 + 88 + x) รท 5 = 85 <br />
Multiplying through by 5 and simplifying, we get: <br/>
the next step : 87 + 95 + 76 + 88 + x = 425 <br />
the next step : 346 + x = 425 <br />
the final step : x = 79 <br />
so, he needs to get at least a 79 on the last test.
</div>

</body>
</html>

### 6.4 Practice Problem: Miscellaneous
In this section we discuss how to determine the outliers based on first and third quartile. First, we need to find out the inter quartile range (IQR). And then we compute the lower and upper limit for outliers by the following formulae: <br>
Lower limit: $$M_1 - 1.5 \times \text{IQR}$$
and upper limit: $$M_3 + 1.5 \times \text{IQR}$$
where,<br>
IQR $= (M_3 - M_1)$ <br>
So, all the data that are not belong to this range are outliers.

Now, answer the following question:

In [103]:
answer641 = widgets.RadioButtons(options=['Select best one','1.6 and 4.1', '1.7 and 3.9', '2.2 and 3.8', '1.7 and 4.1', 'None of the above'],
                              value='Select best one',description='Choices:')

question641 = "6.4.1 What is the lower and upper limit of outliers of semester 2?"


def checkAnswer641(f):
    IPython.display.clear_output(wait=False)
    display(question641, answer641 )
    if answer641.value == '1.7 and 4.1':
        print("Correct Answer!")
    else:
        if answer641.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")

IPython.display.clear_output(wait=False)
display(question641, answer641)
answer641.observe(checkAnswer641, 'value')

6.4.1 What is the lower and upper limit of outliers of semester 2?


In [104]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question641">
 <button id = "641"
onclick="toggle('answer641');">Solution</button> 
</div>
<div style="display:none" id="answer641">
At first we have to determine the first median$(M_1)$ and third median $(M_3)$ of semester 2.<br/> 
Data for first Median :  2.2, 2.4, 2.6, 2.7, 2.7 <br/>
    So the first median or first quartile is 2.6 <br/>
Similarly, $M_3$ is 3.2 <br/>
So, the inter median/quartile range (IQR) = 3.2 - 2.6 = 0.6 <br/>
Now we can compute the limits for outliers. <br/>

Lower limit : $M_1 - 1.5\times IQR = 2.6 - 1.5 \times 0.6 = 1.7$ <br/>
Similarly, using second equation we can determine the upper limit, <br/>
Upper limit : $ 3.2 + 1.5 \times 0.6 = 4.1$
</div>

</body>
</html>

In [105]:
answer642 = widgets.RadioButtons(options=['Select best one', '2.2', '3.8', '1.7', '2.1', 'None of the above'],
                             value = 'Select best one',  description='Choices:')

question642 = "6.4.2 What is the Potential outliers  of semester 1?"

def check642(g):
    IPython.display.clear_output(wait=False)
    display(question642, answer642)
    if answer642.value == '2.1':
        print("Correct Answer!")
    else:
        if answer642.value == 'Select best one':
            pass
        else:
            print("Wrong answer! Try again.")

IPython.display.clear_output(wait=False)
display(question642, answer642)
answer642.observe(check642, 'value')

6.4.2 What is the Potential outliers  of semester 1?


In [106]:
%%html
<html>
<head>
<script type="text/javascript">
<!--
function toggle(id) {
var e = document.getElementById(id);
if(e.style.display == 'none')
e.style.display = 'block';
else
e.style.display = 'none';
}
//-->
</script>
</head>

<body>
<div id="question642">
 <button id = "642"
onclick="toggle('answer642');">Solution</button> 
</div>
<div style="display:none" id="answer642">
In our previous question we determined the lower and upper limit of semester 2. Similarly if we calculate the limits for semester 1 
we find that the lower and upper limits are 2.2 and 3.8 respectively. So as per the definition of outliers, the numbers that are 
lower than the lower limit and higher than the upper limit should be the potential outliers.<br />

So from the given data of semester 1 we can find that 2.1 is less than the lower limit (2.2)and all other GPAs are in between lower
and upper limit.  <br/>Hence, 2.1 is potential outlier.
</div>

</body>
</html>

![alt text][bottom-banner]

[bottom-banner]: ./callysto-bottom-banner.jpg