## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [2]:
import numpy as np
import pandas as pd
import scipy.stats as stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [4]:
samples = np.random.normal(100,15,1000)
samples

array([ 63.32914946,  98.40034699, 115.10708889, 103.29293904,
        85.68257523, 116.46523002,  89.99492933,  99.22932458,
       101.25284865,  98.07237195, 107.20059074,  83.64726513,
        86.00478986,  97.03847807, 118.07634779, 105.03355248,
       110.41657454, 109.34095828,  99.9961096 ,  82.70276849,
       101.68192698, 130.73319953,  85.40024242, 100.61639334,
        81.88032125, 132.33744527,  89.99555533, 104.28232207,
       107.42471821,  88.38923002, 119.52593036,  89.09439494,
        83.84680825, 104.5596799 ,  86.76849912, 116.50046187,
       110.88491483,  92.81707432, 108.23785818, 114.20625973,
       103.98167637, 124.56035795,  99.70848799, 107.85478778,
       102.43447071,  78.9469326 , 107.08288745,  92.78790714,
       104.20302947,  94.95054031,  95.22261782, 126.06588582,
        88.43980714,  88.12802665,  73.37558774,  89.19317848,
       105.76256275, 105.6684754 ,  90.43715823, 127.88267933,
        99.11806181, 115.43506271,  90.97513626,  69.09

Compute the **mean**, **median**, and **mode**

In [12]:
mean =np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)
print(mean)
print(median)
print(mode)

99.63333123830938
99.82905026937968
ModeResult(mode=array([47.68199818]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [8]:
min = np.min(samples)
print(min)
max = np.max(samples)
print(max)
q1 = np.percentile(samples,25)
print(q1)
q3 = np.percentile(samples,75)
print(q3)
iqr= stats.iqr(samples)
print(iqr)

53.35965548784427
154.56851407218454
89.78181328668701
110.55903752567548
20.777224238988467


Compute the **variance** and **standard deviation**

In [9]:
variance = np.var(samples)
print(variance)
std_dev = np.std(samples)
print(std_dev)

229.47567418525577
15.1484545147436


Compute the **skewness** and **kurtosis**

In [10]:
skewness = stats.skew(samples)
print(skewness)
kurtosis = stats.kurtosis(samples)
print(kurtosis)


-0.05312808314313685
-0.03354154183399993


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [11]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [12]:
y = np.array([1,2,3,4,5,6,7,8,9,10])
y

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [15]:
r = np.corrcoef(x, y)
r

array([[1., 1.],
       [1., 1.]])

## Pandas Correlation Calculation

Run the code below

In [16]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [22]:
r = pd.DataFrame([x,y])
r.corr(method="pearson")


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
1,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
2,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
4,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
5,1.0,1.0,1.0,1.0,1.0,1.0,-1.0,-1.0,-1.0,-1.0
6,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,1.0
7,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,1.0
8,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,1.0
9,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,1.0


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [23]:
rho = r.corr(method="spearman")
print(rho)

     0    1    2    3    4    5    6    7    8    9
0  1.0  1.0  1.0  1.0  1.0  1.0 -1.0 -1.0 -1.0 -1.0
1  1.0  1.0  1.0  1.0  1.0  1.0 -1.0 -1.0 -1.0 -1.0
2  1.0  1.0  1.0  1.0  1.0  1.0 -1.0 -1.0 -1.0 -1.0
3  1.0  1.0  1.0  1.0  1.0  1.0 -1.0 -1.0 -1.0 -1.0
4  1.0  1.0  1.0  1.0  1.0  1.0 -1.0 -1.0 -1.0 -1.0
5  1.0  1.0  1.0  1.0  1.0  1.0 -1.0 -1.0 -1.0 -1.0
6 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  1.0  1.0  1.0  1.0
7 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  1.0  1.0  1.0  1.0
8 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  1.0  1.0  1.0  1.0
9 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  1.0  1.0  1.0  1.0


## Seaborn Dataset Tips

Import Seaborn Library

In [24]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [25]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [27]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [28]:
stats.mode(tips)

ModeResult(mode=array([[13.42, 2.0, 'Male', 'No', 'Sat', 'Dinner', 2]], dtype=object), count=array([[  3,  33, 157, 151,  87, 176, 156]]))

Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [30]:
tips.corr(method='pearson')

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
