## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [35]:
import numpy as np
from scipy import stats
import pandas as pd


 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [19]:
samples = np.random.normal(100,15,1000)

Compute the **mean**, **median**, and **mode**

In [21]:
mean = np.mean(samples)
mean


99.84006258528028

In [22]:
median = np.median(samples)
median

100.36052457339514

In [33]:
a = stats.mode(samples)
a

ModeResult(mode=array([36.75240714]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [58]:
stats.describe(samples)

DescribeResult(nobs=1000, minmax=(36.75240714246372, 138.51335195690444), mean=99.84006258528028, variance=227.74429199033693, skewness=-0.24458560864920129, kurtosis=0.051361819336270376)

In [26]:
min_ = np.min(samples)
min_

36.75240714246372

In [27]:
max_ = np.max(samples)
max_

138.51335195690444

In [40]:
q1 = np.percentile(samples, 25)
q1

89.9325090824205

In [46]:
q2 = np.percentile(samples, 50)
q2

100.36052457339514

In [42]:
q3 = np.percentile(samples, 75)
q3

110.50315477440476

In [44]:
iqr = q3 - q1
iqr

20.570645691984268

In [52]:
stats.iqr(samples)

20.570645691984268

Compute the **variance** and **standard deviation**

In [50]:
variance = np.var(samples)
variance


227.5165476983466

In [51]:
std_dev = np.std(samples)
std_dev

15.083651669882416

Compute the **skewness** and **kurtosis**

In [48]:
skewness = stats.skew(samples)
skewness


-0.24458560864920129

In [59]:
kurtosis = stats.kurtosis(samples)
kurtosis

0.051361819336270376

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [61]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [79]:
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
y

array([ 2,  1,  4,  5,  8, 12, 18, 25, 96, 48])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [80]:
r = np.corrcoef(x,y)
r

array([[1.        , 0.75864029],
       [0.75864029, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [68]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

In [100]:
xy = pd.DataFrame([x,y]).T
xy.columns=["X","Y"]
xy

Unnamed: 0,X,Y
0,10,2
1,11,1
2,12,4
3,13,5
4,14,8
5,15,12
6,16,18
7,17,25
8,18,96
9,19,48


Call the relevant method  to calculate Pearson's r correlation.

In [103]:
xy.corr()

Unnamed: 0,X,Y
X,1.0,0.75864
Y,0.75864,1.0


In [73]:
r,p = stats.pearsonr(x,y)
r

0.7586402890911869

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [104]:
xy.corr(method="spearman")

Unnamed: 0,X,Y
X,1.0,0.975758
Y,0.975758,1.0


In [74]:
rho = stats.spearmanr(x,y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [69]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [75]:
tips = sns.load_dataset("tips")

In [77]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [81]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [82]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
