## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [26]:
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [3]:
samples = np.random.normal(100, 15, 1000)
samples

array([107.37005782, 103.20202849,  68.68237823, 104.62808392,
        82.89052923,  87.02218425,  96.87390566, 119.23791148,
       115.72492597,  80.03052679, 110.17413638,  96.50075095,
        95.85339946,  90.49454016, 110.2747068 ,  94.4711567 ,
       105.15266727, 100.28964051,  98.44837953, 106.24053985,
        84.19394694, 111.10830316, 113.76700496,  83.79240836,
        90.83220502, 113.09237389,  99.01117438,  94.89662979,
        94.62411607,  79.440125  , 108.16620819,  91.74347231,
       101.71382139, 105.08553963, 132.54979005, 103.925558  ,
       102.12773713, 114.0262936 , 117.03945291,  99.08857425,
       102.06974452,  90.53261664, 105.84721132, 124.39710172,
       109.40020226, 113.44052892, 106.84184465, 104.88467756,
       108.11637266, 124.09051619, 111.55606933,  91.81560614,
       110.27155141, 117.09269435,  97.34373182, 104.94938978,
       110.54388449, 100.95785144,  90.99561207, 111.44860204,
       101.35657869,  85.12616217, 114.4444045 , 113.94

Compute the **mean**, **median**, and **mode**

In [15]:
mean = samples.mean()
print("mean   :" , mean, )
median = np.median(samples)
print("median : ",median)
mode = sp.stats.mode(samples)
print("mode   :", mode)

mean   : 100.5530007452375
median :  100.79176448458554
mode   : ModeResult(mode=array([55.66244208]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [19]:
min = samples.min()
print("min :", min)
max = samples.max()
print("max :", max)
q1 = np.quantile(samples, .25)
print("q1 :", q1)
q3 = np.quantile(samples, .75)
print("q3 :", q3)
iqr = np.quantile(samples, .50)
print("iqr : ", iqr)

min : 55.66244207921135
max : 143.87421096040654
q1 : 90.73692499393975
q3 : 110.68864849877345
iqr :  100.79176448458554


Compute the **variance** and **standard deviation**

In [24]:
variance = samples.var()
print("variance: ", variance)
std_dev = samples.std()
print("std_dev :", std_dev)

variance:  212.9477610487781
std_dev : 14.592729732602399


Compute the **skewness** and **kurtosis**

In [27]:
from scipy import stats

In [32]:
skewness = sp.stats.skew(samples)
print("skewness :",skewness)
kurtosis = sp.stats.kurtosis(samples)
print("kurtosis :",kurtosis)

skewness : -0.028750096854409067
kurtosis : -0.021586738329131716


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [34]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [37]:
y = np.random.randint(10, size = 10)
y

array([0, 2, 1, 4, 5, 2, 2, 9, 4, 9])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [38]:
r = np.corrcoef(x, y, rowvar=False)
r

array([[1.        , 0.74107391],
       [0.74107391, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [39]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [44]:
r = x.corr(y, method = "pearson")
r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [45]:
rho = x.corr(y, method = "spearman")
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [46]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [47]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [48]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [50]:
r = tips.total_bill.corr(tips.tip, method = "pearson")
r

0.6757341092113641