## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [19]:
import numpy as np
import pandas as pd
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [20]:
samples = np.random.normal(loc = 100, scale = 15, size = 1000)
samples

array([125.20050744,  97.55268427, 107.90280542,  87.10547081,
        86.19302948,  74.00478853, 103.71027009, 100.79803558,
        84.51656748,  86.9621688 , 124.29803987,  74.91333954,
       104.36658919,  82.30133301, 107.87437834, 117.83299242,
       115.56088016,  93.58372225,  79.18341206, 106.20924282,
       106.95469732,  88.76728152,  71.04336919, 105.34134266,
        95.02409208,  85.72841147,  72.4226461 ,  81.67780077,
        89.39211575, 109.82516828, 104.45296196,  99.88326325,
        93.34758143,  92.36123584,  97.12864271,  60.45397306,
        95.83104088, 107.00815677,  93.02205028, 101.06043193,
        97.19449555,  80.85281452, 112.92398484, 100.59318968,
        86.58875711, 126.71843867,  83.09192101, 116.69049431,
       107.35567284, 101.1640209 , 123.51466752, 104.06605482,
       111.61831011, 101.35953664, 106.34027265, 107.72094691,
        78.37663653, 124.72455528,  78.57642544,  61.81264734,
        72.1893945 ,  76.03257151, 106.56601565, 111.07

Compute the **mean**, **median**, and **mode**

In [21]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)

In [22]:
mean

99.57831247365986

In [23]:
median

99.96446423522144

In [24]:
mode

ModeResult(mode=array([52.2603512]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [25]:
samples.min()
# np.min(samples)

52.260351200538224

In [26]:
samples.max()
# np.max(samples)

151.89516277605512

In [27]:
np.percentile(samples, 25)

89.76207643014058

In [28]:
np.percentile(samples, 75)

108.72832997333317

In [29]:
np.percentile(samples, [25, 75]) # for both of them at once

array([ 89.76207643, 108.72832997])

In [30]:
iqr = np.percentile(samples, 75)-np.percentile(samples, 25)
iqr

# stats.iqr(samples)

18.966253543192593

Compute the **variance** and **standard deviation**

In [35]:
variance = np.var(samples)  # # "np.var(samples, ddof=1)" for sample
std_dev = np.std(samples)  # "np.std(samples, ddof=1)" for sample

In [36]:
variance

212.49425658205593

In [37]:
std_dev

14.577182738171869

Compute the **skewness** and **kurtosis**

In [42]:
skewness = stats.skew(samples)  # "stats.skew(samples, bias=False)" for sample
kurtosis = stats.kurtosis(samples)  # "stats.kurtosis(samples, bias=False)" for sample

In [43]:
skewness

0.047176924401094764

In [44]:
kurtosis

0.2094385899465192

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [72]:
x = np.arange(10, 20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [71]:
y = np.array([20, 48, 34, 13, 55, 27, 30, 33, 41, 18])
y

array([20, 48, 34, 13, 55, 27, 30, 33, 41, 18])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [73]:
r = np.corrcoef(x, y)
r

array([[ 1.        , -0.06741507],
       [-0.06741507,  1.        ]])

## Pandas Correlation Calculation

Run the code below

In [74]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [75]:
r = x.corr(y)
r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [77]:
rho = x.corr(y, method="spearman")
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [78]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [79]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [80]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [82]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
