## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [2]:
import numpy as np
import pandas as pd
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [None]:
samples = np.random.normal(loc=100, scale=15, size= (50,20))
samples

array([[100.58963895,  88.59258087,  92.84014501,  73.39016186,
         79.078682  , 102.56934264,  90.35482732, 103.97094887,
         83.45895863,  98.97185783,  97.93813979,  91.64964437,
         70.12496478, 112.53617104, 117.45568275,  90.69485353,
         76.1882795 ,  86.66041165, 119.79168568, 129.18103091],
       [113.34726089, 102.19350058,  95.40635533,  86.76307089,
         73.69987143,  81.46620418,  96.03918051,  82.12140122,
         89.39165578, 103.82654515,  85.2025918 , 102.82239272,
        108.63275188, 118.61776129,  90.83351309,  79.19260472,
        120.17491804,  89.2658761 , 103.76597042,  94.49560868],
       [ 84.44314473, 112.27694619,  78.90692477,  96.23937125,
        106.12944044,  88.7174902 ,  89.0680381 , 119.5076971 ,
         83.56120745,  91.43479101, 107.06656688, 130.79812066,
         90.07061548,  85.65408275, 109.69999953,  97.4766943 ,
        110.60912626,  91.19046978,  85.69543302, 112.23635674],
       [117.8098832 ,  94.62574368, 1

Compute the **mean**, **median**, and **mode**

In [None]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)
print(mode)
print(mean)
print(median)


ModeResult(mode=array([[62.29331007, 58.62629848, 69.26606554, 63.08039399, 71.7576469 ,
        66.21357399, 74.38966581, 61.15524278, 76.48113926, 55.81287343,
        55.51287328, 60.35813314, 59.87587079, 60.37586946, 76.30318679,
        69.90342109, 76.1882795 , 69.46804167, 62.65993221, 72.41209605]]), count=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]))
100.20977650628524
99.67225787357158


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
min = np.min(samples)
max = np.max(samples)
q1 = np.quantile(samples, .25)
q3 = np.quantile(samples, .75)
iqr = q3 - q1
print(min, max, q1, q3, iqr)

55.51287328335808 150.62389011872256 89.78011703506675 110.57058800758224 20.790470972515493


Compute the **variance** and **standard deviation**

In [None]:
variance = np.var(samples)
std_dev = np.std(samples)
print(variance, std_dev)

223.5735964340794 14.952377618094033


Compute the **skewness** and **kurtosis**

In [None]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)
print(skewness, kurtosis)

[-0.15804031 -0.156227    0.07469833 -0.22971525 -0.09808847  0.49624829
  0.57736128 -0.27071782  0.10753128 -0.58322704 -0.74941786  0.04256784
  0.12960781 -0.35816993  0.14618354  0.01898185  0.54448957 -0.10875378
  0.24134259  0.40647518] [-8.95663424e-01  1.87923949e-01  1.18364481e-01 -7.27639504e-01
 -8.43722252e-01  1.01773988e+00  4.32698543e-01 -3.21636328e-01
 -9.68299212e-01  3.63906531e-01  6.62386331e-01 -3.00062677e-01
  1.35752099e-01 -2.85323498e-01 -4.49803424e-01 -1.00018398e+00
  1.00492417e-01 -7.62958732e-01  8.80071397e-04 -3.40260618e-01]


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [None]:
x = np.arange(10, 20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [None]:
y = np.array([7,8,9,10,11,12,13,14,15,16])
y

array([ 7,  8,  9, 10, 11, 12, 13, 14, 15, 16])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [None]:
r = np.correlate(x,y)
r

array([1750])

## Pandas Correlation Calculation

Run the code below

In [4]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
r, p = stats.pearsonr(x, y) 

Call the relevant method  to calculate Pearson's r correlation.

In [5]:
r

0.758640289091187

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [6]:
stats.spearmanr(x, y)

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [7]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [8]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [9]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [10]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [12]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
