## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [124]:
import numpy as np  
import pandas as pd
from scipy import stats as st

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [125]:
sample = np.random.normal(scale = 15, size = 100)
sample

array([ 1.81016930e+01, -6.10165450e+00,  6.98017296e+00, -1.43780812e+01,
        1.62184667e+01,  2.81141450e+00, -2.79398508e+00, -1.10976400e+01,
        1.98262023e+01, -2.21002125e+01, -1.71446046e+01,  3.32330741e+00,
        1.47725220e+01, -4.87303276e+00, -1.20294061e+01,  7.16158344e+00,
        5.29041754e+00,  3.59719384e+00,  9.68680029e+00, -9.33952292e+00,
        3.45027237e+00,  8.04196122e+00,  1.02361974e+01, -3.44008237e-01,
       -1.33401071e+01,  9.16510997e+00,  1.39236757e+01,  1.04817952e+01,
       -9.60573239e+00, -1.23496526e+01,  4.76014506e+00, -7.25576989e+00,
       -3.58687806e+01,  4.68129309e+01,  4.37571910e+01,  2.10202221e+01,
       -1.51310020e+01, -1.15688982e+01,  1.15753915e+01,  1.74999003e+01,
        9.73960142e+00, -1.35159354e+01, -2.69211726e+01, -3.56517872e+00,
        2.35609813e+01,  1.57618054e+01, -6.93554501e+00,  1.16745632e-01,
        2.23546163e+01,  1.26502271e+01,  8.01085514e+00,  1.11840793e+01,
        1.93299268e+01,  

Compute the **mean**, **median**, and **mode**

In [126]:
print("mean :", np.mean(sample))
print("median :", np.median(sample))
print("mode :", st.mode(sample))

mean : 2.0789601131755227
median : 3.2741284782252285
mode : ModeResult(mode=array([-37.48631377]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [127]:
print("min :", np.min(sample))
print("max :", np.max(sample))
q1 = np.percentile(sample, 25)
print("Q1 :", round(q1, 2))
q3 = np.percentile(sample, 75)
print("Q3 :", round(q3, 2))
print("IQR :", round(st.iqr(sample), 2))

min : -37.486313768117164
max : 46.81293087097648
Q1 : -9.62
Q3 : 13.47
IQR : 23.08


Compute the **variance** and **standard deviation**

In [128]:
print("variance :", np.var(sample))
print("standard deviation :", np.std(sample))

variance : 235.0835809933186
standard deviation : 15.33243558581997


Compute the **skewness** and **kurtosis**

In [129]:
df = pd.DataFrame(sample);
print("skewness :", df.skew())
print("kurtosis :", df.kurt())

skewness : 0    0.139594
dtype: float64
kurtosis : 0    0.223558
dtype: float64


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [130]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [131]:
random_10 = np.random.randint(100 ,size = 10)
y = np.array(random_10)
y

array([35, 88, 55, 62, 37, 98, 25, 82, 41, 68])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [132]:
r =  np.corrcoef(x, y)
r

array([[1.        , 0.03920898],
       [0.03920898, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [133]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [134]:
x.corr(y, method = 'pearson')

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [135]:
x.corr(y, method= "spearman")

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [136]:
import seaborn as sns 

Load "tips" dataset from Seaborn

In [137]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [138]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [139]:
tips.corr(method = 'pearson')

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
