## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [9]:
import numpy as np 
import pandas as pd 
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [17]:
s_mean = 100
s_std = 15
samples = np.random.normal(s_mean, s_std, 1000)

In [21]:
stats.mode(samples)

ModeResult(mode=array([49.68503962]), count=array([1]))

Compute the **mean**, **median**, and **mode**

In [28]:
samples.mean()

99.5357675331404

In [29]:
np.median(samples)

99.39062603504792

In [30]:
stats.mode(samples)

ModeResult(mode=array([49.68503962]), count=array([1]))

In [None]:
mean = 99.5357675331404
median = 99.39062603504792
mode = ModeResult(mode=array([49.68503962]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [31]:
samples.min()

49.68503962455603

In [32]:
samples.max()

153.53477055751227

In [35]:
q1 = np.percentile(samples, 25)
q1

89.66249824929668

In [36]:
q3 = np.percentile(samples, 75)
q3

110.38360336728357

In [37]:
iqr = q3 -q1
iqr

20.721105117986895

In [None]:
min = 49.68503962455603
max = 153.53477055751227
q1 = 89.66249824929668
q3 = 110.38360336728357
iqr = 20.721105117986895

Compute the **variance** and **standard deviation**

In [39]:
np.var(samples)

231.0962369225779

In [40]:
np.std(samples)

15.201849786212792

In [None]:
variance = 231.0962369225779
std_dev = 15.201849786212792

Compute the **skewness** and **kurtosis**

In [41]:
stats.skew(samples)

-0.02664276688737525

In [42]:
stats.kurtosis(samples)

0.03254384542821853

In [None]:
skewness = -0.02664276688737525
kurtosis = 0.03254384542821853

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [43]:
x = np.arange(10,20)

In [47]:
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [51]:
y =  np.random.randint(0, 100, 10)

In [52]:
y

array([ 8, 84, 12, 31, 89, 15, 61,  8,  2, 38])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [54]:
r =  np.corrcoef(x,y)
r

array([[ 1.        , -0.17401163],
       [-0.17401163,  1.        ]])

## Pandas Correlation Calculation

Run the code below

In [55]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [64]:
r = x.corr(y) 
r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [62]:
rho = x.corr(y, method='spearman')
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [58]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [60]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [61]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [72]:
r1 = tips["total_bill"].corr(tips["tip"])
r1

0.6757341092113645

In [73]:
r2 = tips["total_bill"].corr(tips["size"])
r2

0.5983151309049017

In [74]:
r3 = tips["tip"].corr(tips["size"])
r3

0.48929877523035786