## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [4]:
import numpy as np
import scipy as stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [52]:
samples = np.random.normal(100,15,1000); samples

array([ 98.79556238, 118.34119514, 138.78338556, 107.97399583,
        78.43565676, 127.47610952, 104.12853561,  82.05634023,
       104.8659041 ,  98.69184139, 128.41837946,  88.30960074,
       105.18291248,  77.7573443 ,  88.97093544, 103.55815542,
        86.71063015, 117.0348299 ,  91.76274134,  91.5549522 ,
       114.07624735,  96.25775702, 116.14747561,  76.36491725,
       112.45300965, 103.02544078, 108.05581611, 110.80702156,
        93.47444436,  97.64632935, 101.68246224,  81.83920986,
        88.82171758,  83.58520566,  89.23201514,  76.85915483,
        99.35655525, 110.87335355, 120.31505792,  92.38720636,
        88.35025593,  91.70202746,  89.67941294, 114.07741142,
        86.71676469,  84.93041068, 140.0142188 ,  87.78317921,
       128.32914727,  84.13970379, 103.99992165, 111.39825578,
       105.36286653,  80.31456187, 106.02030704,  88.14110228,
       106.56204297, 114.72427544, 117.63396247, 127.9149914 ,
       102.30926699, 106.12599924,  86.64778941, 106.84

Compute the **mean**, **median**, and **mode**

In [57]:
mean = np.mean(samples)
median = np.median(samples)
mode = np.mod(samples,2); mean

98.91355704559257

In [58]:
median

98.91960407533483

In [59]:
mode

array([7.95562382e-01, 3.41195136e-01, 7.83385563e-01, 1.97399583e+00,
       4.35656759e-01, 1.47610952e+00, 1.28535611e-01, 5.63402319e-02,
       8.65904096e-01, 6.91841386e-01, 4.18379464e-01, 3.09600745e-01,
       1.18291248e+00, 1.75734430e+00, 9.70935443e-01, 1.55815542e+00,
       7.10630150e-01, 1.03482990e+00, 1.76274134e+00, 1.55495220e+00,
       7.62473518e-02, 2.57757023e-01, 1.47475609e-01, 3.64917251e-01,
       4.53009649e-01, 1.02544078e+00, 5.58161070e-02, 8.07021561e-01,
       1.47444436e+00, 1.64632935e+00, 1.68246224e+00, 1.83920986e+00,
       8.21717583e-01, 1.58520566e+00, 1.23201514e+00, 8.59154826e-01,
       1.35655525e+00, 8.73353552e-01, 3.15057917e-01, 3.87206361e-01,
       3.50255929e-01, 1.70202746e+00, 1.67941294e+00, 7.74114214e-02,
       7.16764689e-01, 9.30410675e-01, 1.42188031e-02, 1.78317921e+00,
       3.29147273e-01, 1.39703787e-01, 1.99992165e+00, 1.39825578e+00,
       1.36286653e+00, 3.14561868e-01, 2.03070381e-02, 1.41102282e-01,
      

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [60]:
min = samples.min()
max = samples.max()
q1 = np.percentile(samples,25)
q3 = np.percentile(samples,75)
iqr = q3 -q1; iqr

20.262415454123527

Compute the **variance** and **standard deviation**

In [61]:
variance = np.var(samples); variance

230.84712422214517

In [62]:
std_dev = np.std(samples); std_dev

15.193654077349041

Compute the **skewness** and **kurtosis**

In [63]:
from scipy.stats import kurtosis
from scipy.stats import skew

In [64]:
skewness = skew(samples)
kurtosis = kurtosis(samples); kurtosis

0.005689896801513328

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [51]:
x=np.arange(10,20); x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [33]:
y =  np.array(np.random.normal(10,10,10),dtype = "int"); y

array([ 6,  9,  8,  8,  0, 18,  4, -8,  9, 16])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [50]:
r = np.corrcoef(x,y); r

array([[1.        , 0.75864029],
       [0.75864029, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [35]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [36]:
r = x.corr(y); r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [49]:
rho = x.corr(y, method ='spearman'); rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [41]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [48]:
tips = sns.load_dataset("tips"); tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [43]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [47]:
result = tips.corr(); result

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
