## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [60]:
import numpy as np
import scipy
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [25]:
samples =np.random.normal(100,15,1000)  # s = np.random.normal(mu, sigma, 1000)
samples

array([ 85.72137491, 107.85276706,  87.2252371 ,  98.51079928,
        62.30888886,  71.32158498,  94.79884158, 112.43602834,
       109.06083004, 104.50017274, 112.45098305,  79.63309343,
       106.79147249,  89.97643454,  79.78060985, 105.48720994,
        89.12231479,  91.28839366,  75.36578115,  89.65958209,
        94.92563316, 107.1275961 , 105.8849873 , 102.28272677,
       103.05604719,  77.05467835,  79.26309832, 139.05173074,
        82.2824849 , 101.9309109 ,  94.58343498,  94.30034326,
        84.6778876 ,  98.55944863, 103.573757  ,  89.09812628,
       114.439042  ,  79.81627806, 100.67590844,  90.09968776,
       124.33355279, 129.61544451,  98.74656257,  88.21280299,
        94.83892907,  93.19773618, 101.56927604,  84.20218545,
        86.66133889,  98.29902995,  99.37649059,  90.53911974,
        98.97029582,  92.55990518, 117.2547394 ,  83.81383082,
        92.22583084, 113.96712002,  82.41254538,  96.7599849 ,
        92.72576635, 100.30823677, 108.6355962 ,  86.54

Compute the **mean**, **median**, and **mode**

In [32]:
mean = np.mean(samples)
median = np.median(samples)
mode = scipy.stats.mode(samples)


In [33]:
mean

99.72197988395116

In [34]:
median

98.95030273246786

In [35]:
mode

ModeResult(mode=array([51.59063458]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [30]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples,25)
q2 = np.percentile(samples,50)
q3 = np.percentile(samples,75)
iqr = q3 - q1
min,max,q1,q2,q3,iqr

(51.590634579012374,
 144.53305588370307,
 89.75934424589943,
 98.95030273246786,
 109.33082145084775,
 19.571477204948323)

Compute the **variance** and **standard deviation**

In [31]:
variance = np.var(samples)
std_dev = np.std(samples)
variance, std_dev 

(215.30198897878412, 14.673172423807475)

Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [38]:
skewness = scipy.stats.skew(samples)
kurtosis = scipy.stats.kurtosis(samples)
skewness, kurtosis

(0.12215985457252111, -0.012639766935673258)

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [39]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [44]:
y = np.array([2,4,7,5,9,11,9,3,6,10])
y

array([ 2,  4,  7,  5,  9, 11,  9,  3,  6, 10])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [48]:
r = np.corrcoef(x,y)
r

array([[1.        , 0.47377937],
       [0.47377937, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [50]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
x,y

(0    10
 1    11
 2    12
 3    13
 4    14
 5    15
 6    16
 7    17
 8    18
 9    19
 dtype: int64,
 0     2
 1     1
 2     4
 3     5
 4     8
 5    12
 6    18
 7    25
 8    96
 9    48
 dtype: int64)

Call the relevant method  to calculate Pearson's r correlation.

In [53]:
r = stats.pearsonr(x,y)  # Pearson correlation coefficient and p-value for testing non-correlation.
r

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [55]:
rho = stats.spearmanr(x,y)  # Calculate a Spearman correlation coefficient with associated p-value.
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [56]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [59]:
tips = sns.load_dataset("tips")
tips


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [61]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [62]:
tips.corr()  # Compute pairwise correlation of columns, excluding NA/null values.

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
