## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
import scipy as sp
import pandas as pd

In [2]:
from scipy import stats

# import warnings filter
from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

np.random.normal(loc=0.0, scale=1.0, size=None)  #you need to modify this code.

loc will be equal to mean, scale will be equal to std deviation, size will be equal to sample size.

In [3]:
samples = np.random.normal(loc=100, scale=15, size=1000)
# print(samples)

Compute the **mean**, **median**, and **mode**

In [4]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)

print("mean   : ", mean)
print("median : ", median)
print("mode   : ", mode)

mean   :  99.375492219258
median :  99.55369105513144
mode   :  ModeResult(mode=array([54.95970295]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [5]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples,25)
q3 = np.percentile(samples,75)
iqr = stats.iqr(samples)

print("minimum : ", min)
print("maximum : ", max)
print("Q1      : ", q1)
print("Q3      : ", q3)
print("IQR     : ", iqr)


minimum :  54.95970294906619
maximum :  163.0835345088989
Q1      :  89.09436585225608
Q3      :  109.15717646167715
IQR     :  20.062810609421078


Compute the **variance** and **standard deviation**

In [6]:
variance = np.var(samples)
std_dev = np.std(samples)

print("variance : ", variance)
print("standart_deviation : ", std_dev)

variance :  231.144547402685
standart_deviation :  15.203438670336556


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [7]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)

print("skewness : ", skewness)
print("kurtosis : ", kurtosis)

skewness :  0.1046282783968026
kurtosis :  0.22511906395961256


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [8]:
x = np.arange(10,20)
print(x)

[10 11 12 13 14 15 16 17 18 19]


Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [9]:
y = np.array([10, 24, 32, 45, 48, 85, 37, 98, 101, 12])
print(y)

[ 10  24  32  45  48  85  37  98 101  12]


Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [10]:
r = stats.pearsonr(x,y)
r1 = np.corrcoef(x,y)
print("\ncorrelation_coefficient_pearson : ", r[0])
print("correlation_coefficient_numpy   : ", r1[0][1])


correlation_coefficient_pearson :  0.4866181649075422
correlation_coefficient_numpy   :  0.4866181649075422


## Pandas Correlation Calculation

Run the code below

In [11]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

In [12]:
print(x)
print(y)

0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64
0     2
1     1
2     4
3     5
4     8
5    12
6    18
7    25
8    96
9    48
dtype: int64


Call the relevant method  to calculate Pearson's r correlation.

In [13]:
r = x.corr(y)
r2 = y.corr(x)
print(r)
print(r2)

0.7586402890911867
0.7586402890911869


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [14]:
rho = x.corr(y,method="spearman")

In [15]:
print("spearman_correlation : ",rho)

spearman_correlation :  0.9757575757575757


## Seaborn Dataset Tips

Import Seaborn Library

In [16]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [17]:
tips = sns.load_dataset("tips")
tips.head(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
5,25.29,4.71,Male,No,Sun,Dinner,4
6,8.77,2.0,Male,No,Sun,Dinner,2
7,26.88,3.12,Male,No,Sun,Dinner,4
8,15.04,1.96,Male,No,Sun,Dinner,2
9,14.78,3.23,Male,No,Sun,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [18]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [19]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
