## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [7]:
import numpy as np;
import pandas as pd
import scipy as sp

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [35]:
mean, standard_deviation = 100, 15  # mean and standard deviation
samples =np.random.normal(mean , standard_deviation,1000)
np.set_printoptions(threshold=10)
samples

array([115.9529522 , 100.71342346, 107.97323426, ...,  84.03669148,
        92.46306569, 100.12490597])

Compute the **mean**, **median**, and **mode**

In [33]:
from scipy import stats

mean =np.mean(samples)
median =np.median(samples)
mode =stats.mode(samples)
print(" mean : ", mean,"\n", "median : ", median, "\n","mode : ", mode) 

 mean :  100.01720337065751 
 median :  100.68342810206516 
 mode :  ModeResult(mode=array([52.10894942]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [10]:
min = np.min(samples)
max = np.max(samples)
q1 =np.percentile(samples,25)
q3 =np.percentile(samples,75)
iqr =q3-q1

print("max :" , max,
      "\nmin :",  min,
      "\nq1  :",   q1,
      "\nq3  :",  q3,
      "\niqr :",  iqr)

max : 149.75180604188478 
min : 52.108949420306715 
q1  : 90.66801423774936 
q3  : 110.82371805005101 
iqr : 20.155703812301653


Compute the **variance** and **standard deviation**

In [11]:
variance = np.var(samples)
std_dev = np.std(samples)
print("variance :" , variance,"\n","std_dev :",std_dev)

variance : 232.08349815405217 
 std_dev : 15.234286926339944


Compute the **skewness** and **kurtosis**

In [34]:
from scipy.stats import skew
from scipy.stats import kurtosis

skewness = skew(samples)
kurtosis = kurtosis(samples)
print(" skewness :", skewness,"\n","kurtosis :", kurtosis)

 skewness : -0.1474350607916441 
 kurtosis : -0.07411654125135936


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [13]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [14]:
y = np.array([10,12,14,16,18,20,22,24,26,28])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [25]:
r = np.corrcoef(x,y)
r

array([[1.        , 0.75864029],
       [0.75864029, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [16]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [27]:
r =pd.concat([x,y], axis=1).corr(method="pearson")
print("r :",r)

r :          0        1
0  1.00000  0.75864
1  0.75864  1.00000


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [28]:
rho =pd.concat([x,y], axis=1).corr(method="spearman")
rho

Unnamed: 0,0,1
0,1.0,0.975758
1,0.975758,1.0


## Seaborn Dataset Tips

Import Seaborn Library

In [19]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [20]:
tips = sns.load_dataset("tips")

In [21]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [22]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [23]:
tip=tips["tip"]
size=tips["size"]
r=pd.concat([tip,size], axis=1).corr(method="pearson")
r


Unnamed: 0,tip,size
tip,1.0,0.489299
size,0.489299,1.0
