## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
from scipy import stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

np.random.normal(loc=0.0, scale=1.0, size=None)  #you need to modify this code.

loc will be equal to mean, scale will be equal to std deviation, size will be equal to sample size.

In [3]:
samples = np.random.normal(loc=100, scale=15, size=1000)
samples

array([ 90.17362021,  75.10471693,  86.75155182,  85.57042625,
        98.85448313,  88.28387933,  95.73368611,  93.09195727,
        96.88827343, 107.48963759,  87.80269371, 108.53053889,
       110.54741472,  99.68124815, 125.48845353, 108.18287556,
       103.31472594,  69.75778398,  99.80830662, 118.91097302,
        94.80693326, 105.30897739,  91.69679543,  99.45407196,
       105.08405399, 101.35398175,  99.49033916, 121.94459116,
        98.67929174,  84.03063677,  59.54282549, 129.96593839,
       110.97191579,  86.3635765 ,  98.61800715,  76.98270924,
        80.97398195, 112.09869228, 130.11901651,  92.29483247,
       108.76562146,  96.22420526, 104.13834594, 117.59849752,
        98.91615612, 103.989267  , 111.53588602, 103.11884157,
       111.46292466,  97.23789248, 102.5928086 , 100.96581819,
       102.37732449,  93.05959285,  78.71342548,  77.24637432,
        92.79086795,  94.2299062 , 125.35326013, 107.50696757,
        96.67316202,  93.82113443,  89.19145303,  91.70

Compute the **mean**, **median**, and **mode**

In [4]:
print("mean = ", np.mean(samples))
print("median = ", np.median(samples))
print("mode = ", stats.mode(samples))

mean =  99.92090867516313
median =  99.65622458513863
mode =  ModeResult(mode=array([46.91442191]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [5]:
min = print("min = ", np.min(samples))
max = print("max = ", np.max(samples))
q1 =  print("Q1 = ", np.percentile(samples,25))
q3 =  print("Q3 = ", np.percentile(samples,75))
iqr = print("IQR:", (stats.iqr(samples)))

min =  46.91442190544459
max =  148.16516763161914
Q1 =  89.89471617020739
Q3 =  110.32855469344577
IQR: 20.43383852323838


Compute the **variance** and **standard deviation**

In [6]:
variance = print("Variance: ", (np.var(samples)))
std_dev =  print("Std deviation: ", (np.std(samples)))

Variance:  221.02912973617288
Std deviation:  14.867048454087074


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [9]:
skewness = print("skewness: ", skew(samples))
kurtosis = print("kurtosis: ", kurtosis(samples))

skewness:  -0.047478635653329285
kurtosis:  0.07197750609153575


In [8]:
from scipy.stats import kurtosis
from scipy.stats import skew

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [10]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [13]:
y = np.array(np.random.randint(10,20,10))
print("y :", y) 

y : [18 13 12 17 16 15 13 15 16 13]


Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [14]:
r =  np.corrcoef(x, y)
print("correlation: ", np.corrcoef(x, y))

correlation:  [[ 1.         -0.20297414]
 [-0.20297414  1.        ]]


## Pandas Correlation Calculation

Run the code below

In [19]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
x


0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64

Call the relevant method  to calculate Pearson's r correlation.

In [16]:
r = stats.pearsonr(x, y)
r   

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [20]:
rho = stats.spearmanr(x, y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [21]:
import  seaborn as sns

Load "tips" dataset from Seaborn

In [24]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [25]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns (plus heatmap)

In [26]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
