## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [16]:
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.stats.stats import pearsonr

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [5]:
samples =  np.random.normal(loc=100, scale=15, size=1000)
samples

array([ 90.79511774,  98.04533992, 118.14385204, 101.07675034,
        89.59476087,  91.46655641,  83.87400076,  91.36685804,
        98.63242464, 105.39887298,  88.56209262, 124.82903722,
       106.49219712,  90.99334219, 107.564706  ,  91.76695969,
       118.24175517,  83.99842568,  83.09715273, 102.81445636,
        90.4594741 , 112.76011823, 117.85546976,  96.44534428,
       101.43867117,  87.93907892, 100.16586323, 102.31936833,
       108.69833814, 129.15315844,  95.6020885 ,  88.35595623,
       104.67559768,  93.25506864, 111.26503016, 100.31914166,
       122.17127101, 129.29838204,  97.66999545, 121.41232297,
        69.1432624 ,  76.09986749, 115.90351506,  96.80794414,
       110.94756853, 113.31273182,  38.78401805, 100.75102262,
        67.21738409, 101.54818387, 120.93326796, 116.83932377,
        74.53108448, 108.40272913, 105.96070293,  97.53517806,
       120.02322392, 116.65672674, 127.92790585,  92.43961107,
       113.3615319 , 118.63643412,  96.66084968, 116.39

Compute the **mean**, **median**, and **mode**

In [14]:
mean = np.mean(samples)
median = np.median(samples)
mode = st.mode(samples)

print(mean)
print(median)
print(mode)

100.03377311189415
100.20941155005609
ModeResult(mode=array([38.78401805]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [24]:
df = pd.DataFrame(samples)

In [23]:
min = df.min()
max = df.max()
q1 = df.quantile(0.25)
q3 = df.quantile(0.75)
iqr = q3 - q1

print(min)
print(max)
print(q1)
print(q3)
print(iqr)

0    38.784018
dtype: float64
0    149.762142
dtype: float64
0    89.037231
Name: 0.25, dtype: float64
0    110.919218
Name: 0.75, dtype: float64
0    21.881987
dtype: float64


Compute the **variance** and **standard deviation**

In [33]:
import statistics

In [36]:
variance = df.var()
std_dev = df.std()

print(variance)
print(std_dev)

0    235.866445
dtype: float64
0    15.357944
dtype: float64


Compute the **skewness** and **kurtosis**

In [28]:
skewness = st.skew(samples)
kurtosis = st.kurtosis(samples)

print(skewness)
print(kurtosis)

-0.03626201896317395
-0.004400726031717372


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [37]:
x = np.arange(10, 20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [63]:
y = np.array(np.random.rand(10))
y

array([0.61079668, 0.30481321, 0.92648131, 0.67610572, 0.36727602,
       0.17448017, 0.65886625, 0.62551419, 0.04186845, 0.55980869])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [9]:
r = np.corrcoef(x, y)
r

array([[1.        , 0.75864029],
       [0.75864029, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [26]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
df3 = pd.DataFrame([x, y]).T

In [27]:
df3

Unnamed: 0,0,1
0,10,2
1,11,1
2,12,4
3,13,5
4,14,8
5,15,12
6,16,18
7,17,25
8,18,96
9,19,48


Call the relevant method  to calculate Pearson's r correlation.

In [17]:
pearsonr(x, y)

(0.7586402890911869, 0.010964341301680832)

In [29]:
df3.corr()

Unnamed: 0,0,1
0,1.0,0.75864
1,0.75864,1.0


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [32]:
rho = df3.corr(method ="spearman")
rho

Unnamed: 0,0,1
0,1.0,0.975758
1,0.975758,1.0


## Seaborn Dataset Tips

Import Seaborn Library

In [33]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [35]:
tips = sns.load_dataset("tips")

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [37]:
tips.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.785943,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9511,1.0,2.0,2.0,3.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [38]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
