## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [None]:
import numpy as np
from scipy import stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

np.random.normal(loc=0.0, scale=1.0, size=None)  #you need to modify this code.

loc will be equal to mean, scale will be equal to std deviation, size will be equal to sample size.

In [None]:
samples = np.random.normal(loc=100, scale=15, size=1000)

Compute the **mean**, **median**, and **mode**

In [None]:
mean = np.mean(samples)
print("mean:", mean)
median = np.median(samples)
print("median:", median)
mode = stats.mode(samples)
print("mode:", mode)

mean: 99.78444746744474
median: 100.19276022566262
mode: ModeResult(mode=array([48.24429899]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
min = np.min(samples)
print("Min:", min)
max = np.max(samples)
print("Max:", max)
q1 =  np.percentile(samples, 25)
print("Q1:", q1)
q3 = np.percentile(samples, 75)
print("Q3:", q3)
iqr = stats.iqr(samples)
print("IQR:", iqr)

Min: 48.244298985948056
Max: 146.20602498419137
Q1: 89.40270659910324
Q3: 109.74099972293683
IQR: 20.338293123833594


Compute the **variance** and **standard deviation**

In [None]:
variance = np.var(samples)
print("variance:",variance)
std_dev = np.std(samples)
print("standard deviation:",std_dev)

variance: 209.69845715324303
standard deviation: 14.48096879194355


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [None]:
skewness = stats.skew(samples)
print("skewness:",skewness)
kurtosis = stats.kurtosis(samples)
print("kurtosis:",kurtosis)

skewness: -0.07602993316602473
kurtosis: 0.19658682413004858


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [None]:
x = np.arange(10,20,1)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [None]:
y = np.array([1, 2, 10, 4, 5, 6, 7, 5, 9, 3])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [None]:
r = np.corrcoef(x, y)
r

array([[1.        , 0.32921944],
       [0.32921944, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [None]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [None]:
r = stats.pearsonr(x, y)
r

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [None]:
rho = stats.spearmanr(x,y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [None]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [None]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [None]:
tips.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.785943,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9511,1.0,2.0,2.0,3.0,6.0
