## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np

from scipy import stats
import pandas as pd


 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

np.random.normal(loc=0.0, scale=1.0, size=None)  #you need to modify this code.

loc will be equal to mean, scale will be equal to std deviation, size will be equal to sample size.

In [None]:
import random
import numpy as np


In [8]:
samples = np.random.normal(loc=100, scale=15, size=1000)
samples

array([ 74.9040499 , 101.10335587,  98.91760573,  90.22880039,
        96.2529715 ,  80.75393997,  86.45952166,  97.25218815,
       114.37123701, 109.9629513 , 107.89926184, 118.54344808,
        98.63888883, 114.24775187,  85.03462122,  82.68666714,
        99.02058073,  92.56572605,  72.39622075,  93.85665518,
       115.40482795,  81.57052565, 122.10446564, 111.49568242,
       107.65490008, 105.20666838,  94.64721167, 105.2381689 ,
       128.79985217,  67.93181256, 111.14394777, 107.12604133,
       102.15605926,  93.88558913,  84.66154477,  95.30835248,
       100.2478269 ,  99.35126815, 119.74795331, 104.48150906,
        83.054332  , 102.93925534,  90.81529132,  95.62143486,
       102.36938956, 106.25610546,  98.48743817,  98.94758839,
       134.03928954,  80.98218772,  91.50227943, 110.75339472,
       122.07876371,  89.88780937, 103.32608503, 110.89002653,
        78.27946244,  88.87472284,  90.64669109, 107.84431161,
       113.79080614, 104.63505139, 110.28714875,  90.07

Compute the **mean**, **median**, and **mode**

In [10]:
mean = np.mean(samples)
median = np.median(samples)
mode =stats.mode(samples)
print(mean)
print(median)
print(mode)

99.46469828545806
99.38684464544204
ModeResult(mode=array([56.81444966]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [13]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25)
q3 =np.percentile(samples, 75)
iqr =stats.iqr(samples)

print(min)
print(max)
print(q1)
print(q3)
print(iqr)



56.81444965872258
143.766883877925
89.29464794416687
110.38843963250105
21.09379168833418


In [15]:
iqr = q3-q1
print(iqr)

21.09379168833418


Compute the **variance** and **standard deviation**

In [21]:
variance = np.var(samples)
std_dev = np.std(samples)
print(variance)
print(std_dev)

218.29078574374248
14.7746670264931


In [23]:
std = np.sqrt(np.var(samples))
print(std)

14.7746670264931


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [25]:
skewness =stats.skew(samples)
kurtosis = stats.kurtosis(samples)
print(skewness)
print(kurtosis)


-0.07203497449251407
-0.3057627354485142


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [28]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [33]:
y = np.array([1, 3, 5, 6, 8, 5, 7, 6, 8, 9])
y

array([1, 3, 5, 6, 8, 5, 7, 6, 8, 9])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [35]:
r = np.corrcoef(x, y)
r

array([[1.        , 0.84212907],
       [0.84212907, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [37]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
print(x)
print(y)

0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64
0     2
1     1
2     4
3     5
4     8
5    12
6    18
7    25
8    96
9    48
dtype: int64


Call the relevant method  to calculate Pearson's r correlation.

In [64]:


r = x.corr(y, method= "pearson")
r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [67]:

rho = x.corr(y, method="spearman")
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [42]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [44]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [48]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [49]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


In [50]:
tips.corr(method= "spearman")

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.678968,0.604791
tip,0.678968,1.0,0.468268
size,0.604791,0.468268,1.0


In [51]:
tips.corr(method= "kendall")

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.517181,0.484342
tip,0.517181,1.0,0.378185
size,0.484342,0.378185,1.0
