## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [4]:
import numpy as np
from scipy import stats 
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [5]:
samples = np.random.normal(100, 15, 1000)
samples

array([131.85404985, 109.59978544,  98.70052877, 127.1005821 ,
       106.68508239, 104.3381121 ,  87.30421952, 113.24168764,
       100.22517811,  96.08030944,  78.79073708,  86.34562313,
       114.03124276,  98.71594343, 108.19721227, 103.29765509,
       108.79107035,  96.34264413,  93.75978142,  98.77605581,
        93.31267333, 100.89001383, 108.17478099, 105.88967564,
       120.14279377,  83.11225249, 129.21520294,  95.84880123,
       105.86950106,  87.3700774 ,  91.81344219, 106.10101198,
        98.61994213, 102.45154812, 120.5971232 , 113.04819591,
       100.46379352, 112.04763161,  90.27094323, 102.21047918,
        94.50175431, 100.20054962,  83.36081393, 108.16247449,
       111.32346193,  92.49810249,  94.59575922,  76.40296957,
        82.94955088, 102.32851224, 109.98593931,  63.00119755,
        86.24298078,  68.80656463,  95.26434364,  98.1982624 ,
        80.11789954,  94.43461291, 121.18722404,  97.22224527,
       106.87413245,  98.64990716,  98.82263756,  80.23

Compute the **mean**, **median**, and **mode**

In [4]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)

print(mean)
print(median)
print(mode)

100.21604236075771
100.45506196444086
ModeResult(mode=array([56.53368999]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [6]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr1 = q3 -q1
iqr2 = stats.iqr(samples)

print(min)
print(max)
print(q1)
print(q3)
print(iqr1)
print(iqr2)


56.53368999241994
152.74854074641433
91.27524839962855
109.61077363860036
18.33552523897181
18.33552523897181


Compute the **variance** and **standard deviation**

In [7]:
variance = np.var(samples)
std_dev = np.std(samples)

print(variance)
print(std_dev)

202.44843099168654
14.228437405129439


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [6]:
import scipy
from scipy.stats import kurtosis, skew
from scipy import stats

In [7]:
skewness = scipy.stats.skew(samples)
skewness1 = skew(samples)
kurtosis = scipy.stats.kurtosis(samples)
kurtosis1 = kurtosis(samples)
print(kurtosis)
print(kurtosis1)
print(skewness)
print(skewness1)


TypeError: 'float' object is not callable

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [8]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [11]:
y = np.array([0,1,2,34,-24,5,324,7,-3,9])
y

array([  0,   1,   2,  34, -24,   5, 324,   7,  -3,   9])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [12]:
r = np.corrcoef(x, y)
print(r)

[[1.         0.17516223]
 [0.17516223 1.        ]]


## Pandas Correlation Calculation

Run the code below

In [14]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
print(x)
print(y)

0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64
0     2
1     1
2     4
3     5
4     8
5    12
6    18
7    25
8    96
9    48
dtype: int64


Call the relevant method  to calculate Pearson's r correlation.

In [15]:
r = stats.pearsonr(x,y)
print(r)

(0.758640289091187, 0.010964341301680813)


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [16]:
rho = stats.spearmanr(x,y)
print(rho)

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)


## Seaborn Dataset Tips

Import Seaborn Library

In [17]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [None]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [None]:
tips.describe()


Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [None]:
a = tips.corr()
print(a)

            total_bill       tip      size
total_bill    1.000000  0.675734  0.598315
tip           0.675734  1.000000  0.489299
size          0.598315  0.489299  1.000000


In [21]:
a = np.arange(6).reshape(3, 2)
b = np.arange(6, 15).reshape(3, 3)
print(a)
print(b)

# vertical stacking (row wise)
#np.vstack((a, b))
# horizontal stacking (column wise)
np.hstack((a, b))

[[0 1]
 [2 3]
 [4 5]]
[[ 6  7  8]
 [ 9 10 11]
 [12 13 14]]


array([[ 0,  1,  6,  7,  8],
       [ 2,  3,  9, 10, 11],
       [ 4,  5, 12, 13, 14]])

In [2]:
for _ in input():
    print(_)

a
l
o
