## Descriptive Statistics

In [2]:
import numpy as np
import pandas as pd
import scipy.stats as stats

In [None]:
"""
np.random.normal()

Draw random samples from a normal (Gaussian) distribution.

loc : float or array_like of floats
    Mean ("centre") of the distribution.
scale : float or array_like of floats
    Standard deviation (spread or "width") of the distribution. Must be
    non-negative.
size : int or tuple of ints, optional
    Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
    ``m * n * k`` samples are drawn.  If size is ``None`` (default),
    a single value is returned if ``loc`` and ``scale`` are both scalars.
    Otherwise, ``np.broadcast(loc, scale).size`` samples are drawn.
"""

In [6]:
samples = np.random.normal(100,15,1000)

Compute the mean, median, and mode

In [7]:
mean =np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)
print(mean)
print(median)
print(mode)

100.13069116638482
99.73388940360094
ModeResult(mode=array([57.25981244]), count=array([1]))


Compute the **min, max, Q1, Q3**, and **interquartile range**

In [8]:
mini = np.min(samples)
print(mini)
maxi = np.max(samples)
print(maxi)
q1 = np.percentile(samples,25)
print(q1)
q3 = np.percentile(samples,75)
print(q3)
iqr = q3-q1
iqr2= stats.iqr(samples)
print(iqr)
print(iqr2)

57.25981243769809
142.57077227596434
90.43544824633024
110.75940947282237
20.32396122649213
20.32396122649213


Compute the **variance** and **standard deviation**

In [9]:
variance = np.var(samples)
print(variance)
std_dev = np.std(samples)
print(std_dev)

221.5913588663021
14.885945010858467


Compute the **skewness** and **kurtosis**

In [10]:
skewness = stats.skew(samples)
print(skewness)
kurtosis = stats.kurtosis(samples)
print(kurtosis)

0.015511014133762456
-0.09970080215015154


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use np.arange()

In [11]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use np.array() to create a second array y containing 10 arbitrary integers.

In [12]:
y = np.array([1,2,3,4,5,6,7,8,9,10])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [13]:
r = np.corrcoef(x, y)
r

array([[1., 1.],
       [1., 1.]])

## Pandas Correlation Calculation

Run the code below

In [14]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method to calculate Pearson's r correlation.

In [15]:
r = stats.pearsonr(x, y)
print(r)

(0.758640289091187, 0.010964341301680813)


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [16]:
rho = stats.spearmanr(x, y)
print(rho)

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)


## Seaborn Dataset Tips

Import Seaborn Library

In [17]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [18]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [19]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [21]:
stats.mode(tips["total_bill"])

ModeResult(mode=array([13.42]), count=array([3]))

In [22]:
stats.mode(tips["tip"])

ModeResult(mode=array([2.]), count=array([33]))

In [23]:
tips["total_bill"].std()

8.902411954856856

In [26]:
print(tips.agg(["min","median","std", "var"]))

        total_bill       tip      size
min       3.070000  1.000000  1.000000
median   17.795000  2.900000  2.000000
std       8.902412  1.383638  0.951100
var      79.252939  1.914455  0.904591


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [27]:
tips.head(1)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2


In [28]:
stats.pearsonr(tips["total_bill"], tips["tip"])

(0.6757341092113647, 6.6924706468630016e-34)

In [29]:
stats.pearsonr(tips["total_bill"], tips["size"])

(0.5983151309049013, 4.393510142477069e-25)

In [31]:
stats.pearsonr(tips["tip"], tips["size"])

(0.4892987752303571, 4.3005433272249695e-16)