## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [24]:
import numpy as np
import pandas as pn
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [9]:
samples = np.random.normal(100, 15, size=1000)
samples


array([ 97.06002959,  96.62874138, 120.07990446,  99.88174994,
       115.12863585, 119.70508231, 114.76232587, 119.71346032,
       114.95713525,  73.21371382, 106.29630477,  83.13715284,
        86.12168816,  94.57883494, 119.80373298, 119.83559575,
        59.63036944,  99.56946483,  83.56463233, 111.35976492,
       117.88028899,  92.83728442,  79.99282454, 107.0383351 ,
        73.62900343,  85.51745333, 102.00761099,  77.43746205,
       124.42769042, 106.13495366,  88.13658077,  89.23383164,
        83.47323212,  88.03411655, 109.66664512,  70.64120457,
        99.6287586 ,  83.51648395,  83.67799702,  96.33127033,
        80.7685881 , 122.9481353 , 102.01589735, 103.84379737,
       102.25597135,  83.97214585, 114.30425099, 111.16803828,
        75.5375053 , 100.34425047,  84.78779701,  95.78576371,
        76.62507801,  98.63913811,  86.04270464, 118.14425314,
       101.4612277 , 103.04557744,  75.71726921,  89.79975037,
        89.18970979,  93.53179102,  97.0029452 ,  81.40

Compute the **mean**, **median**, and **mode**

In [26]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)
print("mean:", mean)
print("median:", median)
print("mode:", mode)

mean: 99.4654343440834
median: 99.22467444726891
mode: ModeResult(mode=array([55.39852712]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [31]:
min = np.min(samples)
max = np.max(samples)
q1 = np.median(samples[:500])
q3 = np.median(samples[500:])
iqr = q3 - q1

min
max
q1
q3
iqr

1.0902334039643051

In [32]:
min

55.39852711527713

In [33]:
max

148.8081608505441

In [34]:
q1

98.68304959679753

In [35]:
q3

99.77328300076184

Compute the **variance** and **standard deviation**

In [38]:
variance = np.var(samples)
std_dev = np.std(samples)
variance

221.70283856316976

In [39]:
std_dev

14.889689001559763

Compute the **skewness** and **kurtosis**

In [45]:
skewness = stats.skew(samples)


skewness

0.02749570541005768

In [51]:
kurtosis = stats.kurtosis(samples)
kurtosis

-0.2363575809161076

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [55]:
x = np.arange(10,20, dtype=int)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [67]:
y = np.array(samples[:10], dtype=int)
y

array([ 97,  96, 120,  99, 115, 119, 114, 119, 114,  73])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [71]:
r = np.corrcoef(x,y)
r

array([[ 1.        , -0.05581308],
       [-0.05581308,  1.        ]])

## Pandas Correlation Calculation

Run the code below

In [4]:
import pandas as pd

In [5]:

x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
x
y

0     2
1     1
2     4
3     5
4     8
5    12
6    18
7    25
8    96
9    48
dtype: int64

Call the relevant method  to calculate Pearson's r correlation.

In [6]:
import numpy as np
import matplotlib.pyplot as plt



In [11]:
from scipy.stats import pearsonr

In [13]:
corr, _ = pearsonr(x, y)

corr

0.758640289091187

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [None]:
rho =

## Seaborn Dataset Tips

## Import Seaborn Library

In [14]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [15]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [28]:
df.mean()

8.451297814207651

In [29]:
df.median()

2.9982786885245902

In [30]:
df.mode()

0     2.569672
1     2.998279
2    19.785943
dtype: float64

Call the relevant method to calculate pairwise Pearson's r correlation of columns