## Descriptive Statistics in Python
First load the numpy (numerical computing), matplotlib (plotting), and scipy (mathematics and science) libraries and check the version  

In [18]:
import numpy as np 
import matplotlib as mpl 
import scipy as sp 

print ("NumPy version", np.__version__)
print ("SciPy version", sp.__version__)
print ("Matplotlib version", mpl.__version__)

NumPy version 1.17.3
SciPy version 1.3.1
Matplotlib version 3.1.1


In Python we often use lists.  If a list contains only numbers we can do some numerical calculations on it, like computing the sum

In [19]:
test_list=[11,12,17,9,17]
print(sum(test_list))
print(len(test_list))

66
5


However lists can be too general for numerical compuations.  For example we can combine numbers and string in lists which results in an error when trying to compute the sum

In [None]:
test_list2=[11,12,17,9,17,'a']
print(sum(test_list))
print(len(test_list))

Numpy Arrays are a better option for numerical compuation.  Let's create an 'array' of numbers using numpy

In [20]:
test_array=np.array([11,12,17,9,17])
print(type(test_array))

<class 'numpy.ndarray'>


And run some calculations

In [21]:
test_array=np.array([11,12,17,9,17])

print ("max is: ", test_array.max())
print ("min is: ", test_array.min())

max is:  17
min is:  9


We can calculate measure of typical value

In [31]:
print ("mean is: ", test_array.mean())
print ("median is: ", np.median(test_array))
print ("mode is: ", scipy.stats.mode(test_array))

mean is:  13.2
median is:  12.0
mode is:  ModeResult(mode=array([17]), count=array([2]))


We can also load data directly from .csv using the Pandas libary. 

In [22]:
import pandas as pd
from IPython.display import display

data_pandas=pd.read_csv('credit_risk.csv')
print (type(data_pandas))
display (data_pandas.head())

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,debt,salary,repaid
0,20000,15000,no
1,30000,45000,no
2,15000,41000,yes
3,0,80000,yes
4,12000,27000,yes


Once data is in a Pandas DataFrame it is easy to calculate statistics

In [27]:
display(data_pandas.mean())
display(data_pandas.median())

debt      15400.0
salary    41600.0
dtype: float64

debt      15000.0
salary    41000.0
dtype: float64

The describe function is a convenient way to display several statistics

In [28]:
data_pandas.describe()

Unnamed: 0,debt,salary
count,5.0,5.0
mean,15400.0,41600.0
std,10990.905331,24531.612258
min,0.0,15000.0
25%,12000.0,27000.0
50%,15000.0,41000.0
75%,20000.0,45000.0
max,30000.0,80000.0
