# Mathematics & Statistics for Data Science

## Descriptive vs Inferential Statistics

**Essentially, descriptive statistics state facts and proven outcomes from a population, whereas inferential statistics analyze samplings to make predictions about larger populations.**

## Median, Mean, Mode, Percentile

In [13]:
import pandas as pd
import numpy as np

In [14]:
df = pd.read_csv('/kaggle/input/maths-stats-practice/income.csv')

In [15]:
df

Unnamed: 0,Name,Monthly Income ($)
0,Rob,5000
1,Rafiq,6000
2,Nina,4000
3,Sofia,7500
4,Mohan,8000
5,Tao,7000
6,Elon Musk,10000000


In [20]:
# Renaming collumns

df = df.rename(columns={'Name':'name',
                        'Monthly Income ($)':'income'})

**Mean and Medium are used commonly for Descriptive Analysis and for Data Cleaning (Filling NA Values)**

Where Mean is the average value of a dataset and median is the middle value of a dataset

**Percentile is commonly used for Outlier Removal and General Data Analysis**

In [21]:
# Take statistics from the dataset

df.describe()

Unnamed: 0,income
count,7.0
mean,1433929.0
std,3777283.0
min,4000.0
25%,5500.0
50%,7000.0
75%,7750.0
max,10000000.0


In [22]:
# Find the 50th percentile of 'income' collumn

df['income'].quantile(0.50)

7000.0

In [23]:
# Find the 100th percentile of 'income' collumn which is also an outlier

df['income'].quantile(1)

10000000.0

In [24]:
# We create a new df without the outlier

percentile_99 =  df['income'].quantile(0.99)

new_df = df[df['income'] <= percentile_99]
new_df

Unnamed: 0,name,income
0,Rob,5000
1,Rafiq,6000
2,Nina,4000
3,Sofia,7500
4,Mohan,8000
5,Tao,7000


In [28]:
# Fill a NaN value
# First we create a NaN value for practicing

df['income'][3] = np.NaN

In [29]:
df

Unnamed: 0,name,income
0,Rob,5000.0
1,Rafiq,6000.0
2,Nina,4000.0
3,Sofia,
4,Mohan,8000.0
5,Tao,7000.0
6,Elon Musk,10000000.0


In [30]:
# We fill NaN value with median value

df['income'].median()

df_new2 = df.fillna(df['income'].median())

df_new2

Unnamed: 0,name,income
0,Rob,5000.0
1,Rafiq,6000.0
2,Nina,4000.0
3,Sofia,6500.0
4,Mohan,8000.0
5,Tao,7000.0
6,Elon Musk,10000000.0


> 