# MACHINE LEARNING

Machine learning is programming computers to optimize a performance criterion using example
data or past experience. We have a model defined up to some parameters, and learning is the
execution of a computer program to optimize the parameters of the model using the training data or
past experience. 

The model may be predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both

A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience E.

# Machine Learning vs Artificial Learning

Machine learning is a subset of AI which allows a machine to automatically learn from past data without programming explicitly.

The goal of AI is to make a smart computer system like humans to solve complex problems.

The goal of ML is to allow machines to learn from data so that they can give accurate output.

# Scope of Machine Learning

The Future Scope of Machine Learning: Top Use Cases
“Machine learning is the process of automatically getting insights from data that can drive business value”, says Lavanya Tekumalla, founder and mentor. 

This is typically done in the following process:

1) Gathering and preparing large volumes of data that the machine will use to teach itself.
2) Feeding the data into ML models and training them to make right decisions through supervision and correction.
3) Deploying the model to make analytical predictions or feed with new kinds of data to expand its capabilities.


# Types of ML

In general, machine learning algorithms can be classified into three types:

1) Supervised learning
    - Regression
    - Classiffication
2) Unsupervised learning
    - Clustering
    - Market Basket Analysis
3) Reinforcement learning


# Real World Application


1. Image Recognition
2. Speech Recognition
3. Traffic prediction
4. Product recommendations
5. Self-driving cars
6. Email Spam and Malware Filtering
7. Virtual Personal Assistant
8. Online Fraud Detection
9. Stock Market trading
10. Medical Diagnosis
11. Automatic Language Translation

# Data Wrangling

Data Wrangling is a broad term used often from informally to describe the process of transforming the raw data to a clean and organized format ready for use for us.

The wrangling is only one step in processing our data.

The most common data structure used to wrangle data is the data frame, which can be both intuitive and incredibly versatile.

That the frames are tabular, meaning that they are based in rows and columns

In [1]:
import pandas as pd
 
url = 'https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/titanic.csv'
dataframe = pd.read_csv(url)
dataframe

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.00,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.00,female,0,1
2,"Allison, Mr Hudson Joshua Creighton",1st,30.00,male,0,0
3,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.00,female,0,1
4,"Allison, Master Hudson Trevor",1st,0.92,male,1,0
...,...,...,...,...,...,...
1308,"Zakarian, Mr Artun",3rd,27.00,male,0,0
1309,"Zakarian, Mr Maprieder",3rd,26.00,male,0,0
1310,"Zenni, Mr Philip",3rd,22.00,male,0,0
1311,"Lievens, Mr Rene",3rd,24.00,male,0,0


In [2]:
dataframe.head(5)

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1
2,"Allison, Mr Hudson Joshua Creighton",1st,30.0,male,0,0
3,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.0,female,0,1
4,"Allison, Master Hudson Trevor",1st,0.92,male,1,0


### 1. Data Inspection and Exploration

In [3]:
dataframe.describe()

Unnamed: 0,Age,Survived,SexCode
count,756.0,1313.0,1313.0
mean,30.397989,0.342727,0.351866
std,14.259049,0.474802,0.477734
min,0.17,0.0,0.0
25%,21.0,0.0,0.0
50%,28.0,0.0,0.0
75%,39.0,1.0,1.0
max,71.0,1.0,1.0


In [4]:
dataframe.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
1308    False
1309    False
1310    False
1311    False
1312    False
Length: 1313, dtype: bool

In [5]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1313 entries, 0 to 1312
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      1313 non-null   object 
 1   PClass    1313 non-null   object 
 2   Age       756 non-null    float64
 3   Sex       1313 non-null   object 
 4   Survived  1313 non-null   int64  
 5   SexCode   1313 non-null   int64  
dtypes: float64(1), int64(2), object(3)
memory usage: 61.7+ KB


In [6]:
dataframe.nunique()

Name        1310
PClass         4
Age           75
Sex            2
Survived       2
SexCode        2
dtype: int64

In [7]:
cat_col = [col for col in dataframe.columns if dataframe[col].dtype == 'object']
print('Categorical columns :',cat_col)
# Numerical columns
num_col = [col for col in dataframe.columns if dataframe[col].dtype != 'object']
print('Numerical columns :',num_col)

Categorical columns : ['Name', 'PClass', 'Sex']
Numerical columns : ['Age', 'Survived', 'SexCode']


In [8]:
dataframe[cat_col].nunique()

Name      1310
PClass       4
Sex          2
dtype: int64

### 2. Removal of unwanted observations

In [9]:
dataframe['Name'].unique()[:50]

array(['Allen, Miss Elisabeth Walton', 'Allison, Miss Helen Loraine',
       'Allison, Mr Hudson Joshua Creighton',
       'Allison, Mrs Hudson JC (Bessie Waldo Daniels)',
       'Allison, Master Hudson Trevor', 'Anderson, Mr Harry',
       'Andrews, Miss Kornelia Theodosia', 'Andrews, Mr Thomas, jr',
       'Appleton, Mrs Edward Dale (Charlotte Lamson)',
       'Artagaveytia, Mr Ramon', 'Astor, Colonel John Jacob',
       'Astor, Mrs John Jacob (Madeleine Talmadge Force)',
       'Aubert, Mrs Leontine Pauline', 'Barkworth, Mr Algernon H',
       'Baumann, Mr John D',
       'Baxter, Mrs James (Helene DeLaudeniere Chaput)',
       'Baxter, Mr Quigg Edmond', 'Beattie, Mr Thomson',
       'Beckwith, Mr Richard Leonard',
       'Beckwith, Mrs Richard Leonard (Sallie Monypeny)',
       'Behr, Mr Karl Howell', 'Birnbaum, Mr Jakob',
       'Bishop, Mr Dickinson H', 'Bishop, Mrs Dickinson H (Helen Walton)',
       'Bjornstrm-Steffansson, Mr Mauritz Hakan',
       'Blackwell, Mr Stephen Weart'

In [10]:
dataframe['Survived'].unique()[:50]

array([1, 0], dtype=int64)

In [11]:
df1 = dataframe.drop(columns=['Name'])
df1.shape

(1313, 5)

### 3. Handling Missing Values

In [12]:
round((df1.isnull().sum()/df1.shape[0])*100,2)

PClass       0.00
Age         42.42
Sex          0.00
Survived     0.00
SexCode      0.00
dtype: float64

In [13]:
# Mean imputation
df2 = df1.fillna(df1.Age.mean())
# Let's check the null values again
df2.isnull().sum()

PClass      0
Age         0
Sex         0
Survived    0
SexCode     0
dtype: int64