# Table of Content
1. Introduction to Machine Learning
2. Real life relation with Machine learning
3. Supervised Learning: Regression
4. Feature Engineering
5. Supervised Learning: Classification

        

# What is Machine Learning ?

### The capability of Artificial Intelligence systems to learn by extracting patterns from data is known as Machine Learning. It is a kind of AI that enables computers to think and learn on their own.
![image.png](attachment:image.png)


# Features of Machine Learning
* Machine Learning is computing-intensive and generally requires a large amount of training data.
* It involves repetitive training to improve the learning and decision making of algorithms.
* As more data gets added, Machine Learning training can be automated for learning new data patterns and adapting its algorithm.

# Difference between Traditional Programming and Machine learning

![image.png](attachment:image.png)

# Type of Machine Learning


1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Reinforcement Machine Learning 

# Machine learning performance Steps
1. * Problem Definition
1. * Analyse Data
1. * Prepare Data
1. * Evaluate Algorithm
1. * Improve Results
1. * Present Results

# Application of Supervised Machine Learning
* Bioinformatics
* Quantitative structure
* Database marketing
* Handwriting recognition
* Information retrieval
* Learning to rank
* Information extraction
* Object recognition in computer vision
* Optical character recognition
* Spam detection
* Pattern recognition

# Machine Learning Technique

![image.png](attachment:image.png)

# Supervised Machine Learning
Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.
It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance.

Supervised learning problems can be further grouped into **regression ** and **classification** problems.

#### Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.

#### Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.

Some common types of problems built on top of classification and regression include recommendation and time series prediction respectively.

Some popular examples of supervised machine learning algorithms are:

Linear regression for regression problems.
Random forest for classification and regression problems.
Support vector machines for classification problems.

# Unsupervised Machine Learning
Unsupervised learning is where you only have input data (X) and no corresponding output variables.

The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.

Unsupervised learning problems can be further grouped into clustering and association problems.

#### Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
#### Association:  An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Some popular examples of unsupervised learning algorithms are:

k-means for clustering problems.
Apriori algorithm for association rule learning problems.

# Reinforcement Machine Learning
Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem. To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.

![image.png](attachment:image.png)

## Supervised learning uses regression techniques and classification algorithms to develop predictive models.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

1. **Simple Linear Regression model:** *Simple linear regression is a statistical method that enables users to summarise and study relationships between two continuous (quantitative) variables. Linear regression is a linear model wherein a model that assumes a linear relationship between the input variables (x) and the single output variable (y). Here the y can be calculated from a linear combination of the input variables (x). When there is a single input variable (x), the method is called a simple linear regression. When there are multiple input variables, the procedure is referred as multiple linear regression.*

![image.png](attachment:image.png)

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
train = pd.read_csv("../input/random-linear-regression/train.csv") 
test = pd.read_csv("../input/random-linear-regression/test.csv") 
train = train.dropna()
test = test.dropna()
train.head()

# Model PLot and Accuracy
X_train = np.array(train.iloc[:, :-1].values)
y_train = np.array(train.iloc[:, 1].values)
X_test = np.array(test.iloc[:, :-1].values)
y_test = np.array(test.iloc[:, 1].values)
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)

plt.plot(X_train, model.predict(X_train), color='green')
plt.show()
print(accuracy)

### Applications:
* Studying engine performance from test data in automobiles
* Least squares regression is used to model causal relationships between parameters in biological systems
* OLS regression can be used in weather data analysis
* Linear regression can be used in market research studies and customer survey results analysis
* Linear regression is used in observational astronomy commonly enough. A number of statistical tools and methods are used in astronomical data analysis, and there are entire libraries in languages like Python meant to do data analysis in astrophysics.

2. **Lasso Regression**: *LASSO stands for Least Absolute Selection Shrinkage Operator wherein shrinkage is defined as a constraint on parameters. The goal of lasso regression is to obtain the subset of predictors that minimize prediction error for a quantitative response variable. The algorithm operates by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward a zero.*

      *Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. This lasso regression analysis is basically a shrinkage and variable selection method and it helps analysts to determine which of the predictors are most important.*

In [None]:
from sklearn.linear_model import ElasticNet, Lasso,  BayesianRidge, LassoLarsIC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import RobustScaler
#Data is used the same as LGB
X = train.drop(columns=['item_price', 'item_id']) 
y = train['item_price']
X.head()

# Model & Accuracy
lasso = make_pipeline(RobustScaler(), Lasso(alpha =0.0005, random_state=1))
lasso.fit(X, y)
r2_score(lasso.predict(X), y)

### Applications:
Lasso regression algorithms have been widely used in financial networks and economics. In finance, its application is seen in forecasting probabilities of default and Lasso-based forecasting models are used in assessing enterprise wide risk framework. Lasso-type regressions are also used to perform stress test platforms to analyze multiple stress scenarios.  

Please note : Obviously I am collecting the best part to understand ML from various sources.

## I will discuss all technique in details in another notebook to understand everything more efficiently.

# 3.**Logistic regression:** 
*One of the most commonly used regression techniques in the industry which is extensively applied across fraud detection, credit card scoring and clinical trials, wherever the response is binary has a major advantage. One of the major upsides is of this popular algorithm is that one can include more than one dependent variable which can be continuous or dichotomous. The other major advantage of this supervised machine learning algorithm is that it provides a quantified value to measure the strength of association according to the rest of variables. Despite its popularity, researchers have drawn out its limitations, citing a lack of robust technique and also a great model dependency.*




### Applications: 
Today enterprises deploy Logistic Regression to predict house values in real estate business, customer lifetime value in the insurance sector and are leveraged to produce a continuous outcome such as whether a customer can buy/will buy scenario.

In [None]:
import sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score
from statistics import mode


train = pd.read_csv("../input/titanic/train.csv")
test  = pd.read_csv('../input/titanic/test.csv')
train.head()


ports = pd.get_dummies(train.Embarked , prefix='Embarked')
train = train.join(ports)
train.drop(['Embarked'], axis=1, inplace=True)
train.Sex = train.Sex.map({'male':0, 'female':1})
y = train.Survived.copy()
X = train.drop(['Survived'], axis=1) 
X.drop(['Cabin'], axis=1, inplace=True) 
X.drop(['Ticket'], axis=1, inplace=True) 
X.drop(['Name'], axis=1, inplace=True) 
X.drop(['PassengerId'], axis=1, inplace=True)
X.Age.fillna(X.Age.median(), inplace=True)

#Model and Accuracy

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter = 500000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)
print(accuracy)

In [None]:
# Support Vector Machine

# Support Vector Machine:
Support Vector Machines are perhaps one of the most popular and talked about machine learning algorithms.It is primarily a classier method that performs classification tasks by constructing hyperplanes in a multidimensional space that separates cases of different class labels. SVM supports both regression and classification tasks and can handle multiple continuous and categorical variables

Example: One class is linearly separable from the others like if we only had two features like Height and Hair length of an individual, we’d first plot these two variables in two dimensional space where each point has two co-ordinates

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
data_svm = pd.read_csv("../input/svm-classification/UniversalBank.csv")
data_svm.head()



#model & accuuracy

X = data_svm.iloc[:,1:13].values
y = data_svm.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()

# Work in Progress :-)