# Q1 : How to Choose a Machine Learning Model – Some Guidelines?

The overall steps for Machine Learning/Deep Learning are:

* Collect data
* Check for anomalies, missing data and clean the data
* Perform statistical analysis and initial visualization
* Build models
* Check the accuracy
* Present the results
 
**Machine learning tasks can be classified into**

* Supervised learning
* Unsupervised learning
* Semi-supervised learning
* Reinforcement learning    

Below are some approaches on choosing a model for Machine Learning/Deep Learning:

**OVERALL APPROACHES:** 

* Dealing with unbalanced data: Use resampling strategies            
* Create new features : Principal component analysis (PCA) to reduce dimensionality, Autoencoders to create a latent space and possibly Clustering to create new features
* To prevent overfitting, outliers and noise in linear regression - use regularization techniques like lasso and ridge.
* Overcoming the Black-box AI problem - consider strategies for building interpretable models
* Algorithms not sensitive to outliers : Some discussion on choice of Random Forest to overcome outliers

**MACHINE LEARNING MODELS**

* First approach to predicting continuous values: Linear Regression is generally a good first approach for predicting continuous values (ex: prices)
* Binary classification: Logistic regression is a good starting point for Binary classification. Support Vector Machines SVM is also a good choice of two class classification
* Multi-class classification: Random forest is a choice for multi-class classification. See SVM vs Random Forest usage
* Is there a simplest or easiest model category to start off with? Decision trees are often seen as simple to understand and use. Decision trees are implemented through models such as Random forest or Gradient boosting.
* Which models are used in Kaggle? For supervised learning: Random forest and XGboost See note on Gradient boosted trees

**DEEP LEARNING MODELS**

* Complex features which cannot be easily specified but you have large number of labelled examples: Multi-layer perceptrons
* Vision based Machine Learning: Image classification, Object Detection, Image segmentation – Convolutional Neural Networks
* Sequence modelling tasks: RNNs (typically LSTM) for sequence modelling tasks ex text classification or language translation
 

#  What is the difference between AI, Data Science, ML, and DL?

## Artificial Intelligence:
AI is purely math and scientific exercise, but when it became computational, it started to solve
human problems formalized into a subset of computer science. Artificial intelligence has changed
the original computational statistics paradigm to the modern idea that machines could mimic
actual human capabilities, such as decision making and performing more “human” tasks. Modern
AI into two categories 
1. General AI - Planning, decision making, identifying objects, recognizing sounds, social & business transactions 
2. Applied AI - driver-less Autonomous car or machine smartly trade stocks
![](./i/dsday1-1-768x432.jpg)
## Machine Learning:
Instead of engineers teaching or programming computers to have what they need to carry out
tasks, that perhaps computers could teach themselves – learn something without being explicitly
programmed to do so. ML is a form of AI where based on more data, and they can change actions
and response, which will make more efficient, adaptable and scalable. e.g., navigation apps and recommendation engines.
![](./i/1_y5iOXWzUaA78akU7dbf3cw.jpeg)

## Data Science:

Data science has many tools, techniques, and algorithms called from these fields, plus others to handle big data.

The goal of data science, somewhat similar to machine learning, is to make accurate predictions and to automate and perform transactions in real-time, such as purchasing internet traffic or automatically generating content.

on math and coding and more on data and building new systems to
process the data. Relying on the fields of data integration, distributed architecture, automated machine learning, data visualization, data engineering, and automated data-driven decisions, data
science can cover an entire spectrum of data processing, not only the algorithms or statistics related to data.

**Deep Learning: It is a technique for implementing ML.**

ML provides the desired output from a given input, but DL reads the input and applies it to
another data. In ML, we can easily classify the flower based upon the features. 

Suppose you want
a machine to look at an image and determine what it represents to the human eye, whether a face,
flower, landscape, truck, building, etc.
Machine learning is not sufficient for this task because machine learning can only produce an output from a data set – whether according to a known algorithm or based on the inherent structure
of the data. You might be able to use machine learning to determine whether an image was of an
“X” – a flower, say – and it would learn and get more accurate. But that output is binary (yes/no)
and is dependent on the algorithm, not the data. In the image recognition case, the outcome is not
binary and not dependent on the algorithm.


The neural network performs MICRO calculations with computational on many layers. Neural
networks also support weighting data for ‘confidence‘. These results in a probabilistic system,
vs. deterministic, and can handle tasks that we think of as requiring more ‘human-like’ judgment.

# difference between supervised , unsupervised and Reinforcement learning?

Ans 2: Machine learning is the scientific study of algorithms and statistical models that computer
systems use to effectively perform a specific task without using explicit instructions, relying on
patterns and inference instead.

Building a model by learning the patterns of historical data with some relationship between data to make a data-driven prediction.
Types of Machine Learning:
* Supervised Learning
* Unsupervised Learning
* Reinforcement Learning

## Supervised learning
In a supervised learning model, the algorithm learns on labeled dataset, to generate reasonable predictions for the response to new data. (Forecasting outcome of new data)
* Regression
* Classification

## Unsupervised learning

An unsupervised model, in contrast, provides unlabelled data that the algorithm tries to make
sense of by extracting features, co-occurrence and underlying patterns on its own. We use unsupervised learning for
* Clustering
* Anomaly detection
* Association
* Autoencoders
* Reinforcement Learning

## Reinforcement learning
Reinforcement learning is less supervised and depends on the learning agent in determining the
output solutions by arriving at different possible ways to achieve the best possible solution.

**What is Machine Learning?** Two definitions of Machine Learning are offered. Arthur Samuel
described it as: “the field of study that gives computers the ability to learn without being explicitly
programmed.” This is an older, informal definition.

**Tom Mitchell provides a more modern definition**: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.”

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

**In general, any machine learning problem can be assigned to one of two broad classifications:**
Supervised learning and Unsupervised learning.

In supervised learning, we are given a data set and already know what our correct output should
look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into “regression” and “classification” problems. In
a regression problem, we are trying to predict results within a continuous output, meaning that
we are trying to map input variables to some continuous function.

In a classification problem, we
are instead trying to predict results in a discrete output. In other words, we are trying to map
input variables into discrete categories.

Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price as a
function of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about
whether the house “sells for more or less than the asking price.” Here we are classifying the
houses based on price into two discrete categories.

Example 2:

* (a) Regression - Given a picture of a person, we have to predict their age on the basis of the
given picture
* (b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

Unsupervised Learning Unsupervised learning allows us to approach problems with little or no
idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in
the data.

# Describe the general architecture of Machine learning.
## Business understanding:

![](./i/dsday1-2-768x524.jpg)

Understand the give use case, and also, it’s good to know more about the domain for which the
use cases are built.

## Data Acquisition and Understanding:

Data gathering from different sources and understanding the data. Cleaning the data, handling
the missing data if any, data wrangling, and EDA( Exploratory data analysis).

## Modeling:

Feature engineering – scaling the data, feature selection – not all features are important. We use
the backward elimination method, correlation factors, PCA and domain knowledge to select the
features.
Model Training based on trial and error method or by experience, we select the algorithm and
train with the selected features.
Model evaluation Accuracy of the model, confusion matrix and cross-validation.
If accuracy is not high, to achieve higher accuracy, we tune the model. . . either by changing the
algorithm used or by feature selection or by gathering more data, etc.

## Deployment:

Once the model has good accuracy, we deploy the model either in the cloud or Rasberry py or any
other place. Once we deploy, we monitor the performance of the model.if its good. . . we go live
with the model or reiterate the all process until our model performance is good.
It’s not done yet!!!
What if, after a few days, our model performs badly because of new data. In that case, we do all
the process again by collecting new data and redeploy the model.

# List of python package that I can use for ML?

1- seaborn

2- SKlearn

3- seaborn

4- pandas

5- numpy

6- 