<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/MLPG-Book-Cover-Small.png"><br>

This notebook contains an excerpt from the **`Machine Learning Project Guidelines - For Beginners`** book written by *Balasubramanian Chandran*; the content is available [on GitHub](https://github.com/BalaChandranGH/Books/ML-Project-Guidelines).

<br>
<!--NAVIGATION-->

<[ [Contents and Acronyms](00.00-mlpg-Contents-and-Acronyms.ipynb) | [Grouping of ML Algorithms](02.00-mlpg-Grouping-of-ML-Algorithms.ipynb) ]>

# 1. Introduction to Machine Learning

## 1.1. What is Machine Learning?
* Machine Learning is the science of teaching machines how to learn by themselves
* Machines can do high-frequency repetitive tasks with high accuracy without getting bored
* Machines need a way to think and this is precisely where machine learning models help
* The machines capture data from the environment and feed it to the machine learning models

## 1.2. Automation vs Machine Learning vs Statistical modeling
* **Automation**: Rule-driven and _`does not evolve`_
* **Machine Learning**: An algorithm that _`does evolve`_ by learning from past data, without relying on rules-based programming, and changing its decisions/performance
* **Statistical modeling**: Formalization of relationships between variables in the form of mathematical equations<br>
_`The only relation between Automation and ML is that ML enables better automation.`_

## 1.3. Differences between Machine Learning and Statistical modeling
![](figures/MLPG-DifferencesMLSM.png)

## 1.4. Types of Machine Learning
**Supervised Learning:**
* This algorithm consists of a target/outcome variable (dependent variable) which is to be predicted from a given set of predictors (independent variables)
* Using this set of variables, we generate a function that maps inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data
* Examples: Linear Regression, Decision Tree, Random Forest, KNN, Logistic Regression, etc.

**Semi-Supervised Learning:**
* It refers to a learning problem (and algorithms designed for the learning problem) that involves a small portion of labeled examples and a large number of unlabeled examples from which a model must learn and make predictions on new examples

**Unsupervised Learning:**
* In this algorithm, we do not have any target or outcome variable to predict/estimate
* It is used for clustering populations in different groups, which is widely used for segmenting customers in different groups for specific interventions
* Examples: Apriori algorithm, K-means

**Reinforcement Learning:**
* Using this algorithm, the machine is trained to make specific decisions
* It works this way: the machine is exposed to an environment where it trains itself continually using trial and error
* This machine learns from experience and tries to capture the best possible knowledge to make accurate business decisions
* Examples: Markov Decision Process
* Reinforcement Learning is said to be the hope of true artificial intelligence and it is a slightly complex topic as compared to traditional ML but an equally crucial one for the future

## 1.5. Differences between Supervised & Unsupervised Learnings
![](figures/MLPG-DifferencesSLUSL.png)

## 1.6. Difference between ML Algorithms and ML Models
**ML Algorithms:**
* Procedures that are implemented in code and are run on data
* Learn from data or fit on a dataset
* Can be described by pseudocodes
* Can be implemented by programming languages (Python, R, etc.)

**ML Models:**
* Outputs generated by algorithms and are comprised of model data and algorithms, i.e., <br>
_`Models = Algorithms + data`_
* The output of ML algorithms that run on data
* Represents what was learned by an ML algorithm
* An ML model is a set of programs that is saved after running an ML algorithm on training data. For eg.,
  - An LR algorithm results in a model comprised of a vector of coefficients with specific values
  - A DT algorithm results in a model comprised of a tree of if-then statements with specific values
  - The NN/backpropagation/gradient descent algorithms together result in a model comprised of a graph structure with vectors or matrices of weights with specific values

## 1.7. Is Machine Learning a complete black box?
* No – it is not. There are methods/algorithms within ML which can be interpreted well. These methods can help us understand what are the significant relationships and why has the m/c taken a particular decision
* On the other hand, certain algorithms are difficult to interpret. With these methods, even if we achieve very high accuracy, we may struggle with explanations
* The good thing is that depending on the application or the problem we are trying to solve – we can choose the right method. This is also a very active field of research and development

## 1.8. Challenges in the adoption of Machine Learning
* While ML has made tremendous progress in the last few years, some big challenges still need to be solved
* It is an area of active research and you can expect a lot of effort to solve these problems in the coming time. The challenges are:
  - _**Huge data requirement**_: It takes a huge amount of data to train a model today. For example – if you want to classify Cats vs. Dogs based on images (and you don’t use an existing model) – you would need the model to be trained on thousands of images. Compare that to a human – we typically explain the difference between Cat and Dog to a child by using 2 or 3 photos
  - _**High computing requirement**_: As of now, machine learning and deep learning models require huge computations to achieve simple tasks (simple according to humans). This is why the use of special hardware including GPUs and TPUs is required. The cost of computations needs to come down for machine learning to make a next-level impact
  - _**Interpretation of models is difficult at times**_: Some modeling techniques can give us high accuracy but are difficult to explain. This can leave the business owners frustrated. Imagine being a bank, but you cannot tell why you declined a loan for a customer!
  - _**New and better algorithms requirement**_: Researchers are consistently looking out for new and better algorithms to address some of the problems mentioned above
  - _**Need for more Data Scientists**_: Further, since the domain has grown so quickly – there aren’t many people with the skill sets required to solve the vast variety of problems. This is expected to remain so for the next few years. So, if you are thinking about building a career in machine learning – you are in good stead!

## 1.9. Applications of ML in day-to-day life (most common UCs)
* Applications in Smartphones
  - Voice Assistants (e.g., Apple's Siri, Google Assistant, Amazon's Alexa, Microsoft's Cortana, etc.)
  - Cameras (e.g., Object detection and filling missing parts in a picture, etc.)
  - App Store and Play store recommendations (e.g., Google Play Store)
  - Face Unlock
* Transportation Optimization
  - Dynamic pricing in travel (e.g., Depending on location, Time of day, Weather, Demand, etc.)
  - Google maps (e.g., Routes, Estimated time of travel, Traffic details, Explore Nearby feature, etc.)
* Popular Web Services
  - Email filtering (e.g., Spam detection)
  - Google Search
  - Google Translate
  - LinkedIn and Facebook recommendations and Ads
* Sales and Marketing
  - Recommendation Engines
    - E-commerce sites like Amazon and Flipkart
    - Movie services like IMDb and Netflix
    - Hospitality sites like MakeMyTrip, Booking.com, etc.
    - Food aggregators like Zomato and Uber Eats
  - Personalized Marketing
  - Customer Support Queries (and Chatbots)
* Security
  - Video Surveillance
  - Cyber Security (Captchas - Completely Automated Public Turing test - I am not a robot)
* Financial Domain
  - Fraud Detection in banking transactions, and Personalized Banking
* Other Popular Use Cases (e.g., Self-driving cars)

## 1.10. Why is Machine Learning getting so much attention recently?
This development is driven by a few underlying forces (Asimov’s Three Laws of robotics):<br>
**Force 1**: The amount of data generation is increasing significantly with a reduction in the cost of sensors<br>
**Force 2**: The cost of storing this data has been reduced significantly<br>
**Force 3**: The cost of computing has come down significantly<br>
**Force 4**: Cloud has democratized Compute for the masses

## 1.11. A must know 7 regression techniques
![](figures/MLPG-7RegTechs1.png)
![](figures/MLPG-7RegTechs2.png)

## 1.12. How to select the right regression technique?
* There are various kinds of regression techniques available to make predictions, however, these techniques are mostly driven by three metrics:
  - Number of independent variables
    - Can range from 10s to 100s of 1000s
  - The type of the dependent variable(s)
    - If continuous, use Linear Regression. If categorical, use Logistic Regression
  - The shape of the regression line
    - Can be a straight line or Gaussian bell shape or any non-linear shape

<!--NAVIGATION-->
<br>

<[ [Contents and Acronyms](00.00-mlpg-Contents-and-Acronyms.ipynb) | [Grouping of ML Algorithms](02.00-mlpg-Grouping-of-ML-Algorithms.ipynb) ]>