# Week 1 - Exploratory data analysis

## 1. Introduction to Machine Learning

### A bit of history

- The whole history starts in 1943 when Walter Pitts and Warred McCulloch presented the first mathematical neural network model.
- The term itself was created in 1959 by Arthur Samuel who created a professional-level computer checker algorithm that analyzed the data and made the most optimal move.
- 1967 Thomas Cover and Peter E. Hart publishes an article about the nearest neighbour algorihm.
- 1986 Paul Smolensky creates Restricted Boltzmann Machine (RBM).
- Deep Blue wins a gaim against Garry Kasparov on 10 February 1996.


### Model development

The development of the machine learning model generally involves the following steps:

1. **Exploratory Data Analysis** (or **EDA**). We aim to understand the data and perform the initial (statistical) analysis. It could involve visualization (histograms, etc.), statistical measures (inter quartile ranges, variance, etc.).

2. **Data Pre-processing**. Prior to the model training, the data has to be pre-processed. In most cases, this includes checking and dealing with missing values, detecting outliers, balancing dataset, etc.

3. **Feature Selection**. In order to train a model, one has to choose the meaningful features that are meaningful for the depending variable.

4. **Building Model**. Choosing the model that is the most appropriate for the desired task (linear regression, K-Means, etc.)

5. **Training Model**. Training model with the features prepared in the previous step.

6. **Evaluating Model**. By evaluating model(s) according to metrics, we can ensure the selection of the most optimal combination.


### Data Pre-processing

Data pre-processing is one of the most important and underlooked steps in the ML model development, therefore, we will look at some of the most widely used methods.

#### Missing data

When dealing with missing data, there are a few options available: 
- We can delete the row containing missing data (might be loosing some important data)
- Insert the mean in the place of the missing value (useful when we have numeric data like age, salary, year, etc.)


#### Encoding data

Frequently when dealing with non-numeric (text, etc.) or categorical (positive-negative) data, we want to convert them into numerical values so that model could mathematically understand our data. There are a few methods to do it:

- **One-hot encoding**. Setting the feature in focus to 1, while other values remains 0.
- **Embedding**. Setting all features to numeric value (usually for textual data).


#### Feature scalling

The feature scalling goal is to standardize the independent in the specific range, thus decreasing the computational work. In addition, this helps when multiple independent values have different order of magnitudes (without standardizing, a larger value would take over smaller features).

- **Standardization**
![standartization](https://static.javatpoint.com/tutorial/machine-learning/images/data-preprocessing-machine-learning-9.png)


- **Normalization**
![normalization](https://static.javatpoint.com/tutorial/machine-learning/images/data-preprocessing-machine-learning-10.png)


### Classification of ML

Machine Learning models are classified into three major categories, depending on the nature of the learning "signal" or "response" present to a learning system.

- **Supervised Learning**. An algorithm is provided an example data and associated target outputs. After such learning, the model is abble to provide new outputs from new example data. ![supervised](https://www.onemodel.co/hs-fs/hubfs/supervised%20learning.png?width=500&name=supervised%20learning.png)



- **Unsupervised Learning**. Algorithm learns from the example data **without** the associated target output - the model has to determine patterns on its own. This leads to restructuring of the data into new features that might represent a class or new series of (un)correlated values. One of the most common examples of such category would be various clustering algorithms (data is "grouped" according to feature similarities).
![clustering](https://www.ecloudvalley.com/wp-content/uploads/2019/09/Unsupervised-learning.png)


- **Reinforcement Learning**. It is similar to the unsupervised learning in a way that we present the examples without an associated target output. On the other hand, we accompany the example with positive or negative feedback based on the models performance.
![reinforcement learning](https://cdn-images-1.medium.com/max/1600/1*vz3AN1mBUR2cr_jEG8s7Mg.png)


### ML categorization (based on output)

In addition to classification based on the learning, machine learning models can be divided into three categories based on the desired output.

![models](https://www.researchgate.net/profile/Frank_Nielsen2/publication/314626729/figure/fig1/AS:810830673244160@1570328505835/The-three-pillars-of-learning-in-data-science-clustering-flat-or-hierarchical.ppm)


- **Classification**. These types of models have the inputs that are divided into two or more classes and the training end goal is to produce a model that assigns unseen inputs to one or more of these classes. It is usually a supervised learning technique.


- **Regression**. These types of models works with continuous outputs rather than discrete. One of such model examples could be a simple linear regression model that tries predict the housing price based on the floor area.


- **Clustering**. The goal is to divide the set of inputs into groups even though the groups are unknown at the start of the learning