### Target:
 - Skills:
  - Regression
  - Classification
  - Clustering
  - Scikit Learn
  - Scipy
  - Supervised Learning and Unsupervised Learning
 
 
Machine Learning is the subfield of computer science that gives ?computers the ability to learn without being explicitly programmed." - Arthur Samuel

Major machine learning techniques:
 - Regression/Estimation: Predicting continuous values
     -> Ex: * Predicting things like price of a house base on its characteristics
            * Estimate CO2 emssion from a car's engine
 - Classification: Predicting the item class/category of a case
     -> Ex: * If a cell is benign or malignant, or whether or not a customer will churn
 - Clustering: Finding the structure of data; summarization
     -> Ex: * Clustering groups of similar cases like find similar patients
            * Can be used for customer segmentation in the banking field
 - Associations: Associating frequent co-occuring items/events
     -> Ex: * Grocery items that are usually bought together by a particular customer
 - Anomaly detection: Discovering abnormal and unusual cases
     -> Ex: * Used for credit card fraud detection
 - Sequence mining: Predicting next events; click-stream in website (Markov Model, HMM)
 - Dimension Reduction: Reducing the size of data (PCA)
 - Recommendation systems: Recommending items
     -> Ex: Associates peooples' preferences with others who have similar tastes, and recommends new items to them, such as books or movies


Difference between artificial intelligence, machine learning, and deep learning
 - AI Components: Computer Vision, Language Processing, Creativity, Summarization, etc. -> make computers intelligent in order to mimic the cognitive functions of human
 - Machine Learning: Classification, Clustering, Neural Network, etc. -> branch of AI that covers the statistical part of artificial intelligent, teaches the computer to soolve problems by looking at hundreds or thousands of examples, learning from them, and then using that experience to solve the same problem in new situations
 - Revolution in ML: Deep Learning -> computer can actually learn and make intelligent decisions on their own, Deep learning involves a deeper level of automation in comparison with most machine learning algorthims
 
#### Review:
 - Supervised Learning: 
     - Is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. 
     - It infers a function from labeled training data consiting of a set of training examples, can be used for mapping new examples
     - Most widely used learning algorithms:
         - Support Vector Machines (SVM) for Classification
         - Linear Regression
         - Logistic Regression
         - Naive Bayes
         - Linear Discriminant Analysis
         - Decision Trees
         - K-nearest neighbor
         - Neural Networks (Multilayer perceptron)
     
 - Unsupervised Learning:
     - Is where only have input data X and no corresonding output variables
     - The goal is to model the underlying structure or distribution in the data in order to learn more about the data
     - Most Widely used learning algorithms:
         - Clustering
         - Association
         - K-means
         - Apriori
         - Dimension reduction
         - Density Estimation
         - Market basket analysis
         
![picture_alt](https://image.ibb.co/f4xJ49/image.png)
         
 - Semi-Supervised Learning:
     - Is where having input data X and some of the data is labeled Y
     - Problems sit between both supervised and unsupervised learning
     - Can use supervised learning techniques to make best predictions for the unlabled data, feed that data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data
     - Can use unsupervised learning techniques to discover and learn the structure in the input variables

#### Python libraries for machine learning
 - NumPy: a math library to work with N-Dimensional arrays, enable to do computation efficiently and effectively (Ex usage: arrays, dictionaries, functions, datatypes)
 - SciPy: a collection of numerical algorithms and domain specific toolboxes, including signal processing, optimization, statistics, etc., high performance computation
 - Matplotlib: provides 2D plotting, awa 3D plotting
 - Pandas: high level library provides high performance, easy to use data structures, has many functions for data importing, manipulation and analysis, offers data structures and operations for manipulating numerical tables and timeseries
 - __Scikit Learn__: a collection of algorithm and tools for machine learning, has most with the Classification, Regression and Clustering algorithms, designed to work with Python numerical and scientific libraries: NumPy and SciPy, includes very good documentation, less Python code. Most of the tasks need to be done in a machine learning pipeline are implemented already in Scikit Learn including pre-processing of data, feature selection, feature extraction, train test splitting, defining the algorithms, fitting models, tuning parameters, prediction, evaluation and exporting the model
![picture_alt](https://image.ibb.co/fntDMp/image.png)


![picture_alt]()
![picture_alt]()
![picture_alt]()

### Linear Regression

 - Simple Linear Regression
 - Multiple Linear Regression

#### Review Regression Metrics
 - Mean Squared Error (MSE)
 - Root Mean Squared Error (RMSE)
 - Mean Absolute Error (MAE)
 - R Square ($R^2$)
 - Mean Square Percentage Error (MSPE)
 - Mean Absolute Percentage Error (MAPE)
 - Root Mean Squared Logarithmic Error (RMSLE)
 
###### Mean Squared Error (MSE): 
\begin{equation*}
MSE = \frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2  
\end{equation*}
As $y_i$ is the actual expected output and $\hat{y}_i$ is the models prediction
 - Most simple and common metric for regression evaluation, least useful
 - Basically measures average squared error of our predictions. For each point, it calculates square difference between the predictions and the target and then average those values
 - The higher this value, the worse the model is. It would be zero for a perfect model
 - Pros: 
     - Useful if we have unexpected values that we should care about
     - Very high or low value that we should pay attention
 - Cons:
     - If we make a single very bad prediction, the squaring will make the error even worse and it may skew the metric towards overestimating the model's badness. That is a particularly problematic behaviour if we have noisy data (that is, data that for whatever reason is not entirely reliable)
     - Even a “perfect” model may have a high MSE in that situation, so it becomes hard to judge how well the model is performing. On the other hand, if all the errors are small, or rather, smaller than 1, than the opposite effect is felt: we may underestimate the model’s badness.
 - Note: if we want to have a constant prediction the best one will be the __mean value of the target values__. It can be found by setting the derivative of our total error with respect to that constant to zero, and find it from this equation.

###### Root Mean Squared Error (RMSE)
\begin{equation*}
RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2} = \sqrt{MSE} 
\end{equation*}
 - The square root is introduced to make scale of the errors to be the same as the scale of the targets
 - $MSE$ and $RMSE$ are similar in terms of their minimizers, every minimizer of MSE is also a minimizer for $RMSE$ and vice versa since the square root is an non-decreasing function. For example, if we have two sets of predictions, $A$ and $B$, and say $MSE$ of $A$ is greater than MSE of $B$, then we can be sure that $RMSE$ of $A$ is greater $RMSE$ of $B$. And it also works in the opposite direction.
 - 
 
###### Mean Absolute Error (MAE)
###### R Square (R^2)
###### Mean Square Percentage Error (MSPE)
###### Mean Absolute Percentage Error (MAPE)
###### Root Mean Squared Logarithmic Error (RMSLE)

 

### Logistic Regression
 - __Logistic Regression__ is a classification algorithm for categorical variables
![picture_alt](https://image.ibb.co/e5xsBz/image.png)
 - Logistic regression is analogous to linear regression, but tries to predict a categorical or discrete target field instead of a numeric one, that mean predicting binary variable like Yes/No, True/False, or anythings can be coded as 1 or 0
 - Logistic regression dependent variables should be continuous
 - If categorical, they should be dummy or indicator coded, means that we have to transform them to some continuous value
 - Common applications:
     - Predicting the probability of a person having a heart attack within a specified time period based on our knowledge of the person's age, sex and body mass index
     - Predicting the chance of mortality and and an injured patient or to predict whether a patient has a given disease such as diabetes based on observed chaaracteristics of that patient such as weight, height, blood pressure and results of various blood tests and so on
     - Predicting a customer's propensity to purchase a product or halt a subscription
     - Predicting the probability of failure of a given process, system or product
     - Predicting the likelihood of a homeowner defaulting on a mortgage

#### When should we use logistic regression?
![picture_alt](https://image.ibb.co/e7BG4K/image.png)
 - If your data is binary: $0/1$, $Yes/No$, $True/False$, $+/-$
 - If you need probabilistic results. Logistic Regression returns a probability score between zero and one for a given sample of data, and we map the cases to a discrete class based on that probability
 - If your data is linearly separable or when you need a linear dicision boundary. The decisioon boundary of logistic regression is a line or a plane or a hyper plane. A classifier will classify all the points on one side of the decision boundary as beloging to one class, and all those on the other side as belonging to the other class. Eg. if we have just two features and they're not applying any polynomial processing we can obtain an inequality like $\theta_0 + \theta_1x_1 + \theta_2x_2 > 0$, which is a half-plane  easily plotable. _Note:_ in using logistic regression, we can also achieve a complex decision boundary using polynomial processing as well, you'get more insight decision boundaries when you understand how logistic regression works
 - If you need to understand the impact of a feature. You can select the best features based on the statistical significance of the logistic regression model coefficients or parameters. That is, after finding the optimum parameters, a feature $x$ with the weight $\theta_1$ close to $0$ has a smaller effect on the prediction than features with large absolute values of $\theta_1$. Indeed, it allows us to understand the impact an independent variable has on the dependt variable while controlling other independent variables.
![picture_alt](https://image.ibb.co/jwVb4K/image.png)

#### Logistic regression vs. Linear Regression
Basis for comparison|Linear Regression|Logistic Regression
--------------------|-----------------|-------------------
Basic|The data is modelled using a straight line|The probability of some obtained event is represented as a linear function of a combination of predictor variables
Linear relationship between dependent and independent variables|Is required|Not required
The independent variable|Could be correlated with each other (Specially in multiple linear regression)|Should not be correlated with each other (no multicollinearity exist)

