# Introduction to Scikit-Learn: Machine Learning with Python
This session will cover the basics of Scikit-Learn, a popular package containing a collection of tools for machine learning written in Python. See more at http://scikit-learn.org.

# 1. Introduction

<img style="float: right;" src="Images/1.png" width="100">

- Have you ever wondered how Amazon/Jumia suggest items for you to buy?  
 
- How Gmail filters your emails in the spam and non-spam categories?  
 
- How Netflix predicts the shows of your liking?  

- How predictive text works on your phone works?  (Let’s try this now)  

- Or how your phone recognises your face or voice (Siri, Alexa, Cortana)?    





# 2. What is Machine Learning? 


Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to “self-learn” from training data and improve over time, without being explicitly programmed. 

Machine learning algorithms are able to detect patterns in data and learn from them, in order to make their own predictions.


While artificial intelligence (AI) and machine learning (ML) are often used interchangeably, they are two different concepts. AI is the broader concept – machines making decisions, learning new skills, and solving problems in a similar way to humans – whereas machine learning is a subset of AI that enables intelligent systems to autonomously learn new things from data. 


# 3. Applications of Machine Learning


<img style="float: centre;" src="Images/8.png" width="750">





# 4. Types of Machine Learning

### <u> Supervised Learning  </u> 


In supervised learning, the algorithm learns on the labeled dataset, where the response is known. This acts as a ‘supervisor’ to train the model that provides an answer key that the algorithm can use to evaluate its accuracy on training data. This is used to predict the values for future or unseen data. This is when you know exactly what you want to predict — the target or dependent variable, and have a set of independent variables that you want to better understand in terms of their influence on the target variable. This is a task-driven technique.

<img style="float: centre;" src="Images/3.png" width="450">


##### Classification:    
A classification problem is when the output variable is a category, such as “red” or “blue” or “yes” and “no”.

- Logistic Regression (Yes, I know its a confusing name)
- Decision Tree  
- Random Forest  
- K-Nearest Neighbors  
- Naive Bayes  
- Support Vector Machines  
- ANN  
- Ensemble Techniques (Bagging, Random Forest, Adaboost, GBM, XGBoost) 

##### Regression: 
A regression problem is when the output variable is a real value, such as “cedis” or “weight”.

- Simple Linear Regression 
- Multiple linear regression
- Lasso Regression
- Bayesian Linear Regression



### <u> Unsupervised Learning </u>

models are not supervised using training dataset. Instead         models itself find hidden patterns and insights from given data


<img style="float: centre;" src="Images/4.png" width="450">   

##### Clustering:   
A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.

- K-means clustering  
- Hierarchical clustering  
- Fuzzy clustering  


##### Association:   
An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

- Apriori  
- Eclat



### <u> Semi-Supervised Learning </u>

Machine learning technique that combines supervised and unsupervised learning  
- It uses a small amount of labelled data and a large amount of unlabelled data during training


<img style="float: centre;" src="Images/5.png" width="450">

### <u> Reinforcement Learning </u>


Machine learning technique that uses reward and punishment mechanism
- Positive values are assigned to desired actions to encourage the agent and negative values are assigned to undesirable actions
- This method trains the agent to seek maximum overall reward to achieve optimal solution

<img style="float: centre;" src="Images/6.png" width="450">

# 5. Data Types

Machine learning is about creating models from data: for that reason, we'll start by discussing how data can be represented in order to be understood by the computer. Data can come in many forms, but machine learning models rely on four primary data types.

<img style="float: centre;" src="Images/7.png" width="650">

https://mldoodles.com/statistical-data-types-used-in-machine-learning/

# 6. Machine Learning Lifecycle 


<img style="float: left;" src="Images/2.png" width="350">

<div style="text-align: justify"> 
    
    - Data Gathering
    
    - Data Preparation
    
    - Data Wrangling
    
    - Data Analysis
    
    - Train Model
    
    - Test Model
    
    - Deployment
</div>


# 7. Scikit-Learn

Open source machine learning libraries offer collections of pre-made models and components that developers can use to build their own applications, instead of having to code from scratch. They are free, flexible, and can be customized to meet specific needs.

Some of the most popular open-source libraries for machine learning include:

Scikit-learn  
PyTorch   
TensorFlow
Keras (runs on TF)
NLTK  

### Scikit-learn  

Scikit-learn is a popular Python library and a great option for those who are just starting out with machine learning. It is built on Built on NumPy, SciPy, and Matplotlib. The sklearn library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction. Find documentation at http://scikit-learn.org. 

In [7]:
#To Check the libraries installed
!pip list

Package                       Version
----------------------------- --------------------
alabaster                     0.7.12
anaconda-client               1.11.0
anaconda-navigator            2.3.1
anaconda-project              0.11.1
anyio                         3.5.0
appdirs                       1.4.4
applaunchservices             0.3.0
appnope                       0.1.2
appscript                     1.1.2
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
arrow                         1.2.2
astroid                       2.11.7
astropy                       5.1
atomicwrites                  1.4.0
attrs                         21.4.0
Automat                       20.2.0
autopep8                      1.6.0
Babel                         2.9.1
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile            1.0
backports.weakref             1.0.post1
bcrypt                        3.2.0
beautifulsoup4                4.11.1
bi

pyzmq                         23.2.0
QDarkStyle                    3.0.2
qstylizer                     0.1.10
QtAwesome                     1.0.3
qtconsole                     5.3.2
QtPy                          2.2.0
queuelib                      1.5.0
regex                         2022.7.9
requests                      2.28.1
requests-file                 1.5.1
rope                          0.22.0
Rtree                         0.9.7
ruamel-yaml-conda             0.15.100
s3transfer                    0.6.0
scikit-image                  0.19.2
scikit-learn                  1.0.2
scikit-learn-intelex          2021.20221004.121333
scipy                         1.9.1
Scrapy                        2.6.2
seaborn                       0.11.2
Send2Trash                    1.8.0
service-identity              18.1.0
setuptools                    63.4.1
sip                           6.6.2
six                           1.16.0
smart-open                    5.2.1
sniffio                       1.2.

In [1]:
#Sample Datasets on Sklearn

from sklearn.datasets import make_classification
import pandas as pd
X, y = make_classification(n_samples=10, n_features=4, n_classes=2, random_state=123)