# A Quick Machine Learning Modelling Tutorial with Python and Scikit-Learn

#### Resources
- https://github.com/mrdbourke/zero-to-mastery-ml/blob/81492352d12d7a52caef57bba7744cbdc34af33f/section-2-data-science-and-ml-tools/introduction-to-scikit-learn.ipynb

## Overview
[Scikit-Learn](https://scikit-learn.org) (`sklearn`) is an open-sourced, Python, ML library built on NumPy and Matplotlib.
- provides many utilities for common ML activities

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split

### End-to-end Scikit-learn Workflow
> **note**: this notebook is focused on supervised learning

 1. Prepare data (cleaning, split into features & labels, split into training and testing, etc.)
 2. Choose the right model (linear regression, k-means, classification, etc.)
 3. Fit the model to the data and use it to make predictions
 4. Evaluate the model (and iterate!)
 5. Prepare for deployment & sharing

## 1. Prepare Data

The main data transformation actions you'll have to take are:
- splitting data columns into features & labesl (often labelled `X` and `Y`)
- splitting the data records into test, validation, and training subsets
- filling (aka imputing) or dropping missing values
- converting non-numerical data into a numerical format (**feature-encoding**)

In [15]:
heart_disease = pd.read_csv("https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/data/heart-disease.csv") # load data directly from URL (requires raw form on GitHub, source: https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/data/heart-disease.csv)
X = heart_disease.drop('target', axis=1)
Y = heart_disease['target']

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)