# Introduction to Python for Machine Learning  

**Setup: click the 'Kernels' or 'Detecting Kernels' button at the top right of this notebook. Click 'Python Environments' then select our 'storehouse' environment. You will need to do this with Jupyter Notebook you open throughout this course.**
  
This notebook provides a basic introduction to Python for Machine Learning. We will cover the following topics:  
  
1. Importing libraries  
2. Loading a dataset  
3. Exploratory data analysis  
4. Data preprocessing  
5. Training a simple machine learning model  

Note: Don't worry too much if you don't fully understand what's happening in the cells below. We will explore these concepts deeper in future modules. For now, use this as brief hands on introduction to the high level concepts of the Machine Learning process. 

In [None]:
# Import python libraries to be used in this notebook

import pandas as pd  
import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.neighbors import KNeighborsClassifier  
from sklearn.metrics import accuracy_score  

## Loading a dataset  
  
We will use the Iris dataset, which is available in the `sklearn.datasets` module.  

In [None]:
from sklearn.datasets import load_iris  
iris = load_iris()  
# Save the dataset features to the variable 'X'
X = iris.data  
# Save the target outcome to the target variable 'y'
y = iris.target  

## Exploratory Data Analysis  
  
Let's take a look at the dataset using Pandas.  

In [None]:
# Load the data as a pandas Dataframe
df = pd.DataFrame(X, columns=iris.feature_names)  
# Add the target variable as the last column
df['species'] = y  
df.head()  
 

## Data Preprocessing  
  
We will split the dataset into training and testing sets.  

In [None]:
# This will create a feature set for training and testing, and a corresponding target set for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)  

## Training a Simple Machine Learning Model  
  
We will train a simple K-Nearest Neighbors classifier on the dataset.  
For more information on the K-Nearest algorithm, view [this page](https://scikit-learn.org/stable/modules/neighbors.html#classification). 

In [None]:
model = KNeighborsClassifier(n_neighbors=3)  
model.fit(X_train, y_train)  

## Evaluating the Model  
  
We will evaluate the model using the accuracy score.  

In [None]:
# Get predictions on the feature test set
y_pred = model.predict(X_test)  
# Calculate the accuracy of what the target should have been (y_test) compared to the predicted targets (y_pred)
accuracy = accuracy_score(y_test, y_pred)  
print(f"Accuracy: {np.round(accuracy * 100, 2)}%")  