<a href="https://colab.research.google.com/github/Apoak/Deep-Learning-Projects/blob/main/Binary_Linear_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Lab 1.2: Binary Linear Classifier

In this lab you will try making a binary linear classifier using the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/).

You will need to install the packages ``sklearn``, ``palmerpenguins``, and ``mlxtend``.  In the following code block, the ``!`` indicates a shell command.

In [None]:
!pip install scikit-learn palmerpenguins mlxtend

In [None]:
import sklearn
from palmerpenguins import load_penguins
from mlxtend.plotting import plot_decision_regions

The dataset is loaded as a [Pandas dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).  

In [None]:
df = load_penguins()
df.head()

For simplicity we will drop any rows with missing values (encoded as NaNs).

In [None]:
df.dropna(inplace=True)
df.head()

Let's select just the Adelie and Chinstrap penguins.

In [None]:
df = df[(df['species']=='Adelie')|(df['species']=='Chinstrap')]

Now we will grab the flipper length and bill length to be the features (stored in ``X``) and the species as the labels (stored in ``y``).

In [None]:
X = df[['flipper_length_mm','bill_length_mm']].values
y = df['species'].map({'Adelie':0,'Chinstrap':1}).values

## Exercises

1. Fit a binary linear classifier using scikit-learn (see ``sklearn.linear_model.LogisticRegression``).

Plot the resulting classifier using ``plot_decision_regions(X, y, clf=model)``.


In [None]:
model = sklearn.linear_model.LogisticRegression()
model.fit(X,y)
plot_decision_regions(X, y, clf=model)

2. Print out the coefficients of the line (``model.coef_``).  Interpret these values (in terms of the direction of the line and also what they tell us about how the classifier operates).

In [None]:
model.coef_
# outpu: array([[-0.15845623,  1.17936568]])
# 1.17936568 is a number with more weight so this indicates that the y value (bill length) is of more importance for determining classification

3. Calculate and print out the accuracy of the classifier using the `.score` function.  Interpret this value.

In [None]:
model.score(X, y)
# 0.9579439252336449
# The score is how accurate the line separating the data is. It isn't 100% because there are some points spilling over the line into either side of the classification.

4. Try different combinations of features and print out the accuracy for each one.  Interpret your results.

In [None]:
Xx = df[['flipper_length_mm','bill_depth_mm']].values
yy = df['species'].map({'Adelie':0,'Chinstrap':1}).values

model.fit(Xx,yy)
plot_decision_regions(Xx, yy, clf=model)
model.coef_
# array([[ 0.14777008, -0.28166971]])
model.score(Xx, yy)
# 0.7383177570093458





In [None]:
Xxx = df[['body_mass_g', 'flipper_length_mm']].values
yyy = df['species'].map({'Adelie':0,'Chinstrap':1}).values

model.fit(Xxx,yyy)
plot_decision_regions(Xxx, yyy, clf=model)
model.coef_
# array([[ 0.14777008, -0.28166971]])
model.score(Xxx, yyy)
# 0.7383177570093458

The combination of flipper length and bill depth proved to be a less accurate method of classifying data. Flipper length and body_mass was also less accurate.