This is code to create a model to classify different flower species of the Iris Flower

General worlflow:
- Import relevant libraries/modules
- Import data
- Transform data:
    - train/test split
- Run model
- Test model

In [11]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

---

In [2]:
data = pd.read_csv("iris.csv")

In [3]:
print(data.head())
print(data.info())
print(data.describe())

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
species         150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None
       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.054000      3.758667     1.198667
std        0.828066     0.433594      1.764420     0.763161
min        4.300000     2.00000

In [4]:
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


---

The dependent variable to be predicted in this dataset is categorical, this calls for the use of a Logistic regression.

In [5]:
# label encode the target variable (not necessary)
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
data.species = le.fit_transform(data.species)

# create  a dictionary to reference the label encoded variables later
mapping = dict(zip(list(le.classes_),range(len(le.classes_))))

mapping

{'setosa': 0, 'versicolor': 1, 'virginica': 2}

Separate the data into train and test sets

In [17]:
# Scikit-learn automatically encodes the target labels if they are strings, therefore we do not need to label encode
# Split the dependent and independent varibles
x = data.iloc[:,:-1]
y = data.iloc[:,-1]

In [18]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

In [19]:
model = LogisticRegression(solver='lbfgs', multi_class='auto')
model.fit(x_train, y_train)
# the solver and multi_class arguments are added just to prevent future warning errors

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [20]:
# curious about the model coefficients?
print(model.coef_)

[[-0.39673082  0.95339799 -2.37667929 -1.01215199]
 [ 0.51231342 -0.24809    -0.21410119 -0.76297273]
 [-0.1155826  -0.705308    2.59078048  1.77512472]]


In [9]:
predictions = model.predict(x_test)

In [13]:
# check accuracy of the model
print(classification_report(y_test,predictions))
print('The accuracy score of the model is: ',accuracy_score(y_test, predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

The accuracy score of the model is:  1.0


The model is too accurate, possible overfitting?