# Multinominal logistic regression  
For supervised classification of 3 classes based on 4 properties

## Data set
[UCI Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris)

Contains 3 classes of Iris flowers with data:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      - Iris Setosa
      - Iris Versicolour
      - Iris Virginica

Example data:  
```
5.1,3.5,1.4,0.2,Iris-setosa  
4.9,3.0,1.4,0.2,Iris-setosa  
4.7,3.2,1.3,0.2,Iris-setosa  
6.0,2.2,4.0,1.0,Iris-versicolor  
6.1,2.9,4.7,1.4,Iris-versicolor  
5.6,2.9,3.6,1.3,Iris-versicolor  
6.7,3.1,4.4,1.4,Iris-versicolor  
6.4,2.7,5.3,1.9,Iris-virginica  
6.8,3.0,5.5,2.1,Iris-virginica  
5.7,2.5,5.0,2.0,Iris-virginica  
5.8,2.8,5.1,2.4,Iris-virginica  
```

## For reading file containing comma separated values
- Custom readed can be created
- Pandas package can be installed, imported and used

### Installing Pandas

In [3]:
import sys
print(sys.executable)

c:\python\python37\python.exe


In [4]:
!{sys.executable} -m pip install pandas

Collecting pandas
  Using cached pandas-1.0.1-cp37-cp37m-win_amd64.whl (9.0 MB)
Collecting pytz>=2017.2
  Using cached pytz-2019.3-py2.py3-none-any.whl (509 kB)
Installing collected packages: pytz, pandas
Successfully installed pandas-1.0.1 pytz-2019.3


## Installing required modules

In [5]:
import numpy as np
import pandas as pd

from sklearn import linear_model
from sklearn import metrics
from sklearn.model_selection import train_test_split

## Reading and formatting data

In [None]:
col_names = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']

# !!!!!!!!!! set data file path!
iris = pd.read_csv("..\Data\iris.data", header=None, names=col_names)

## Create train and test data sets

In [None]:
attribute_cols = ['sepal length', 'sepal width', 'petal length', 'petal width']
X = iris[attribute_cols]
y = iris.label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

## Create and train the classifier

In [None]:
classifier = linear_model.LogisticRegression(solver='lbfgs', multi_class='multinomial')
classifier.fit(X, y)
print("Learning accuracy: {0:.2f}%".format(classifier.score(X, y) * 100))

## Create a prediction and evalueate classifier

In [10]:
y_pred = classifier.predict(X_test)
accuracy = metrics.accuracy_score(y_test, y_pred)
print("Classification accuracy: {0:.2f}%".format(accuracy * 100))

Learning accuracy: 97.33%
Classification accuracy: 96.67%
