# Best Accuracy Model for Deployment (Tugas UAS)

## Read-Me !


> Tugas UAS, Cari model terbaik dari dataset yang dipakai dalam website berikut

---


▶ Kunjungi [Website](https://muhammadkurniasani2342.pythonanywhere.com/)


**❗ Aturan**


* 👉  Ambil [data set](https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset)
* 👉  Lakukan *preproccessing*
* 👉  Lakukan *split* data latih & data tes
* 👉  Hitung *accuracy* dengan setiap model berikut
**  Gaussian Naive Bayes
**  Linier Regression
**  Random Forest
**  Decision Tree
**  KNN
* 👉  Cari model dengan *accuracy* terbaik





### Data Description


1.   Data Set Name: Dry Bean Dataset

2.  Abstract:
    * Images of 13,611 grains
    * 7 different registered dry beans were taken 
    * Taken with a high-resolution camera. 
    * A total of 16 features 
    * 12 dimensions
    * 4 shape forms

3.  Source:
    * Murat KOKLU
        * Faculty of Technology,
        * Selcuk University,
        * TURKEY.
        * ORCID : 0000-0002-2737-2360
        * mkoklu@selcuk.edu.tr

    *  Ilker Ali OZKAN
        *  Faculty of Technology,
        *  Selcuk University,
        *  TURKEY.
        *  ORCID : 0000-0002-5715-1040
        *  ilkerozkan@selcuk.edu.tr

4.  Data Type : Multivariate
5.  Task : Classification
6.  Attribute Type:
    *  Categorical
    *  Integer
    *  Real
7.  Area : CS / Engineering
8.  Format Type : Matrix
    *   Does your data set contain missing values? ***No***
    *   Number of Instances (records in your data set): ***13611***
    *   Number of Attributes (fields within each record): ***17***

9.   Relevant Information:


> Seven different types of dry beans were used in this research, taking into account the features such as form, shape, type, and structure by the market situation. A computer vision system was developed to distinguish seven different registered varieties of dry beans with similar features in order to obtain uniform seed classification. For the classification model, images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. Bean images obtained by computer vision system were subjected to segmentation and feature extraction stages, and a total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.

### Attribute Information:


1.  **Area (*A*)**: 
    *   The area of a bean zone and the number of pixels within its boundaries.
2.  **Perimeter (*P*)**: 
    *    Bean circumference is defined as the length of its border.
3.  **Major axis length (*L*)**: 
    *    The distance between the ends of the longest line that can be drawn from a bean.
4.  **Minor axis length (*l*)**: 
    *    The longest line that can be drawn from the bean while standing perpendicular to the main axis.
5.  **Aspect ratio (*K*)**: 
    *    Defines the relationship between L and l.
6.  **Eccentricity (*Ec*)**: 
    *    Eccentricity of the ellipse having the same moments as the region.
7.  **Convex area (*C*)**: 
    *    Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
8.  **Equivalent diameter (*Ed*)**: 
    *    The diameter of a circle having the same area as a bean seed area.
9.  **Extent (*Ex*)**: 
    *    The ratio of the pixels in the bounding box to the bean area.
10.  **olidity (*S*)**: 
    *    Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
11.  **Roundness (*R*)**: 
    *    Calculated with the following formula 
$ \frac{4 \pi A}{P^2} $
12.  **Compactness (C*O*)**: 
    *    Measures the roundness of an object: 
$ \frac{Ed}{L} $
13.  **hapeFactor1 (*SF1*)**
14.  **hapeFactor2 (*SF2*)**
15.  **hapeFactor3 (*SF3*)**
16.  **hapeFactor4 (*SF4*)**
17.  **Class (Seker, Barbunya, Bombay, Cali, Dermosan, Horoz and **Sira)**

### Relevant Paper


>   KOKLU, M. and OZKAN, I.A., (2020), “Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques.” Computers and Electronics in Agriculture, 174, 105507.
DOI: https://doi.org/10.1016/j.compag.2020.105507


---


>   Citation Requests / Acknowledgements:
KOKLU, M. and OZKAN, I.A., (2020), “Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques.” Computers and Electronics in Agriculture, 174, 105507.
DOI: https://doi.org/10.1016/j.compag.2020.105507

### Import Library

In [74]:
### Data Wrangling 
import pandas as pd
import numpy as np
from scipy.io import arff

### Modelling 
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

### Remove unnecessary warnings
import warnings
warnings.filterwarnings('ignore')

### Load Data

In [26]:
# Create dataset from Google Drive
dataset_url = "/content/drive/MyDrive/datamining/tugas/notebooks-assignement/asset/Dry_Bean_Dataset.arff"
data = arff.loadarff(dataset_url)

In [27]:
df = pd.DataFrame(data[0])

In [28]:
print(df.shape)

(13611, 17)


In [29]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13611 entries, 0 to 13610
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Area             13611 non-null  float64
 1   Perimeter        13611 non-null  float64
 2   MajorAxisLength  13611 non-null  float64
 3   MinorAxisLength  13611 non-null  float64
 4   AspectRation     13611 non-null  float64
 5   Eccentricity     13611 non-null  float64
 6   ConvexArea       13611 non-null  float64
 7   EquivDiameter    13611 non-null  float64
 8   Extent           13611 non-null  float64
 9   Solidity         13611 non-null  float64
 10  roundness        13611 non-null  float64
 11  Compactness      13611 non-null  float64
 12  ShapeFactor1     13611 non-null  float64
 13  ShapeFactor2     13611 non-null  float64
 14  ShapeFactor3     13611 non-null  float64
 15  ShapeFactor4     13611 non-null  float64
 16  Class            13611 non-null  object 
dtypes: float64(1

In [76]:
df.head(10)

Unnamed: 0,Area,Perimeter,MajorAxisLength,MinorAxisLength,AspectRation,Eccentricity,ConvexArea,EquivDiameter,Extent,Solidity,roundness,Compactness,ShapeFactor1,ShapeFactor2,ShapeFactor3,ShapeFactor4,Class
0,28395.0,610.291,208.178117,173.888747,1.197191,0.549812,28715.0,190.141097,0.763923,0.988856,0.958027,0.913358,0.007332,0.003147,0.834222,0.998724,b'SEKER'
1,28734.0,638.018,200.524796,182.734419,1.097356,0.411785,29172.0,191.27275,0.783968,0.984986,0.887034,0.953861,0.006979,0.003564,0.909851,0.99843,b'SEKER'
2,29380.0,624.11,212.82613,175.931143,1.209713,0.562727,29690.0,193.410904,0.778113,0.989559,0.947849,0.908774,0.007244,0.003048,0.825871,0.999066,b'SEKER'
3,30008.0,645.884,210.557999,182.516516,1.153638,0.498616,30724.0,195.467062,0.782681,0.976696,0.903936,0.928329,0.007017,0.003215,0.861794,0.994199,b'SEKER'
4,30140.0,620.134,201.847882,190.279279,1.060798,0.33368,30417.0,195.896503,0.773098,0.990893,0.984877,0.970516,0.006697,0.003665,0.9419,0.999166,b'SEKER'
5,30279.0,634.927,212.560556,181.510182,1.171067,0.520401,30600.0,196.347702,0.775688,0.98951,0.943852,0.923726,0.00702,0.003153,0.85327,0.999236,b'SEKER'
6,30477.0,670.033,211.050155,184.03905,1.146768,0.489478,30970.0,196.988633,0.762402,0.984081,0.85308,0.933374,0.006925,0.003242,0.871186,0.999049,b'SEKER'
7,30519.0,629.727,212.996755,182.737204,1.165591,0.51376,30847.0,197.12432,0.770682,0.989367,0.967109,0.92548,0.006979,0.003158,0.856514,0.998345,b'SEKER'
8,30685.0,635.681,213.534145,183.157146,1.165852,0.514081,31044.0,197.659696,0.771561,0.988436,0.95424,0.925658,0.006959,0.003152,0.856844,0.998953,b'SEKER'
9,30834.0,631.934,217.227813,180.897469,1.200834,0.553642,31120.0,198.139012,0.783683,0.99081,0.970278,0.912125,0.007045,0.003008,0.831973,0.999061,b'SEKER'


### Preproccessing Data

In [77]:
### CONSTANT
FIRST_IDX = 0

#### Split Data Class & Data Value




```
# X for values
# y for labels
```



In [80]:
X = df.drop(columns=["Class"])

In [81]:
print(X)

          Area  Perimeter  MajorAxisLength  MinorAxisLength  AspectRation  \
0      28395.0    610.291       208.178117       173.888747      1.197191   
1      28734.0    638.018       200.524796       182.734419      1.097356   
2      29380.0    624.110       212.826130       175.931143      1.209713   
3      30008.0    645.884       210.557999       182.516516      1.153638   
4      30140.0    620.134       201.847882       190.279279      1.060798   
...        ...        ...              ...              ...           ...   
13606  42097.0    759.696       288.721612       185.944705      1.552728   
13607  42101.0    757.499       281.576392       190.713136      1.476439   
13608  42139.0    759.321       281.539928       191.187979      1.472582   
13609  42147.0    763.779       283.382636       190.275731      1.489326   
13610  42159.0    772.237       295.142741       182.204716      1.619841   

       Eccentricity  ConvexArea  EquivDiameter    Extent  Solidity  roundne

In [82]:
y = df['Class'].values

***Label Set***

*   b'BOMBAY'
*   b'HOROZ'
*   b'DERMASON'
*   b'BARBUNYA'
*   b'SIRA'
*   b'SEKER'
*   b'CALI



In [84]:
labelset = set(y)

In [85]:
print(labelset)

{b'BOMBAY', b'HOROZ', b'DERMASON', b'BARBUNYA', b'SIRA', b'SEKER', b'CALI'}


In [86]:
le = preprocessing.LabelEncoder()
le.fit(y)
y = le.transform(y)

In [87]:
print(y)

[5 5 5 ... 3 3 3]


In [88]:
encoded_labelset = set(y)



*   Number ***0*** as *b'BOMBAY'*
*   Number ***1*** as *b'HOROZ'*
*   Number ***2*** as *b'DERMASON'*
*   Number ***3*** as *b'BARBUNYA'*
*   Number ***4*** as *b'SIRA'*
*   Number ***5*** as *b'SEKER'*
*   Number ***6*** as *b'CALI*



In [89]:
print(encoded_labelset)

{0, 1, 2, 3, 4, 5, 6}


#### Normalization

In [90]:
scaler = preprocessing.StandardScaler().fit(X)

In [91]:
X = scaler.transform(X)

In [92]:
print(X[FIRST_IDX])

[-0.84074853 -1.1433189  -1.30659814 -0.63115304 -1.56505251 -2.18572039
 -0.84145059 -1.0633406   0.28908744  0.36761343  1.42386707  1.8391164
  0.68078638  2.40217287  1.92572347  0.83837103]


#### Split Train Data & Test Data

In [93]:
amount_of_test_data_in_percent = 0.3

In [94]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=amount_of_test_data_in_percent, random_state=123)

In [95]:
print(y_train)

[2 0 0 ... 5 3 1]


#### Define Decision Tree Models

In [96]:
# membuat model Decision Tree
tree_model = DecisionTreeClassifier()

In [97]:
# Melatih model dengan menggunakan data latih
tree_model = tree_model.fit(X_train, y_train)

#### Model Evaluation


In [98]:
y_pred = tree_model.predict(X_test)

In [99]:
acc_secore = round(accuracy_score(y_pred, y_test), 3)

In [100]:
print('Accuracy: ', acc_secore)

Accuracy:  0.894


#### Model Predict 

**Attributes**


*   Perimeter
*   MajorAxisLength
*   MinorAxisLength
*   AspectRation
*   Eccentricity
*   ConvexArea
*   EquivDiameter
*   Extent
*   Solidity
*   roundness
*   Compactness
*   ShapeFactor1
*   ShapeFactor2
*   ShapeFactor3
*   ShapeFactor4


```
# Values
[
    -0.84074853,
    -1.1433189 ,
    -1.30659814,
    -0.63115304,
    -1.56505251,
    -2.18572039,
    -0.84145059,
    -1.0633406 ,
    0.28908744,
    0.36761343,
    1.42386707,
    1.8391164 ,
    0.68078638,
    2.40217287,
    1.92572347,
    0.83837103
]
```



In [101]:
# prediksi model dengan tree_model.predict([[SepalLength, SepalWidth, PetalLength, PetalWidth]])
print(tree_model.predict([[
        -0.84074853, -1.1433189, -1.30659814, -0.63115304, -1.56505251,
        -2.18572039, -0.84145059, -1.0633406,  0.28908744,  0.36761343,
        1.42386707,  1.8391164 ,  0.68078638,  2.40217287,  1.92572347, 
        0.83837103]]
))

[5]
