<a href="https://colab.research.google.com/github/Yolantele/ML-data-clasifier/blob/master/NL_SpaCy_ML_Classifier_for_Waste_Data_Augmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Dutch Waste Data Classification and Augmentation** 

Performs text analysis operations with spaCy and  builds machine learning model with scikit-learn


scikit-learn :
https://scikit-learn.org/stable/

spaCy Language Models:
https://spacy.io/usage/models

scikit-learn + Spacy : 
https://www.dataquest.io/blog/tutorial-text-classification-in-python-using-spacy/

Eural Code reference: 
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:02000D0532-20150601


In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
# mount data from drive
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [None]:
!pip install -U spacy
!pip install pandas
!python -m spacy download nl_core_news_md

## **Loading Languages model and Data Frames**

In [55]:
import spacy
import pandas as pd
from spacy.lang.nl import Dutch
import nl_core_news_md
spacy.prefer_gpu()

nlp = nl_core_news_md.load #medium size lang model (smae results as large size lanf model)

In [185]:
path = '/content/drive/My Drive/data/'

# train data frames:
all_data = pd.read_csv(path + '/nlData.csv')


# test data frames:
materials_test = pd.read_csv(path + '/nlWithoutMaterialData.csv')

# set data frame: 
df = all_data


df.shape
df.info()
df.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5848 entries, 0 to 5847
Data columns (total 27 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   reason            45 non-null     object 
 1   origin            130 non-null    object 
 2   color             61 non-null     object 
 3   state             1098 non-null   object 
 4   size              190 non-null    object 
 5   consistency       5390 non-null   object 
 6   otherCode         973 non-null    object 
 7   material4         28 non-null     object 
 8   material3         490 non-null    object 
 9   material2         986 non-null    object 
 10  material          3696 non-null   object 
 11  mType             946 non-null    object 
 12  composite2        56 non-null     object 
 13  composite1        2793 non-null   object 
 14  cType             354 non-null    object 
 15  indirectProduct   3683 non-null   object 
 16  directProduct     923 non-null    object 


Unnamed: 0,reason,origin,color,state,size,consistency,otherCode,material4,material3,material2,material,mType,composite2,composite1,cType,indirectProduct,directProduct,pType,mixedOrPure,cleanOrDirty,euralDescription,euralCode,description,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26
0,,,,,,,,,,,,,,,,,,,,,slib van wassen en schoonmaken,20101,SLIB VAN WASSEN EN SCHOONMAKEN,,,,
1,afgekeurd,,,,,vast,,,,,organisch materiaal,,,GFT,,GFT,,,gemengd,,afval van dierlijke weefsels,20102,GFT Afgekeurd,,,,
2,,,,,,vast,categorie 3,,,,organisch materiaal,,,GFT,,GFT,,,puur,,afval van dierlijke weefsels,20102,GFT Categorie 3,,,,
3,,,,,,,2,,,,,,,,,,,,,,afval van plantaardige weefsels,20103,2,,,,
4,,,,,,vast,200,,,,hout,stobben,,,,hout,,,puur,,afval van plantaardige weefsels,20103,200 Boomstobben,,,,


## **Tokening the Data With spaCy**

creating a custom tokenizer function using spaCy to automatically strip unnecesarry information like stopwords and punctuation

In [186]:
from spacy.lang.nl.stop_words import STOP_WORDS


parser = Dutch()
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
stopwords = STOP_WORDS
# stopwords = [] # in some cases stopwords help (increase accuracy by 1%)

def spacy_tokenize(sentence):
    # Creating our token object, which is used to create documents with linguistic annotations.
    sentence = sentence.strip().lower()
    mytokens = parser(sentence)

    # Lemmatizing each token and converting each token into lowercase
    mytokens = [ word.lemma_.lower().strip() if word.lemma_ != "-PRON-" else word.lower_ for word in mytokens  ]

    # Removing stop words
    mytokens = [ word for word in mytokens if word not in punctuations and word not in stopwords]

    # return preprocessed list of tokens
    return mytokens


###**Vectorization Feature Engineering, TF-IDF, Bag of Words and N-grams**

Classifying text we end up with text snippets with their respective labels. But in machine learning model we need to convert into numeric representation (vector coordinates)

- **TF-IDF -Term Frequency-Inverse Document Frequency**- simply a way of normalizing our Bag of Words(BoW) by looking at each word’s frequency in comparison to the document frequency.

- **N-grams** - combinations of adjacent words in a given text. For example "who will win"
 1. when n = 1, becomes "who", "will", "win",
 2. when n = 2 , becomes "who will", "will win" etc. 

In [187]:
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
from sklearn.base import TransformerMixin

bow_vector = CountVectorizer(tokenizer=spacy_tokenize, ngram_range=(1,1))

tfidf_vector = TfidfVectorizer(tokenizer=spacy_tokenize)

# print(bow_vector)
# print(tfidf_vector)

## **Splitting The Data into Training and Validation Sets**


### material classiciation

In [108]:
from sklearn.model_selection import train_test_split

df = df[df['material'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['material'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

1                                           GFT Afgekeurd
2                                         GFT Categorie 3
4                                         200 Boomstobben
5                                                BERMGRAS
6                          BLAUWMAANZAAD TER VERNIETIGING
                              ...                        
5838                                          fruitresten
5839                             gft vloeibare reststrome
5840            consumptie ongeschikt mat. (palmvetzuren)
5842                                              zetmeel
5845    voedings-en genotmiddelen ongeschikt voor cons...
Name: description, Length: 3696, dtype: object


### mixedOrPure classification

In [120]:
from sklearn.model_selection import train_test_split


df = df[df['mixedOrPure'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['mixedOrPure'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

1                                   GFT Afgekeurd
2                                 GFT Categorie 3
4                                 200 Boomstobben
5                                        BERMGRAS
6                  BLAUWMAANZAAD TER VERNIETIGING
                          ...                    
5841    sediment uit plantaardige olie & bijprod.
5842                                      zetmeel
5843                             beddingmateriaal
5844                                bedding afval
5846           000 rum ongeschikt voor consumptie
Name: description, Length: 4796, dtype: object


### consistency classification

In [128]:
from sklearn.model_selection import train_test_split


df = df[df['consistency'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['consistency'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

1                                   GFT Afgekeurd
2                                 GFT Categorie 3
4                                 200 Boomstobben
5                                        BERMGRAS
6                  BLAUWMAANZAAD TER VERNIETIGING
                          ...                    
5841    sediment uit plantaardige olie & bijprod.
5842                                      zetmeel
5843                             beddingmateriaal
5844                                bedding afval
5846           000 rum ongeschikt voor consumptie
Name: description, Length: 5390, dtype: object


### cleanOrDirty classification

In [136]:
from sklearn.model_selection import train_test_split


df = df[df['cleanOrDirty'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['cleanOrDirty'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

11                 Gras vervuild met Japanse Duizendknoop
12                    Groenafval met Japanse duizendknoop
13                Groenafval vervuild met invasieve exoot
14                  Rozenafval verontreinigd met steenwol
41                           431 Organisch afval vervuild
                              ...                        
5805                          grond met zuur (drugsafval)
5806                                   verontreinigd zand
5816                          selectief verwijderd asbest
5817    metaalafval dat met gevaarlijke stoffen is ver...
5818                            metaalafval verontreinigd
Name: description, Length: 971, dtype: object


### directProduct classification

In [143]:
from sklearn.model_selection import train_test_split


df = df[df['directProduct'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['directProduct'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

46                                  Digestaat plantaardig
47                                 VOT/VGW-Monsanto/zaden
48                           VOT/Zaden/Mil.Expr.-Monsanto
58                                 PVC - buizen/profielen
75                                   Bestrijdingsmiddelen
                              ...                        
5843                                     beddingmateriaal
5844                                        bedding afval
5845    voedings-en genotmiddelen ongeschikt voor cons...
5846                   000 rum ongeschikt voor consumptie
5847                         afval van dierlijke weefsels
Name: description, Length: 923, dtype: object


### indirectProduct Classification

In [153]:
from sklearn.model_selection import train_test_split


df = df[df['indirectProduct'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['indirectProduct'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

1                                           GFT Afgekeurd
2                                         GFT Categorie 3
4                                         200 Boomstobben
5                                                BERMGRAS
6                          BLAUWMAANZAAD TER VERNIETIGING
                              ...                        
5843                                     beddingmateriaal
5844                                        bedding afval
5845    voedings-en genotmiddelen ongeschikt voor cons...
5846                   000 rum ongeschikt voor consumptie
5847                         afval van dierlijke weefsels
Name: description, Length: 3683, dtype: object


### cType classification

In [160]:
from sklearn.model_selection import train_test_split

df = df[df['cType'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['cType'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

119            Uitgepakte voedingsmiddelen supermarktmix
281                              Verpakte levensmiddelen
301       Consumptie ongeschikt mat. (marinades, sauzen)
331     02.03.05 - Afvalwaterslib plantaardige oorsprong
467                                           papierslib
                              ...                       
5769                                       bentonietslib
5777                                        puin gemengd
5780                      gemengd- en containerpuin < 70
5830                          onges. huishoud batterijen
5832                             fietsbatterijen gemengd
Name: description, Length: 354, dtype: object


### composite1 classification

In [167]:
from sklearn.model_selection import train_test_split

df = df[df['composite1'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['composite1'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

1                                           GFT Afgekeurd
2                                         GFT Categorie 3
11                 Gras vervuild met Japanse Duizendknoop
14                  Rozenafval verontreinigd met steenwol
16                         5430 perliet / organisch afval
                              ...                        
5834                                    wit- en bruingoed
5835                                industrieel slib (ba)
5836    voedings en genotmiddelen ongeschikt voor cons...
5839                             gft vloeibare reststrome
5841            sediment uit plantaardige olie & bijprod.
Name: description, Length: 2793, dtype: object


### state classification

In [178]:
from sklearn.model_selection import train_test_split

df = df[df['state'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['state'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)
print(ylabels)

15          5430 organisch afval ongehakse
36                      Meloenen in netjes
54      1407 Tuinbouwfolie / landbouwfolie
55                           Landbouwfolie
76              Bestrijdingsmiddelen (kvp)
                       ...                
5792             3240 puin gemengd < 70 cm
5799               verontreinigd puin fijn
5802             verontreinigde grond/puin
5812                         plastic draad
5819                   folies/kunststoffen
Name: description, Length: 1098, dtype: object
15          ongehakseld
36            in netjes
54                folie
55                folie
76      kleinverpakking
             ...       
5792               puin
5799         puin, fijn
5802               puin
5812              draad
5819              folie
Name: state, Length: 1098, dtype: object


### mType classification

In [188]:
from sklearn.model_selection import train_test_split

df = df[df['mType'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['mType'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

4                                         200 Boomstobben
5                                                BERMGRAS
6                          BLAUWMAANZAAD TER VERNIETIGING
7                                             Bloembollen
8                                  Boomstronken (stobben)
                              ...                        
5810              > 28% koper-pit teerhoudende grondkabel
5826    geslagen aluminium, oude en nieuwe aluminium p...
5837                                      plantaardig vet
5838                                          fruitresten
5840            consumptie ongeschikt mat. (palmvetzuren)
Name: description, Length: 946, dtype: object


### otherCode classification

In [62]:
from sklearn.model_selection import train_test_split

df = df[df['otherCode'].notna()]

X = df['description'] # the features we want to analyze
ylabels = df['otherCode'] # the labels, or answers, we want to test against

X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)

print(X)

2                                         GFT Categorie 3
3                                                       2
4                                         200 Boomstobben
11                 Gras vervuild met Japanse Duizendknoop
15                         5430 organisch afval ongehakse
                              ...                        
5786                                        1480 dakgrind
5789                        3242 puin mineraal vervuild <
5792                            3240 puin gemengd < 70 cm
5816                          selectief verwijderd asbest
5826    geslagen aluminium, oude en nieuwe aluminium p...
Name: description, Length: 973, dtype: object


## **Creating a Pipeline and Generating the Model**

###we’ll create a pipeline with three components: a **cleaner, a vectorizer, and a classifier**. 

- The cleaner uses our predictors class object to clean and preprocess the text. 

- The vectorizer uses countvector objects to create the bag of words matrix for our text. 

- The classifier is an object that performs the logistic regression to classify the sentiments.

In [189]:
# Custom transformer using spaCy
class predictors(TransformerMixin):
    def transform(self, X, **transform_params):
        # Cleaning Text
        return [clean_text(text) for text in X]

    def fit(self, X, y=None, **fit_params):
        return self

    def get_params(self, deep=True):
        return {}


def clean_text(text):
    # Removing spaces and converting text into lowercase
    return text.strip().lower()

In [190]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Creating Logistic Regression Classifier
classifier = LogisticRegression()

# Create pipeline using Bag of Words (BoW)
model = Pipeline([("cleaner", predictors()),
                 ('vectorizer', bow_vector),
                 ('classifier', classifier)])

# model generation
model.fit(X_train,y_train)

Pipeline(memory=None,
         steps=[('cleaner', <__main__.predictors object at 0x7f36242d9dd8>),
                ('vectorizer',
                 CountVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.int64'>, encoding='utf-8',
                                 input='content', lowercase=True, max_df=1.0,
                                 max_features=None, min_df=1,
                                 ngram_range=(1, 1), preprocessor=None,
                                 stop_words=None, strip_accents=None,
                                 t...\\b\\w\\w+\\b',
                                 tokenizer=<function spacy_tokenize at 0x7f3624a710d0>,
                                 vocabulary=None)),
                ('classifier',
                 LogisticRegression(C=1.0, class_weight=None, dual=False,
                                    fit_intercept=True, intercept_scaling=1,
             

### **Model Predictions - Classifications**

In [191]:
from sklearn import metrics

row = 3

# Predicting for a test dataset
predicted = model.predict(X_test)
n = 0
print('Using validation set:')
print('predicted is ----->', predicted[0 + n], '(descr: ', X_test.iloc[0 + n], ')')
print('predicted is ----->', predicted[1 + n], '(descr: ', X_test.iloc[1 + n], ')')
print('predicted is ----->', predicted[2 + n], '(descr: ', X_test.iloc[2 + n], ')')
print('predicted is ----->', predicted[3 + n], '(descr: ', X_test.iloc[3 + n], ')')
# print('predicted is ----->', predicted[4 + n], '(descr: ', X_test.iloc[4 + n], ')')
print('predicted is ----->', predicted[5 + n], '(descr: ', X_test.iloc[5 + n], ')')
print('predicted is ----->', predicted[6 + n], '(descr: ', X_test.iloc[6 + n], ')')
print('predicted is ----->', predicted[7 + n], '(descr: ', X_test.iloc[7 + n], ')')
print('predicted is ----->', predicted[8 + n], '(descr: ', X_test.iloc[8 + n], ')')


# predict using unclassified set
pred = model.predict(materials_test)
print('')
print('Using unclassified set:')
print('predicted is ----->', pred[0 + n], '(descr: ', materials_test.iloc[0 + n].description, ')')
print('predicted is ----->', pred[1 + n], '(descr: ', materials_test.iloc[1 + n].description, ')')
print('predicted is ----->', pred[2 + n], '(descr: ', materials_test.iloc[2 + n].description, ')')
print('predicted is ----->', pred[3 + n], '(descr: ', materials_test.iloc[3 + n].description, ')')
# print('predicted is ----->', pred[4 + n], '(descr: ', materials_test.iloc[4 + n].description, ')')
print('predicted is ----->', pred[5 + n], '(descr: ', materials_test.iloc[5 + n].description, ')')
print('predicted is ----->', pred[6 + n], '(descr: ', materials_test.iloc[6 + n].description, ')')
print('predicted is ----->', pred[7 + n], '(descr: ', materials_test.iloc[7 + n].description, ')')
print('predicted is ----->', pred[8 + n], '(descr: ', materials_test.iloc[8 + n].description, ')')


# print(model.predict(materials_test)[row])



Using validation set:
predicted is -----> B (descr:  B-Hout, J. Blokdijk en Zn )
predicted is -----> takken (descr:  Takken,Stronken,Stamhout )
predicted is -----> takken (descr:  bananen in dozen, verwerkt onder toezicht klant )
predicted is -----> B (descr:  Hout (A   B Kwaliteit) )
predicted is -----> bont (descr:  bont papier, niet route )
predicted is -----> B (descr:  I B-HOUT )
predicted is -----> B (descr:  HOUT A-KWALITEIT )
predicted is -----> non-ferro (descr:  Non-ferro metalen )

Using unclassified set:
predicted is -----> takken (descr:  SLIB VAN WASSEN EN SCHOONMAKEN )
predicted is -----> takken (descr:  2 )
predicted is -----> takken (descr:  ? )
predicted is -----> takken (descr:  Niet onder 170503 vallende grond en stenen )
predicted is -----> takken (descr:  agrochemisch afval dat gevaarlijke stoffen bevat )
predicted is -----> takken (descr:  agrochemisch afval dat gevaarlijke stoffen bevat ( )
predicted is -----> takken (descr:  niet onder 02 01 08 vallend agrochem

### **Model Accuracy Reports**

**Accuracy** refers to the percentage of the total predictions our model makes that are completely correct.

**Precision** describes the ratio of true positives to true positives plus false positives in our predictions.

**Recall** describes the ratio of true positives to true positives plus false negatives in our predictions.


In [None]:
# Model Accuracy Reports

print("Accuracy:",metrics.accuracy_score(y_test, predicted ))
print("Precision:",metrics.precision_score(y_test, predicted, average='weighted'))
print("Recall:",metrics.recall_score(y_test, predicted, average='weighted'))
print( metrics.classification_report(y_test, predicted))

### **Plotting the Classification Outcomes (description and material dependencies)**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

df = materials_test

#chosen column field option
chosen_material = 'slakken'
chosen_mixedOrPure = 'puur'
chosen_consistency = 'vast'
chosen_cleanOrDirty = 'vervuild'
chosen_indirectProduct = 'slib'

# set graph: 
outcome = chosen_indirectProduct
x_axis = 'indirectProduct'
x_data = df.indirectProduct

data = df.head(120).loc[x_data ==outcome]
sns.set_style('ticks')
fig, ax = plt.subplots()
fig.set_size_inches(1, 15)
sns.regplot(x=x_axis, y='description', data=data, ax=ax)


### **Save the Trained Model**

In [193]:
import pickle

# save the model to disk
name = 'nl_mType_classification_model.sav'
filename = path + name
pickle.dump(model, open(filename, 'wb'))
 

In [217]:
#load models
cType_name = 'nl_c_type_classification_model.sav'
cleanOrDirty_name = 'nl_clean_or_dirty_classification_model.sav'
consistency_name = 'nl_consistency_classification_model.sav'
composite1_name = 'nl_composite1_classification_model.sav'
directProduct_name = 'nl_direct_product_classification_model.sav'
indirectProduct_name = 'nl_indirect_product_classification_model.sav'
material_name = 'nl_material_classification_model.sav'
mixedOrPure_name = 'nl_mixed_or_pure_classification_model.sav'
mType_name = 'nl_mType_classification_model.sav'
otherCode_name = 'nl_other_code_classification_model.sav'
state_name = 'nl_state_classification_model.sav'

state_model = pickle.load(open(path + state_name, 'rb'))
state_result = state_model.score(X_test, y_test)

cType_model = pickle.load(open(path + cType_name, 'rb'))
cType_result = cType_model.score(X_test, y_test)

cleanOrDirty_model = pickle.load(open(path + cleanOrDirty_name, 'rb'))
cleanOrDirty_result = cleanOrDirty_model.score(X_test, y_test)

consistency_model = pickle.load(open(path + consistency_name, 'rb'))
consistency_result = consistency_model.score(X_test, y_test)

composite1_model = pickle.load(open(path + composite1_name, 'rb'))
composite1_result = composite1_model.score(X_test, y_test)

directProduct_model = pickle.load(open(path + directProduct_name, 'rb'))
directProduct_result = directProduct_model.score(X_test, y_test)

indirectProduct_model = pickle.load(open(path + indirectProduct_name, 'rb'))
indirectProduct_result = indirectProduct_model.score(X_test, y_test)

material_model = pickle.load(open(path + material_name, 'rb'))
material_result = material_model.score(X_test, y_test)

mixedOrPure_model = pickle.load(open(path + mixedOrPure_name, 'rb'))
mixedOrPure_result = mixedOrPure_model.score(X_test, y_test)

mType_model = pickle.load(open(path + mType_name, 'rb'))
mType_result = mType_model.score(X_test, y_test)

otherCode_model = pickle.load(open(path + otherCode_name, 'rb'))
otherCode_result = otherCode_model.score(X_test, y_test)


#models' classifications 
# overal_score = _model.score(X_test, y_test)
row = 7

def print_classification_results(row=0, with_accuracy=False):
  print('row', row)
  print( 'description is --->', X_test.iloc[row])
  print( 'cType ------------>', cType_model.predict(X_test)[row], '(OMS - overal model score: 0.40)' if with_accuracy else '')
  print( 'clean or dirty --->', cleanOrDirty_model.predict(X_test)[row], '(OMS: 0.79)' if with_accuracy else '')
  print( 'consistency ------>', consistency_model.predict(X_test)[row], '(OMS: 0.96)' if with_accuracy else '')
  print( 'composite 1 ------>', composite1_model.predict(X_test)[row], '(OMS: 0.78)' if with_accuracy else '')
  print( 'direct product --->', directProduct_model.predict(X_test)[row], '(OMS: 0.71)' if with_accuracy else '')
  print( 'indirect product ->', indirectProduct_model.predict(X_test)[row],'(OMS: 0.27)' if with_accuracy else '')
  print( 'material --------->', material_model.predict(X_test)[row], '(OMS: 0.85)'if with_accuracy else '')
  print( 'state ------------>', state_model.predict(X_test)[row], '(OMS: 0.82)'if with_accuracy else '')
  print( 'mixed or pure ---->', mixedOrPure_model.predict(X_test)[row],'OMS: 0.86)'if with_accuracy else '')
  print( 'm type ----------->', mType_model.predict(X_test)[row],'(OMS: 0.54)'if with_accuracy else '')
  print( 'other code ------->', otherCode_model.predict(X_test)[row], '(OMS: 0.44)'if with_accuracy else '')
  print('')


print_classification_results(row+1, with_accuracy=True)
print_classification_results(row+2)
print_classification_results(row+3)

row 8
description is ---> Non-ferro metalen
cType ------------> teerhoudend (OMS - overal model score: 0.40)
clean or dirty ---> vervuild (OMS: 0.79)
consistency ------> vast (OMS: 0.96)
composite 1 ------> metalen (OMS: 0.78)
direct product ---> 0 (OMS: 0.71)
indirect product -> non-ferro (OMS: 0.27)
material ---------> metalen (OMS: 0.85)
state ------------> puin (OMS: 0.82)
mixed or pure ----> gemengd OMS: 0.86)
m type -----------> non-ferro (OMS: 0.54)
other code -------> 9340 (OMS: 0.44)

row 9
description is ---> extr reinigbare grond (in)
cType ------------> teerhoudend 
clean or dirty ---> vervuild 
consistency ------> vast 
composite 1 ------> composiet 
direct product ---> grond 
indirect product -> composiet 
material ---------> grond 
state ------------> puin 
mixed or pure ----> gemengd 
m type -----------> extractief reinigbaar 
other code -------> niet toepasbaar 

row 10
description is ---> Klei niet toepasbaar
cType ------------> teerhoudend 
clean or dirty ---> vervui