# Project Details

**Batch**                -  May 2019 Batch (Group 10 A)  
**Project Type**         -  Capstone project  
**Project Domain**       -  NLP  
**Project Name**         -  Automatic Ticket Assignment  
**Submission Date**      -  17-May-2020      
**Submitted By**         -  Group10A  
**Delivery Type**        -  Milestone 2,3

## The Real Problem
One of the key activities of any IT function is to “Keep the lights on” to ensure there is noimpact to the Business operations. IT leverages Incident Management process to achieve theabove Objective. An incident is something that is unplanned interruption  to  an  IT  service  orreduction  in  the quality of an IT service that affects the Users and theBusiness. The main goal of Incident Management process is to provide a quick fix / workarounds or solutions thatresolves the interruption and restores the service to its full capacity to ensure no businessimpact.In most of the organizations, incidents are created by various Business and IT Users, End Users/ Vendors if they have access to ticketing systems, and from the integrated monitoringsystems and tools. Assigning the incidents to the appropriate person or unit in the support team has critical importance to provide improved user satisfaction while ensuring better allocation of support resources. The assignment of incidents to appropriate IT 
groups is still a manual process in many of the IT organizations.Manual assignment of incidents is time consuming and requires human efforts. There may bemistakes due to human errors and resource consumption is carried out ineffectively because of the misaddressing. On the other hand, manual assignment increases the response and resolution times which result in user satisfaction deterioration / poor customer service.

# Business Domain Value
In the support process,incoming incidents are analyzed and assessed by organization’s support teams to fulfill the request. 
In many organizations, better allocation and effective usage of the valuable support resources will directly result 
in substantial cost savings.Currently the incidents are created by various stakeholders (Business Users, IT Users 
and Monitoring Tools) within IT Service Management Tool and are assigned to Service Desk teams (L1 / L2 teams). This team will review the incidents for right ticket categorization, priorities and then carry out initial diagnosis to see if they can resolve. Around ~54% of the incidents are resolved by L1 / L2 teams. Incase L1 / L2 is unable to resolve, they will then escalate / assign the tickets to Functional teams from Applications and Infrastructure (L3 teams). Some portions of incidents are directly assigned to L3 teams by either Monitoring tools or Callers / Requestors. L3 teams will carry out detailed diagnosis and resolve the incidents. Around ~56% 
of incidents are resolved by Functional / L3 teams. Incase if vendor support is needed, they will reach out for their support towards incident closure.L1 / L2 needs to spend time reviewing Standard Operating Procedures (SOPs) before assigning to Functional teams (Minimum ~25-30% of incidents needs to be reviewed for SOPs before ticket assignment). 15 min is being spent for SOP review for each incident. Minimum of ~1 FTE effort needed only for incident assignment to L3 teams.During the process of incident assignments by L1 / L2 teams to functional groups, there were multiple instances of incidents getting assigned to wrong functional groups. Around ~25% of Incidents are wrongly assigned to functional teams. Additional effort needed for Functional teams to re-assign to right functional groups. During this process, some of the incidents are in queue and not addressed timely resulting in poor customer service.Guided by powerful AI techniques that can classify incidents to right functional groups can help organizations to reduce the resolving time of the issue and can focus on more productive tasks.

# Project Description

In this capstone project, the goal is to build a classifier that can classify the tickets by analysing text.Details about the data and dataset files are given in below link,https://drive.google.com/file/d/1OZNJm81JXucV3HmZroMq6qCT2m7ez7IJ 

## Milestone 1: 
● Pre-Processing, Data Visualisation and EDA Overview  
● Exploring the given Data files  
● Understanding the structure of data  
● Missing points in data  
● Finding inconsistencies in the data  
● Visualizing different patterns   
● Visualizing different text features  
● Dealing with missing values  
● Text preprocessing   
● Creating word vocabulary from the corpus of report text data  
● Creating tokens as required   

## Milestone 2
● Model BuildingOverview  
● Building a model architecture which can classify.  
● Trying different model architectures by researching state of the art for similar tasks.  
● Train the model  
● To deal with large training time, save the weights so that you can use them when training the model for the second time without starting from scratch.  

## Milestone 3:  
● Test the Model, Fine-tuning and RepeatOverview.  
● Test the model and report as per evaluation metrics.  
● Try different models.  
● Try different evaluation metrics.   
● Set different hyper parameters, by trying different optimizers, loss functions, epochs, learning rate, batch size, checkpointing, early stopping etc..for these models to fine-tune them.  
● Report evaluation metrics for these models along with your observation on how changing different hyper parameters leads to change in the final evaluation metric.  

In [11]:
# # Let us first start with the date to note down timing
# import warnings
# warnings.filterwarnings('always')

from datetime import datetime
def time_now():
    return datetime.now().strftime("%H:%M:%S")

def print_msg(*msg):
    print(time_now(),":",*msg)

## Import all the required modules
print_msg("importing modules")

## most used packages
import numpy as np
import pandas as pd
import re,os,io,json

### NLP packages
import nltk
from gensim.utils import simple_preprocess
from nltk.corpus import stopwords,wordnet
from nltk.stem import WordNetLemmatizer

## Keras packages
from tensorflow.keras.preprocessing.text import Tokenizer,tokenizer_from_json
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import LSTM, Activation, Dense, Dropout, Input ,Flatten,BatchNormalization
from tensorflow.keras.layers import Embedding,GlobalMaxPool1D,Bidirectional,SpatialDropout1D
from tensorflow.keras.models import Sequential,load_model

# sklearn packages
from sklearn.feature_extraction.text import CountVectorizer,TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier   
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,BaggingClassifier,GradientBoostingClassifier
from sklearn import metrics,preprocessing,svm
from sklearn.model_selection import train_test_split

    
print_msg("completed importing modules")

print_msg("loading functions")
          
## Functions to loading data, build models and data preprocessing
## New Log table to capture metrics from different model iterations 
try:                              
    len(log)
except:
    print("create log table")
    log_cols = ["groups","model_name","model_column","data_set", "Accuracy","Precision Score","Recall Score","F1-Score","kappa_score"]
    log = pd.DataFrame(columns=log_cols)

# Function to capture metrics in the log table, save table to log.xlsx file with every run
def metric_update(y_test,y_pred):
    global model_column,model_name,data_set
    global log,itr_cnt
    accuracy = metrics.accuracy_score(y_test,y_pred)
    precision = metrics.precision_score(y_test,y_pred,average='macro',labels=np.unique(y_pred))
    recall = metrics.recall_score(y_test,y_pred,average='macro',labels=np.unique(y_pred))
    f1_score = metrics.f1_score(y_test,y_pred,average='macro',labels=np.unique(y_pred))
    kappa_score=metrics.cohen_kappa_score(y_test,y_pred)
    col_data=[msg_grp,model_name,model_column,data_set,accuracy,precision,recall,f1_score,kappa_score]
    log_entry = pd.DataFrame([col_data], columns=log_cols)
    log = log.append(log_entry)
    itr_cnt=itr_cnt+1
    print_msg("completed iteration ={}".format(itr_cnt))
    print_msg(metrics.classification_report(y_test,y_pred))


# Function to load data and perform preprocessing,add new columns in dataframe    
def load_data(file,prefix):        
    print_msg("loading file=",file)
    df = pd.read_excel(file)
    df= df.drop("Caller" , axis=1)
    df.columns=["short_description","long_description","assigned_group"]
    df["combined_description"]=df["short_description"]+" "+df["long_description"]
    df.dropna(inplace=True) ## Not many null ,so can safely drop the rows
    df,name=preprocess_column(df,"short_description")
    df,name=preprocess_column(df,"combined_description")
    df["assigned_group_org"]=df["assigned_group"]
    df.dropna(subset=['combined_description_text'],inplace=True)
    df.short_description_text[df.short_description_text.isnull()]=df.combined_description_text[df.short_description_text.isnull()]
    df.to_excel(prefix+file)
    return df
    

# Data preprocessing 
# When we set the flag deacc=True , the function removes punctuations also 
def sent_to_words(sentences):
    for sentence in sentences:
        yield(simple_preprocess(str(sentence), deacc=True , min_len=3, max_len=20))  # deacc=True removes punctuations

# covert list to sentences
def words_to_sent(sentences):
    final=[]
    for sentence in sentences:
        local=" ".join(sentence)
        final.append(local)
    return final

## Stopword ,duplicate word removal
stop_words = stopwords.words('english')
stop_words.extend(['from', 'subject', 're', 'edu', 'use', 'received'])

def remove_stopwords_duplicate(texts):
    final=[]
    for doc in texts:
        local=[]
        for word in simple_preprocess(str(doc)):
            if word not in local:
                local.append(word)
        final.append(local)
    return final

## find POS for each word for lemmatizer
def get_wordnet_pos(word):
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ, "N": wordnet.NOUN,  "V": wordnet.VERB, "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)

def preprocess_document(documents):
    ## Convert to lower case
    documents = [sent.lower() for sent in documents]
    # Remove Emails
    documents = [re.sub('\S*@\S*\s?', ' ', sent) for sent in documents]
    # Remove new line characters
    documents = [re.sub('\s+', ' ', sent) for sent in documents]
    # Remove _
    documents = [re.sub('_', ' ', sent) for sent in documents]
    # Remove Numbers
    documents = [re.sub('\d+', ' ', sent) for sent in documents]
    # Remove  distracting single quotes
    documents = [re.sub('\'', ' ', sent) for sent in documents]
    # Remove all non word characters
    documents = [re.sub('\W', ' ', sent) for sent in documents]
    
    document_words = list(sent_to_words(documents))
    document_words = remove_stopwords_duplicate(document_words)
    
    # Init Lemmatizer
    lemmatizer = WordNetLemmatizer()
#     print_msg("lemmatization started")
    hl_lemmatized = []
    for tokens in document_words:
        lemm = [lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in tokens]
        hl_lemmatized.append(lemm)
    document_words=hl_lemmatized   
#     print_msg("lemmatization ended")
    
    return document_words

def preprocess_column(df,column_name):
    new_column=column_name+"_list"
    new_column1=column_name+"_text"
    try:
        df[new_column].shape
        print_msg("pre-processing was already done")
    except:
        print_msg("process started for "+column_name)
        documents = df[column_name].values.tolist()
        document_words=preprocess_document(documents)
        df[new_column]=document_words
        df[new_column1]=words_to_sent(document_words)
        print_msg("process finished for "+column_name)
    return df,new_column

## sklearn models with default parameters
def sklearn_model(column_name):
    global model_column
    global model_name
    global data_set
    print_msg("fitting all classic models with column_name=",column_name)
    model_column=column_name
    
    X = df[model_column]
    y = df.assigned_group.astype('category')
 
    vectorizer = CountVectorizer()
    X_bow = vectorizer.fit_transform(X)
    
    tfidf_transformer = TfidfTransformer()
    X_tfidf = tfidf_transformer.fit_transform(X_bow)

    model = LogisticRegression()
    model_name = "LogisticRegression"
    sklearn_model_fit(model,X_bow,X_tfidf,y)

    model = DecisionTreeClassifier(criterion = 'entropy' )
    model_name ="DecisionTreeClassifier"
    sklearn_model_fit(model,X_bow,X_tfidf,y)

    model= RandomForestClassifier()
    model_name = "RandomForestClassifier"
    sklearn_model_fit(model,X_bow,X_tfidf,y)

    model = AdaBoostClassifier(n_estimators= 20)
    model_name = "AdaBoostClassifier"
    sklearn_model_fit(model,X_bow,X_tfidf,y)

    model = BaggingClassifier()
    model_name = "BaggingClassifier"
    sklearn_model_fit(model,X_bow,X_tfidf,y)
   
#     model = GradientBoostingClassifier()
#     name = "GradientBoostingClassifier"
#     classic_model_fit(name,model,X_bow,X_tfidf,y)

    model = GaussianNB()
    X_array_bow  = X_bow.toarray()
    X_array_tfidf  = X_tfidf.toarray()
    model_name = "GaussianNB"
    sklearn_model_fit(model,X_array_bow,X_array_tfidf,y)

    model = svm.SVC()
    model_name = "svm.svc"
    sklearn_model_fit(model,X_bow,X_tfidf,y)
    
def sklearn_model_fit(model,X_bow,X_tfidf,y):
    global model_column,model_name,data_set
    data_set="Bow"
    X_train, X_test, y_train, y_test = train_test_split(X_bow,y, test_size=0.10, random_state=42,stratify=y)
    print_msg("working on",msg_grp,model_column,model_name,data_set)
    model.fit(X_train,y_train)
    y_pred=model.predict(X_test)
    metric_update(y_test,y_pred)
    
    data_set="TFIDF"
    print_msg("working on",msg_grp,model_column,model_name,data_set)
    X_train, X_test, y_train, y_test = train_test_split(X_tfidf,y, test_size=0.10, random_state=42,stratify=y)
    model.fit(X_train,y_train)
    y_pred=model.predict(X_test)
    metric_update(y_test,y_pred) 
    
try:
    len(embeddings)
except:
    embeddings = {}
    
def embedding(glove_file):
    global embeddings
    
    if len(embeddings) < 1000:
        print_msg("embedding length",len(embeddings))
        print_msg("Building Embeddings from "+glove_file)

        for o in open(glove_file,encoding="utf8"):
            word = o.split(" ")[0]
            # print_msg(word)
            embd = o.split(" ")[1:]
            embd = np.asarray(embd, dtype='float32')
            # print_msg(embd)
            embeddings[word] = embd
    else:
        print_msg("Embeddings from ",glove_file,"already exists")
    print_msg("No. of embeddings = ", len(embeddings))  

def keras_model(column_name,epoch=5,fit=0):
    global model_column,model_name,data_set
    model_column=column_name
    model_name="Keras LSTM Model"
    data_set="processed"
    
    print_msg("working on",msg_grp,model_column,model_name,data_set)
    
    max_features = 10000
    #epoch = 20
    batch_size = 100
    
    documents = df[model_column].values.tolist()
    max_allowed=max(df["short_description_text"].apply(lambda x: len(x.split(" "))))*2
    max_allowed=40
    maxlen = max(df[model_column].apply(lambda x: len(x)))
    #print_msg("maxlen before=",maxlen) 
    if maxlen>max_allowed:
        maxlen=max_allowed
    print_msg("maxlen after=",maxlen)    

    tokenizer = Tokenizer(num_words=max_features)
    tokenizer.fit_on_texts(list(documents))
    sequence = tokenizer.texts_to_sequences(documents)
    tokenizer_json = tokenizer.to_json()
    with io.open('tokenizer.json', 'w', encoding='utf-8') as f:
        f.write(json.dumps(tokenizer_json, ensure_ascii=False))
        
    word_index = tokenizer.word_index
    vocab_size = len(word_index)+1
    
    print_msg("My vocabulary size = ",vocab_size)
    X = pad_sequences(sequence, maxlen = maxlen, padding='post',truncating='post') 
    
    groups = list(df.assigned_group.unique())
    with open('group.txt', 'w') as filehandle:
        for listitem in groups:
            filehandle.write('%s\n' % listitem)
        
    le = preprocessing.LabelEncoder()
    le.fit(groups)
    y=to_categorical(le.transform(df.assigned_group))   

    X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.1, random_state = 42,stratify=y)   
    
    glove_file='glove.6B.200d.txt'
    embedding(glove_file)
    embedding_size=embeddings['the'].shape[0]
    
    embedding_matrix = np.zeros((vocab_size, embedding_size))
    for word, i in tokenizer.word_index.items():
        embedding_vector = embeddings.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
        
    model = Sequential()
    model.add(Embedding(vocab_size, embedding_size, weights = [embedding_matrix],input_length=maxlen))
    model.add(SpatialDropout1D(0.2))
    model.add(BatchNormalization())
    model.add(Bidirectional(LSTM(df.assigned_group.nunique()*2, return_sequences = True,recurrent_dropout=0.1, dropout=0.1)))
    model.add(GlobalMaxPool1D())
    model.add(Dropout(0.4))
    model.add(BatchNormalization())
    model.add(Dense(df.assigned_group.nunique()*2, activation="relu"))
    model.add(Dropout(0.4))
    model.add(BatchNormalization())
    model.add(Dense(df.assigned_group.nunique(), activation="softmax"))

    model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['acc'])
    print_msg(model.summary())
    
    if fit>0:
        print_msg('running model.fit')
        history = model.fit(X_train, y_train, epochs = epoch, batch_size=batch_size, validation_split=0.05,shuffle= True,verbose = 1)
        model.save('my_model.h5')
        y_predict=model.predict_classes(X_test)
        y_test1=np.argmax(y_test,axis=1)
        
        metric_update(y_test1, y_predict)
        
def model_load():
    with open('tokenizer.json') as f:
        data = json.load(f)
    tokenizer = tokenizer_from_json(data)
    model = load_model('my_model.h5')
    
    groups = []
    with open('group.txt', 'r') as filehandle:
        for line in filehandle:
            currentPlace = line[:-1]
            groups.append(currentPlace)
        
    le = preprocessing.LabelEncoder()
    le.fit(groups)
    
    return le,tokenizer,model

def model_predict(text):
    document=[text]
    documents=preprocess_document(document)
#     print_msg(documents)
    sequence = tokenizer.texts_to_sequences(documents)
    maxlen=model.input.shape[1]
    X = pad_sequences(sequence, maxlen = maxlen, padding='post',truncating='post')
#     print_msg(X)
    y=model.predict_classes(X)
    print_msg("text=",document,"       predicted_class=",le.inverse_transform(y))

print_msg("Imported modules and new functions completed")

17:33:24 : importing modules
17:33:24 : completed importing modules
17:33:24 : loading functions
17:33:24 : Imported modules and new functions completed


In [None]:
df.to_excel("1657data.xlsx",index=False)

In [2]:
os.chdir("C:\\work\\capstone")
file="1657data.xlsx"
df=pd.read_excel(file)
df.short_description_list=list(sent_to_words(df.short_description_text.values.tolist()))
df.combined_description_list=list(sent_to_words(df.combined_description_text.values.tolist()))
# df=load_data("data.xlsx","1657")   # load data and backup processed data into a file prefix with 1657

In [12]:
# start model building with all 74 groups , then reduce the number of groups by aggregating low sample groups into one large group
## Perfrom 3 iteration with all models
itr_cnt=0   
for i in [2,30,100]:  
    df["assigned_group"]=df["assigned_group_org"]
    f=( df.assigned_group.value_counts() < i ) 
    j=f[f].index
    df.assigned_group[df.assigned_group.isin(j)]="GRP_74"
    new_groups=df.assigned_group.nunique()
    print_msg("total count for GRP_74={}".format(new_groups))
    msg_grp="grp_74 with <{} rows, total groups={}".format(i,new_groups)
    model_column=" "
    model_name=" "
    data_set=" "
    sklearn_model("short_description_text")
#     sklearn_model("combined_description_text")
    keras_model("short_description_list",20,1)
#     keras_model("combined_description_list",20,1)
    log.to_excel("log.xlsx")

17:33:31 : total count for GRP_74=69
17:33:31 : fitting all classic models with column_name= short_description_text
17:33:31 : working on grp_74 with <2 rows, total groups=69 short_description_text LogisticRegression Bow




17:33:32 : completed iteration =1
17:33:32 :               precision    recall  f1-score   support

       GRP_0       0.72      0.96      0.82       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.71      0.36      0.48        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.65      0.65      0.65        26
      GRP_13       0.42      0.33      0.37        15
      GRP_14       0.40      0.33      0.36        12
      GRP_15       1.00      0.25      0.40         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.50      0.33      0.40         9
      GRP_19       0.50      0.14      0.21        22
       GRP_2       0.55      0.25      0.34        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00   

  'precision', 'predicted', average, warn_for)


17:33:34 : completed iteration =2
17:33:34 :               precision    recall  f1-score   support

       GRP_0       0.62      0.99      0.76       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.80      0.29      0.42        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.69      0.42      0.52        26
      GRP_13       0.44      0.27      0.33        15
      GRP_14       0.80      0.33      0.47        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       1.00      0.11      0.20         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.33      0.08      0.13        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       1.00      0.50   

  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


17:33:34 : completed iteration =3
17:33:35 :               precision    recall  f1-score   support

       GRP_0       0.69      0.83      0.76       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.40      0.43      0.41        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.42      0.54      0.47        26
      GRP_13       0.10      0.07      0.08        15
      GRP_14       0.50      0.25      0.33        12
      GRP_15       0.50      0.25      0.33         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.50      0.22      0.31         9
      GRP_19       0.31      0.23      0.26        22
       GRP_2       0.38      0.33      0.36        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00   

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


17:33:36 : completed iteration =4
17:33:36 :               precision    recall  f1-score   support

       GRP_0       0.72      0.84      0.78       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.50      0.36      0.42        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.52      0.54      0.53        26
      GRP_13       0.12      0.13      0.12        15
      GRP_14       0.40      0.33      0.36        12
      GRP_15       1.00      0.25      0.40         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.22      0.22      0.22         9
      GRP_19       0.10      0.09      0.10        22
       GRP_2       0.44      0.29      0.35        24
      GRP_20       0.33      0.25      0.29         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00   

  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


17:33:39 : completed iteration =6
17:33:39 :               precision    recall  f1-score   support

       GRP_0       0.68      0.93      0.78       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.57      0.29      0.38        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.61      0.65      0.63        26
      GRP_13       0.31      0.27      0.29        15
      GRP_14       0.67      0.33      0.44        12
      GRP_15       0.25      0.25      0.25         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.40      0.22      0.29         9
      GRP_19       0.43      0.14      0.21        22
       GRP_2       0.43      0.12      0.19        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00   

  'precision', 'predicted', average, warn_for)


17:33:42 : completed iteration =8
17:33:42 :               precision    recall  f1-score   support

       GRP_0       0.53      1.00      0.69       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.00      0.00      0.00        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        15
      GRP_14       0.00      0.00      0.00        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.00      0.00      0.00         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.00      0.00      0.00        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00   

  'precision', 'predicted', average, warn_for)


17:33:46 : completed iteration =9
17:33:46 :               precision    recall  f1-score   support

       GRP_0       0.71      0.90      0.79       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.33      0.29      0.31        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.57      0.65      0.61        26
      GRP_13       0.40      0.27      0.32        15
      GRP_14       0.43      0.25      0.32        12
      GRP_15       0.33      0.50      0.40         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.40      0.22      0.29         9
      GRP_19       0.33      0.23      0.27        22
       GRP_2       0.40      0.25      0.31        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00   

  'precision', 'predicted', average, warn_for)


17:33:53 : completed iteration =10
17:33:53 :               precision    recall  f1-score   support

       GRP_0       0.70      0.93      0.80       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.50      0.29      0.36        14
      GRP_11       1.00      0.33      0.50         3
      GRP_12       0.59      0.73      0.66        26
      GRP_13       0.29      0.27      0.28        15
      GRP_14       0.50      0.33      0.40        12
      GRP_15       0.67      0.50      0.57         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.30      0.33      0.32         9
      GRP_19       0.31      0.23      0.26        22
       GRP_2       0.44      0.17      0.24        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       1.00      0.33      0.50         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:33:53 : working on grp_74 with <2 rows, total groups=69 short_description_text GaussianNB Bow


  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


17:34:00 : completed iteration =11
17:34:00 :               precision    recall  f1-score   support

       GRP_0       0.62      0.34      0.43       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.62      0.36      0.45        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.50      0.38      0.43        26
      GRP_13       0.44      0.47      0.45        15
      GRP_14       0.33      0.42      0.37        12
      GRP_15       0.29      0.50      0.36         4
      GRP_16       0.14      0.22      0.17         9
      GRP_17       0.88      0.88      0.88         8
      GRP_18       0.33      0.22      0.27         9
      GRP_19       0.13      0.18      0.15        22
       GRP_2       0.14      0.25      0.18        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.14      0.33      0.20         3
      GRP_23       0.00      0.00  

  'recall', 'true', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


17:34:07 : completed iteration =12
17:34:07 :               precision    recall  f1-score   support

       GRP_0       0.62      0.34      0.44       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.56      0.36      0.43        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.40      0.38      0.39        26
      GRP_13       0.50      0.47      0.48        15
      GRP_14       0.38      0.42      0.40        12
      GRP_15       0.29      0.50      0.36         4
      GRP_16       0.14      0.22      0.17         9
      GRP_17       0.88      0.88      0.88         8
      GRP_18       0.40      0.22      0.29         9
      GRP_19       0.19      0.27      0.23        22
       GRP_2       0.14      0.25      0.18        24
      GRP_20       0.10      0.25      0.14         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.17      0.33      0.22         3
      GRP_23       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:34:19 : completed iteration =14
17:34:19 :               precision    recall  f1-score   support

       GRP_0       0.47      1.00      0.64       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.00      0.00      0.00        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        15
      GRP_14       0.00      0.00      0.00        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.00      0.00      0.00         9
      GRP_17       0.00      0.00      0.00         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.00      0.00      0.00        24
      GRP_20       0.00      0.00      0.00         4
      GRP_21       0.00      0.00      0.00         3
      GRP_22       0.00      0.00      0.00         3
      GRP_23       0.00      0.00  

  'precision', 'predicted', average, warn_for)


 : working on grp_74 with <2 rows, total groups=69 short_description_list Keras LSTM Model processed
17:34:19 : maxlen after= 40
17:34:19 : My vocabulary size =  5258
17:34:20 : Embeddings from  glove.6B.200d.txt already exists
17:34:20 : No. of embeddings =  400000
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 40, 200)           1051600   
_________________________________________________________________
spatial_dropout1d_2 (Spatial (None, 40, 200)           0         
_________________________________________________________________
batch_normalization_6 (Batch (None, 40, 200)           800       
_________________________________________________________________
bidirectional_2 (Bidirection (None, 40, 276)           374256    
_________________________________________________________________
global_max_pooling1d_2 (Glob (None, 276)           

  'precision', 'predicted', average, warn_for)


17:47:22 : total count for GRP_74=36
17:47:22 : fitting all classic models with column_name= short_description_text
17:47:22 : working on grp_74 with <30 rows, total groups=36 short_description_text LogisticRegression Bow




17:47:23 : completed iteration =16
17:47:23 :               precision    recall  f1-score   support

       GRP_0       0.71      0.96      0.81       397
       GRP_1       1.00      0.33      0.50         3
      GRP_10       0.70      0.50      0.58        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.62      0.50      0.55        26
      GRP_13       0.56      0.36      0.43        14
      GRP_14       0.71      0.42      0.53        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.80      0.50      0.62         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.33      0.11      0.17         9
      GRP_19       0.62      0.24      0.34        21
       GRP_2       0.43      0.25      0.32        24
      GRP_20       1.00      0.25      0.40         4
      GRP_22       1.00      0.33      0.50         3
      GRP_24       0.83      0.86      0.85        29
      GRP_25       0.50      0.17  

  'precision', 'predicted', average, warn_for)


17:47:24 : completed iteration =17
17:47:24 :               precision    recall  f1-score   support

       GRP_0       0.61      0.98      0.75       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       1.00      0.43      0.60        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.42      0.31      0.36        26
      GRP_13       0.75      0.21      0.33        14
      GRP_14       1.00      0.25      0.40        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.50      0.12      0.20         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.50      0.10      0.16        21
       GRP_2       0.29      0.08      0.13        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.87      0.69      0.77        29
      GRP_25       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:47:25 : completed iteration =18
17:47:25 :               precision    recall  f1-score   support

       GRP_0       0.68      0.84      0.75       397
       GRP_1       1.00      0.33      0.50         3
      GRP_10       0.50      0.50      0.50        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.40      0.46      0.43        26
      GRP_13       0.50      0.36      0.42        14
      GRP_14       0.50      0.33      0.40        12
      GRP_15       1.00      0.25      0.40         4
      GRP_16       0.67      0.50      0.57         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.40      0.22      0.29         9
      GRP_19       0.13      0.10      0.11        21
       GRP_2       0.31      0.21      0.25        24
      GRP_20       0.50      0.25      0.33         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.80      0.69      0.74        29
      GRP_25       0.30      0.25  

  'precision', 'predicted', average, warn_for)


17:47:26 : completed iteration =19
17:47:26 :               precision    recall  f1-score   support

       GRP_0       0.70      0.85      0.77       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.43      0.43      0.43        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.25      0.19      0.22        26
      GRP_13       0.38      0.36      0.37        14
      GRP_14       0.50      0.33      0.40        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.60      0.38      0.46         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.33      0.33      0.33         9
      GRP_19       0.12      0.10      0.11        21
       GRP_2       0.21      0.17      0.19        24
      GRP_20       0.50      0.25      0.33         4
      GRP_22       0.25      0.33      0.29         3
      GRP_24       0.86      0.83      0.84        29
      GRP_25       0.38      0.42  

  'precision', 'predicted', average, warn_for)


17:47:27 : completed iteration =20
17:47:27 :               precision    recall  f1-score   support

       GRP_0       0.68      0.91      0.78       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.78      0.50      0.61        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.42      0.50      0.46        26
      GRP_13       0.33      0.21      0.26        14
      GRP_14       0.67      0.33      0.44        12
      GRP_15       0.67      0.50      0.57         4
      GRP_16       0.40      0.25      0.31         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.50      0.33      0.40         9
      GRP_19       0.27      0.14      0.19        21
       GRP_2       0.21      0.12      0.16        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.87      0.69      0.77        29
      GRP_25       0.38      0.42  

  'precision', 'predicted', average, warn_for)


17:47:29 : completed iteration =21
17:47:29 :               precision    recall  f1-score   support

       GRP_0       0.67      0.94      0.78       397
       GRP_1       1.00      0.33      0.50         3
      GRP_10       0.64      0.50      0.56        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.41      0.42      0.42        26
      GRP_13       0.50      0.43      0.46        14
      GRP_14       1.00      0.42      0.59        12
      GRP_15       1.00      0.25      0.40         4
      GRP_16       0.50      0.12      0.20         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.56      0.24      0.33        21
       GRP_2       0.44      0.17      0.24        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.92      0.76      0.83        29
      GRP_25       0.67      0.17  

  'precision', 'predicted', average, warn_for)


17:47:30 : completed iteration =22
17:47:30 :               precision    recall  f1-score   support

       GRP_0       0.54      0.99      0.70       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.00      0.00      0.00        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.67      0.25      0.36         8
      GRP_17       0.00      0.00      0.00         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        21
       GRP_2       0.00      0.00      0.00        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.50      0.33      0.40         3
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:47:31 : completed iteration =23
17:47:31 :               precision    recall  f1-score   support

       GRP_0       0.55      0.99      0.71       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.00      0.00      0.00        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.67      0.25      0.36         8
      GRP_17       0.00      0.00      0.00         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        21
       GRP_2       0.00      0.00      0.00        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       1.00      0.33      0.50         3
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:47:36 : completed iteration =24
17:47:36 :               precision    recall  f1-score   support

       GRP_0       0.68      0.89      0.77       397
       GRP_1       1.00      0.33      0.50         3
      GRP_10       0.50      0.57      0.53        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.44      0.42      0.43        26
      GRP_13       0.67      0.43      0.52        14
      GRP_14       1.00      0.42      0.59        12
      GRP_15       0.50      0.25      0.33         4
      GRP_16       1.00      0.38      0.55         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       1.00      0.22      0.36         9
      GRP_19       0.23      0.14      0.18        21
       GRP_2       0.38      0.33      0.36        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.81      0.59      0.68        29
      GRP_25       0.57      0.33  

  'precision', 'predicted', average, warn_for)


17:47:42 : completed iteration =25
17:47:42 :               precision    recall  f1-score   support

       GRP_0       0.68      0.93      0.79       397
       GRP_1       1.00      0.33      0.50         3
      GRP_10       0.78      0.50      0.61        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.44      0.46      0.45        26
      GRP_13       0.50      0.43      0.46        14
      GRP_14       0.62      0.42      0.50        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       1.00      0.25      0.40         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.43      0.33      0.38         9
      GRP_19       0.40      0.19      0.26        21
       GRP_2       0.38      0.12      0.19        24
      GRP_20       0.33      0.25      0.29         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.85      0.79      0.82        29
      GRP_25       0.50      0.08  

  'precision', 'predicted', average, warn_for)


17:47:43 : working on grp_74 with <30 rows, total groups=36 short_description_text GaussianNB Bow
17:47:47 : completed iteration =26
17:47:47 :               precision    recall  f1-score   support

       GRP_0       0.61      0.29      0.40       397
       GRP_1       0.07      0.33      0.12         3
      GRP_10       0.67      0.57      0.62        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.62      0.38      0.48        26
      GRP_13       0.50      0.50      0.50        14
      GRP_14       0.12      0.25      0.16        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.21      0.50      0.30         8
      GRP_17       0.89      1.00      0.94         8
      GRP_18       0.38      0.33      0.35         9
      GRP_19       0.14      0.24      0.17        21
       GRP_2       0.13      0.25      0.17        24
      GRP_20       0.10      0.25      0.14         4
      GRP_22       0.10      0.67      0.17  



17:47:55 : completed iteration =28
17:47:55 :               precision    recall  f1-score   support

       GRP_0       0.47      1.00      0.64       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.00      0.00      0.00        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.00      0.00      0.00         8
      GRP_17       0.00      0.00      0.00         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        21
       GRP_2       0.00      0.00      0.00        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:47:59 : completed iteration =29
17:47:59 :               precision    recall  f1-score   support

       GRP_0       0.47      1.00      0.64       397
       GRP_1       0.00      0.00      0.00         3
      GRP_10       0.00      0.00      0.00        14
      GRP_11       0.00      0.00      0.00         3
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_15       0.00      0.00      0.00         4
      GRP_16       0.00      0.00      0.00         8
      GRP_17       0.00      0.00      0.00         8
      GRP_18       0.00      0.00      0.00         9
      GRP_19       0.00      0.00      0.00        21
       GRP_2       0.00      0.00      0.00        24
      GRP_20       0.00      0.00      0.00         4
      GRP_22       0.00      0.00      0.00         3
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:47:59 : My vocabulary size =  5258
17:48:00 : Embeddings from  glove.6B.200d.txt already exists
17:48:00 : No. of embeddings =  400000
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 40, 200)           1051600   
_________________________________________________________________
spatial_dropout1d_3 (Spatial (None, 40, 200)           0         
_________________________________________________________________
batch_normalization_9 (Batch (None, 40, 200)           800       
_________________________________________________________________
bidirectional_3 (Bidirection (None, 40, 144)           157248    
_________________________________________________________________
global_max_pooling1d_3 (Glob (None, 144)               0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 144)        

  'precision', 'predicted', average, warn_for)


17:56:08 : working on grp_74 with <100 rows, total groups=17 short_description_text LogisticRegression Bow




17:56:08 : completed iteration =31
17:56:08 :               precision    recall  f1-score   support

       GRP_0       0.73      0.92      0.81       397
      GRP_10       1.00      0.29      0.44        14
      GRP_12       0.41      0.27      0.33        26
      GRP_13       0.29      0.14      0.19        14
      GRP_14       0.67      0.17      0.27        12
      GRP_19       0.33      0.05      0.08        22
       GRP_2       0.64      0.38      0.47        24
      GRP_24       0.90      0.90      0.90        29
      GRP_25       0.50      0.17      0.25        12
       GRP_3       0.45      0.25      0.32        20
      GRP_33       0.83      0.45      0.59        11
       GRP_4       0.00      0.00      0.00        10
       GRP_5       1.00      0.38      0.56        13
       GRP_6       0.57      0.22      0.32        18
      GRP_74       0.52      0.49      0.50       136
       GRP_8       0.56      0.89      0.69        66
       GRP_9       0.50      0.08  



17:56:09 : completed iteration =32
17:56:09 :               precision    recall  f1-score   support

       GRP_0       0.69      0.94      0.80       397
      GRP_10       1.00      0.21      0.35        14
      GRP_12       0.40      0.23      0.29        26
      GRP_13       0.50      0.07      0.12        14
      GRP_14       0.00      0.00      0.00        12
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.50      0.17      0.25        24
      GRP_24       0.92      0.76      0.83        29
      GRP_25       0.00      0.00      0.00        12
       GRP_3       1.00      0.05      0.10        20
      GRP_33       1.00      0.27      0.43        11
       GRP_4       0.00      0.00      0.00        10
       GRP_5       0.83      0.38      0.53        13
       GRP_6       0.67      0.22      0.33        18
      GRP_74       0.47      0.47      0.47       136
       GRP_8       0.55      0.86      0.67        66
       GRP_9       1.00      0.04  

  'precision', 'predicted', average, warn_for)


17:56:09 : completed iteration =33
17:56:09 :               precision    recall  f1-score   support

       GRP_0       0.74      0.83      0.78       397
      GRP_10       0.62      0.36      0.45        14
      GRP_12       0.37      0.38      0.38        26
      GRP_13       0.14      0.07      0.10        14
      GRP_14       0.00      0.00      0.00        12
      GRP_19       0.25      0.14      0.18        22
       GRP_2       0.40      0.42      0.41        24
      GRP_24       0.75      0.83      0.79        29
      GRP_25       0.50      0.33      0.40        12
       GRP_3       0.33      0.20      0.25        20
      GRP_33       0.20      0.09      0.13        11
       GRP_4       0.50      0.40      0.44        10
       GRP_5       0.73      0.62      0.67        13
       GRP_6       0.31      0.28      0.29        18
      GRP_74       0.44      0.40      0.42       136
       GRP_8       0.60      0.80      0.68        66
       GRP_9       0.33      0.24  



17:56:11 : completed iteration =35
17:56:11 :               precision    recall  f1-score   support

       GRP_0       0.71      0.90      0.79       397
      GRP_10       0.71      0.36      0.48        14
      GRP_12       0.38      0.38      0.38        26
      GRP_13       0.25      0.07      0.11        14
      GRP_14       0.50      0.08      0.14        12
      GRP_19       0.30      0.14      0.19        22
       GRP_2       0.53      0.38      0.44        24
      GRP_24       0.82      0.79      0.81        29
      GRP_25       0.50      0.08      0.14        12
       GRP_3       0.33      0.15      0.21        20
      GRP_33       0.40      0.18      0.25        11
       GRP_4       0.50      0.10      0.17        10
       GRP_5       0.80      0.62      0.70        13
       GRP_6       0.32      0.33      0.32        18
      GRP_74       0.50      0.36      0.42       136
       GRP_8       0.59      0.82      0.69        66
       GRP_9       0.54      0.28  

  'precision', 'predicted', average, warn_for)


17:56:14 : completed iteration =38
17:56:14 :               precision    recall  f1-score   support

       GRP_0       0.55      0.99      0.70       397
      GRP_10       0.00      0.00      0.00        14
      GRP_12       0.25      0.04      0.07        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.00      0.00      0.00        24
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00      0.00        12
       GRP_3       0.00      0.00      0.00        20
      GRP_33       0.00      0.00      0.00        11
       GRP_4       0.00      0.00      0.00        10
       GRP_5       1.00      0.15      0.27        13
       GRP_6       0.43      0.17      0.24        18
      GRP_74       0.00      0.00      0.00       136
       GRP_8       0.50      0.88      0.63        66
       GRP_9       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:56:18 : completed iteration =39
17:56:18 :               precision    recall  f1-score   support

       GRP_0       0.75      0.86      0.80       397
      GRP_10       0.83      0.36      0.50        14
      GRP_12       0.41      0.42      0.42        26
      GRP_13       0.25      0.14      0.18        14
      GRP_14       1.00      0.08      0.15        12
      GRP_19       0.27      0.14      0.18        22
       GRP_2       0.41      0.38      0.39        24
      GRP_24       0.79      0.76      0.77        29
      GRP_25       0.43      0.25      0.32        12
       GRP_3       0.53      0.40      0.46        20
      GRP_33       0.33      0.36      0.35        11
       GRP_4       0.40      0.20      0.27        10
       GRP_5       1.00      0.62      0.76        13
       GRP_6       0.33      0.22      0.27        18
      GRP_74       0.44      0.40      0.42       136
       GRP_8       0.59      0.82      0.68        66
       GRP_9       0.50      0.28  



17:56:31 : completed iteration =43
17:56:31 :               precision    recall  f1-score   support

       GRP_0       0.47      1.00      0.64       397
      GRP_10       0.00      0.00      0.00        14
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.00      0.00      0.00        24
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00      0.00        12
       GRP_3       0.00      0.00      0.00        20
      GRP_33       0.00      0.00      0.00        11
       GRP_4       0.00      0.00      0.00        10
       GRP_5       0.00      0.00      0.00        13
       GRP_6       0.00      0.00      0.00        18
      GRP_74       0.00      0.00      0.00       136
       GRP_8       0.00      0.00      0.00        66
       GRP_9       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:56:35 : completed iteration =44
17:56:35 :               precision    recall  f1-score   support

       GRP_0       0.47      1.00      0.64       397
      GRP_10       0.00      0.00      0.00        14
      GRP_12       0.00      0.00      0.00        26
      GRP_13       0.00      0.00      0.00        14
      GRP_14       0.00      0.00      0.00        12
      GRP_19       0.00      0.00      0.00        22
       GRP_2       0.00      0.00      0.00        24
      GRP_24       0.00      0.00      0.00        29
      GRP_25       0.00      0.00      0.00        12
       GRP_3       0.00      0.00      0.00        20
      GRP_33       0.00      0.00      0.00        11
       GRP_4       0.00      0.00      0.00        10
       GRP_5       0.00      0.00      0.00        13
       GRP_6       0.00      0.00      0.00        18
      GRP_74       0.00      0.00      0.00       136
       GRP_8       0.00      0.00      0.00        66
       GRP_9       0.00      0.00  

  'precision', 'predicted', average, warn_for)


17:56:35 : maxlen after= 40
17:56:35 : My vocabulary size =  5258
17:56:35 : Embeddings from  glove.6B.200d.txt already exists
17:56:35 : No. of embeddings =  400000
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 40, 200)           1051600   
_________________________________________________________________
spatial_dropout1d_4 (Spatial (None, 40, 200)           0         
_________________________________________________________________
batch_normalization_12 (Batc (None, 40, 200)           800       
_________________________________________________________________
bidirectional_4 (Bidirection (None, 40, 68)            63920     
_________________________________________________________________
global_max_pooling1d_4 (Glob (None, 68)                0         
_________________________________________________________________
dropout_8 (Dropout) 

  'precision', 'predicted', average, warn_for)


In [22]:
log.sort_values(by=['kappa_score'],ascending=False).head(10) 

Unnamed: 0,groups,model_name,model_column,data_set,Accuracy,Precision Score,Recall Score,F1-Score,kappa_score
0,"grp_74 with <2 rows, total groups=69",Keras LSTM Model,short_description_list,processed,0.677267,0.525336,0.395995,0.427204,0.552353
0,"grp_74 with <2 rows, total groups=69",LogisticRegression,short_description_text,Bow,0.67609,0.590135,0.371567,0.426145,0.531342
0,"grp_74 with <2 rows, total groups=69",LogisticRegression,short_description_text,Bow,0.67609,0.590135,0.371567,0.426145,0.531342
0,"grp_74 with <100 rows, total groups=17",BaggingClassifier,short_description_text,TFIDF,0.663133,0.537706,0.39347,0.4307,0.523767
0,"grp_74 with <30 rows, total groups=36",LogisticRegression,short_description_text,Bow,0.669022,0.732875,0.400116,0.479534,0.518314
0,"grp_74 with <30 rows, total groups=36",LogisticRegression,short_description_text,Bow,0.669022,0.732875,0.400116,0.479534,0.518314
0,"grp_74 with <100 rows, total groups=17",LogisticRegression,short_description_text,Bow,0.664311,0.582486,0.355186,0.403217,0.511237
0,"grp_74 with <2 rows, total groups=69",Keras LSTM Model,short_description_list,processed,0.641932,0.489569,0.404149,0.4202,0.508811
0,"grp_74 with <30 rows, total groups=36",Keras LSTM Model,short_description_list,processed,0.656066,0.574363,0.389137,0.422317,0.506353
0,"grp_74 with <100 rows, total groups=17",Keras LSTM Model,short_description_list,processed,0.664311,0.686335,0.3804,0.415979,0.504596


In [23]:
os.chdir("C:\\work\\capstone")
le,tokenizer,model=model_load()

In [24]:
model_predict("crm add-in is getting disabled from outlook ")

19:58:20 : text= ['crm add-in is getting disabled from outlook ']        predicted_class= ['GRP_0']


In [25]:
log.shape

(74, 9)

## Best model out of all iterations coming with LSTM with 69 group, Kappa_score=55%

Summary - 
Not sufficient data to build a model with high accuracy and Kappa score.  
Most of the classes are unbalanced.  
Even after grouping unbalanced classes into large group best accuracy achieved was 69% and kappa_score=55%  
Furhter data cleanup, Removal of duplicate words etc improved the score by 1% level only.  
Keras deep model with regularization and drop layer improved the scope atleast by 2-3%.  
Total 74 iterations are performed with multiple models and tuning parameters.  


# As a next step lets try LDA model to reduce the number of groups so that efficieny can be improved.