# Classical Machine Learning Approach

In this notebook we will be learning to
  1. Create a Naive TF - IDF based Bag of Words representation of text.
  2. Use classical ML models to solve text classification.
  3. Use a One Vs Rest strategy to solve multi-label text classification.


  **HOT TIP** : *Save them as pickle for easy rendering for experiments*

  This Notebook uses code from https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Multi%20label%20text%20classification.ipynb


In [1]:
# Installing packages.
!pip install contractions
!pip install textsearch
!pip install tqdm

# Importing packages.
import nltk
nltk.download('punkt')
nltk.download('stopwords')
%matplotlib inline
import re
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.multiclass import OneVsRestClassifier
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import pickle
import ast
from sklearn.externals import joblib
from datetime import datetime
from sklearn.preprocessing import MultiLabelBinarizer

Collecting contractions
  Downloading https://files.pythonhosted.org/packages/85/41/c3dfd5feb91a8d587ed1a59f553f07c05f95ad4e5d00ab78702fbf8fe48a/contractions-0.0.24-py2.py3-none-any.whl
Collecting textsearch
  Downloading https://files.pythonhosted.org/packages/42/a8/03407021f9555043de5492a2bd7a35c56cc03c2510092b5ec018cae1bbf1/textsearch-0.0.17-py2.py3-none-any.whl
Collecting pyahocorasick
[?25l  Downloading https://files.pythonhosted.org/packages/f4/9f/f0d8e8850e12829eea2e778f1c90e3c53a9a799b7f412082a5d21cd19ae1/pyahocorasick-1.4.0.tar.gz (312kB)
[K     |████████████████████████████████| 317kB 4.1MB/s 
[?25hCollecting Unidecode
[?25l  Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)
[K     |████████████████████████████████| 245kB 43.3MB/s 
[?25hBuilding wheels for collected packages: pyahocorasick
  Building wheel for pyahocorasick (setup.py) ... [?25l[?25hdone
  



In [6]:
# Let's mount our G-Drive.

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [0]:
# Data read and preparation.
# Mentioning where is our data located on G-Drive. Make sure to rectify your path
path = '/content/drive/My Drive/ICDMAI_Tutorial/notebook/'
data ='filtered_data/question_tag_text_mapping.pkl'
ml_model = path + 'ml_model/'

In [8]:
# Let us quickly load our question tag data
question_tag = pd.read_pickle(path+data)
question_tag.head(3)

Unnamed: 0,Id,OwnerUserId,CreationDate,ClosedDate,Score,Title,Body,CreationMonth,CreationYear,Tag
0,120,83.0,2008-08-01 15:50:08,,21,ASP.NET Site Maps,<p>Has anyone got experience creating <strong>...,8,2008,"[asp.net, sql]"
1,260,91.0,2008-08-01 23:22:08,,49,Adding scripting functionality to .NET applica...,<p>I have a little game written in C#. It uses...,8,2008,"[c#, .net]"
2,330,63.0,2008-08-02 02:51:36,,29,Should I use nested classes in this case?,<p>I am working on a collection of classes use...,8,2008,[c++]


### Creating one hot encoding from multilabelled tagged data

In [9]:
# In order to use one vs rest strategy we will need to one hot encoding each tag across all documents.
mlb = MultiLabelBinarizer()
question_tag['Tag_pop'] = question_tag['Tag']
question_tag = question_tag.join(pd.DataFrame(mlb.fit_transform(question_tag.pop('Tag_pop')),
                          columns=mlb.classes_,
                          index=question_tag.index))
question_tag.head(3)

Unnamed: 0,Id,OwnerUserId,CreationDate,ClosedDate,Score,Title,Body,CreationMonth,CreationYear,Tag,.net,agile,ajax,amazon-web-services,android,android-studio,angular2,angularjs,apache,apache-spark,api,asp.net,asp.net-web-api,azure,bash,c,c#,c++,cloud,codeigniter,css,devops,django,docker,drupal,eclipse,elasticsearch,embedded,entity-framework,excel,...,qt,r,react-native,reactjs,redis,redux,regex,rest,ruby,ruby-on-rails,sass,scala,selenium,shell,spring,spring-boot,spring-mvc,sql,sql-server,swift,tdd,testing,twitter-bootstrap,twitter-bootstrap-3,typescript,ubuntu,unity3d,unix,vb.net,vba,visual-studio,vue.js,wcf,web-services,windows,wordpress,wpf,xamarin,xcode,xml
0,120,83.0,2008-08-01 15:50:08,,21,ASP.NET Site Maps,<p>Has anyone got experience creating <strong>...,8,2008,"[asp.net, sql]",0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,260,91.0,2008-08-01 23:22:08,,49,Adding scripting functionality to .NET applica...,<p>I have a little game written in C#. It uses...,8,2008,"[c#, .net]",1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,330,63.0,2008-08-02 02:51:36,,29,Should I use nested classes in this case?,<p>I am working on a collection of classes use...,8,2008,[c++],0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [0]:
# Creating a list of all existing 'Tags'
dummy = question_tag.drop(['Id', 'OwnerUserId', 'CreationDate', 'ClosedDate', 'Score', 'Title','Body', 'CreationMonth', 'CreationYear','Tag'], axis=1)
categories = list(dummy.columns.values)

### Text preprocessing

In [0]:
# Let us createa a very basic text preprocessor which we will use for cleaning text.
def clean_text(text):
    text = text.lower()
    text = re.sub(r"what's", "what is ", text)
    text = re.sub(r"\'s", " ", text)
    text = re.sub(r"\'ve", " have ", text)
    text = re.sub(r"can't", "can not ", text)
    text = re.sub(r"n't", " not ", text)
    text = re.sub(r"i'm", "i am ", text)
    text = re.sub(r"\'re", " are ", text)
    text = re.sub(r"\'d", " would ", text)
    text = re.sub(r"\'ll", " will ", text)
    text = re.sub(r"\'scuse", " excuse ", text)
    text = re.sub('\W', ' ', text)
    text = re.sub('\s+', ' ', text)
    text = text.strip(' ')
    return text

question_tag['Body'] = question_tag['Body'].map(lambda com : clean_text(com))

### Creating a 70/30 Train-Test Split

In [12]:
train, test = train_test_split(question_tag, random_state=42, test_size=0.30, shuffle=True)

X_train = train.Body
X_test = test.Body
print(X_train.shape)
print(X_test.shape)

(736394,)
(315598,)


# Creating Bag of Words representation using Tfidf
  1. Initializing the Vectorizer object
  2. Create a corpus from training data.
  3. Create a document term matrix

In [0]:
#Initializing the Vectorizer object
tfidf = TfidfVectorizer(stop_words=stop_words)

#Create a corpus from training data
#Create a document term matrix of training data based on the corpus.
X_train_dtm = tfidf.fit_transform(X_train)

#Create a document term matrix of test data based on the corpus.
#Note that the dimensions/columns of DTM of the test data will be based on the training data corpus only.
X_test_dtm = tfidf.transform(X_test)

## Pipeline
scikit-learn provides a Pipeline utility to help automate machine learning workflows. Pipelines are very common in Machine Learning systems, since there is a lot of data to manipulate and many data transformations to apply. So we will utilize pipeline to train every classifier.

## OneVsRest multilabel strategy
The Multi-label algorithm accepts a binary mask over multiple labels. The result for each prediction will be an array of 0s and 1s marking which class labels apply to each row input sample.

OneVsRest strategy can be used for multilabel learning, where a classifier is used to predict multiple labels for instance. **Naive Bayes**, **SVM**, **Logistic Regression** supports multi-class, but we are in a multi-label scenario, therefore, we wrap them in the OneVsRestClassifier.

### We create a Training Pipeline and a Scoring Pipeline

In [0]:
def tag_level_training_pipeline(X_train, train, X_test, test, classifier_pipeline, output_directory):
  #1. Create a classifier for each Tag
  for category in categories:
    print('... Processing {}'.format(category))
    # 1. train the model using X_dtm & y
    classifier_pipeline.fit(X_train, train[category])
    # 2. save the model to disk
    filename = ml_model + output_directory +str(category)+ '_model.pkl'
    joblib.dump(classifier_pipeline, filename, compress = 1)
    # 3. compute the testing accuracy
    prediction = classifier_pipeline.predict(X_test)
    print('Test accuracy is {}'.format(accuracy_score(test[category], prediction)))
    print(classification_report(test[category], prediction))

def tag_level_predict(X_train, train, X_test, test, model_directory):
  prediction_df = pd.DataFrame(columns=['dummy1'])
  #Score the document across classifier for each Tag
  for category in categories:
    print('... Processing {}'.format(category))
    # 1. load the model
    filename = ml_model + model_directory +str(category)+ '_model.pkl'
    classifier_pipeline = joblib.load(filename)
    # 2. predict on the test data.
    prediction = classifier_pipeline.predict(X_test)
    prediction_df[str(category)] = prediction

  # Remember We had encoded the labels. It time to bring them back to their original form.
  for category in categories:
    prediction_df.loc[prediction_df[str(category)] == 1, str(category)] = category
  prediction_df['predicted_labels'] = prediction_df[[str(i) for i in categories]].values.tolist()
  prediction_df['predicted_labels'] =  prediction_df['predicted_labels'].apply(lambda x : list(set(x)))
  
  # We create result having orignal labels and predicted labels for metrics Evaluation
  final_pred_df = pd.concat([test[['Id','Tag']].reset_index(), prediction_df[['predicted_labels']].reset_index()], axis=1)
  final_pred_df['original_labels'] = final_pred_df['Tag']
  # prediction_df[['Id']] = test[['Id']]
  final_pred_df_result = final_pred_df[['Id','original_labels','predicted_labels']]
  return final_pred_df_result

# Evaluating our results

In [0]:
# Here we define precision, recall, f1 measure at a single document level.
def document_evaluation_metrics(prd_grp,grp,metric="precision"):
    pred_group = prd_grp
    if 0 in pred_group: pred_group.remove(0)
    group = grp

    set_pred_group = set(pred_group)
    set_group = set(group)
    intrsct = set_group.intersection(set_pred_group)
    accuracy = len(intrsct) / float(len(set_pred_group) if len(set_pred_group)>1 else 1)
    recall = len(intrsct) / float(len(set_group) if len(set_group)>1 else 1)
    if metric == "precision":
      return accuracy
    elif metric == "recall":
      return recall
    elif metric == "f1_measure":
      if accuracy == 0 or recall == 0:
        return 0
      elif accuracy > 0 and recall >0 :
        f1_measure = 2*accuracy*recall/(float(accuracy + recall))
        return f1_measure
    
    return -1

# Provide overall average stats and populate document level metrics.
def model_evaluation_stats(final_pred_df, model_name="default"):
  final_pred_df['doc_precision'] = final_pred_df.apply(lambda x: document_evaluation_metrics(x.predicted_labels, x.original_labels, "precision"), axis=1)
  final_pred_df['doc_recall'] = final_pred_df.apply(lambda x: document_evaluation_metrics(x.predicted_labels, x.original_labels, "recall"), axis=1)
  final_pred_df['doc_f1_measure'] = final_pred_df.apply(lambda x: document_evaluation_metrics(x.predicted_labels, x.original_labels, "f1_measure"), axis=1)
  
  print('Avearge precision across documents is {}'.format(final_pred_df['doc_precision'].mean()))
  print('Avearge recall across documents is {}'.format(final_pred_df['doc_recall'].mean()))
  print('Avearge f1 measure across documents is {}'.format(final_pred_df['doc_f1_measure'].mean()))
  pickle.dump(final_pred_df, open(ml_model + model_name + ".pkl", 'wb'))
  # final_pred_df.to_csv(ml_model + 'SVM_Tag_predictions.txt',sep='\t',index=False)

# Let us train, score and evaluate Naive Bayes

In [0]:
#Naive Bayes Classifier
NB_pipeline = Pipeline([
                ('clf', OneVsRestClassifier(MultinomialNB(
                    fit_prior=True, class_prior=None))),
            ])

tag_level_training_pipeline(X_train_dtm, train, X_test_dtm, test, NB_pipeline, 'NaiveBayes/')
result = tag_level_predict(X_train_dtm, train, X_test_dtm, test, 'NaiveBayes/')
model_evaluation_stats(result, "NaiveBayes")

... Processing .net
Test accuracy is 0.9770689294608964
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    308362
           1       0.00      0.00      0.00      7236

    accuracy                           0.98    315598
   macro avg       0.49      0.50      0.49    315598
weighted avg       0.95      0.98      0.97    315598

... Processing agile
Test accuracy is 0.9999207853028219


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315573
           1       0.00      0.00      0.00        25

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing ajax
Test accuracy is 0.985275572088543
              precision    recall  f1-score   support

           0       0.99      1.00      0.99    310952
           1       0.00      0.00      0.00      4646

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.97      0.99      0.98    315598

... Processing amazon-web-services
Test accuracy is 0.9969739985677982


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314643
           1       0.00      0.00      0.00       955

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing android
Test accuracy is 0.9357251947097257
              precision    recall  f1-score   support

           0       0.93      1.00      0.97    288322
           1       0.99      0.26      0.41     27276

    accuracy                           0.94    315598
   macro avg       0.96      0.63      0.69    315598
weighted avg       0.94      0.94      0.92    315598

... Processing android-studio
Test accuracy is 0.9968028948218937


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314589
           1       0.00      0.00      0.00      1009

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing angular2
Test accuracy is 0.9977819884790144


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314898
           1       0.00      0.00      0.00       700

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing angularjs
Test accuracy is 0.9804244640333589
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    309420
           1       0.50      0.00      0.00      6178

    accuracy                           0.98    315598
   macro avg       0.74      0.50      0.50    315598
weighted avg       0.97      0.98      0.97    315598

... Processing apache
Test accuracy is 0.9936374755226586


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313590
           1       0.00      0.00      0.00      2008

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing apache-spark
Test accuracy is 0.9980576556251941


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314985
           1       0.00      0.00      0.00       613

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing api
Test accuracy is 0.9952344438177682


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314094
           1       0.00      0.00      0.00      1504

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing asp.net
Test accuracy is 0.97169183581645
              precision    recall  f1-score   support

           0       0.97      1.00      0.99    306664
           1       0.50      0.00      0.00      8934

    accuracy                           0.97    315598
   macro avg       0.74      0.50      0.49    315598
weighted avg       0.96      0.97      0.96    315598

... Processing asp.net-web-api
Test accuracy is 0.9981020158556138


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314999
           1       0.00      0.00      0.00       599

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing azure
Test accuracy is 0.9965747564940208


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314517
           1       0.00      0.00      0.00      1081

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing bash
Test accuracy is 0.9927534395021514


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313311
           1       0.00      0.00      0.00      2287

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing c
Test accuracy is 0.9781050576999855
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    308691
           1       0.00      0.00      0.00      6907

    accuracy                           0.98    315598
   macro avg       0.49      0.50      0.49    315598
weighted avg       0.96      0.98      0.97    315598

... Processing c#
Test accuracy is 0.903535510364451
              precision    recall  f1-score   support

           0       0.90      1.00      0.95    285147
           1       0.77      0.00      0.00     30451

    accuracy                           0.90    315598
   macro avg       0.8

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315459
           1       0.00      0.00      0.00       139

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing codeigniter
Test accuracy is 0.995411884739447


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314150
           1       0.00      0.00      0.00      1448

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing css
Test accuracy is 0.9599046888763554
              precision    recall  f1-score   support

           0       0.96      1.00      0.98    302936
           1       0.72      0.00      0.00     12662

    accuracy                           0.96    315598
   macro avg       0.84      0.50      0.49    315598
weighted avg       0.95      0.96      0.94    315598

... Processing devops
Test accuracy is 0.9999524711816932


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315583
           1       0.00      0.00      0.00        15

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing django
Test accuracy is 0.9877312277010627


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    311726
           1       0.00      0.00      0.00      3872

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.98      0.99      0.98    315598

... Processing docker
Test accuracy is 0.9985075951051654


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315127
           1       0.00      0.00      0.00       471

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing drupal
Test accuracy is 0.9983048054803896


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315063
           1       0.00      0.00      0.00       535

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing eclipse
Test accuracy is 0.9907889150121357


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    312691
           1       0.00      0.00      0.00      2907

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing elasticsearch
Test accuracy is 0.9979879466916773


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314963
           1       0.00      0.00      0.00       635

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing embedded
Test accuracy is 0.9994138112408824


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315413
           1       0.00      0.00      0.00       185

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing entity-framework
Test accuracy is 0.9942997103910671


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313799
           1       0.00      0.00      0.00      1799

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing excel
Test accuracy is 0.9900316225071135


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    312452
           1       0.00      0.00      0.00      3146

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing excel-vba
Test accuracy is 0.9949777881989113


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    314013
           1       0.00      0.00      0.00      1585

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing express
Test accuracy is 0.9974144322841083


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314782
           1       0.00      0.00      0.00       816

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing flask
Test accuracy is 0.9985931469781177


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315154
           1       0.00      0.00      0.00       444

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing git
Test accuracy is 0.9927977997325712


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313325
           1       0.00      0.00      0.00      2273

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing github
Test accuracy is 0.9980893415040653


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314995
           1       0.00      0.00      0.00       603

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing go
Test accuracy is 0.9982224221953244


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315037
           1       0.00      0.00      0.00       561

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing hadoop
Test accuracy is 0.9971134164348316


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314687
           1       0.00      0.00      0.00       911

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing haskell
Test accuracy is 0.9970405389134278


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314664
           1       0.00      0.00      0.00       934

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing hibernate
Test accuracy is 0.9945246801310528


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313870
           1       0.00      0.00      0.00      1728

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing html
Test accuracy is 0.9442360217745359
              precision    recall  f1-score   support

           0       0.94      1.00      0.97    297994
           1       0.63      0.00      0.00     17604

    accuracy                           0.94    315598
   macro avg       0.79      0.50      0.49    315598
weighted avg       0.93      0.94      0.92    315598

... Processing html5
Test accuracy is 0.9909726931095888


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    312749
           1       0.00      0.00      0.00      2849

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing ionic-framework
Test accuracy is 0.9985899783902306


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315153
           1       0.00      0.00      0.00       445

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing ios
Test accuracy is 0.9559217739022428
              precision    recall  f1-score   support

           0       0.96      1.00      0.98    301642
           1       0.78      0.00      0.01     13956

    accuracy                           0.96    315598
   macro avg       0.87      0.50      0.49    315598
weighted avg       0.95      0.96      0.93    315598

... Processing iphone
Test accuracy is 0.979676677291998
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    309185
           1       0.00      0.00      0.00      6413

    accuracy                           0.98    315598
   macro avg    

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314792
           1       0.00      0.00      0.00       806

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing javascript
Test accuracy is 0.8813237092757242
              precision    recall  f1-score   support

           0       0.88      1.00      0.94    278136
           1       0.83      0.00      0.00     37462

    accuracy                           0.88    315598
   macro avg       0.86      0.50      0.47    315598
weighted avg       0.88      0.88      0.83    315598

... Processing jenkins
Test accuracy is 0.9981717247891305


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315021
           1       0.00      0.00      0.00       577

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing jquery
Test accuracy is 0.9250977509363177
              precision    recall  f1-score   support

           0       0.93      1.00      0.96    291958
           1       0.60      0.00      0.00     23640

    accuracy                           0.93    315598
   macro avg       0.76      0.50      0.48    315598
weighted avg       0.90      0.93      0.89    315598

... Processing json
Test accuracy is 0.983124100913187
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    310273
           1       0.00      0.00      0.00      5325

    accuracy                           0.98    315598
   macro avg   

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314281
           1       0.00      0.00      0.00      1317

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing laravel
Test accuracy is 0.9952439495814296


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314097
           1       0.00      0.00      0.00      1501

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing less
Test accuracy is 0.9993979683014468


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315408
           1       0.00      0.00      0.00       190

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing linq
Test accuracy is 0.994220495693889


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313774
           1       0.00      0.00      0.00      1824

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing linux
Test accuracy is 0.987297131160527


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    311589
           1       0.00      0.00      0.00      4009

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.97      0.99      0.98    315598

... Processing machine-learning
Test accuracy is 0.9987674193119095


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315209
           1       0.00      0.00      0.00       389

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing matlab
Test accuracy is 0.9937959049170146


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313640
           1       0.00      0.00      0.00      1958

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing maven
Test accuracy is 0.9955481340185933


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314193
           1       0.00      0.00      0.00      1405

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing mongodb
Test accuracy is 0.9931621873395903


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313440
           1       0.00      0.00      0.00      2158

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing mysql
Test accuracy is 0.9593596917597703
              precision    recall  f1-score   support

           0       0.96      1.00      0.98    302773
           1       0.00      0.00      0.00     12825

    accuracy                           0.96    315598
   macro avg       0.48      0.50      0.49    315598
weighted avg       0.92      0.96      0.94    315598

... Processing nginx
Test accuracy is 0.997873877527741


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314927
           1       0.00      0.00      0.00       671

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing node.js
Test accuracy is 0.9864574553704396


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    311324
           1       0.00      0.00      0.00      4274

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.97      0.99      0.98    315598

... Processing objective-c
Test accuracy is 0.9746322853756995
              precision    recall  f1-score   support

           0       0.97      1.00      0.99    307593
           1       0.00      0.00      0.00      8005

    accuracy                           0.97    315598
   macro avg       0.49      0.50      0.49    315598
weighted avg       0.95      0.97      0.96    315598

... Processing oracle
Test accuracy is 0.9928960259570719


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313356
           1       0.00      0.00      0.00      2242

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing osx
Test accuracy is 0.9932984366187365


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313483
           1       0.00      0.00      0.00      2115

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing perl
Test accuracy is 0.9950696772476378


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314042
           1       0.00      0.00      0.00      1556

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing photoshop
Test accuracy is 0.9998130533146597


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315539
           1       0.00      0.00      0.00        59

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing php
Test accuracy is 0.9066629066090406
              precision    recall  f1-score   support

           0       0.91      1.00      0.95    285890
           1       0.97      0.01      0.02     29708

    accuracy                           0.91    315598
   macro avg       0.94      0.50      0.48    315598
weighted avg       0.91      0.91      0.86    315598

... Processing plsql
Test accuracy is 0.9987008789662799


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315188
           1       0.00      0.00      0.00       410

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing postgresql
Test accuracy is 0.994271193100083


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313790
           1       0.00      0.00      0.00      1808

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing powershell
Test accuracy is 0.9964036527481163


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314463
           1       0.00      0.00      0.00      1135

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing python
Test accuracy is 0.938690992972072
              precision    recall  f1-score   support

           0       0.94      1.00      0.97    296188
           1       0.94      0.00      0.01     19410

    accuracy                           0.94    315598
   macro avg       0.94      0.50      0.49    315598
weighted avg       0.94      0.94      0.91    315598

... Processing qt
Test accuracy is 0.9949841253746855


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    314015
           1       0.00      0.00      0.00      1583

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing r
Test accuracy is 0.9854086527798022
              precision    recall  f1-score   support

           0       0.99      1.00      0.99    310983
           1       0.92      0.00      0.00      4615

    accuracy                           0.99    315598
   macro avg       0.95      0.50      0.50    315598
weighted avg       0.98      0.99      0.98    315598

... Processing react-native
Test accuracy is 0.9992807305496233


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315371
           1       0.00      0.00      0.00       227

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing reactjs
Test accuracy is 0.9975443443874803


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314823
           1       0.00      0.00      0.00       775

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing redis
Test accuracy is 0.9990240749307664


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315290
           1       0.00      0.00      0.00       308

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing redux
Test accuracy is 0.9996229380414324


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315479
           1       0.00      0.00      0.00       119

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing regex
Test accuracy is 0.9854213271313507
              precision    recall  f1-score   support

           0       0.99      1.00      0.99    310998
           1       0.00      0.00      0.00      4600

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.97      0.99      0.98    315598

... Processing rest
Test accuracy is 0.9952376124056553


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314095
           1       0.00      0.00      0.00      1503

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing ruby
Test accuracy is 0.9838592133029994
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    310503
           1       1.00      0.00      0.00      5095

    accuracy                           0.98    315598
   macro avg       0.99      0.50      0.50    315598
weighted avg       0.98      0.98      0.98    315598

... Processing ruby-on-rails
Test accuracy is 0.9751867882559458
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    307768
           1       0.00      0.00      0.00      7830

    accuracy                           0.98    315598
   macr

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315302
           1       0.00      0.00      0.00       296

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing scala
Test accuracy is 0.9945848833009081


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313889
           1       0.00      0.00      0.00      1709

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing selenium
Test accuracy is 0.9966222853123277


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314532
           1       0.00      0.00      0.00      1066

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing shell
Test accuracy is 0.9952344438177682


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314094
           1       0.00      0.00      0.00      1504

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing spring
Test accuracy is 0.9906431599693281


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    312645
           1       0.00      0.00      0.00      2953

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing spring-boot
Test accuracy is 0.9987008789662799


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315188
           1       0.00      0.00      0.00       410

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing spring-mvc
Test accuracy is 0.9964892046210686


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314490
           1       0.00      0.00      0.00      1108

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing sql
Test accuracy is 0.9659250058618876
              precision    recall  f1-score   support

           0       0.97      1.00      0.98    304845
           1       0.00      0.00      0.00     10753

    accuracy                           0.97    315598
   macro avg       0.48      0.50      0.49    315598
weighted avg       0.93      0.97      0.95    315598

... Processing sql-server
Test accuracy is 0.9826868357847641
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    310135
           1       0.00      0.00      0.00      5463

    accuracy                           0.98    315598
   macro av

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    312095
           1       0.00      0.00      0.00      3503

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.98      0.99      0.98    315598

... Processing tdd
Test accuracy is 0.9995563976958028


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315458
           1       0.00      0.00      0.00       140

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing testing
Test accuracy is 0.9977091109576106


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314875
           1       0.00      0.00      0.00       723

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing twitter-bootstrap
Test accuracy is 0.9931843674548001


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313447
           1       0.00      0.00      0.00      2151

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing twitter-bootstrap-3
Test accuracy is 0.9984220432322132


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315100
           1       0.00      0.00      0.00       498

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing typescript
Test accuracy is 0.9983491657108093


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315077
           1       0.00      0.00      0.00       521

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing ubuntu
Test accuracy is 0.9969739985677982


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314643
           1       0.00      0.00      0.00       955

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing unity3d
Test accuracy is 0.9980671613888554


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314988
           1       0.00      0.00      0.00       610

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing unix
Test accuracy is 0.9969771671556854


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314644
           1       0.00      0.00      0.00       954

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      1.00    315598

... Processing vb.net
Test accuracy is 0.9901330173195014


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    312484
           1       0.00      0.00      0.00      3114

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing vba
Test accuracy is 0.9933554712007047


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313501
           1       0.00      0.00      0.00      2097

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing visual-studio
Test accuracy is 0.9941824726392436


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313762
           1       0.00      0.00      0.00      1836

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing vue.js
Test accuracy is 0.999844739193531


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315549
           1       0.00      0.00      0.00        49

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing wcf
Test accuracy is 0.9955956628369002


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314208
           1       0.00      0.00      0.00      1390

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      1.00      0.99    315598

... Processing web-services
Test accuracy is 0.9949175850290559


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    313994
           1       0.00      0.00      0.00      1604

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.99      0.99      0.99    315598

... Processing windows
Test accuracy is 0.9904181902293424


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    312574
           1       0.00      0.00      0.00      3024

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing wordpress
Test accuracy is 0.9906463285572152


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      1.00    312646
           1       0.00      0.00      0.00      2952

    accuracy                           0.99    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       0.98      0.99      0.99    315598

... Processing wpf
Test accuracy is 0.9883427651632773


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    311919
           1       0.00      0.00      0.00      3679

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.98      0.99      0.98    315598

... Processing xamarin
Test accuracy is 0.9983840201775677


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315088
           1       0.00      0.00      0.00       510

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing xcode
Test accuracy is 0.9899333962826127


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    312421
           1       0.00      0.00      0.00      3177

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.98      0.99      0.98    315598

... Processing xml
Test accuracy is 0.9863402176186161
              precision    recall  f1-score   support

           0       0.99      1.00      0.99    311288
           1       0.00      0.00      0.00      4310

    accuracy                           0.99    315598
   macro avg       0.49      0.50      0.50    315598
weighted avg       0.97      0.99      0.98    315598

... Processing .net
... Processing agile
... Processing ajax
... Processing amazon-web-services
... Processing android
... Processing android-studio
... Processing angular2
... Processing angularjs
... Processing apache
... Processing apache-spark
... Processing api
... Processing asp.net
.

# Let us train, score and evaluate Support Vector Machines

In [16]:
#SVM Classifier
SVC_pipeline = Pipeline([
                ('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
            ])

tag_level_training_pipeline(X_train_dtm, train, X_test_dtm, test, SVC_pipeline, 'SVM/')
result = tag_level_predict(X_train_dtm, train, X_test_dtm, test, 'SVM/')
model_evaluation_stats(result, "SVM")

... Processing .net
Test accuracy is 0.9771893358006071
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    308362
           1       0.51      0.09      0.15      7236

    accuracy                           0.98    315598
   macro avg       0.75      0.54      0.57    315598
weighted avg       0.97      0.98      0.97    315598

... Processing agile
Test accuracy is 0.9999429654180318
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315573
           1       0.89      0.32      0.47        25

    accuracy                           1.00    315598
   macro avg       0.94      0.66      0.74    315598
weighted avg       1.00      1.00      1.00    315598

... Processing ajax
Test accuracy is 0.9887356700612805
              precision    recall  f1-score   support

           0       0.99      1.00      0.99    310952
           1       0.70      0.41      0.52      4646

    accuracy 

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315583
           1       0.00      0.00      0.00        15

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing django
Test accuracy is 0.9972718458291878
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    311726
           1       0.96      0.81      0.88      3872

    accuracy                           1.00    315598
   macro avg       0.98      0.90      0.94    315598
weighted avg       1.00      1.00      1.00    315598

... Processing docker
Test accuracy is 0.9996546239203037
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315127
           1       0.93      0.83      0.88       471

    accuracy                           1.00    315598
   macro avg

# Let us train, score and evaluate Logistic Regression

In [0]:
#Logistic Regression Classifier
LogReg_pipeline = Pipeline([
                ('clf', OneVsRestClassifier(LogisticRegression(solver='sag'), n_jobs=1)),
            ])

tag_level_training_pipeline(X_train_dtm, train, X_test_dtm, test, LogReg_pipeline, 'LogisticRegression/')
result = tag_level_predict(X_train_dtm, train, X_test_dtm, test, 'LogisticRegression/')
model_evaluation_stats(result, "LogisticRegression")

... Processing .net
Test accuracy is 0.9771069525155419
              precision    recall  f1-score   support

           0       0.98      1.00      0.99    308362
           1       0.50      0.08      0.15      7236

    accuracy                           0.98    315598
   macro avg       0.74      0.54      0.57    315598
weighted avg       0.97      0.98      0.97    315598

... Processing agile
Test accuracy is 0.9999207853028219


  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315573
           1       0.00      0.00      0.00        25

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing ajax
Test accuracy is 0.9888465706373297
              precision    recall  f1-score   support

           0       0.99      1.00      0.99    310952
           1       0.70      0.42      0.53      4646

    accuracy                           0.99    315598
   macro avg       0.85      0.71      0.76    315598
weighted avg       0.99      0.99      0.99    315598

... Processing amazon-web-services
Test accuracy is 0.9979562608128062
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    314643
           1       0.79      0.45      0.57       955

    accuracy                           1.00    315598
 

  'precision', 'predicted', average, warn_for)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315583
           1       0.00      0.00      0.00        15

    accuracy                           1.00    315598
   macro avg       0.50      0.50      0.50    315598
weighted avg       1.00      1.00      1.00    315598

... Processing django
Test accuracy is 0.9966412968396504
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    311726
           1       0.96      0.75      0.85      3872

    accuracy                           1.00    315598
   macro avg       0.98      0.88      0.92    315598
weighted avg       1.00      1.00      1.00    315598

... Processing docker
Test accuracy is 0.999534217580593
              precision    recall  f1-score   support

           0       1.00      1.00      1.00    315127
           1       0.95      0.72      0.82       471

    accuracy                           1.00    315598
   macro avg 