### Naive Bayes
We will investigate Naive Bayes methods since we assume conditional independence between every pair of features (words) given the class.

Going to do Bernoulli and Multinomial NB as is common for text classification. Not going to do Gaussian NB bc not continuous. Going to try complement naive Bayes (CNB) algorithm, an adaptation of the standard multinomial naive Bayes (MNB) algorithm that is particularly suited for imbalanced data sets (https://scikit-learn.org/stable/modules/naive_bayes.html) on the not downsampled data.

Going to use RandomizedSearch instead of GridSearchCV which "can be computationally expensive, especially if you are searching over a large hyperparameter space and dealing with multiple hyperparameters. A solution to this is to use RandomizedSearchCV, in which not all hyperparameter values are tried out. Instead, a fixed number of hyperparameter settings is sampled from specified probability distributions."

In [1]:
import pickle
import pandas as pd
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.utils.fixes import loguniform
from sklearn.metrics import average_precision_score, roc_auc_score, precision_score, recall_score
import random
random.seed(22)

In [2]:
# Importing labels
with open('../data/train_labels.pckl', 'rb') as f:
    train_labels = pickle.load(f)

with open('../data/dev_labels.pckl', 'rb') as f:
    dev_labels = pickle.load(f)

In [3]:
def get_data(dataset, vectorizer):
    '''
    returns feature matrix for specified dataset and vectorizer
    @param dataset: string specifying dataset, "train","dev",etc
    @param vectorizer: string specifying vectorizer "binary","count",etc

    '''
    with open(f'../data/{dataset}_{vectorizer}_subsampled_data.pckl', 'rb') as f:
        return pickle.load(f)

### Multinomial
"Empirical comparisons provide evidence that the multinomial model tends to outperform the multi-variate Bernoulli model if the vocabulary size is relatively large [13]. However, the performance of machine learning algorithms is highly dependent on the appropriate choice of features. In the case of naive Bayes classifiers and text classification, large differences in performance can be attributed to the choices of stop word removal, stemming, and token-length [14]." (Citation: https://sebastianraschka.com/Articles/2014_naive_bayes_1.html) 

In [7]:
from sklearn.naive_bayes import MultinomialNB

vectorizers = ['count', 'tfidf', 'binary']

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data('train', vectorizer)
    dev = get_data('dev', vectorizer)

    nb_multi = MultinomialNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_multi, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels)
    
    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)
    
    print(random_search.best_params_)
    
    nb_train_auc = roc_auc_score(train_labels, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels, nb_train)
    nb_train_prec = precision_score(train_labels, nb_train)
    nb_dev_auc = roc_auc_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels, nb_dev)
    nb_dev_prec = precision_score(dev_labels, nb_dev)

    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
{'alpha': 0.2733556118022408}
Train AUC:        0.9991
Train AP:         0.9991
Train Precision:  0.9832
Train Recall:     0.9914
Dev   AUC:        0.7248
Dev   AP:         0.2031
Dev   Precision:  0.1930
Dev   Recall:     0.6143
-----  tfidf  -----
{'alpha': 0.17693089816649998}
Train AUC:        0.9678
Train AP:         0.9810
Train Precision:  0.9831
Train Recall:     0.9428
Dev   AUC:        0.7499
Dev   AP:         0.2294
Dev   Precision:  0.1920
Dev   Recall:     0.7297
-----  binary  -----
{'alpha': 0.2733556118022408}
Train AUC:        0.9991
Train AP:         0.9991
Train Precision:  0.9837
Train Recall:     0.9916
Dev   AUC:        0.7250
Dev   AP:         0.2032
Dev   Precision:  0.1905
Dev   Recall:     0.6225


In [5]:
# Looking at a sample confusion matrix
# https://stats.stackexchange.com/questions/95209/how-can-i-interpret-sklearn-confusion-matrix
y_true = pd.Series(dev_labels)
y_pred = pd.Series(nb_dev)

pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)

# < 10% of positive cases classified correctly, ~98% of negative class classified correctly

Predicted,0,1,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,22618,9652,32270
1,1377,2271,3648
All,23995,11923,35918


### Bernoulli

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with Naive Bayes – Which Naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

In [4]:
from sklearn.naive_bayes import BernoulliNB

vectorizers = ['count', 'tfidf', 'binary']

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data('train', vectorizer)
    dev = get_data('dev', vectorizer)

    nb_bern = BernoulliNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_bern, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels)
    
    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)
    
    print(random_search.best_params_)
    
    nb_train_auc = roc_auc_score(train_labels, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels, nb_train)
    nb_train_prec = precision_score(train_labels, nb_train)
    nb_dev_auc = roc_auc_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels, nb_dev)
    nb_dev_prec = precision_score(dev_labels, nb_dev)


    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
{'alpha': 0.0007614091062416327}
Train AUC:        0.9899
Train AP:         0.9801
Train Precision:  0.9518
Train Recall:     1.0000
Dev   AUC:        0.6734
Dev   AP:         0.1608
Dev   Precision:  0.1442
Dev   Recall:     0.7733
-----  tfidf  -----
{'alpha': 0.0007614091062416327}
Train AUC:        0.9899
Train AP:         0.9801
Train Precision:  0.9518
Train Recall:     1.0000
Dev   AUC:        0.6734
Dev   AP:         0.1608
Dev   Precision:  0.1442
Dev   Recall:     0.7733
-----  binary  -----
{'alpha': 0.0007614091062416327}
Train AUC:        0.9899
Train AP:         0.9801
Train Precision:  0.9518
Train Recall:     1.0000
Dev   AUC:        0.6734
Dev   AP:         0.1608
Dev   Precision:  0.1442
Dev   Recall:     0.7733


## Non-downsampled data explorations

### Complement NB
The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. [sklearn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.ComplementNB.html#sklearn.naive_bayes.ComplementNB)

<b>This was done with non-downsampled data that I generated by: not running downsample.ipynb then editing file names and re-running vectorize-count.ipynb and concat-features.ipynb.</b>

In [6]:
def get_data_not_ds(dataset, vectorizer):
    '''
    get data that is not downsampled
    @param dataset: string specifying dataset, "train","dev",etc
    @param vectorizer: string specifying vectorizer "binary","count",etc

    '''
    with open(f'../data/{dataset}_{vectorizer}_NOT_downsampled_data.pckl', 'rb') as f:
        return pickle.load(f)

In [7]:
# Importing labels NOT DOWNSAMPLED

with open('../data/train_labels_nods.pckl', 'rb') as f:
    train_labels_not_ds = pickle.load(f)

with open('../data/dev_labels_nods.pckl', 'rb') as f:
    dev_labels_not_ds = pickle.load(f)

In [10]:
from sklearn.naive_bayes import ComplementNB

vectorizers = ['count', 'tfidf', 'binary']

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data_not_ds('train', vectorizer)
    dev = get_data_not_ds('dev', vectorizer)

    nb_comp = ComplementNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_comp, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels_not_ds)
    
    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)
    
    print(random_search.best_params_)
    
    nb_train_auc = roc_auc_score(train_labels_not_ds, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels_not_ds, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels_not_ds, nb_train)
    nb_train_prec = precision_score(train_labels_not_ds, nb_train, zero_division=0)
    nb_dev_auc = roc_auc_score(dev_labels_not_ds, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels_not_ds, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels_not_ds, nb_dev)
    nb_dev_prec = precision_score(dev_labels_not_ds, nb_dev, zero_division=0)
    
    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
{'alpha': 0.2733556118022408}
Train AUC:        0.9976
Train AP:         0.9854
Train Precision:  0.9894
Train Recall:     0.9120
Dev   AUC:        0.7008
Dev   AP:         0.2143
Dev   Precision:  0.4737
Dev   Recall:     0.0148
-----  tfidf  -----
{'alpha': 0.2733556118022408}
Train AUC:        0.8328
Train AP:         0.5154
Train Precision:  1.0000
Train Recall:     0.0222
Dev   AUC:        0.7182
Dev   AP:         0.2085
Dev   Precision:  0.0000
Dev   Recall:     0.0000
-----  binary  -----
{'alpha': 0.2733556118022408}
Train AUC:        0.9977
Train AP:         0.9859
Train Precision:  0.9909
Train Recall:     0.9101
Dev   AUC:        0.7016
Dev   AP:         0.2142
Dev   Precision:  0.4673
Dev   Recall:     0.0137


### Trying other 2 NB models with not downsampled data
Curious if downsampling worsens performance with other models too.
#### Bernoulli

In [11]:
# Bernoulli with raw (non-downsampled) data
vectorizers = ['count', 'tfidf', 'binary'] # 'hashing', 'hashing_binary'

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data_not_ds('train', vectorizer)
    dev = get_data_not_ds('dev', vectorizer)

    nb_bern = BernoulliNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_bern, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels_not_ds)
    
    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)

    nb_train_auc = roc_auc_score(train_labels_not_ds, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels_not_ds, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels_not_ds, nb_train)
    nb_train_prec = precision_score(train_labels_not_ds, nb_train, zero_division=0)
    nb_dev_auc = roc_auc_score(dev_labels_not_ds, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels_not_ds, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels_not_ds, nb_dev)
    nb_dev_prec = precision_score(dev_labels_not_ds, nb_dev, zero_division=0)
    
    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
Train AUC:        0.9844
Train AP:         0.9115
Train Precision:  0.9786
Train Recall:     0.5183
Dev   AUC:        0.7127
Dev   AP:         0.2142
Dev   Precision:  0.5641
Dev   Recall:     0.0060
-----  tfidf  -----
Train AUC:        0.9844
Train AP:         0.9115
Train Precision:  0.9786
Train Recall:     0.5183
Dev   AUC:        0.7127
Dev   AP:         0.2142
Dev   Precision:  0.5641
Dev   Recall:     0.0060
-----  binary  -----
Train AUC:        0.9844
Train AP:         0.9115
Train Precision:  0.9786
Train Recall:     0.5183
Dev   AUC:        0.7127
Dev   AP:         0.2142
Dev   Precision:  0.5641
Dev   Recall:     0.0060


#### Multinomial

In [13]:
from sklearn.naive_bayes import MultinomialNB

# Multinomial with raw (non-downsampled) data
vectorizers = ['count', 'tfidf', 'binary']

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data_not_ds('train', vectorizer)
    dev = get_data_not_ds('dev', vectorizer)

    nb_multi = MultinomialNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_multi, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels_not_ds)

    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)
    
    nb_train_auc = roc_auc_score(train_labels_not_ds, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels_not_ds, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels_not_ds, nb_train)
    nb_train_prec = precision_score(train_labels_not_ds, nb_train, zero_division=0)
    nb_dev_auc = roc_auc_score(dev_labels_not_ds, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels_not_ds, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels_not_ds, nb_dev)
    nb_dev_prec = precision_score(dev_labels_not_ds, nb_dev, zero_division=0)
    
    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
Train AUC:        0.9976
Train AP:         0.9855
Train Precision:  0.9934
Train Recall:     0.8907
Dev   AUC:        0.7008
Dev   AP:         0.2143
Dev   Precision:  0.5172
Dev   Recall:     0.0123
-----  tfidf  -----
Train AUC:        0.8794
Train AP:         0.6699
Train Precision:  1.0000
Train Recall:     0.0357
Dev   AUC:        0.7231
Dev   AP:         0.2141
Dev   Precision:  1.0000
Dev   Recall:     0.0003
-----  binary  -----
Train AUC:        0.9977
Train AP:         0.9860
Train Precision:  0.9938
Train Recall:     0.8888
Dev   AUC:        0.7016
Dev   AP:         0.2141
Dev   Precision:  0.5244
Dev   Recall:     0.0118


## Upsampled data explorations

#### Labels and vectorizers were created using Upsample.ipynb. 

In [14]:
# Importing labels
with open('../data/train_labels_us.pckl', 'rb') as f:
    train_labels = pickle.load(f)

with open('../data/dev_labels_us.pckl', 'rb') as f:
    dev_labels = pickle.load(f)

In [15]:
def get_data(dataset, vectorizer):
    '''
    returns feature matrix for specified dataset and vectorizer
    @param dataset: string specifying dataset, "train", "dev", etc
    @param vectorizer: string specifying vectorizer "binary", "count", etc

    '''
    with open(f'../data/{dataset}_{vectorizer}_subsampled_data_us.pckl', 'rb') as f:
        return pickle.load(f)

In [16]:
from sklearn.naive_bayes import MultinomialNB

vectorizers = ['count', 'tfidf', 'binary']

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data('train', vectorizer)
    dev = get_data('dev', vectorizer)

    nb_multi = MultinomialNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_multi, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels)
    
    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)
    
    print(random_search.best_params_)
        
    nb_train_auc = roc_auc_score(train_labels, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels, nb_train)
    nb_train_prec = precision_score(train_labels, nb_train)
    nb_dev_auc = roc_auc_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels, nb_dev)
    nb_dev_prec = precision_score(dev_labels, nb_dev)
    
    
    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
{'alpha': 0.00048377811104736193}
Train AUC:        0.9995
Train AP:         0.9993
Train Precision:  0.9913
Train Recall:     0.9965
Dev   AUC:        0.6769
Dev   AP:         0.1916
Dev   Precision:  0.2964
Dev   Recall:     0.0800
-----  tfidf  -----
{'alpha': 0.00048377811104736193}
Train AUC:        0.9992
Train AP:         0.9993
Train Precision:  0.9917
Train Recall:     0.9903
Dev   AUC:        0.7199
Dev   AP:         0.2120
Dev   Precision:  0.2963
Dev   Recall:     0.0973
-----  binary  -----
{'alpha': 0.00048377811104736193}
Train AUC:        0.9995
Train AP:         0.9993
Train Precision:  0.9913
Train Recall:     0.9965
Dev   AUC:        0.6758
Dev   AP:         0.1913
Dev   Precision:  0.2958
Dev   Recall:     0.0800


In [18]:
from sklearn.naive_bayes import BernoulliNB

vectorizers = ['count', 'tfidf', 'binary']

# specify parameters and distributions to sample from
param_dist = {'alpha': loguniform(1e-4, 1e0)}

for vectorizer in vectorizers:
    print('----- ', vectorizer, ' -----')
    train = get_data('train', vectorizer)
    dev = get_data('dev', vectorizer)

    nb_bern = BernoulliNB()  
        
    # run randomized search
    random_search = RandomizedSearchCV(nb_bern, param_distributions=param_dist, random_state=22)
    
    random_search.fit(train, train_labels)
    
    nb_train = random_search.predict(train)
    nb_dev = random_search.predict(dev)
    nb_train_proba = random_search.predict_proba(train)
    nb_dev_proba = random_search.predict_proba(dev)
    
    print(random_search.best_params_)
    
    nb_train_auc = roc_auc_score(train_labels, nb_train_proba[:, 1])
    nb_train_ap = average_precision_score(train_labels, nb_train_proba[:, 1])
    nb_train_recall = recall_score(train_labels, nb_train)
    nb_train_prec = precision_score(train_labels, nb_train)
    nb_dev_auc = roc_auc_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_ap = average_precision_score(dev_labels, nb_dev_proba[:, 1])
    nb_dev_recall = recall_score(dev_labels, nb_dev)
    nb_dev_prec = precision_score(dev_labels, nb_dev)

    print(f'Train AUC:        {nb_train_auc:.4f}\n'
          f'Train AP:         {nb_train_ap:.4f}\n'
          f'Train Precision:  {nb_train_prec:.4f}\n'
          f'Train Recall:     {nb_train_recall:.4f}\n'
          f'Dev   AUC:        {nb_dev_auc:.4f}\n'
          f'Dev   AP:         {nb_dev_ap:.4f}\n'
          f'Dev   Precision:  {nb_dev_prec:.4f}\n'
          f'Dev   Recall:     {nb_dev_recall:.4f}')

-----  count  -----
{'alpha': 0.00048377811104736193}
Train AUC:        0.9900
Train AP:         0.9804
Train Precision:  0.9550
Train Recall:     1.0000
Dev   AUC:        0.6658
Dev   AP:         0.1764
Dev   Precision:  0.1985
Dev   Recall:     0.3152
-----  tfidf  -----
{'alpha': 0.00048377811104736193}
Train AUC:        0.9900
Train AP:         0.9804
Train Precision:  0.9550
Train Recall:     1.0000
Dev   AUC:        0.6658
Dev   AP:         0.1764
Dev   Precision:  0.1985
Dev   Recall:     0.3152
-----  binary  -----
{'alpha': 0.00048377811104736193}
Train AUC:        0.9900
Train AP:         0.9804
Train Precision:  0.9550
Train Recall:     1.0000
Dev   AUC:        0.6658
Dev   AP:         0.1764
Dev   Precision:  0.1985
Dev   Recall:     0.3152
