# InferSent
[Facebook's InferSent](https://github.com/facebookresearch/InferSent) is described as the following: 
> InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language inference data and generalizes well to many different tasks.


In [1]:
import pandas as pd
from data_loader import DataLoader

from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.svm import LinearSVC

#### Data loading class for interaction with the data

In [2]:
data = DataLoader()
infersent = data.get_infersent()

infersent['train'].head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,4087,4088,4089,4090,4091,4092,4093,4094,4095,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2635.json,0.020727,-0.029814,0.067043,0.031574,0.001203,-0.02519,-0.012615,-0.008706,-0.030712,-0.029765,...,0.014876,-0.001332,-0.022097,0.000505,0.005025,-0.024432,-0.041104,-0.00621,-0.009475,false
10540.json,0.032673,-0.018673,0.061216,0.031574,0.001203,-0.007593,-0.021861,-0.008706,-0.031638,-0.029765,...,0.014876,-0.001332,-0.022097,0.011968,0.007136,-0.024432,-0.041104,-0.00621,-0.000431,half-true
324.json,0.014887,-0.019057,0.05771,0.031574,0.001203,-0.022052,-0.000649,-0.008706,-0.031638,-0.029765,...,0.014876,-0.001332,-0.022097,0.00284,0.006011,-0.024432,-0.041104,-0.00621,-0.011754,mostly-true
1123.json,0.017577,-0.024319,0.069689,0.031574,0.001203,-0.023412,-0.015143,-0.008706,-0.023898,-0.029765,...,0.014876,-0.001332,-0.022097,0.000505,0.005025,-0.024432,-0.041104,-0.00621,0.001519,false
9028.json,0.032344,-0.023073,0.077455,0.031574,0.006226,-0.02519,-0.008017,-0.008706,-0.031638,-0.029765,...,0.014876,-0.001332,-0.022097,0.000505,0.005025,-0.024432,-0.041104,-0.00621,-0.010983,half-true


In [3]:
X_train, y_train =  infersent['train'].drop(['label'], axis = 1), infersent['train']['label']
X_test, y_test = infersent['test'].drop(['label'], axis = 1), infersent['test']['label']
X_validation, y_validation = infersent['validation'].drop(['label'], axis = 1), infersent['validation']['label']

<hr>
## Applying InferSent
### Logistic Regression

In [6]:
clf = make_pipeline(GridSearchCV(LogisticRegression(random_state = 2),
                     param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100]},
                     cv = 10,
                     refit = True))

# Fit grid search
best_model = clf.fit(X_train, y_train)

# Extract the best hyperparameters
print('Best Penalty:', best_model.get_params()['gridsearchcv'].best_estimator_.get_params()['penalty'])
print('Best C:', best_model.get_params()['gridsearchcv'].best_estimator_.get_params()['C'])
print('Test score accuracy:', best_model.score(X_test, y_test))
print('Validation score accuracy:', best_model.score(X_validation, y_validation))

Best Penalty: l2
Best C: 10
Test score accuracy: 0.24388318863456984
Validation score accuracy: 0.2468847352024922


### SVM

In [5]:
clf = make_pipeline(GridSearchCV(LinearSVC(random_state = 2),
                     param_grid = {'C': [0.001, 0.01, 0.1, 1, 10]},
                     cv = 10,
                     refit = True,
                     scoring = 'accuracy',
                     verbose = 10))

# Fit grid search
best_model = clf.fit(X_train, y_train)

# Extract the best hyperparameters
print('Best Penalty:', best_model.get_params()['gridsearchcv'].best_estimator_.get_params()['penalty'])
print('Best C:', best_model.get_params()['gridsearchcv'].best_estimator_.get_params()['C'])
print('Test score accuracy:', best_model.score(X_test, y_test))
print('Validation score accuracy:', best_model.score(X_validation, y_validation))

Fitting 10 folds for each of 6 candidates, totalling 60 fits
[CV] C=0.001 .........................................................
[CV] ............... C=0.001, score=0.22687439143135346, total=   4.5s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    4.7s remaining:    0.0s


[CV] ............... C=0.001, score=0.21226874391431352, total=   4.3s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    9.2s remaining:    0.0s


[CV] ............... C=0.001, score=0.22514619883040934, total=   4.4s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   13.7s remaining:    0.0s


[CV] ............... C=0.001, score=0.22417153996101363, total=   4.4s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   18.3s remaining:    0.0s


[CV] ...................... C=0.001, score=0.2294921875, total=   4.3s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   22.8s remaining:    0.0s


[CV] ............... C=0.001, score=0.22580645161290322, total=   4.5s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:   27.5s remaining:    0.0s


[CV] ................ C=0.001, score=0.2299412915851272, total=   4.3s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   32.0s remaining:    0.0s


[CV] ............... C=0.001, score=0.21330724070450097, total=   4.3s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   36.5s remaining:    0.0s


[CV] ............... C=0.001, score=0.22211350293542073, total=   4.3s
[CV] C=0.001 .........................................................


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:   40.9s remaining:    0.0s


[CV] ............... C=0.001, score=0.22820763956904996, total=   4.4s
[CV] C=0.01 ..........................................................
[CV] ................. C=0.01, score=0.2385589094449854, total=   5.3s
[CV] C=0.01 ..........................................................
[CV] ................ C=0.01, score=0.22590068159688412, total=   5.4s
[CV] C=0.01 ..........................................................
[CV] ................ C=0.01, score=0.23976608187134502, total=   5.3s
[CV] C=0.01 ..........................................................
[CV] ................ C=0.01, score=0.23976608187134502, total=   5.3s
[CV] C=0.01 ..........................................................
[CV] ....................... C=0.01, score=0.2548828125, total=   5.4s
[CV] C=0.01 ..........................................................
[CV] ................ C=0.01, score=0.25317693059628543, total=   5.3s
[CV] C=0.01 ..........................................................
[CV] .

KeyboardInterrupt: 