# ☣️ Jigsaw - Early Ensemble


&nbsp;
&nbsp;
&nbsp;

# If it is interesting, you like it, or you fork...
# 🙌🙏 Please, _DO_ upvote !! 🙏🙌


&nbsp;
&nbsp;
&nbsp;
---


## Simple weighted sum of the following 7 public models:

It got a much higher LB than I was expecting. It might be overfitting badly so... be aware of that...

|Number |Model| Author/s| LB |
|--|--|--|--:|
|1 |[[0.816] Jigsaw Inference](https://www.kaggle.com/debarshichanda/0-816-jigsaw-inference)| [Debarshi Chanda](https://www.kaggle.com/debarshichanda)|`0.816`|
|2 |[JRSoTC - RidgeRegression (ensemble of 3)](https://www.kaggle.com/adityasharma01/jrsotc-ridgeregression-ensemble-of-3)|[steubk](https://www.kaggle.com/steubk) / [Aditya Sharma](https://www.kaggle.com/adityasharma01/) |`0.825`|
|3 | [Pytorch RoBERTa Ranking Baseline JRSTC [Infer]](https://www.kaggle.com/manabendrarout/pytorch-roberta-ranking-baseline-jrstc-infer) | [Manav](https://www.kaggle.com/manabendrarout)|`0.807` |
|4 | [JRSTC \| INFER \| LB : 0.806 🎃](https://www.kaggle.com/kishalmandal/jrstc-infer-lb-0-806)|[Kishal](https://www.kaggle.com/kishalmandal)|`0.806`|
|5 |[☣️ Jigsaw - 🤗 HF hub out-of-the-box models](https://www.kaggle.com/julian3833/jigsaw-hf-hub-out-of-the-box-models)|[dataista0 (Julián Peller)](https://www.kaggle.com/julian3833/) |`0.782`|
|6 |[☣️ Jigsaw - Incredibly Simple Naive Bayes [0.768]](https://www.kaggle.com/julian3833/jigsaw-incredibly-simple-naive-bayes-0-768)|[dataista0 (Julián Peller)](https://www.kaggle.com/julian3833/)|`0.768`|
|7 |[*#&@ the Benchmark [0.81+] - TFIDF - Ridge](https://www.kaggle.com/samarthagarwal23/the-benchmark-0-81-tfidf-ridge)|[Samarth Agarwal](https://www.kaggle.com/samarthagarwal23) |`0.812`|

---

## Changelog

|Best|Version | Description | LB |
|--|-- | -- | --: |
||V1 | Model 1 + Model 2. Linear weighting of MinMax-ed predictions | `0.823` |
||[**V2**](https://www.kaggle.com/julian3833/jigsaw-early-ensemble-lb-0-829?scriptVersionId=80257580) | Model 1 + Model 2. Rank as sum of ranks | `0.828` |
|| [**V4**](https://www.kaggle.com/julian3833/jigsaw-early-ensemble-lb-0-829?scriptVersionId=80260646) | Model 1 + 0.5 * Model 2. Combination of ranks (as V2) | `0.825` |
|| [**V5**](https://www.kaggle.com/julian3833/jigsaw-early-ensemble-lb-0-829?scriptVersionId=80272370) | 7 models. Weighted combination of ranks. Weights=`[1, 1, 1, 1, 0.5, 0.5, 1]` | `0.829` |
|_Best_| [**V7**](https://www.kaggle.com/julian3833/jigsaw-early-ensemble-lb-0-836?scriptVersionId=80279395) | 7 models (as V5). Weights=`[1.25, 1.5, 1.25, 1, 0.25, 0.25, 1]` | `0.836` |
|_Best_| [**V8**](https://www.kaggle.com/julian3833/jigsaw-early-ensemble-lb-0-836?scriptVersionId=80279726) / [**Current**](https://www.kaggle.com/julian3833/jigsaw-early-ensemble-lb-0-836) | w=`[0.39, 0.39, 0.06, 0.06, 0.02, 0.02, 0.06]` | `0.836` |


# Model 1: [[0.816] Jigsaw Inference](https://www.kaggle.com/debarshichanda/0-816-jigsaw-inference)
`LB=0.816`

In [None]:
%%time

import os
import gc
import cv2
import copy
import time
import random

# For data manipulation
import numpy as np
import pandas as pd

# Pytorch Imports
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# For Transformer Models
from transformers import AutoTokenizer, AutoModel

# Utils
from tqdm import tqdm

# For descriptive error messages
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

CONFIG = dict(
    seed = 42,
    model_name = '../input/roberta-base',
    test_batch_size = 64,
    max_length = 128,
    num_classes = 1,
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
)

CONFIG["tokenizer"] = AutoTokenizer.from_pretrained(CONFIG['model_name'])

MODEL_PATHS = [
    '../input/pytorch-w-b-jigsaw-starter/Loss-Fold-0.bin',
    '../input/pytorch-w-b-jigsaw-starter/Loss-Fold-1.bin',
    '../input/pytorch-w-b-jigsaw-starter/Loss-Fold-2.bin',
    '../input/pytorch-w-b-jigsaw-starter/Loss-Fold-3.bin',
    '../input/pytorch-w-b-jigsaw-starter/Loss-Fold-4.bin'
]

def set_seed(seed = 42):
    '''Sets the seed of the entire notebook so results are the same every time we run.
    This is for REPRODUCIBILITY.'''
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    # When running on the CuDNN backend, two further options must be set
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    # Set a fixed value for the hash seed
    os.environ['PYTHONHASHSEED'] = str(seed)
    
    
class JigsawDataset(Dataset):
    def __init__(self, df, tokenizer, max_length):
        self.df = df
        self.max_len = max_length
        self.tokenizer = tokenizer
        self.text = df['text'].values
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        text = self.text[index]
        inputs = self.tokenizer.encode_plus(
                        text,
                        truncation=True,
                        add_special_tokens=True,
                        max_length=self.max_len,
                        padding='max_length'
                    )
        
        ids = inputs['input_ids']
        mask = inputs['attention_mask']        
        
        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long)
        }    

    
class JigsawModel(nn.Module):
    def __init__(self, model_name):
        super(JigsawModel, self).__init__()
        self.model = AutoModel.from_pretrained(model_name)
        self.drop = nn.Dropout(p=0.2)
        self.fc = nn.Linear(768, CONFIG['num_classes'])
        
    def forward(self, ids, mask):        
        out = self.model(input_ids=ids,attention_mask=mask,
                         output_hidden_states=False)
        out = self.drop(out[1])
        outputs = self.fc(out)
        return outputs
    
@torch.no_grad()
def valid_fn(model, dataloader, device):
    model.eval()
    
    dataset_size = 0
    running_loss = 0.0
    
    PREDS = []
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        
        outputs = model(ids, mask)
        PREDS.append(outputs.view(-1).cpu().detach().numpy()) 
    
    PREDS = np.concatenate(PREDS)
    gc.collect()
    
    return PREDS


def inference(model_paths, dataloader, device):
    final_preds = []
    for i, path in enumerate(model_paths):
        model = JigsawModel(CONFIG['model_name'])
        model.to(CONFIG['device'])
        model.load_state_dict(torch.load(path))
        
        print(f"Getting predictions for model {i+1}")
        preds = valid_fn(model, dataloader, device)
        final_preds.append(preds)
    
    final_preds = np.array(final_preds)
    final_preds = np.mean(final_preds, axis=0)
    return final_preds


set_seed(CONFIG['seed'])
df = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")
df.head()

test_dataset = JigsawDataset(df, CONFIG['tokenizer'], max_length=CONFIG['max_length'])
test_loader = DataLoader(test_dataset, batch_size=CONFIG['test_batch_size'],
                         num_workers=2, shuffle=False, pin_memory=True)

preds1 = inference(MODEL_PATHS, test_loader, CONFIG['device'])

In [None]:
preds1[:5]

# Model 2: [JRSoTC - RidgeRegression (ensemble of 3)](https://www.kaggle.com/adityasharma01/jrsotc-ridgeregression-ensemble-of-3)
`LB=0.825`

In [None]:
%%time
import pandas as pd
import numpy as np

from sklearn.linear_model import Ridge
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import mean_squared_error
from scipy.stats import rankdata

def ridge_cv (vec, X, y, X_test, folds, stratified ):
    kf = StratifiedKFold(n_splits=FOLDS,shuffle=True,random_state=12)
    val_scores = []
    rmse_scores = []
    X_less_toxics = []
    X_more_toxics = []

    preds = []
    for fold, (train_index,val_index) in enumerate(kf.split(X,stratified)):
        X_train, y_train = X[train_index], y[train_index]
        X_val, y_val = X[val_index], y[val_index]
        model = Ridge()
        model.fit(X_train, y_train)

        rmse_score = mean_squared_error ( model.predict (X_val), y_val, squared = False) 
        rmse_scores.append (rmse_score)

        X_less_toxic = vec.transform(df_val['less_toxic'])
        X_more_toxic = vec.transform(df_val['more_toxic'])

        p1 = model.predict(X_less_toxic)
        p2 = model.predict(X_more_toxic)

        X_less_toxics.append ( p1 )
        X_more_toxics.append ( p2 )

        # Validation Accuracy
        val_acc = (p1< p2).mean()
        val_scores.append(val_acc)

        pred = model.predict (X_test)
        preds.append (pred)

        print(f"FOLD:{fold}, rmse_fold:{rmse_score:.5f}, val_acc:{val_acc:.5f}")

    mean_val_acc = np.mean (val_scores)
    mean_rmse_score = np.mean (rmse_scores)

    p1 = np.mean ( np.vstack(X_less_toxics), axis=0 )
    p2 = np.mean ( np.vstack(X_more_toxics), axis=0 )

    val_acc = (p1< p2).mean()

    print(f"OOF: val_acc:{val_acc:.5f}, mean val_acc:{mean_val_acc:.5f}, mean rmse_score:{mean_rmse_score:.5f}")
    
    preds = np.mean ( np.vstack(preds), axis=0 )
    
    return p1, p2, preds

toxic = 1.0
severe_toxic = 2.0
obscene = 1.0
threat = 1.0
insult = 1.0
identity_hate = 2.0

def create_train (df):
    df['y'] = df[["toxic","severe_toxic","obscene","threat","insult","identity_hate"]].max(axis=1)
    df['y'] = df["y"]+df['severe_toxic']*severe_toxic
    df['y'] = df["y"]+df['obscene']*obscene
    df['y'] = df["y"]+df['threat']*threat
    df['y'] = df["y"]+df['insult']*insult
    df['y'] = df["y"]+df['identity_hate']*identity_hate
    
    
    
    df = df[['comment_text', 'y', 'toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].rename(columns={'comment_text': 'text'})

    #undersample non toxic comments  on Toxic Comment Classification Challenge
    min_len = (df['y'] >= 1).sum()
    df_y0_undersample = df[df['y'] == 0].sample(n=int(min_len*1.5),random_state=201)
    df = pd.concat([df[df['y'] >= 1], df_y0_undersample])
                                                
    return df
 

df_val = pd.read_csv("../input/jigsaw-toxic-severity-rating/validation_data.csv")
df_test = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")




jc_train_df = pd.read_csv("../input/jigsaw-toxic-comment-classification-challenge/train.csv")
print(f"jc_train_df:{jc_train_df.shape}")

jc_train_df = create_train(jc_train_df)
                           
df = jc_train_df
print(df['y'].value_counts())


FOLDS = 7

vec = TfidfVectorizer(analyzer='char_wb', max_df=0.5, min_df=3, ngram_range=(4, 6) )
X = vec.fit_transform(df['text'])
y = df["y"].values
X_test = vec.transform(df_test['text'])

stratified = np.around ( y )
jc_p1, jc_p2, jc_preds =  ridge_cv (vec, X, y, X_test, FOLDS, stratified )


juc_train_df = pd.read_csv("../input/jigsaw-unintended-bias-in-toxicity-classification/train.csv")
print(f"juc_train_df:{juc_train_df.shape}")
juc_train_df = juc_train_df.query ("toxicity_annotator_count > 5")
print(f"juc_train_df:{juc_train_df.shape}")

juc_train_df['y'] = juc_train_df[[ 'severe_toxicity', 'obscene', 'sexual_explicit','identity_attack', 'insult', 'threat']].sum(axis=1)

juc_train_df['y'] = juc_train_df.apply(lambda row: row["target"] if row["target"] <= 0.5 else row["y"] , axis=1)
juc_train_df = juc_train_df[['comment_text', 'y']].rename(columns={'comment_text': 'text'})
min_len = (juc_train_df['y'] > 0.5).sum()
df_y0_undersample = juc_train_df[juc_train_df['y'] <= 0.5].sample(n=int(min_len*1.5),random_state=201)
juc_train_df = pd.concat([juc_train_df[juc_train_df['y'] > 0.5], df_y0_undersample])

df = juc_train_df
print(df['y'].value_counts())

FOLDS = 7

vec = TfidfVectorizer(analyzer='char_wb', max_df=0.5, min_df=3, ngram_range=(4, 6) )
X = vec.fit_transform(df['text'])
y = df["y"].values
X_test = vec.transform(df_test['text'])

stratified = (np.around ( y, decimals = 1  )*10).astype(int)
juc_p1, juc_p2, juc_preds =  ridge_cv (vec, X, y, X_test, FOLDS, stratified )





rud_df = pd.read_csv("../input/ruddit-jigsaw-dataset/Dataset/ruddit_with_text.csv")
print(f"rud_df:{rud_df.shape}")
rud_df['y'] = rud_df['offensiveness_score'].map(lambda x: 0.0 if x <=0 else x)
rud_df = rud_df[['txt', 'y']].rename(columns={'txt': 'text'})
min_len = (rud_df['y'] < 0.5).sum()
print(rud_df['y'].value_counts())

FOLDS = 7
df = rud_df
vec = TfidfVectorizer(analyzer='char_wb', max_df=0.5, min_df=3, ngram_range=(4, 6) )
X = vec.fit_transform(df['text'])
y = df["y"].values
X_test = vec.transform(df_test['text'])

stratified = (np.around ( y, decimals = 1  )*10).astype(int)
rud_p1, rud_p2, rud_preds =  ridge_cv (vec, X, y, X_test, FOLDS, stratified )

jc_max = max(jc_p1.max() , jc_p2.max())
juc_max = max(juc_p1.max() , juc_p2.max())
rud_max = max(rud_p1.max() , rud_p2.max())


p1 = jc_p1/jc_max + juc_p1/juc_max + rud_p1/rud_max
p2 = jc_p2/jc_max + juc_p2/juc_max + rud_p2/rud_max

val_acc = (p1< p2).mean()
print(f"Ensemble: val_acc:{val_acc:.5f}")

preds2 = jc_preds/jc_max + juc_preds/juc_max + rud_preds/rud_max  

In [None]:
preds2[:5]

# Model 3: [Pytorch RoBERTa Ranking Baseline JRSTC [Infer]](https://www.kaggle.com/manabendrarout/pytorch-roberta-ranking-baseline-jrstc-infer)
`LB=0.807`

In [None]:
%%time
# Asthetics
import warnings
import sklearn.exceptions
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings("ignore", category=sklearn.exceptions.UndefinedMetricWarning)

# General
from tqdm.auto import tqdm
from bs4 import BeautifulSoup
from collections import defaultdict
import pandas as pd
import numpy as np
import os
import re
import random
import gc
import glob
pd.set_option('display.max_columns', None)
np.seterr(divide='ignore', invalid='ignore')
gc.enable()

# Deep Learning
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import OneCycleLR
# NLP
from transformers import AutoTokenizer, AutoModel

# Random Seed Initialize
RANDOM_SEED = 42

def seed_everything(seed=RANDOM_SEED):
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything()

# Device Optimization
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
    
print(f'Using device: {device}')

data_dir = '../input/jigsaw-toxic-severity-rating'
models_dir = '../input/jrstc-models/roberta_base'
test_file_path = os.path.join(data_dir, 'comments_to_score.csv')
print(f'Train file: {test_file_path}')

test_df = pd.read_csv(test_file_path)

# Text Cleaning

def text_cleaning(text):
    '''
    Cleans text into a basic form for NLP. Operations include the following:-
    1. Remove special charecters like &, #, etc
    2. Removes extra spaces
    3. Removes embedded URL links
    4. Removes HTML tags
    5. Removes emojis
    
    text - Text piece to be cleaned.
    '''
    template = re.compile(r'https?://\S+|www\.\S+') #Removes website links
    text = template.sub(r'', text)
    
    soup = BeautifulSoup(text, 'lxml') #Removes HTML tags
    only_text = soup.get_text()
    text = only_text
    
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               "]+", flags=re.UNICODE)
    text = emoji_pattern.sub(r'', text)
    
    text = re.sub(r"[^a-zA-Z\d]", " ", text) #Remove special Charecters
    text = re.sub(' +', ' ', text) #Remove Extra Spaces
    text = text.strip() # remove spaces at the beginning and at the end of string

    return text

tqdm.pandas()
test_df['text'] = test_df['text'].progress_apply(text_cleaning)

test_df.sample(10)

# CFG

params = {
    'device': device,
    'debug': False,
    'checkpoint': '../input/roberta-base',
    'output_logits': 768,
    'max_len': 256,
    'batch_size': 32,
    'dropout': 0.2,
    'num_workers': 2
}

if params['debug']:
    train_df = train_df.sample(frac=0.01)
    print('Reduced training Data Size for Debugging purposes')

# Dataset

class BERTDataset:
    def __init__(self, text, max_len=params['max_len'], checkpoint=params['checkpoint']):
        self.text = text
        self.max_len = max_len
        self.checkpoint = checkpoint
        self.tokenizer = AutoTokenizer.from_pretrained(checkpoint)
        self.num_examples = len(self.text)

    def __len__(self):
        return self.num_examples

    def __getitem__(self, idx):
        text = str(self.text[idx])

        tokenized_text = self.tokenizer(
            text,
            add_special_tokens=True,
            truncation=True,
            padding='max_length',
            max_length=self.max_len,
            return_attention_mask=True,
            return_token_type_ids=True,
        )

        ids = tokenized_text['input_ids']
        mask = tokenized_text['attention_mask']
        token_type_ids = tokenized_text['token_type_ids']

        return {'ids': torch.tensor(ids, dtype=torch.long),
                'mask': torch.tensor(mask, dtype=torch.long),
                'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long)}

# NLP Model

class ToxicityModel(nn.Module):
    def __init__(self, checkpoint=params['checkpoint'], params=params):
        super(ToxicityModel, self).__init__()
        self.checkpoint = checkpoint
        self.bert = AutoModel.from_pretrained(checkpoint, return_dict=False)
        self.layer_norm = nn.LayerNorm(params['output_logits'])
        self.dropout = nn.Dropout(params['dropout'])
        self.dense = nn.Sequential(
            nn.Linear(params['output_logits'], 256),
            nn.LeakyReLU(negative_slope=0.01),
            nn.Dropout(params['dropout']),
            nn.Linear(256, 1)
        )

    def forward(self, input_ids, token_type_ids, attention_mask):
        _, pooled_output = self.bert(input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)
        pooled_output = self.layer_norm(pooled_output)
        pooled_output = self.dropout(pooled_output)
        preds = self.dense(pooled_output)
        return preds

# Prediction

predictions_nn = None
for model_name in glob.glob(models_dir + '/*.pth'):
    model = ToxicityModel()
    model.load_state_dict(torch.load(model_name))
    model = model.to(params['device'])
    model.eval()

    test_dataset = BERTDataset(
        text = test_df['text'].values
    )
    test_loader = DataLoader(
        test_dataset, batch_size=params['batch_size'],
        shuffle=False, num_workers=params['num_workers'],
        pin_memory=True
    )

    temp_preds = None
    with torch.no_grad():
        for batch in tqdm(test_loader, desc=f'Predicting. '):
            ids= batch['ids'].to(device)
            mask = batch['mask'].to(device)
            token_type_ids = batch['token_type_ids'].to(device)
            predictions = model(ids, token_type_ids, mask).to('cpu').numpy()
            
            if temp_preds is None:
                temp_preds = predictions
            else:
                temp_preds = np.vstack((temp_preds, predictions))

    if predictions_nn is None:
        predictions_nn = temp_preds
    else:
        predictions_nn += temp_preds
        
predictions_nn /= (len(glob.glob(models_dir + '/*.pth')))

preds3 = predictions_nn
preds3 = preds3.squeeze(-1)

In [None]:
preds3[:5]

# Model 4: [JRSTC | INFER | LB : 0.806 🎃](https://www.kaggle.com/kishalmandal/jrstc-infer-lb-0-806)
`LB=0.806`

In [None]:
%%time
import os
import gc
import copy
import time
import random
import string

import nltk
from nltk.stem import SnowballStemmer, WordNetLemmatizer
import re
from nltk.corpus import stopwords

from tqdm import tqdm
from collections import defaultdict

import torch
import torch.nn as nn

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from transformers import AdamW, get_linear_schedule_with_warmup
from transformers import AutoTokenizer, AutoModel
from torch.utils.data import Dataset, DataLoader
from torch.optim import lr_scheduler

class Config:
    num_classes=1
    epochs=10
    margin=0.5
    model_name = '../input/robertalarge'
    batch_size = 8
    lr = 1e-4
    weight_decay=0.01
    scheduler = 'CosineAnnealingLR'
    max_length = 196
    accumulation_step = 1
    patience = 1

class ToxicDataset(Dataset):
    def __init__(self, comments, tokenizer, max_length):
        self.comment = comments
        self.tokenizer = tokenizer
        self.max_len = max_length
        
    def __len__(self):
        return len(self.comment)
    
    def __getitem__(self, idx):

        inputs_more_toxic = self.tokenizer.encode_plus(
                                self.comment[idx],
                                truncation=True,
                                add_special_tokens=True,
                                max_length=self.max_len,
                                padding='max_length'
                            )

        
        input_ids = inputs_more_toxic['input_ids']
        attention_mask = inputs_more_toxic['attention_mask']

       
        return {
            'input_ids': torch.tensor(input_ids, dtype=torch.long),
            'attention_mask': torch.tensor(attention_mask, dtype=torch.long),
        }
       

class ToxicModel(nn.Module):
    def __init__(self, model_name, args):
        super(ToxicModel, self).__init__()
        self.args = args
        self.model = AutoModel.from_pretrained(model_name)
        self.dropout = nn.Dropout(p=0.2)
        self.output = nn.Linear(1024, self.args.num_classes)
    
        
    def forward(self, toxic_ids, toxic_mask):
        
        out = self.model(
            input_ids=toxic_ids,
            attention_mask=toxic_mask,
            output_hidden_states=False
        )
        
        out = self.dropout(out[1])
        outputs = self.output(out)

        return outputs
        

def get_predictions(model, dataloader):
    model.eval()
    
    PREDS=[]
    with torch.no_grad():
        bar = tqdm(enumerate(dataloader), total=len(dataloader))
        for step, data in bar:        
            input_ids = data['input_ids'].cuda()
            attention_mask = data['attention_mask'].cuda()

            outputs = model(input_ids, attention_mask)

            PREDS.append(outputs.view(-1).cpu().detach().numpy())

            bar.set_postfix(Stage='Inference')  
        
        PREDS = np.hstack(PREDS)
        gc.collect()

        return PREDS

df = pd.read_csv('../input/jigsaw-toxic-severity-rating/validation_data.csv')

args = Config()

tokenizer = AutoTokenizer.from_pretrained(args.model_name)

def washing_machine(comments):
    corpus=[]
    for i in tqdm(range(len(comments))):
        comment = re.sub('[^a-zA-Z]', ' ', comments[i])
        comment = comment.lower()
        comment = comment.split()
        stemmer = SnowballStemmer('english')
        lemmatizer = WordNetLemmatizer()
        all_stopwords = stopwords.words('english')
        comment = [stemmer.stem(word) for word in comment if not word in set(all_stopwords)]
        comment = [lemmatizer.lemmatize(word) for word in comment]
        comment = ' '.join(comment)
        corpus.append(comment)

    return corpus




def inference(dataloader):
    final_preds = []
    args = Config()
    base_path='../input/large-jigsaw-kishal-v1/'
    for fold in range(1):
        model = ToxicModel(args.model_name, args)
        model = model.cuda()
        path = base_path + f'model_fold_0.bin'
        model.load_state_dict(torch.load(path))
        
        print(f"Getting predictions for model {fold+1}")
        preds = get_predictions(model, dataloader)
        final_preds.append(preds)
    
    final_preds = np.array(final_preds)
    final_preds = np.mean(final_preds, axis=0)
    return final_preds

# Prediction and submission

sub = pd.read_csv('../input/jigsaw-toxic-severity-rating/comments_to_score.csv')

sub.head(1)

sub_dataset = ToxicDataset(washing_machine(sub['text'].values), tokenizer, max_length=args.max_length)
sub_loader = DataLoader(sub_dataset, batch_size=2*args.batch_size,
                        num_workers=2, shuffle=False, pin_memory=True)

preds4 = inference(sub_loader)

In [None]:
preds4[:5]

# Model 5: [☣️ Jigsaw - 🤗 HF hub out-of-the-box models](https://www.kaggle.com/julian3833/jigsaw-hf-hub-out-of-the-box-models)
`LB=0.782`

In [None]:
%%time
import os; os.environ['TOKENIZERS_PARALLELISM'] = 'false'
import torch
import pandas as pd
import numpy as np
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.preprocessing import MinMaxScaler


class Dataset:
    """
    For comments_to_score.csv (the submission), get only one comment per row
    """
    def __init__(self, text, tokenizer, max_len):
        self.text = text
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.text)

    def __getitem__(self, item):
        text = str(self.text[item])
        inputs = self.tokenizer(
            text, 
            max_length=self.max_len, 
            padding="max_length", 
            truncation=True
        )

        ids = inputs["input_ids"]
        mask = inputs["attention_mask"]

        return {
            "input_ids": torch.tensor(ids, dtype=torch.long),
            "attention_mask": torch.tensor(mask, dtype=torch.long)
        }
    
def generate_predictions(model_path, max_len, is_multioutput):
    model = AutoModelForSequenceClassification.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    model.to("cuda")
    model.eval()
    
    df = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")
    
    dataset = Dataset(text=df.text.values, tokenizer=tokenizer, max_len=max_len)
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=32, num_workers=2, pin_memory=True, shuffle=False
    )

    final_output = []

    for data in data_loader:
        with torch.no_grad():
            for key, value in data.items():
                data[key] = value.to("cuda")
            output = model(**data)
            
            if is_multioutput:
                # Sum the logits for all the toxic labels
                # One strategy out of various possible
                output = output.logits.sum(dim=1)
            else:
                # Classifier. Get logits for "toxic"
                output = output.logits[:, 1]
            
            output = output.detach().cpu().numpy().tolist()
            final_output.extend(output)
    
    torch.cuda.empty_cache()
    return np.array(final_output)

preds_bert = generate_predictions("../input/toxic-bert", max_len=192, is_multioutput=True)
preds_rob1 = generate_predictions("../input/roberta-base-toxicity", max_len=192, is_multioutput=False)
preds_rob2 = generate_predictions("../input/roberta-toxicity-classifier", max_len=192, is_multioutput=False)



df_sub = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")
df_sub["score_bert"] = preds_bert
df_sub["score_rob1"] = preds_rob1
df_sub["score_rob2"] = preds_rob2
df_sub[["score_bert", "score_rob1", "score_rob2"]] = MinMaxScaler().fit_transform(df_sub[["score_bert", "score_rob1", "score_rob2"]])

preds5 = df_sub[["score_bert", "score_rob1", "score_rob2"]].sum(axis=1)

In [None]:
preds5[:5]

# Model 6: [☣️ Jigsaw - Incredibly Simple Naive Bayes [0.768]](https://www.kaggle.com/julian3833/jigsaw-incredibly-simple-naive-bayes-0-768)
`LB=0.768`

In [None]:
%%time
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
df = pd.read_csv("../input/jigsaw-toxic-comment-classification-challenge/train.csv")
df['y'] = (df[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].sum(axis=1) > 0 ).astype(int)
df = df[['comment_text', 'y']].rename(columns={'comment_text': 'text'})
min_len = (df['y'] == 1).sum()
df_y0_undersample = df[df['y'] == 0].sample(n=min_len, random_state=201)
df = pd.concat([df[df['y'] == 1], df_y0_undersample])
vec = TfidfVectorizer()
X = vec.fit_transform(df['text'])
model = MultinomialNB()
model.fit(X, df['y'])
df_sub = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")
X_test = vec.transform(df_sub['text'])
preds6 = model.predict_proba(X_test)[:, 1]

In [None]:
preds6[:5]

# Model 7: [*#&@ the Benchmark [0.81+] - TFIDF - Ridge](https://www.kaggle.com/samarthagarwal23/the-benchmark-0-81-tfidf-ridge)
`LB=0.812`

In [None]:
%%time
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
import scipy
pd.options.display.max_colwidth=300

df = pd.read_csv("../input/jigsaw-toxic-comment-classification-challenge/train.csv")

# Give more weight to severe toxic 
df['severe_toxic'] = df.severe_toxic * 2
df['y'] = (df[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']].sum(axis=1) ).astype(int)
df = df[['comment_text', 'y']].rename(columns={'comment_text': 'text'})


df = pd.concat([df[df.y>0] , 
                df[df.y==0].sample(int(len(df[df.y>0])*1.5)) ], axis=0).sample(frac=1)

pipeline = Pipeline(
    [
        ("vect", TfidfVectorizer(min_df= 3, max_df=0.5, analyzer = 'char_wb', ngram_range = (3,5))),
        ("clf", Ridge()),
    ]
)
pipeline.fit(df['text'], df['y'])
df_sub = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")
preds7 = pipeline.predict(df_sub['text'])


In [None]:
preds7[:5]

# Ensemble

In [None]:
df = pd.read_csv("../input/jigsaw-toxic-severity-rating/comments_to_score.csv")
df['model1'] = preds1
df['model2'] = preds2
df['model3'] = preds3
df['model4'] = preds4
df['model5'] = preds5
df['model6'] = preds6
df['model7'] = preds7

cols = [c for c in df.columns if c.startswith('model')]

# Put all predictions in the same scale. 
# Make all the distances between predictions uniform
#df[cols] = df[cols].rank(method='first').astype(int)
# Make all the distances not uniform
df[cols] = MinMaxScaler().fit_transform(df[cols])

# Weights of each model
weights = {
    'model1': 0.44, # 0.816
    'model2': 0.45, # 0.825
    'model3': 0.05, # 0.807
    'model4': 0.045, # 0.806
    'model5': 0.005, # 0.782
    'model6': 0.005, # 0.768
    'model7': 0.005  # 0.812
}

# A weighted sum determines the final position
# It is the same as an average in the end
df['score'] = pd.DataFrame([df[c] * weights[c] for c in cols]).T.sum(axis=1).rank(method='first').astype(int)
df.head()

In [None]:
df.sort_values("score", ascending=True).head(3)

In [None]:
df.sort_values("score", ascending=True).tail(3)

In [None]:
df.sort_values("score", ascending=True).sample(3)

# Model Correlation

In [None]:
import seaborn as sns
sns.set(rc={'figure.figsize':(10,8)})
sns.heatmap(df.drop("comment_id", axis=1).corr(), annot=True, fmt='.2f', cmap='Greens')

# Submission

In [None]:
df.head()

In [None]:
df[['comment_id', 'score']].to_csv("submission.csv", index=False)