## ２つのモデルだけを平均　

<a class="anchor" id="0"></a>
# [Mechanisms of Action (MoA) Prediction](https://www.kaggle.com/c/lish-moa)

### I use the notebook [Pytorch CV|0.0145| LB| 0.01839 |](https://www.kaggle.com/riadalmadani/pytorch-cv-0-0145-lb-0-01839) from [riadalmadani](https://www.kaggle.com/riadalmadani) as a basis and will try to tune its various parameters.

# Acknowledgements

* [MoA | Pytorch | 0.01859 | RankGauss | PCA | NN](https://www.kaggle.com/kushal1506/moa-pytorch-0-01859-rankgauss-pca-nn)
* [[MoA] Pytorch NN+PCA+RankGauss](https://www.kaggle.com/nayuts/moa-pytorch-nn-pca-rankgauss)
* [Pytorch CV|0.0145| LB| 0.01839 |](https://www.kaggle.com/riadalmadani/pytorch-cv-0-0145-lb-0-01839)
* [[New Baseline] Pytorch | MoA](https://www.kaggle.com/namanj27/new-baseline-pytorch-moa)
* [Deciding (n_components) in PCA](https://www.kaggle.com/kushal1506/deciding-n-components-in-pca)
* [Titanic - Featuretools (automatic FE&FS)](https://www.kaggle.com/vbmokin/titanic-featuretools-automatic-fe-fs)
* tuning and visualization from [Higher LB score by tuning mloss - upgrade & visual](https://www.kaggle.com/vbmokin/higher-lb-score-by-tuning-mloss-upgrade-visual)
* [Data Science for tabular data: Advanced Techniques](https://www.kaggle.com/vbmokin/data-science-for-tabular-data-advanced-techniques)

### My upgrade:

* PCA parameters
* Feature Selection methods
* Dropout
* Structuring of the notebook
* Tuning visualization
* Number of folds

I used the code from sources (please see above). But I am planning to develop this notebook. There are still promising areas for improvement and research of parameters.

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [My upgrade](#2)
    -  [Commit now](#2.1)
    -  [Previous commits](#2.2)
    -  [Parameters and LB score visualization](#2.3)
1. [Download data](#3)
1. [FE & Data Preprocessing](#4)
    - [RankGauss](#4.1)
    - [Seed](#4.2)    
    - [PCA features](#4.3)
    - [Feature selection](#4.4)
    - [CV folds](#4.5)
    - [Dataset Classes](#4.6)
    - [Smoothing](#4.7)
    - [Preprocessing](#4.8)
1. [Modeling](#5)
1. [Prediction & Submission](#6)

In [None]:
import torch

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

In [None]:
import sys
sys.path.append('../input/iterativestratification')

import numpy as np
import random
import pandas as pd
import os
import copy
import gc

import matplotlib.pyplot as plt 
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.preprocessing import QuantileTransformer
from sklearn.feature_selection import VarianceThreshold, SelectKBest
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.nn.modules.loss import _WeightedLoss

import warnings
warnings.filterwarnings('ignore')

os.listdir('../input/lish-moa')

pd.set_option('max_columns', 2000)

## 2. My upgrade <a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

### 2.1. Commit now <a class="anchor" id="2.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
n_comp_GENES = 463
n_comp_CELLS = 60
VarianceThreshold_for_FS = 0.9
Dropout_Model = 0.25
LEARNING_RATE_NEW = 5e-4
print('n_comp_GENES', n_comp_GENES, 'n_comp_CELLS', n_comp_CELLS, 'total', n_comp_GENES + n_comp_CELLS)

### 2.2 Previous commits <a class="anchor" id="2.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
commits_df = pd.DataFrame(columns = ['n_commit', 'n_comp_GENES', 'n_comp_CELLS', 'train_features','VarianceThreshold_for_FS', 'Dropout_Model', 'LB_score', 'CV_logloss'])

### Commit 0 (parameters from https://www.kaggle.com/riadalmadani/pytorch-cv-0-0145-lb-0-01839, commit 8)

In [None]:
n=0
commits_df.loc[n, 'n_commit'] = 0                       # Number of commit
commits_df.loc[n, 'n_comp_GENES'] = 600                 # Number of output features for PCA for g-features
commits_df.loc[n, 'n_comp_CELLS'] = 50                  # Number of output features for PCA for c-features
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.8     # Threshold for VarianceThreshold for feature selection
commits_df.loc[n, 'train_features'] = 1245              # Number features in the training dataframe after FE and before modeling
commits_df.loc[n, 'Dropout_Model'] = 0.2619422201258426 # Dropout in Model
commits_df.loc[n, 'CV_logloss'] = 0.01458269555140327   # Result CV logloss metrics
commits_df.loc[n, 'LB_score'] = 0.01839                 # LB score after submitting

### Commit 4

In [None]:
n=1
commits_df.loc[n, 'n_commit'] = 4
commits_df.loc[n, 'n_comp_GENES'] = 610
commits_df.loc[n, 'n_comp_CELLS'] = 55
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.82
commits_df.loc[n, 'train_features'] = 1240
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] =  0.014584545081734047
commits_df.loc[n, 'LB_score'] = 0.01839

### Commit 5

In [None]:
n=2
commits_df.loc[n, 'n_commit'] = 5
commits_df.loc[n, 'n_comp_GENES'] = 670
commits_df.loc[n, 'n_comp_CELLS'] = 67
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.67
commits_df.loc[n, 'train_features'] = 1298
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] =  0.014588561242139069
commits_df.loc[n, 'LB_score'] = 0.01840

### Commit 6

In [None]:
n=3
commits_df.loc[n, 'n_commit'] = 6
commits_df.loc[n, 'n_comp_GENES'] = 450
commits_df.loc[n, 'n_comp_CELLS'] = 45
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.67
commits_df.loc[n, 'train_features'] = 1297
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] =  0.014586229676302227
commits_df.loc[n, 'LB_score'] = 0.01840

### Commit 9

In [None]:
n=4
commits_df.loc[n, 'n_commit'] = 9
commits_df.loc[n, 'n_comp_GENES'] = 463
commits_df.loc[n, 'n_comp_CELLS'] = 60
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.9
commits_df.loc[n, 'train_features'] = 1219
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] =  0.014572358066092783
commits_df.loc[n, 'LB_score'] = 0.01839

### Commit 10

In [None]:
n=5
commits_df.loc[n, 'n_commit'] = 10
commits_df.loc[n, 'n_comp_GENES'] = 463
commits_df.loc[n, 'n_comp_CELLS'] = 80
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.92
commits_df.loc[n, 'train_features'] = 1214
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] =  0.014571552074579226
commits_df.loc[n, 'LB_score'] = 0.01841

### Commit 12

In [None]:
n=6
commits_df.loc[n, 'n_commit'] = 12
commits_df.loc[n, 'n_comp_GENES'] = 450
commits_df.loc[n, 'n_comp_CELLS'] = 65
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.9
commits_df.loc[n, 'train_features'] = 1219
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] = 0.01458043214513875
commits_df.loc[n, 'LB_score'] = 0.01840

### Commit 13

In [None]:
n=7
commits_df.loc[n, 'n_commit'] = 13
commits_df.loc[n, 'n_comp_GENES'] = 463
commits_df.loc[n, 'n_comp_CELLS'] = 60
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.9
commits_df.loc[n, 'train_features'] = 1219
commits_df.loc[n, 'Dropout_Model'] = 0.4
commits_df.loc[n, 'CV_logloss'] = 0.014625250378417162
commits_df.loc[n, 'LB_score'] = 0.01844

### Commit 14

In [None]:
n=8
commits_df.loc[n, 'n_commit'] = 14
commits_df.loc[n, 'n_comp_GENES'] = 463
commits_df.loc[n, 'n_comp_CELLS'] = 60
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.01
commits_df.loc[n, 'train_features'] = 1604
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] = 0.014713482787703418
commits_df.loc[n, 'LB_score'] = 0.01849

### Commit 18

In [None]:
n=9
commits_df.loc[n, 'n_commit'] = 18
commits_df.loc[n, 'n_comp_GENES'] = 363
commits_df.loc[n, 'n_comp_CELLS'] = 60
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.9
commits_df.loc[n, 'train_features'] = 1219
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] = 0.014568689235607534
commits_df.loc[n, 'LB_score'] = 0.01841

### Commit 19

In [None]:
n=10
commits_df.loc[n, 'n_commit'] = 19
commits_df.loc[n, 'n_comp_GENES'] = 550
commits_df.loc[n, 'n_comp_CELLS'] = 60
commits_df.loc[n, 'VarianceThreshold_for_FS'] = 0.91
commits_df.loc[n, 'train_features'] = 1218
commits_df.loc[n, 'Dropout_Model'] = 0.25
commits_df.loc[n, 'CV_logloss'] = 0.014577509066710863
commits_df.loc[n, 'LB_score'] = 0.01841

### 2.3 Parameters and LB score visualization <a class="anchor" id="2.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
commits_df['n_comp_total'] = commits_df['n_comp_GENES'] + commits_df['n_comp_CELLS']
commits_df['seed'] = 42
commits_df['l_rate'] = 1e-3

In [None]:
# Find and mark minimun value of LB score
commits_df['LB_score'] = pd.to_numeric(commits_df['LB_score'])
commits_df = commits_df.sort_values(by=['LB_score', 'CV_logloss'], ascending = True).reset_index(drop=True)
commits_df['min'] = 0
commits_df.loc[0, 'min'] = 1
commits_df

In [None]:
commits_df.sort_values(by=['CV_logloss'], ascending = True)

In [None]:
# Interactive plot with results of parameters tuning
fig = px.scatter_3d(commits_df, x='n_comp_GENES', y='n_comp_CELLS', z='LB_score', color = 'min', 
                    symbol = 'Dropout_Model',
                    title='Parameters and LB score visualization of MoA solutions')
fig.update(layout=dict(title=dict(x=0.1)))

In [None]:
# Interactive plot with results of parameters tuning
fig = px.scatter_3d(commits_df, x='train_features', y='VarianceThreshold_for_FS', z='LB_score', color = 'min', 
                    symbol = 'seed',
                    title='Parameters and LB score visualization of MoA solutions')
fig.update(layout=dict(title=dict(x=0.1)))

In [None]:
# Interactive plot with results of parameters tuning
fig = px.scatter_3d(commits_df, x='train_features', y='CV_logloss', z='LB_score', color = 'min', 
                    symbol = 'l_rate',
                    title='Parameters and LB score visualization of MoA solutions')
fig.update(layout=dict(title=dict(x=0.1)))

In [None]:
# Interactive plot with results of parameters tuning
commits_df_1841 = commits_df[commits_df.LB_score <= 0.01841]
fig = px.scatter_3d(commits_df_1841, x='train_features', y='CV_logloss', z='LB_score', color = 'min', 
                    symbol = 'l_rate',
                    title='Parameters and LB score visualization of MoA solutions')
fig.update(layout=dict(title=dict(x=0.1)))

### It is recommended:
* **n_comp_GENES** smaller, 
* **n_comp_CELLS** more,
* **VarianceThreshold_for_FS** more, so that **train_features** is less.

## 3. Download data<a class="anchor" id="3"></a>

[Back to Table of Contents](#0.1)

In [None]:
train_features = pd.read_csv('../input/lish-moa/train_features.csv')
train_targets_scored = pd.read_csv('../input/lish-moa/train_targets_scored.csv')
train_targets_nonscored = pd.read_csv('../input/lish-moa/train_targets_nonscored.csv')

test_features = pd.read_csv('../input/lish-moa/test_features.csv')
sample_submission = pd.read_csv('../input/lish-moa/sample_submission.csv')

## 4. FE & Data Preprocessing <a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

In [None]:
GENES = [col for col in train_features.columns if col.startswith('g-')]
CELLS = [col for col in train_features.columns if col.startswith('c-')]

### 4.1 RankGauss<a class="anchor" id="4.1"></a>

[Back to Table of Contents](#0.1)

In [None]:
# RankGauss - transform to Gauss

for col in (GENES + CELLS):

    transformer = QuantileTransformer(n_quantiles=100,random_state=0, output_distribution="normal")
    vec_len = len(train_features[col].values)
    vec_len_test = len(test_features[col].values)
    raw_vec = train_features[col].values.reshape(vec_len, 1)
    transformer.fit(raw_vec)

    train_features[col] = transformer.transform(raw_vec).reshape(1, vec_len)[0]
    test_features[col] = transformer.transform(test_features[col].values.reshape(vec_len_test, 1)).reshape(1, vec_len_test)[0]

### 4.2 Seed<a class="anchor" id="4.2"></a>

[Back to Table of Contents](#0.1)

In [None]:
def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    
seed_everything(seed=42)

### 4.3 PCA features<a class="anchor" id="4.3"></a>

[Back to Table of Contents](#0.1)

In [None]:
len(GENES)

In [None]:
# GENES

data = pd.concat([pd.DataFrame(train_features[GENES]), pd.DataFrame(test_features[GENES])])
data2 = (PCA(n_components=n_comp_GENES, random_state=42).fit_transform(data[GENES]))
train2 = data2[:train_features.shape[0]]; test2 = data2[-test_features.shape[0]:]

train2 = pd.DataFrame(train2, columns=[f'pca_G-{i}' for i in range(n_comp_GENES)])
test2 = pd.DataFrame(test2, columns=[f'pca_G-{i}' for i in range(n_comp_GENES)])

train_features = pd.concat((train_features, train2), axis=1)
test_features = pd.concat((test_features, test2), axis=1)

In [None]:
len(CELLS)

In [None]:
# CELLS

data = pd.concat([pd.DataFrame(train_features[CELLS]), pd.DataFrame(test_features[CELLS])])
data2 = (PCA(n_components=n_comp_CELLS, random_state=42).fit_transform(data[CELLS]))
train2 = data2[:train_features.shape[0]]; test2 = data2[-test_features.shape[0]:]

train2 = pd.DataFrame(train2, columns=[f'pca_C-{i}' for i in range(n_comp_CELLS)])
test2 = pd.DataFrame(test2, columns=[f'pca_C-{i}' for i in range(n_comp_CELLS)])

train_features = pd.concat((train_features, train2), axis=1)
test_features = pd.concat((test_features, test2), axis=1)

In [None]:
train_features.shape

In [None]:
train_features.head(5)

### 4.4 Feature selection<a class="anchor" id="4.4"></a>

[Back to Table of Contents](#0.1)

In [None]:
data = train_features.append(test_features)
data

In [None]:
var_thresh = VarianceThreshold(VarianceThreshold_for_FS)
data = train_features.append(test_features)
data_transformed = var_thresh.fit_transform(data.iloc[:, 4:])

train_features_transformed = data_transformed[ : train_features.shape[0]]
test_features_transformed = data_transformed[-test_features.shape[0] : ]


train_features = pd.DataFrame(train_features[['sig_id','cp_type','cp_time','cp_dose']].values.reshape(-1, 4),\
                              columns=['sig_id','cp_type','cp_time','cp_dose'])

train_features = pd.concat([train_features, pd.DataFrame(train_features_transformed)], axis=1)


test_features = pd.DataFrame(test_features[['sig_id','cp_type','cp_time','cp_dose']].values.reshape(-1, 4),\
                             columns=['sig_id','cp_type','cp_time','cp_dose'])

test_features = pd.concat([test_features, pd.DataFrame(test_features_transformed)], axis=1)

train_features.shape

In [None]:
train_features.head(5)

In [None]:
train = train_features.merge(train_targets_scored, on='sig_id')
train = train[train['cp_type']!='ctl_vehicle'].reset_index(drop=True)
test = test_features[test_features['cp_type']!='ctl_vehicle'].reset_index(drop=True)

target = train[train_targets_scored.columns]

In [None]:
train = train.drop('cp_type', axis=1)
test = test.drop('cp_type', axis=1)

In [None]:
train.head(5)

In [None]:
target_cols = target.drop('sig_id', axis=1).columns.values.tolist()

### 4.5 CV folds<a class="anchor" id="4.5"></a>

[Back to Table of Contents](#0.1)

In [None]:
folds = train.copy()

mskf = MultilabelStratifiedKFold(n_splits=7)
#mskf = MultilabelStratifiedKFold(n_splits=2) #変更

for f, (t_idx, v_idx) in enumerate(mskf.split(X=train, y=target)):
    folds.loc[v_idx, 'kfold'] = int(f)

folds['kfold'] = folds['kfold'].astype(int)
folds

In [None]:
print(train.shape)
print(folds.shape)
print(test.shape)
print(target.shape)
print(sample_submission.shape)

### 4.6 Dataset Classes<a class="anchor" id="4.6"></a>

[Back to Table of Contents](#0.1)

In [None]:
class MoADataset:
    def __init__(self, features, targets):
        self.features = features
        self.targets = targets
        
    def __len__(self):
        return (self.features.shape[0])
    
    def __getitem__(self, idx):
        dct = {
            'x' : torch.tensor(self.features[idx, :], dtype=torch.float),
            'y' : torch.tensor(self.targets[idx, :], dtype=torch.float)            
        }
        return dct
    
class TestDataset:
    def __init__(self, features):
        self.features = features
        
    def __len__(self):
        return (self.features.shape[0])
    
    def __getitem__(self, idx):
        dct = {
            'x' : torch.tensor(self.features[idx, :], dtype=torch.float)
        }
        return dct

In [None]:
def train_fn(model, optimizer, scheduler, loss_fn, dataloader, device):
    model.train()
    final_loss = 0
    
    for data in dataloader:
        optimizer.zero_grad()
        inputs, targets = data['x'].to(device), data['y'].to(device)
        outputs = model(inputs)
        loss = loss_fn(outputs, targets)
        loss.backward()
        optimizer.step()
        scheduler.step()
        
        final_loss += loss.item()
        
    final_loss /= len(dataloader)
    
    return final_loss


def valid_fn(model, loss_fn, dataloader, device):
    model.eval()
    final_loss = 0
    valid_preds = []
    
    for data in dataloader:
        inputs, targets = data['x'].to(device), data['y'].to(device)
        outputs = model(inputs)
        loss = loss_fn(outputs, targets)
        
        final_loss += loss.item()
        valid_preds.append(outputs.sigmoid().detach().cpu().numpy())
        
    final_loss /= len(dataloader)
    valid_preds = np.concatenate(valid_preds)
    
    return final_loss, valid_preds

def inference_fn(model, dataloader, device):
    model.eval()
    preds = []
    
    for data in dataloader:
        inputs = data['x'].to(device)

        with torch.no_grad():
            outputs = model(inputs)
        
        preds.append(outputs.sigmoid().detach().cpu().numpy())
        
    preds = np.concatenate(preds)
    
    return preds

### 4.7 Smoothing<a class="anchor" id="4.7"></a>

[Back to Table of Contents](#0.1)

In [None]:
class SmoothBCEwLogits(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    @staticmethod
    def _smooth(targets:torch.Tensor, n_labels:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = targets * (1.0 - smoothing) + 0.5 * smoothing
        return targets

    def forward(self, inputs, targets):
        targets = SmoothBCEwLogits._smooth(targets, inputs.size(-1),
            self.smoothing)
        loss = F.binary_cross_entropy_with_logits(inputs, targets,self.weight)

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

### 4.8 Preprocessing<a class="anchor" id="4.8"></a>

[Back to Table of Contents](#0.1)

In [None]:
def process_data(data):
    data = pd.get_dummies(data, columns=['cp_time','cp_dose'])
    return data

In [None]:
feature_cols = [c for c in process_data(folds).columns if c not in target_cols]
feature_cols = [c for c in feature_cols if c not in ['kfold','sig_id']]
len(feature_cols)

## 5. Modeling<a class="anchor" id="5"></a>

[Back to Table of Contents](#0.1)

In [None]:
# HyperParameters

DEVICE = ('cuda' if torch.cuda.is_available() else 'cpu')
EPOCHS = 25
#EPOCHS = 2 #変更
BATCH_SIZE = 128
LEARNING_RATE = LEARNING_RATE_NEW
WEIGHT_DECAY = 1e-5
NFOLDS = 7
#NFOLDS = 2 #変更

EARLY_STOPPING_STEPS = 10
EARLY_STOP = False

num_features=len(feature_cols)
num_targets=len(target_cols)
hidden_size=1500

In [None]:
class Model(nn.Module):
    def __init__(self, num_features, num_targets, hidden_size):
        super(Model, self).__init__()
        self.batch_norm1 = nn.BatchNorm1d(num_features)
        self.dense1 = nn.utils.weight_norm(nn.Linear(num_features, hidden_size))
        
        self.batch_norm2 = nn.BatchNorm1d(hidden_size)
        self.dropout2 = nn.Dropout(Dropout_Model)
        self.dense2 = nn.utils.weight_norm(nn.Linear(hidden_size, hidden_size))
        
        self.batch_norm3 = nn.BatchNorm1d(hidden_size)
        self.dropout3 = nn.Dropout(Dropout_Model)
        self.dense3 = nn.utils.weight_norm(nn.Linear(hidden_size, num_targets))
    
    def forward(self, x):
        x = self.batch_norm1(x)
        x = F.leaky_relu(self.dense1(x))
        
        x = self.batch_norm2(x)
        x = self.dropout2(x)
        x = F.leaky_relu(self.dense2(x))
        
        x = self.batch_norm3(x)
        x = self.dropout3(x)
        x = self.dense3(x)
        
        return x
    
class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1):
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        pred = pred.log_softmax(dim=self.dim)
        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

In [None]:
def run_training(fold, seed):
    
    seed_everything(seed)
    
    train = process_data(folds)
    test_ = process_data(test)
    
    trn_idx = train[train['kfold'] != fold].index
    val_idx = train[train['kfold'] == fold].index
    
    train_df = train[train['kfold'] != fold].reset_index(drop=True)
    valid_df = train[train['kfold'] == fold].reset_index(drop=True)
    
    x_train, y_train  = train_df[feature_cols].values, train_df[target_cols].values
    x_valid, y_valid =  valid_df[feature_cols].values, valid_df[target_cols].values
    
    train_dataset = MoADataset(x_train, y_train)
    valid_dataset = MoADataset(x_valid, y_valid)
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
    validloader = torch.utils.data.DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=False)
    
    model = Model(
        num_features=num_features,
        num_targets=num_targets,
        hidden_size=hidden_size,
    )
    
    model.to(DEVICE)
    
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE, weight_decay=WEIGHT_DECAY)
    scheduler = optim.lr_scheduler.OneCycleLR(optimizer=optimizer, pct_start=0.1, div_factor=1e3, 
                                              max_lr=1e-2, epochs=EPOCHS, steps_per_epoch=len(trainloader))
    
    loss_fn = nn.BCEWithLogitsLoss()
    loss_tr = SmoothBCEwLogits(smoothing =0.001)
    
    early_stopping_steps = EARLY_STOPPING_STEPS
    early_step = 0
   
    oof = np.zeros((len(train), target.iloc[:, 1:].shape[1]))
    best_loss = np.inf
    
    for epoch in range(EPOCHS):
        
        train_loss = train_fn(model, optimizer,scheduler, loss_tr, trainloader, DEVICE)
        print(f"FOLD: {fold}, EPOCH: {epoch}, train_loss: {train_loss}")
        valid_loss, valid_preds = valid_fn(model, loss_fn, validloader, DEVICE)
        print(f"FOLD: {fold}, EPOCH: {epoch}, valid_loss: {valid_loss}")
        
        if valid_loss < best_loss:
            
            best_loss = valid_loss
            oof[val_idx] = valid_preds
            torch.save(model.state_dict(), f"FOLD{fold}_.pth")
        
        elif(EARLY_STOP == True):
            
            early_step += 1
            if (early_step >= early_stopping_steps):
                break
            
    
    #--------------------- PREDICTION---------------------
    x_test = test_[feature_cols].values
    testdataset = TestDataset(x_test)
    testloader = torch.utils.data.DataLoader(testdataset, batch_size=BATCH_SIZE, shuffle=False)
    
    model = Model(
        num_features=num_features,
        num_targets=num_targets,
        hidden_size=hidden_size,

    )
    
    model.load_state_dict(torch.load(f"FOLD{fold}_.pth"))
    model.to(DEVICE)
    
    predictions = np.zeros((len(test_), target.iloc[:, 1:].shape[1]))
    predictions = inference_fn(model, testloader, DEVICE)
    
    return oof, predictions

## 6. Prediction & Submission <a class="anchor" id="6"></a>

[Back to Table of Contents](#0.1)

In [None]:
def run_k_fold(NFOLDS, seed):
    oof = np.zeros((len(train), len(target_cols)))
    predictions = np.zeros((len(test), len(target_cols)))
    
    for fold in range(NFOLDS):
        oof_, pred_ = run_training(fold, seed)
        
        predictions += pred_ / NFOLDS
        oof += oof_
        
    return oof, predictions

In [None]:
# Averaging on multiple SEEDS

SEED = [0, 1, 2, 3, 4, 5, 6]
oof = np.zeros((len(train), len(target_cols)))
predictions = np.zeros((len(test), len(target_cols)))

for seed in SEED:
    
    oof_, predictions_ = run_k_fold(NFOLDS, seed)
    oof += oof_ / len(SEED)
    predictions += predictions_ / len(SEED)

train[target_cols] = oof
test[target_cols] = predictions

In [None]:
train_targets_scored

In [None]:
len(target_cols)

In [None]:
valid_results = train_targets_scored.drop(columns=target_cols).merge(train[['sig_id']+target_cols], on='sig_id', how='left').fillna(0)

y_true = train_targets_scored[target_cols].values
y_pred = valid_results[target_cols].values

score = 0
for i in range(len(target_cols)):
    score_ = log_loss(y_true[:, i], y_pred[:, i])
    score += score_ / target.shape[1]
    
print("CV log_loss: ", score)

In [None]:
sub = sample_submission.drop(columns=target_cols).merge(test[['sig_id']+target_cols], on='sig_id', how='left').fillna(0)
sub.to_csv('submission.csv', index=False)

In [None]:
sub.shape

[Go to Top](#0)

# # MoA Predictions 🧬: Overfitting with TabNet Ver7



In [None]:
!pip install --no-index --find-links /kaggle/input/pytorchtabnet/pytorch_tabnet-2.0.0-py3-none-any.whl pytorch-tabnet
!pip install /kaggle/input/iterative-stratification/iterative-stratification-master/

In [None]:
### General ###
import os
import sys
import copy
import tqdm
import pickle
import random
import warnings
warnings.filterwarnings("ignore")
sys.path.append("../input/rank-gauss")
os.environ["CUDA_LAUNCH_BLOCKING"] = '1'

### Data Wrangling ###
import numpy as np
import pandas as pd
from scipy import stats
from gauss_rank_scaler import GaussRankScaler

### Data Visualization ###
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

### Machine Learning ###
from sklearn.decomposition import PCA
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import roc_auc_score, log_loss
from sklearn.preprocessing import QuantileTransformer
from sklearn.feature_selection import VarianceThreshold
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

### Deep Learning ###
import torch
from torch import nn
import torch.optim as optim
from torch.nn import functional as F
from torch.nn.modules.loss import _WeightedLoss
from torch.utils.data import DataLoader, Dataset
from torch.optim.lr_scheduler import ReduceLROnPlateau
# Tabnet 
from pytorch_tabnet.metrics import Metric
from pytorch_tabnet.tab_model import TabNetRegressor

### Make prettier the prints ###
from colorama import Fore
c_ = Fore.CYAN
m_ = Fore.MAGENTA
r_ = Fore.RED
b_ = Fore.BLUE
y_ = Fore.YELLOW
g_ = Fore.GREEN

In [None]:
seed = 42

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
set_seed(seed)

In [None]:
# Parameters
data_path = "../input/lish-moa/"
no_ctl = True
scale = "rankgauss"
variance_threshould = 0.7
decompo = "PCA"
ncompo_genes = 80
ncompo_cells = 10
encoding = "dummy"

In [None]:
train = pd.read_csv(data_path + "train_features.csv")
#train.drop(columns = ["sig_id"], inplace = True)

targets = pd.read_csv(data_path + "train_targets_scored.csv")
#train_targets_scored.drop(columns = ["sig_id"], inplace = True)

#train_targets_nonscored = pd.read_csv(data_path + "train_targets_nonscored.csv")

test = pd.read_csv(data_path + "test_features.csv")
#test.drop(columns = ["sig_id"], inplace = True)

submission = pd.read_csv(data_path + "sample_submission.csv")

In [None]:
if no_ctl:
    # cp_type == ctl_vehicle
    print(b_, "not_ctl")
    train = train[train["cp_type"] != "ctl_vehicle"]
    test = test[test["cp_type"] != "ctl_vehicle"]
    targets = targets.iloc[train.index]
    train.reset_index(drop = True, inplace = True)
    test.reset_index(drop = True, inplace = True)
    targets.reset_index(drop = True, inplace = True)

In [None]:
def distributions(num, graphs, items, features, gorc):
    """
    Plot the distributions of gene expression or cell viability data
    """
    for i in range(0, num - 1, 7):
        if i >= 3:
            break
        idxs = list(np.array([0, 1, 2, 3, 4, 5, 6]) + i)
    
        fig, axs = plt.subplots(1, 7, sharey = True)
        for k, item in enumerate(idxs):
            if item >= items:
                break
            graph = sns.distplot(train[features].values[:, item], ax = axs[k])
            graph.set_title(f"{gorc}-{item}")
            graphs.append(graph)

In [None]:
GENES = [col for col in train.columns if col.startswith("g-")]
CELLS = [col for col in train.columns if col.startswith("c-")]

In [None]:
gnum = train[GENES].shape[1]
graphs = []

distributions(gnum, graphs, 771, GENES, "g")

In [None]:
cnum = train[CELLS].shape[1]
graphs = []

distributions(cnum, graphs, 100, CELLS, "c")

In [None]:
gnum = test[GENES].shape[1]
graphs = []

distributions(gnum, graphs, 771, GENES, "g")

In [None]:
cnum = test[CELLS].shape[1]
graphs = []

distributions(cnum, graphs, 100, CELLS, "c")

In [None]:
data_all = pd.concat([train, test], ignore_index = True)
cols_numeric = [feat for feat in list(data_all.columns) if feat not in ["sig_id", "cp_type", "cp_time", "cp_dose"]]
mask = (data_all[cols_numeric].var() >= variance_threshould).values
tmp = data_all[cols_numeric].loc[:, mask]
data_all = pd.concat([data_all[["sig_id", "cp_type", "cp_time", "cp_dose"]], tmp], axis = 1)
cols_numeric = [feat for feat in list(data_all.columns) if feat not in ["sig_id", "cp_type", "cp_time", "cp_dose"]]

In [None]:
def scale_minmax(col):
    return (col - col.min()) / (col.max() - col.min())

def scale_norm(col):
    return (col - col.mean()) / col.std()

if scale == "boxcox":
    print(b_, "boxcox")
    data_all[cols_numeric] = data_all[cols_numeric].apply(scale_minmax, axis = 0)
    trans = []
    for feat in cols_numeric:
        trans_var, lambda_var = stats.boxcox(data_all[feat].dropna() + 1)
        trans.append(scale_minmax(trans_var))
    data_all[cols_numeric] = np.asarray(trans).T
    
elif scale == "norm":
    print(b_, "norm")
    data_all[cols_numeric] = data_all[cols_numeric].apply(scale_norm, axis = 0)
    
elif scale == "minmax":
    print(b_, "minmax")
    data_all[cols_numeric] = data_all[cols_numeric].apply(scale_minmax, axis = 0)
    
elif scale == "rankgauss":
    ### Rank Gauss ###
    print(b_, "Rank Gauss")
    scaler = GaussRankScaler()
    data_all[cols_numeric] = scaler.fit_transform(data_all[cols_numeric])
    
else:
    pass

In [None]:
# PCA
if decompo == "PCA":
    print(b_, "PCA")
    GENES = [col for col in data_all.columns if col.startswith("g-")]
    CELLS = [col for col in data_all.columns if col.startswith("c-")]
    
    pca_genes = PCA(n_components = ncompo_genes,
                    random_state = seed).fit_transform(data_all[GENES])
    pca_cells = PCA(n_components = ncompo_cells,
                    random_state = seed).fit_transform(data_all[CELLS])
    
    pca_genes = pd.DataFrame(pca_genes, columns = [f"pca_g-{i}" for i in range(ncompo_genes)])
    pca_cells = pd.DataFrame(pca_cells, columns = [f"pca_c-{i}" for i in range(ncompo_cells)])
    data_all = pd.concat([data_all, pca_genes, pca_cells], axis = 1)
else:
    pass

In [None]:
# Encoding
if encoding == "lb":
    print(b_, "Label Encoding")
    for feat in ["cp_time", "cp_dose"]:
        data_all[feat] = LabelEncoder().fit_transform(data_all[feat])
elif encoding == "dummy":
    print(b_, "One-Hot")
    data_all = pd.get_dummies(data_all, columns = ["cp_time", "cp_dose"])

In [None]:
GENES = [col for col in data_all.columns if col.startswith("g-")]
CELLS = [col for col in data_all.columns if col.startswith("c-")]

for stats in tqdm.tqdm(["sum", "mean", "std", "kurt", "skew"]):
    data_all["g_" + stats] = getattr(data_all[GENES], stats)(axis = 1)
    data_all["c_" + stats] = getattr(data_all[CELLS], stats)(axis = 1)    
    data_all["gc_" + stats] = getattr(data_all[GENES + CELLS], stats)(axis = 1)

In [None]:
def distributions(num, graphs, items, features, gorc):
    """
    Plot the distributions of gene expression or cell viability data
    """
    for i in range(0, num - 1, 7):
        if i >= 3:
            break
        idxs = list(np.array([0, 1, 2, 3, 4, 5, 6]) + i)
    
        fig, axs = plt.subplots(1, 7, sharey = True)
        for k, item in enumerate(idxs):
            if item >= items:
                break
            graph = sns.distplot(data_all[features].values[:, item], ax = axs[k])
            graph.set_title(f"{gorc}-{item}")
            graphs.append(graph)

In [None]:
gnum = data_all[GENES].shape[1]
graphs = []

distributions(gnum, graphs, 771, GENES, "g")

In [None]:
cnum = data_all[CELLS].shape[1]
graphs = []

distributions(cnum, graphs, 100, CELLS, "c")


In [None]:
with open("data_all.pickle", "wb") as f:
    pickle.dump(data_all, f)

In [None]:
# train_df and test_df
features_to_drop = ["sig_id", "cp_type"]
data_all.drop(features_to_drop, axis = 1, inplace = True)
try:
    targets.drop("sig_id", axis = 1, inplace = True)
except:
    pass
train_df = data_all[: train.shape[0]]
train_df.reset_index(drop = True, inplace = True)
# The following line it's a bad practice in my opinion, targets on train set
#train_df = pd.concat([train_df, targets], axis = 1)
test_df = data_all[train_df.shape[0]: ]
test_df.reset_index(drop = True, inplace = True)

In [None]:
print(f"{b_}train_df.shape: {r_}{train_df.shape}")
print(f"{b_}test_df.shape: {r_}{test_df.shape}")

In [None]:
X_test = test_df.values
print(f"{b_}X_test.shape: {r_}{X_test.shape}")

In [None]:
MAX_EPOCH = 200
#MAX_EPOCH = 2 #変更

# n_d and n_a are different from the original work, 32 instead of 24
# This is the first change in the code from the original
tabnet_params = dict(
    n_d = 32,
    n_a = 32,
    n_steps = 1,
    gamma = 1.3,
    lambda_sparse = 0,
    optimizer_fn = optim.Adam,
    optimizer_params = dict(lr = 2e-2, weight_decay = 1e-5),
    mask_type = "entmax",
    scheduler_params = dict(
        mode = "min", patience = 5, min_lr = 1e-5, factor = 0.9),
    scheduler_fn = ReduceLROnPlateau,
    seed = seed,
    verbose = 10
)

In [None]:
class LogitsLogLoss(Metric):
    """
    LogLoss with sigmoid applied
    """

    def __init__(self):
        self._name = "logits_ll"
        self._maximize = False

    def __call__(self, y_true, y_pred):
        """
        Compute LogLoss of predictions.

        Parameters
        ----------
        y_true: np.ndarray
            Target matrix or vector
        y_score: np.ndarray
            Score matrix or vector

        Returns
        -------
            float
            LogLoss of predictions vs targets.
        """
        logits = 1 / (1 + np.exp(-y_pred))
        aux = (1 - y_true) * np.log(1 - logits + 1e-15) + y_true * np.log(logits + 1e-15)
        return np.mean(-aux)

In [None]:
scores_auc_all = []
test_cv_preds = []

NB_SPLITS = 10 # 7
#NB_SPLITS = 2 #変更

mskf = MultilabelStratifiedKFold(n_splits = NB_SPLITS, random_state = 0, shuffle = True)

oof_preds = []
oof_targets = []
scores = []
scores_auc = []
for fold_nb, (train_idx, val_idx) in enumerate(mskf.split(train_df, targets)):
    print(b_,"FOLDS: ", r_, fold_nb + 1)
    print(g_, '*' * 60, c_)
    
    X_train, y_train = train_df.values[train_idx, :], targets.values[train_idx, :]
    X_val, y_val = train_df.values[val_idx, :], targets.values[val_idx, :]
    ### Model ###
    model = TabNetRegressor(**tabnet_params)
        
    ### Fit ###
    # Another change to the original code
    # virtual_batch_size of 32 instead of 128
    model.fit(
        X_train = X_train,
        y_train = y_train,
        eval_set = [(X_val, y_val)],
        eval_name = ["val"],
        eval_metric = ["logits_ll"],
        max_epochs = MAX_EPOCH,
        patience = 20,
        batch_size = 1024, 
        virtual_batch_size = 32,
        num_workers = 1,
        drop_last = False,
        # To use binary cross entropy because this is not a regression problem
        loss_fn = F.binary_cross_entropy_with_logits
    )
    print(y_, '-' * 60)
    
    ### Predict on validation ###
    preds_val = model.predict(X_val)
    # Apply sigmoid to the predictions
    preds = 1 / (1 + np.exp(-preds_val))
    score = np.min(model.history["val_logits_ll"])
    
    ### Save OOF for CV ###
    oof_preds.append(preds_val)
    oof_targets.append(y_val)
    scores.append(score)
    
    ### Predict on test ###
    preds_test = model.predict(X_test)
    test_cv_preds.append(1 / (1 + np.exp(-preds_test)))

oof_preds_all = np.concatenate(oof_preds)
oof_targets_all = np.concatenate(oof_targets)
test_preds_all = np.stack(test_cv_preds)

In [None]:
aucs = []
for task_id in range(oof_preds_all.shape[1]):
    aucs.append(roc_auc_score(y_true = oof_targets_all[:, task_id],
                              y_score = oof_preds_all[:, task_id]
                             ))
print(f"{b_}Overall AUC: {r_}{np.mean(aucs)}")
print(f"{b_}Average CV: {r_}{np.mean(scores)}")

In [None]:
all_feat = [col for col in submission.columns if col not in ["sig_id"]]
# To obtain the same lenght of test_preds_all and submission
test = pd.read_csv(data_path + "test_features.csv")
sig_id = test[test["cp_type"] != "ctl_vehicle"].sig_id.reset_index(drop = True)
tmp = pd.DataFrame(test_preds_all.mean(axis = 0), columns = all_feat)
tmp["sig_id"] = sig_id

submission = pd.merge(test[["sig_id"]], tmp, on = "sig_id", how = "left")
submission.fillna(0, inplace = True)

#submission[all_feat] = tmp.mean(axis = 0)

# Set control to 0
#submission.loc[test["cp_type"] == 0, submission.columns[1:]] = 0
submission.to_csv("submission2.csv", index = None)
submission.head()

In [None]:
print(f"{b_}submission2.shape: {r_}{submission.shape}")

# # # pca_var_cv_simple_nn fold6


In [None]:
import sys
sys.path.append('../input/iterative-stratification/iterative-stratification-master/')

import numpy as np
import random
import pandas as pd
import os
from matplotlib import pyplot as plt 
%matplotlib inline
import seaborn as sns
sns.set_style('ticks')
sns.set_context("poster")
sns.set_palette('colorblind')
from sklearn.metrics import log_loss
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.preprocessing import QuantileTransformer
from sklearn.feature_selection import VarianceThreshold
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.nn.modules.loss import _WeightedLoss
from torch.utils.data import Dataset, DataLoader
import warnings
warnings.filterwarnings('ignore')
# os.listdir('../input/lish-moa')

In [None]:
plt.rcParams['figure.figsize'] = (20.0, 10.0)
device = ('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
params = {'device': device,
          'n_comp_g': 450, 
          'n_comp_c': 45, 
          'var_thresh': 0.67,
          'epochs': 25,
          'batch_size': 128,
          'lr': 1e-3,
          'weight_decay': 1e-5, 
          #'n_folds': 7, 
          'n_folds': 6, #変更 
          
          'early_stopping_steps': 10,
          'early_stop': False,
          'in_size': None,
          'out_size': None,
          'hidden_size': 1500}

In [None]:
train_features = pd.read_csv('../input/lish-moa/train_features.csv') # ../input/lish-moa/
train_targets = pd.read_csv('../input/lish-moa/train_targets_scored.csv') # ../input/lish-moa/

test_features = pd.read_csv('../input/lish-moa/test_features.csv') # ../input/lish-moa/
sample_submission = pd.read_csv('../input/lish-moa/sample_submission.csv') # ../input/lish-moa/

In [None]:
g_features = [col for col in train_features.columns if col.startswith('g-')]
c_features = [col for col in train_features.columns if col.startswith('c-')]

g_c_features = g_features + c_features

In [None]:
transformer = QuantileTransformer(n_quantiles=100,random_state=0, output_distribution="normal")


In [None]:
trans_train_features = transformer.fit_transform(train_features[g_c_features])
trans_test_features = transformer.transform(test_features[g_c_features])

trans_train_df = pd.DataFrame(trans_train_features, columns = g_c_features)
trans_test_df = pd.DataFrame(trans_test_features, columns = g_c_features)

train_features = pd.concat([train_features.drop(columns=g_c_features), trans_train_df], axis=1)
test_features = pd.concat([test_features.drop(columns=g_c_features), trans_test_df], axis=1)

In [None]:
g_sample = random.sample(g_features, 3)
c_sample = random.sample(c_features, 3)

In [None]:
colors = ['navy', 'r', 'g']
for col, color in zip(g_sample, colors):
    plt.hist(test_features[col], bins=50, alpha=0.5, label=col)
    plt.axvline(np.median(test_features[col]), linewidth=3, color=color, label='median_{}'.format(col))
plt.xlim(-7, 7)
plt.legend();

In [None]:
colors = ['navy', 'r', 'g']
for col, color in zip(c_sample, colors):
    plt.hist(test_features[col], bins=50, alpha=0.5, label=col)
    plt.axvline(np.median(test_features[col]), linewidth=3, color=color, label='median_{}'.format(col))
plt.xlim(-7, 7)
plt.legend();

In [None]:
def transfrom_all_data(transformer, train, test, feature_list):
    
    data = pd.concat([train[feature_list], test[feature_list]], axis=0).reset_index(drop=True)
    n = train.shape[0]
    
    data_trans = transformer.fit_transform(data)
    train_trans = data_trans[:n, :]
    test_trans = data_trans[n:, :]
    return train_trans, test_trans

In [None]:
def make_pca_features(n_comp, train, test, feature_list, name, normalize=False, scaler=None):
    
    pca = PCA(n_comp)
    
    train_pca, test_pca = transfrom_all_data(pca, train, test, feature_list)
    
    if normalize and scaler is not None:
        train_pca = scaler.fit_transform(train_pca)
        test_pca = scaler.transform(test_pca)
    
    for i in range(n_comp):
        train['{0}_{1}'.format(name, i)] = train_pca[:, i]
        test['{0}_{1}'.format(name, i)] = test_pca[:, i]
        
    return train, test

In [None]:
def preprocess(data):
    data['cp_time'] = data['cp_time'].map({24:0, 48:1, 72:2})
    data['cp_dose'] = data['cp_dose'].map({'D1':0, 'D2':1})
    return data

In [None]:
train_features, test_features = make_pca_features(params['n_comp_g'], train_features, test_features, g_features, 'g_pca')


In [None]:
train_features, test_features = make_pca_features(params['n_comp_c'], train_features, test_features, c_features, 'c_pca')

In [None]:
var_thresh = VarianceThreshold(params['var_thresh'])
to_thresh = train_features.columns[4:]
cat_features = train_features.columns[:4]

In [None]:
train_thresh, test_thresh = transfrom_all_data(var_thresh, train_features, test_features, to_thresh)

In [None]:
train_features = pd.concat([train_features[cat_features], pd.DataFrame(train_thresh)], axis=1)
test_features = pd.concat([test_features[cat_features], pd.DataFrame(test_thresh)], axis=1)

In [None]:
train_mask = train_features['cp_type'] != 'ctl_vehicle'
train_sig_ids = train_features.loc[train_mask]['sig_id']
train = train_features.loc[train_mask].reset_index(drop=True)

test_mask = test_features['cp_type'] != 'ctl_vehicle'
test_sig_ids = test_features.loc[test_mask]['sig_id']
test = test_features.loc[test_mask].reset_index(drop=True)

train_target_sigids = train_targets[['sig_id']]
y_true  = train_targets.copy()

train_targets = train_targets[train_targets['sig_id'].isin(train_sig_ids)].reset_index(drop=True)
train_targets.drop(columns=['sig_id'], inplace=True)
train_targets.reset_index(drop=True, inplace=True)

In [None]:
params['in_size'] = train.shape[1] - 2
params['out_size'] = train_targets.shape[1]

In [None]:
mskf = MultilabelStratifiedKFold(n_splits=params['n_folds'])

In [None]:
folds = train.copy()

for f, (t_idx, v_idx) in enumerate(mskf.split(X=train, y=train_targets)):
    folds.loc[v_idx, 'kfold'] = int(f)

folds['kfold'] = folds['kfold'].astype(int)

In [None]:
class TabularDataset:
    
    def __init__(self, X, y):
        self.X = X
        self.y = y
    
    def __len__(self):
        return(self.X.shape[0])
    
    def __getitem__(self, i):
        
        X_i = torch.tensor(self.X[i, :], dtype=torch.float)
        y_i = torch.tensor(self.y[i, :], dtype=torch.float)
        
        return X_i, y_i
    
    

class TabularDatasetTest:
    
    def __init__(self, X):
        self.X = X
    
    def __len__(self):
        return(self.X.shape[0])
    
    def __getitem__(self, i):
        
        X_i = torch.tensor(self.X[i, :], dtype=torch.float)        
        return X_i

In [None]:
def train_func(model, optimizer, scheduler, loss_func, dataloader, device):
    
    train_loss = 0
    
    model.train()  
    for inputs, labels in dataloader:        
        optimizer.zero_grad()
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        loss = loss_func(outputs, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()
        
        train_loss += loss.item()
        
    train_loss /= len(dataloader)
    
    return train_loss

In [None]:
def valid_func(model, loss_func, dataloader, device):
    
    model.eval()
    
    valid_loss = 0
    valid_preds = []
    
    for inputs, labels in dataloader:   
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        loss = loss_func(outputs, labels)
        
        valid_loss += loss.item()
        valid_preds.append(outputs.sigmoid().detach().cpu().numpy())
        
    valid_loss /= len(dataloader)
    valid_preds = np.concatenate(valid_preds)
    
    return valid_loss, valid_preds

In [None]:
def inference_fn(model, dataloader, device):
    model.eval()
    preds = []
    
    for data in dataloader:
        inputs = data.to(device)

        with torch.no_grad():
            outputs = model(inputs)
        
        preds.append(outputs.sigmoid().detach().cpu().numpy())
        
    preds = np.concatenate(preds)
    
    return preds

In [None]:
class SmoothBCEwLogits(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    @staticmethod
    def _smooth(targets:torch.Tensor, n_labels:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = targets * (1.0 - smoothing) + 0.5 * smoothing
        return targets

    def forward(self, inputs, targets):
        targets = SmoothBCEwLogits._smooth(targets, inputs.size(-1),
            self.smoothing)
        loss = F.binary_cross_entropy_with_logits(inputs, targets,self.weight)

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

In [None]:
class Model(nn.Module):
    def __init__(self, num_features, num_targets, hidden_size):
        super(Model, self).__init__()
        self.batch_norm1 = nn.BatchNorm1d(num_features)
        self.dense1 = nn.utils.weight_norm(nn.Linear(num_features, hidden_size))
        
        self.batch_norm2 = nn.BatchNorm1d(hidden_size)
        self.dropout2 = nn.Dropout(0.25)
        self.dense2 = nn.utils.weight_norm(nn.Linear(hidden_size, hidden_size))
        
        self.batch_norm3 = nn.BatchNorm1d(hidden_size)
        self.dropout3 = nn.Dropout(0.25)
        self.dense3 = nn.utils.weight_norm(nn.Linear(hidden_size, num_targets))
    
    def forward(self, x):
        x = self.batch_norm1(x)
        x = F.leaky_relu(self.dense1(x))
        
        x = self.batch_norm2(x)
        x = self.dropout2(x)
        x = F.leaky_relu(self.dense2(x))
        
        x = self.batch_norm3(x)
        x = self.dropout3(x)
        x = self.dense3(x)
        
        return x


In [None]:
def run_training(fold, seed):
    
    seed_everything(seed)
    
    train = preprocess(folds.drop(columns = ['sig_id', 'cp_type']))
    
    train_mask = train['kfold'] != fold
    valid_idc = train.loc[~train_mask].index
    
    X_train = train.loc[train_mask].reset_index(drop=True)
    y_train = train_targets.loc[train_mask].reset_index(drop=True)

    
    X_val = train.loc[~train_mask].reset_index(drop=True)
    y_val = train_targets.loc[~train_mask].reset_index(drop=True)
    
    X_train.drop(columns=['kfold'], inplace=True)
    X_val.drop(columns=['kfold'], inplace=True)
    
    test_ = preprocess(test.drop(columns = ['sig_id', 'cp_type']))

    
    train_ds = TabularDataset(X_train.values, y_train.values)
    valid_ds = TabularDataset(X_val.values, y_val.values)
    test_ds = TabularDatasetTest(test_.values)
    
    train_dl = DataLoader(train_ds, batch_size=params['batch_size'], shuffle=True)
    valid_dl = DataLoader(valid_ds, batch_size=params['batch_size'], shuffle=False)
    test_dl = DataLoader(test_ds, batch_size=params['batch_size'], shuffle=False)
    
    
    model = Model(num_features=params['in_size'], num_targets=params['out_size'], 
                  hidden_size=params['hidden_size'] )
    
    model.to(params['device'])
    
    optimizer = torch.optim.Adam(model.parameters(), lr=params['lr'], weight_decay=params['weight_decay'])
    scheduler = optim.lr_scheduler.OneCycleLR(optimizer=optimizer, pct_start=0.1, div_factor=1e3, 
                                              max_lr=1e-2, epochs=params['epochs'], steps_per_epoch=len(train_dl))
    
    loss_fn = nn.BCEWithLogitsLoss()
    loss_tr = SmoothBCEwLogits(smoothing=0.001)
    
    early_stopping_steps = params['early_stopping_steps']
    early_step = 0
   
    oof = np.zeros((train.shape[0], params['out_size']))
    best_loss = np.inf
    
    for epoch in range(params['epochs']):
        
        train_loss = train_func(model, optimizer,scheduler, loss_tr, train_dl, params['device'])
        print(f"FOLD: {fold}, EPOCH: {epoch}, train_loss: {train_loss}")
        valid_loss, valid_preds = valid_func(model, loss_fn, valid_dl, params['device'])
        print(f"FOLD: {fold}, EPOCH: {epoch}, valid_loss: {valid_loss}")
        
        if valid_loss < best_loss:
            
            best_loss = valid_loss
            oof[valid_idc] = valid_preds
            torch.save(model.state_dict(), f"FOLD{fold}_.pth")
        
        elif(params['early_stop'] == True):
            
            early_step += 1
            if (early_step >= early_stopping_steps):
                break
            
    
    #--------------------- PREDICTION---------------------

    
    model = Model(num_features=params['in_size'], num_targets=params['out_size'], 
                  hidden_size=params['hidden_size'] )
    model.load_state_dict(torch.load(f"FOLD{fold}_.pth"))
    model.to(params['device'])
    
    
    predictions = np.zeros((test.shape[0], params['out_size']))
    predictions = inference_fn(model, test_dl, params['device'])
    
    return oof, predictions

In [None]:
def run_k_fold(n_folds, seed):
    oof = np.zeros((train.shape[0], params['out_size']))
    predictions = np.zeros((test.shape[0], params['out_size']))
    
    for fold in range(n_folds):
        oof_, pred_ = run_training(fold, seed)
        
        predictions += pred_ / n_folds
        oof += oof_
        
    return oof, predictions

In [None]:
def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    
seed_everything(seed=42)

In [None]:
#seeds = [0, 1, 2, 3, 4, 5, 6]
seeds = [0, 1, 2, 3, 4, 5] #変更

oof = np.zeros((train.shape[0], params['out_size']))
predictions = np.zeros((test.shape[0], params['out_size']))

for seed in seeds:
    
    oof_, predictions_ = run_k_fold(params['n_folds'], seed)
    oof += oof_ / len(seeds)
    predictions += predictions_ / len(seeds)

In [None]:
valid_results = pd.concat([train_target_sigids[train_target_sigids['sig_id'].isin(train_sig_ids)].reset_index(drop=True), pd.DataFrame(oof)], axis=1)

In [None]:
test_results = pd.concat([test[['sig_id']], pd.DataFrame(predictions, columns = sample_submission.columns[1:])], axis=1)

In [None]:
valid_full = train_target_sigids.merge(valid_results, on='sig_id', how='left').fillna(0)


In [None]:
y_true = y_true.drop(columns=['sig_id']).values
y_pred = valid_full.drop(columns=['sig_id']).values

score = 0
for i in range(y_true.shape[1]):
    score_ = log_loss(y_true[:, i], y_pred[:, i])
    score += score_ / y_true.shape[1]
    
print("CV log_loss: ", score)    

In [None]:
sub = sample_submission[['sig_id']].merge(test_results, on='sig_id', how='left').fillna(0)
sub.to_csv('submission3.csv', index=False)

平均をとる

In [None]:
sub1 = pd.read_csv('submission.csv')
sub2 = pd.read_csv('submission2.csv')
sub3 = pd.read_csv('submission3.csv')

sub_id = sub1[['sig_id']] #id部分のみを抜き出し

sub1 = sub1[sub1.columns[sub1.columns != 'sig_id']] #数値のみを抜き出し 
sub2 = sub2[sub2.columns[sub2.columns != 'sig_id']] #数値のみを抜き出し
sub3 = sub3[sub3.columns[sub3.columns != 'sig_id']] #数値のみを抜き出し

sub4 = (sub1 + sub2 + sub3) / 3 #平均

sub_mean = pd.concat([sub_id, sub4], axis=1) #idと数値を結合

sub_mean.to_csv('submission.csv', index=False)
sub_mean.head()