# **Shopee Code League** - *Product Detection*

This is my team's solution for 38th place (top 5%) in Private LB with 0.82362 score of [Student] Shopee Code League 2020 - Product Detection.

https://www.kaggle.com/c/shopee-product-detection-student/

This projects used FastAI2 API by implementing ensemble learning from various models and comparing the probabilities of each labels the models have predicted

# Initialization

## Checking GPU

In [None]:
!nvidia-smi

---

## Mounting google drive

If you're using Google Colab, here are the code to mount your google drive in order to get the dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!unzip "....." # Your file path

---

# Importing Packages Needed

## Common library

In [None]:
import matplotlib.pyplot as plt
import os
import numpy as np
import pandas as pd
import torchvision.models as models

## Installing FastAI2

In [None]:
!pip install git+https://github.com/fastai/fastai2 
from fastai2.vision.all import *

---

# Data Loading and Preprocessing

## Load the data

Reading Files

In [None]:
train = pd.read_csv("...") # Your train.csv path
test = pd.read_csv("...") # Your test.csv path

Making path to image data directory

In [None]:
train_path = Path("...") # Your train image dataset path
test_path = Path("...") # Your test image dataset path

Changing the filename in train dataset for fastai2 processing

In [None]:
train['filename'] = train.apply(lambda x: str(x.category).zfill(2) + '/' + x.filename, axis=1)
train

Unnamed: 0,filename,category
0,03/45e2d0c97f7bdf8cbf3594beb6fdcda0.jpg,3
1,03/f74d1a5fc2498bbbfa045c74e3cc333e.jpg,3
2,03/f6c172096818c5fab10ecae722840798.jpg,3
3,03/251ffd610399ac00fea7709c642676ee.jpg,3
4,03/73c7328b8eda399199fdedec6e4badaf.jpg,3
...,...,...
105387,25/047a60001de0331608ba64092cc7ae2b.jpg,25
105388,25/ea39ac66ccdc4b4d4c6443f6c54d8ae3.jpg,25
105389,25/6215f8c52c5bbcfe3e63e0f3ac6265f8.jpg,25
105390,25/1733d8286f6658149c7b7cdeb40d6461.jpg,25


## Data Processing

The images are loaded using FastAI2 and are processed with image augmentation and normalization based on ImageNet_Stats

In [None]:
item_tfms = [RandomResizedCrop(224, 
                               min_scale=0.9 # Change this min_scale according to model
                               )]
batch_tfms = [*aug_transforms(), 
              Normalize.from_stats(*imagenet_stats)]

We're making a function to get the data needed for training from a dataframe using ImageDataLoaders

In [None]:
def get_dls_from_df(df):
    df = df.copy()
    options = {
        "item_tfms": item_tfms,
        "batch_tfms": batch_tfms,
        "bs": 32, # Change this batch size for different result
    }
    dls = ImageDataLoaders.from_df(df, train_path, **options)
    return dls

In [None]:
dls = get_dls_from_df(train)

---

# Training and modeling

* Model 1: Densenet-201 | 4 Epochs | 0.95 Min Scale
* Model 2: Densenet-201 | 5 Epochs | 0.75 Min Scale
* Model 3: Densenet-169 | 4 Epochs | 0.75 Min Scale
* Model 4: Densenet-161 | 5 Epochs | 0.8  Min Scale
* Model 5: Densenet-121 | 4 Epochs | 0.95 Min Scale
* Model 6: Resnet-152   | 5 Epochs | 0.75 Min Scale
* Model 7: Resnet-101   | 4 Epochs | 0.9  Min Scale
* Model 8: Resnet-50    | 4 Epochs | 0.75 Min Scale



Please change the epochs and min_scale in data processing for possibly better result

In [None]:
learn1 = cnn_learner(dls, densenet201, metrics=accuracy)
learn1.fine_tune(4)

In [None]:
learn2 = cnn_learner(dls, densenet201, metrics=accuracy)
learn2.fine_tune(5)

In [None]:
learn3 = cnn_learner(dls, densenet169, metrics=accuracy)
learn3.fine_tune(4)

In [None]:
learn4 = cnn_learner(dls, densenet161, metrics=accuracy)
learn4.fine_tune(5)

In [None]:
learn5 = cnn_learner(dls, densenet121, metrics=accuracy)
learn5.fine_tune(4)

In [None]:
learn6 = cnn_learner(dls, resnet152, metrics=accuracy)
learn6.fine_tune(5)

In [None]:
learn7 = cnn_learner(dls, resnet101, metrics=accuracy)
learn7.fine_tune(4)

In [None]:
learn8 = cnn_learner(dls, resnet50, metrics=accuracy)
learn8.fine_tune(4)

---

# Prediction

Changing the filename in train dataset for fastai2 processing

In [None]:
test_images = test.filename.apply(lambda fn: test_path/fn)
test_dl = dls.test_dl(test_images)

Making a function to get the prediction out of each model

In [None]:
def get_prediction(learner, dls):
  preds = learner.get_preds(dl=dls, with_decoded=True)
  return preds

The result will be a tensor where:
* preds[0] = Probabilities
* preds[1] = Ground truth
* preds[2] = Label predictions

In [None]:
preds1 = get_prediction(learn1, test_dl)
preds2 = get_prediction(learn2, test_dl)
preds3 = get_prediction(learn3, test_dl)
preds4 = get_prediction(learn4, test_dl)
preds5 = get_prediction(learn5, test_dl)
preds6 = get_prediction(learn6, test_dl)
preds7 = get_prediction(learn7, test_dl)
preds8 = get_prediction(learn8, test_dl)

---

# Ensemble Learning

## Preparing for ensemble learning

Firstly we have to get a dataframe containing the label prediction and probabilities for each model

In [None]:
def get_all_df(preds):
  df = test[["filename"]]

  """
  Getting probabilities
  """
  proba = preds[0].tolist()

  max_proba = []

  for prob in proba:
    max_proba.append(max(prob))
  
  """
  Getting prediction
  """
  prediction = preds[2].tolist()

  df['probability'] = max_proba
  df['prediction'] = prediction

  return df

In [None]:
df1 = get_all_df(preds1)
df2 = get_all_df(preds2)
df3 = get_all_df(preds3)
df4 = get_all_df(preds4)
df5 = get_all_df(preds5)
df6 = get_all_df(preds6)
df7 = get_all_df(preds7)
df8 = get_all_df(preds8)

---

## Doing the ensemble Learning

Making sure that we selected only the suitable columns needed for the ensemble learning

In [None]:
df = df[['filename', 'prediction', 'probability']]
df2 = df2[['filename', 'prediction', 'probability']]
df3 = df3[['filename', 'prediction', 'probability']]
df4 = df4[['filename', 'prediction', 'probability']]
df5 = df5[['filename', 'prediction', 'probability']]
df6 = df6[['filename', 'prediction', 'probability']]
df7 = df7[['filename', 'prediction', 'probability']]
df8 = df8[['filename', 'prediction', 'probability']]

Combining all dataframe into a single dataframe

In [None]:
df8['prediction_2'] = df['prediction']
df8['probability_2'] = df['probability']
df8['prediction_3'] = df2['prediction']
df8['probability_3'] = df2['probability']
df8['prediction_4'] = df3['prediction']
df8['probability_4'] = df3['probability']
df8['prediction_5'] = df4['prediction']
df8['probability_5'] = df4['probability']
df8['prediction_6'] = df5['prediction']
df8['probability_6'] = df5['probability']
df8['prediction_7'] = df6['prediction']
df8['probability_7'] = df6['probability']
df8['prediction_8'] = df7['prediction']
df8['probability_8'] = df7['probability']

In [None]:
df8

We will get the best label prediction by summing up the probabilities for each label predicted by each model

In [None]:
final_pred = []

for row in df8.iterrows():
  dicts = {}
  for i in range(42):
    dicts[i] = 0

  a = row[1].probability
  b = row[1].probability_2
  c = row[1].probability_3
  d = row[1].probability_4
  e = row[1].probability_5
  f = row[1].probability_6
  g = row[1].probability_7
  h = row[1].probability_8

  al = int(row[1].prediction) 
  bl = int(row[1].prediction_2)
  cl = int(row[1].prediction_3)
  dl = int(row[1].prediction_4)
  el = int(row[1].prediction_5)
  fl = int(row[1].prediction_6)
  gl = int(row[1].prediction_7)
  hl = int(row[1].prediction_8)

  dicts[al] += a
  dicts[bl] += b
  dicts[cl] += c
  dicts[dl] += d
  dicts[el] += e
  dicts[fl] += f
  dicts[gl] += g
  dicts[hl] += h

  max_label = max(dicts, key=dicts.get)

  final_pred.append(max_label)

Making a dataframe out of the final prediction

In [None]:
ensemble_df = df[['filename']]
ensemble_df['category'] = final_pred 

# Zero-padding
ensemble_df["category"] = ensemble_df.category.apply(lambda c: str(c).zfill(2))

# Submission

In [None]:
ensemble_df.to_csv('submission.csv', index=False)

## Best Score: 0.82528