# Severstal Steel Defect Detection

## Dataset Description データセットの説明

Steel is one of the most important building materials of modern times.<br>
Steel buildings are resistant to natural and man-made wear which has made the material ubiquitous around the world.<br>
To help make production of steel more efficient, this competition will help identify defects.<br>
Severstal is leading the charge in efficient steel mining and production.<br>
They believe the future of metallurgy requires development across<br>
the economic, ecological, and social aspects of the industry—and they take corporate responsibility seriously.<br>
The company recently created the country’s largest industrial data lake, with petabytes of data that were previously discarded.<br>
Severstal is now looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production.<br>

The production process of flat sheet steel is especially delicate.<br>
From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship.<br>
Today, Severstal uses images from high frequency cameras to power a defect detection algorithm.<br>

In this competition, you’ll help engineers improve the algorithm by localizing and classifying surface defects on a steel sheet.<br>
If successful, you’ll help keep manufacturing standards for steel high and enable Severstal to continue their innovation,<br>
leading to a stronger, more efficient world all around us.
    
スチールは現代の最も重要な建築材料の1つです。鉄骨の建物は世界中のいたるところにある素材を作った自然および人工の摩耗に耐性があります。<br>
鋼の生産をより効率的にするために、この競争は欠陥の特定に役立ちます。 Severstalは、効率的な鋼の採掘と生産の分野をリードしています。<br>
彼らは、冶金学の未来には産業の経済的、生態学的、社会的側面にわたる発展が必要であると信じており、企業責任を真剣に受け止めています。<br>
同社は最近、ペタバイト規模のデータが以前に破棄された国内最大の産業データレイクを作成しました。<br>
Severstalは現在、自動化を改善し、効率を高め、生産の高品質を維持するために機械学習に注目しています。<br>

平らな鋼板の製造プロセスは特にデリケートです。加熱と圧延から乾燥と切断に至るまで出荷準備が整うまでにいくつかの機械が平鋼に接触します。<br>
現在、Severstalは高周波カメラの画像を使用して欠陥検出アルゴリズムを強化しています。<br>

このコンテストでは、エンジニアが鋼板の表面欠陥の位置を特定して分類することにより、アルゴリズムの改善を支援します。<br>
成功すれば鉄鋼の製造基準を高く維持し、Severstalがイノベーションを継続できるようになり、世界中のより強力で効率的な世界につながります。<br>

## Import Libraries ライブラリのインポート

In [2]:
# Basical Library of python analysis
import os
import gc    # garbage collection
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm    # show progress bar
from sklearn.model_selection import train_test_split

# OpenCV module
import cv2

# Keras modules
import keras
from keras import backend as K
from keras import layers
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model, load_model
from keras.layers import Input
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from keras.optimizers import Adam
from keras.callbacks import Callback, ModelCheckpoint

Using TensorFlow backend.


## Preprocessing Phase データ前処理

In [3]:
train_df = pd.read_csv('./train.csv')
train_df['ImageId'] = train_df['ImageId_ClassId'].apply(lambda x: x.split('_')[0])
train_df['ClassId'] = train_df['ImageId_ClassId'].apply(lambda x: x.split('_')[1])
train_df['hasMask'] = ~ train_df['EncodedPixels'].isna()

print(train_df.shape)
train_df.head()

(50272, 5)


Unnamed: 0,ImageId_ClassId,EncodedPixels,ImageId,ClassId,hasMask
0,0002cc93b.jpg_1,29102 12 29346 24 29602 24 29858 24 30114 24 3...,0002cc93b.jpg,1,True
1,0002cc93b.jpg_2,,0002cc93b.jpg,2,False
2,0002cc93b.jpg_3,,0002cc93b.jpg,3,False
3,0002cc93b.jpg_4,,0002cc93b.jpg,4,False
4,00031f466.jpg_1,,00031f466.jpg,1,False


In [4]:
mask_count_df = train_df.groupby('ImageId').agg(np.sum).reset_index()
mask_count_df.sort_values('hasMask', ascending=False, inplace=True)
print(mask_count_df.shape)
mask_count_df.head()

(12568, 2)


Unnamed: 0,ImageId,hasMask
10803,db4867ee8.jpg,3.0
11776,ef24da2ba.jpg,3.0
6284,7f30b9c64.jpg,2.0
9421,bf0c81db6.jpg,2.0
9615,c314f43f3.jpg,2.0


In [6]:
sub_df = pd.read_csv('./sample_submission.csv')
sub_df['ImageId'] = sub_df['ImageId_ClassId'].apply(lambda x: x.split('_')[0])
test_imgs = pd.DataFrame(sub_df['ImageId'].unique(), columns=['ImageId'])
test_imgs.head()

Unnamed: 0,ImageId
0,004f40c73.jpg
1,006f39c41.jpg
2,00b7fb703.jpg
3,00bbcd9af.jpg
4,0108ce457.jpg


In [7]:
non_missing_train_idx = mask_count_df[mask_count_df['hasMask'] > 0]
non_missing_train_idx.head()

Unnamed: 0,ImageId,hasMask
10803,db4867ee8.jpg,3.0
11776,ef24da2ba.jpg,3.0
6284,7f30b9c64.jpg,2.0
9421,bf0c81db6.jpg,2.0
9615,c314f43f3.jpg,2.0


## Remove test images without defects

In [8]:
def load_img(code, base, resize=True):
    path = f'{base}/{code}'
    img = cv2.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    if resize:
        img = cv2.resize(img, (256, 256))
    
    return img

def validate_path(path):
    if not os.path.exists(path):
        os.makedirs(path)

In [9]:
BATCH_SIZE = 64
def create_test_gen():
    return ImageDataGenerator(rescale=1/255.).flow_from_dataframe(
        test_imgs,
        directory='./test_images',
        x_col='ImageId',
        class_mode=None,
        target_size=(256, 256),
        batch_size=BATCH_SIZE,
        shuffle=False
    )

test_gen = create_test_gen()

Found 1769 images.


In [13]:
remove_model = load_model('./model.h5')
remove_model.summary()

OSError: Unable to open file (unable to open file: name = './model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

In [11]:
test_missing_pred = remove_model.predict_generator(
    test_gen,
    steps=len(test_gen),
    verbose=1
)

test_imgs['allMissing'] = test_missing_pred
test_imgs.head()

NameError: name 'remove_model' is not defined