# Detecting Steel Defects - Main Report Notebook 
This notebook details the entire CRISP-DM process for this project. 

##  1. Business Context

 Terms like ‘AI’, ‘Machine Learning’ and ‘Neural Networks’, usually inundate the mind with images of IBM’s Watson crushing some of human kind’s most knowledgeable representatives in Jeopardy, or Google’s AlphaGo beating the world’s top ranked player in Go, a game widely respected as one of the most complex strategy games of all time. What probably doesn’t spring to mind is the deployment of these technologies in an industry like steel manufacturing. However, in much the same way that computers have become ubiquitous in all businesses over the last 70 years, organizations of all kinds are realizing the power of utilizing machine learning solutions. Sometimes called ‘Industry 4.0’, manufacturers are adopting AI, deep learning, and computer vision to improve product quality, reduce costs, and increase efficiency. AI has made its way into every step of the manufacturing process, from supply chain to inventory management, however the focus of this project is computer vision for defect detection.

The dataset for this project comes from a kaggle competition put out by Russian steel production company Severstal, who are looking to utilize machine learning to ‘improve automation, increase efficiency, and maintain quality’ throughout their production process. They are part of the global movement of manufacturing companies towards increased use of AI, and understand the value unlocked by applied machine learning. The specific AI application they are looking to improve with this kaggle competition is the detection of defects in steel using images of sheet steel from their production process. Detection of defects using computer vision could be integrated into the manufacturing pipeline and help reduce costs, and material waste. 

Neural networks are an obvious choice for any computer vision task, and they will be the tool used for this project, as even simple artificial neural networks can perform well in image classification tasks due to their ability to cope with unstructured data such as images. The specific business use cases for the different network architectures are laid out below, under Data Understanding (EDA). 


## 2. Data Preparation and Environment Set Up 



The number of images in the dataset and their relatively high resolution meant that this project was going to be computationally intensive, so all steps were performed on the cloud, using AWS. The images and CSVs were downloaded locally, then uploaded to an Amazon S3 bucket. Using Amazon Sagemaker, an ml.m5.2xlarge notebook instance was created, with a volume size of 100GB. Multiple separate notebooks were created inside this instance, all with the built in conda_tensorflow2_p36 environment. 

In order for the neural networks to be able to interpret the training images, the images must first undergo several transformations. The first is to convert the .jpg files into raw arrays. The dataset was fairly clean; all of the images have the same format of 256 by 1600, and have only 1 color channel, meaning they are grayscale. To reduce compute time, the images were reformatted to 256 by 256 upon import. The image arrays were then scaled from a range of 0 to 255, to a range of 0 to 1. The arrays were also flattened to be 1-Dimensional for use in the artificial neural networks, in the convolutional networks notebook, the arrays are again reshaped to a format of 256 x 256 x 1. 

## 3. Data Understanding (EDA)

The original dataset includes 18,074 total images, 12,568 in the training set, and 5,506 in the test set. All the images are grayscale and come in a resolution of 1600 pixels wide, by 256 pixels tall. Included with the images is a csv, which lists all the images that contain defects, where each row is indexed by the name of .jpg file, and say what class of defect is in the image, as well as data on which pixels in the image make up that defect. Images that are listed more than once in this csv contain more than one class of defect, and images from the training set not listed in this csv at all are images where no defect is present. Because the original dataset comes from a kaggle competition, ground truth labels are not available for the test set, so going forward the 12,568 images in the training set will be treated as the entire dataset. 

Out of the 12,568 total training images, 5902 (47%) do not exhibit defects, while 6666 (53%) exhibit at least one class of defect. While there are images with multiple classes of defect present, 97% of the images have either no, or only one class of defect. 

There are four classes of defect, labelled in the CSV provided simply as 1 through 4, not to be confused with the number of classes of defect present in each image. 

### Binary Classification

With a 53 - 47 split between images with defects and without defects respectively, one way to frame this problem is as a binary classification problem. In this application, as steel flows through the production process, pieces with defects are identified and removed from the production line and dealt with accordingly.

### Multiclass Classification

Since the defect types are not mutually exclusive, this task is technically a multilabel classification problem, not a multiclass one. However because 97% of the images contain either no defects or one class of defect, the task at hand will require significant training time, and there is a significant class imbalance among images with defects, this task will be treated as a multiclass classification problem.

Multiclass classification could be deployed in production similarly to binary classification, where defects of certain classes are redirected from the main production line into seperate production streams. 

## 4. Binary Classification with an Artifical Neural Network

Once the image processing steps had been completed, and the resulting image arrays were flattened into one dimension, the images were ready for input into the artificial neural networks. The image arrays are not all that is required however, this is a supervised learning project so the networks would require ground truth labels to train on. For the purposes of binary classification, the images were sorted into two classes, '0', meaning no defect present, or '1', meaning one or more classes of defect present. These labels were generated using 'train.csv', a CSV file provided with the dataset. Once the labels were organized into a dictionary correlating each .jpg filename to a class, the labels were one hot encoded, the format required by Keras. 

The next step was to perform two train test splits. The first split would seperate the 12,568 images into training and holdout sets, of 90% and 10% respectively. The holdout set would be ignored until the end of the model iteration and training process, for final performance evaluation. The training set was then split again into train and validation sets, so that each epoch the networks could self validate. 

The first network had the simplest architecture possible, with subsequent networks increasing in complexity. This was to find out what was the minimum level of network complexity that could still learn the task. The accuracy curves of the first network did trend upwards, but it was clear that the model was struggling to capture the complexity of the problem. 

The second network had additional hidden layers, and each layer had more nodes, and it was trained for more epochs. This second network had smoother accuracy curves, and the validation accuracy thrashed significantly less. 

In order to reduce the thrashing of the validation accuracy curve, the third network iteration was identical to the second, except regularization was added to each hidden layer. Surprisingly, this worsened the overall accuracy, without improving the thrashing of the validation accuracy. 

The second network iteration was chosen for final evaluation as it performed the best out of the three. This network architecture achieved 77.3% on the holdout data. This is a great result for a simple neural network tackling a problem of this difficulty. The following notebook implements convolutional networks in both binary, and multiclass classification contexts. 


## Multiclass Classification with a Convolutional Neural Network

## Instance Segmentation with a Mask R-CNN Network

## Deployment 

## Conclusions 