# How to design, implement, test and deploy an image classifier


## An introduction

In this article, I explore how to build a complete image classifier, and make it available as a website and a telegram bot as interfaces to users at large.  
This is a well known subject, and there is several excellent articles about it in Internet:
1. Tensorflow and Keras
   - https://www.tensorflow.org/tutorials/keras/classification
   - https://keras.io/examples/vision/image_classification_from_scratch/
   - https://keras.io/examples/vision/bit/
2. PyTorch: 
   - https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html, 
   - https://github.com/cfotache/pytorch_imageclassifier, 
   - https://towardsdatascience.com/how-to-train-an-image-classifier-in-pytorch-and-use-it-to-perform-basic-inference-on-single-images-99465a1e9bf5
3. HuggingFace
   - https://huggingface.co/google/vit-base-patch16-224
   - https://huggingface.co/microsoft/resnet-50
   - https://huggingface.co/facebook/deit-small-distilled-patch16-224

Reasons: **Just for learning.**

## What is a neural network?

A neural network is a function $y = f(x, w)$. In this case, $x$ is the image, $y$ is the image class encoded as a vector, and $w$ are the weights of the neural network. Each one of $x$, $y$, and $w$ are arrays of numbers of distinct shapes.

1. We get the images $x$ from a digital camera.
2. Suppose we have $n$ image classes (for example, 3 classes 'apple', 'orange', and 'banana'). One hot encoding represents each class with a vector whose elements are 1 and 0. For example apple is $[1, 0, 0]^{t}$, orange is $[0, 1, 0]^{t}$, and banana is $[0, 0, 1]^{t}$. See https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/
3. The function $f$ is a set of nested functions or layers.
4. We obtain weights $w$ minimizing a loss function, evaluated with a set of $M$ examples $(x_{k}, y_{k})$, $1 \leq k \leq M$.
5. We can use well known and tested frameworks (Keras, Tensorflow, Pytorch) for building and training a neural network.


### Software

We need to obtain a neural network capable of classifying images. In other words, we need software with four functions (build_model, train, test, and predict) and some data (the weights or parameters needed for predict and test, or a set of examples to be used as inputs for training). 

There are three options to obtain a neural network capable of classifying images. These options differ in the ways of obtaining the weights of the neural network.

1. Use a pre-trained neural network as is (Least expensive, no dataset-building, no training-task).
2. Use a ready to train neural network with a given dataset (no dataset-building, not so hard training-task)
3. Adapt a pre-trained neural network ("transfer learning", or "finetuning", not so hard dataset-building, not so hard training-task)
4. Build a neural network from scratch (Most expensive, hard dataset-building, hard training-task).

Explanations:
1. If we use a pre-trained neural network, it is obvious we save the efforts of the training task and dataset-building task. The real-time savings result from no dataset-building (putting together a set of examples good enough for training the neural network). Building a dataset requires intensive human actions (hard to automate expensive actions).
2. Use a ready to train neural network with a given dataset, frameworks demos
3. Adapt a pretrained neural network requires a small amount of examples.
4. Build a neural network from scratch (Most expensive, hard dataset building, hard training task).

### Data
Building a dataset is a difficult activity to automate, and difficult to perform for one person due to the amount of data required.

Data labeling is the process by which raw images, video, or audio files are identified and annotated individually for machine learning models, which in turn use that data to make predictions that can be applied to the real world. For example, a correctly labeled dataset for a self-driving car can help a model distinguish between a stop sign or a pedestrian, but if mislabeled, it can have catastrophic consequences. 

Manual labeling requires the highest level of human intervention when building out a dataset. In this approach, humans manually annotate objects in each image or video to build a training dataset for a machine learning model. Though time-consuming and costly, manual data labeling does have its advantages for certain types of projects (why?).

Teams looking to build their models entirely onsite can rely on everyone from data scientists and engineers to ML engineers and even interns for simple use cases to label the thousands of images needed to create an effective training dataset. Utilizing their own teams is advantageous when an expert opinion is necessary. Tech enterprises such as Tesla will often use their own in-house teams to build out their datasets. 

In-house operations teams can oversee every step in the data labeling process from start to finish. When building out datasets that involve careful annotation, in-house teams utilize their expertise to create more accurate datasets. Building in-house datasets have the advantage of letting it reside with the experts who know it inside and out and understand each use case for their dataset. **In many circumstances, datasets need to be constantly updated to match the ever-changing landscape of real-world scenarios**. Keeping the data in-house is a surefire way for teams to update their datasets quickly and easily. In the example of self-driving cars, the vehicles on the road are constantly changing, so the images in the dataset should also be updated frequently to avoid data drift and other associated issues. In addition, keeping the data in-house ensures that proprietary information is kept close to the source, reducing the risk of leaks and breaches. 

While keeping datasets in-house limits any hiccups that outsourcing may cause, it also takes up valuable resources within the company. The most time-consuming aspect of building out a machine learning model is data labeling. Utilizing in-house data scientists and ML engineers to label hundreds of thousands of images cuts into valuable time that could be spent on more pressing company needs. Not to mention, it’s hideously expensive. Engineers are some of the highest-paid employees at tech companies, meaning that the process of labeling data is costly and prohibitive for smaller outfits. In-house data labeling simply isn’t possible for smaller startups with limited resources.

Using in-house resources isn’t the only option for manual data labeling. Some companies opt for a hybrid or crowdsourced approach. Choosing one of these methods depends entirely on business' needs, and there are many reasons to choose one avenue over another. 


When crowdsourcing, companies use freelancers to complete the data labeling process through programs like Amazon Mechanical Turk. Labeling is conducted on a small scale by a large set of labelers, reducing the workload individually and company-wide. This is a good option for outfits that do not have the resources to implement in-house operations. 



Crowdsourcing has both its benefits and limitations. One of the main draws for companies to crowdsource data labeling is cost. Utilizing inexpensive freelancers is much less of a financial burden than turning to ML engineers. In addition, it takes much less time than relying on a small cohort of employees to build out a dataset. Crowdsourcing data labeling appeals to smaller companies looking for an efficient way to build their machine learning models, but it has its drawbacks. 


Relying on a crowdsourced team allows large amounts of data to be labeled quickly and cheaply, but accuracy is always a concern. When fragments of datasets are annotated from hundreds or even thousands of sources, the method for doing so varies widely between freelancers, meaning that inconsistencies in the datasets are inevitable. For example, if a company is looking to label cars and trucks accurately, one person might consider an SUV a truck while another might consider it a car. Inconsistent labels can affect the overall accuracy and performance of datasets. Relying on others also makes it challenging to manage workflows and conduct quality assurance checks. 

For those looking for a third option, outsourcing data labeling is a common route that companies take. In this instance, outside teams are hired specifically to label data manually. They’re often trained by QA specialists and devote their full attention to labeling.

Outsourcing is a common practice for companies looking to save time and money, as relying on outside teams to assist in building datasets is far cheaper than using in-house ML engineers. Using outsourced teams is advantageous for projects with large volumes of data that need to be completed in short periods of time. Outsourcing is an optimal choice for temporary projects that don’t need consistent updating. 


Outsourced data labeling is often sent to teams overseas, so ML engineers have limited control over the workflow. Because a centralized team is devoted to your project, it is slower than crowdsourcing because fewer people are generally working on it. That said, outsourcing tends to yield more accurate datasets than crowdsourcing and is often a consideration when choosing this route. 


Aside from manual data labeling, automatic labeling is also an option for different types of projects and is a more viable choice for many companies. While there is a lot of variability in the various forms of automated labeling, it generally involves either an AI system labeling raw data for you or AI being implemented within the annotation UI itself to speed up manual processes (like converting a bounding box to a segmentation). In either case, trained professionals are used to review the data for accuracy and quality.

Data that is correctly labeled is then fed through the system, creating a data pipeline of sorts. Though data labeling cannot be completely automated, as the human touch is often needed for highly complex projects and to validate the AI’s performance, some tools and strategies can significantly streamline and speed up the process. 

Model-assisted labeling (MAL) generally involves labeling a small initial dataset and training an AI system in parallel solely for the purpose of labeling, which then uses this information to predict annotations for unlabeled data. Alternatively, a pre-existing production model is used within the labeling loop to make predictions for you. Then, a human must generally audit the pre-labeled data and correct any errors that could affect the dataset (while feeding the corrected labels back to the model). Some solutions allow you to complete this process within the UI itself, but others only support the uploading of pre-labeled data (i.e., the model makes predictions using your existing technology stack, and you upload those pre-labels to the solution).

Using this method to build out training datasets for your computer vision models can, in theory, be a highly effective way to get a lot of labels quickly – as approving pre-labels is generally faster than manual annotation. In addition, this method provides early indicators of model weaknesses, mainly when working with existing production models, giving you the chance to make corrections earlier in the process. It also cuts down on the need for project managers to oversee crowdsourced or outsourced labeling, a major bottleneck in manual labeling. 

On the flip side, MAL has its drawbacks. While much more automated than manual labeling, it still requires a HITL element to oversee the labeling process – precisely because no model is perfect. Without a human to decipher specific errors, automated models can lead to mistakes easily avoided by a person. It’s also only as good as your pre-existing models or the model you are training, so it is paramount that both your model and datasets are as accurate as possible before or in the early stages of automated labeling. Using time and resources to fix each error is costly, but it is unavoidable without a perfect way to automate machine learning algorithms. Many practitioners have reported to us that they often end up spending more time fixing errors in their pre-labels than they would have spent simply manually labeling them from the start.

Another form of automatic data labeling that some companies choose to implement is an AI-assisted annotation system. In this circumstance, AI-assisted software helps the labeler perform manual tasks more efficiently, like drawing out an outline from only a small set of points – or making predictions based on previous experience

AI-assisted labeling speeds up the process of building datasets with human oversight, meaning that more labels can be completed in a shorter period of time compared to purely manual labeling. In the medical field, for example, specialists often use AI-assisted annotation to more quickly build out ML models trained to identify diseases in a group of patients. Once sufficient labels have been created, the AI software can help determine which objects within a specific image or video frame should be annotated.

With AI-assisted labeling, teams can annotate data and build their models quickly and more efficiently than manually. However, it still generally requires a decent amount of human involvement for each piece of data, and the labels still need to be reviewed after the fact by a QA team or other auditing group.




### Hardware

Also, we have to consider the hardware we use. For example, we can train our neural network in a computer with a fast GPU and lots of RAM, and use a small computer "in the edge" for prediction. However, attempting to train a neural network in a IOT device makes no sense. So, we have the following options:

#### Dataset building task:
1. Local machine, storage?
2. cloud, money?

#### Train and test tasks
1. Local machine, CPU or GPU?, RAM?, kind of data storage? 
2. cloud, money?

#### Predict task
1. Local machine
2. Web server
3. Cloud
4. Edge device


## Some examples


Let us consider some examples

1. https://www.tensorflow.org/tutorials/keras/classification

  - Tensorflow with Keras
  - Prebuilt preprocessed dataset (MNIST Fashion Dataset, 70000 grayscale low resolution $28 \times 28$ images)
  - Very simple training: no check for underfitting, overfitting, no model load nor save, no callbacks
  - No deploy

2. https://www.tensorflow.org/tutorials/keras/save_and_load

  - Tensorflow with Keras
  - Prebuilt preprocessed dataset (MNIST Dataset, 70000 grayscale low resolution $28 \times 28$ images)
  - training: model save and load, no check for underfitting, overfitting
  - no deploy
  
3. https://www.tensorflow.org/tutorials/load_data/images

  - Tensorflow with Keras
  - Prebuild dataset, downloaded from very well known url, decompressed, loaded from directory
  - some data processing
  
