A jupyter lab was created on Google cloud, which acted as a shared workspace and code hub. This environment was also linked to github. A data pipeline in the form a directory structure was created, allowing multiple people to work on different stages of the pipeline.
- Using Computer Vision (cv2), We capture pictures of MASK, No MASK, and BAD MASK with our web camera.
- Save the images and label them in the respective folders by drawing rectangular bounding boxes around the face and key points around the eyes.
- Merged the mask, no mask, and bad mask images of our peers to create a large dataset to train the model.
We started our classification task by using PyTorch and Transfer Learning.
Pytorch gives the opportunity to load some pretrained model, and we choose ResNet18 as our first approach. ResNet-18 is a convolutional neural network that is 18 layers deep. You can load a pretrained version of the network trained on more than a million images from the ImageNet database and the pretrained network can classify images into 1000 object categories.
Since our classification has only 3 possible output ('MASK', 'BAD_MASK', 'NO_MASK'), we froze all the training made on the first layers and changed the last fully connected one.
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Sequential(nn.Linear(512, 10),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(10, 3),
nn.LogSoftmax(dim=1))Since the network is pretrained, it requires input in a specific format, so we had to define a transform for that.
preprocess = transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])In the first line, we instanciate one object of the class transforms.
On line 2, we resize the image to the required size for ResNet.
On line 3, we transform the image to tensor, since it's the datatype required by PyTorch.
On line 4, we apply the mean and the standard deviation that is fixed for the pretrained model.
After training the model and getting good result, we saved it in ONNX format, in order to be able to implement it in OpenCV to make prediction in real time.
We decided to try another approach based on Fast.AI, a more high-level library build on PyTorch.
To start, we had to load all the data in a folder, with each class in his own subfolder. After that, the library requires only one line to load all the data:
data = ImageDataBunch.from_folder(PATH, ds_tfms=get_transforms(), size=sz, bs=bs, valid_pct=0.2).normalize(imagenet_stats)We have to define the path were the data is stored, the size required, the batchsize, the transofrmation and the normalization. Since we are using a different version of ResNet here, the transformation is conveniently stored inside `imagenet_stats`. It also automatically divides data in test and validation
One line is enough to create the model, as you can see we decided to try with ResNet34 instead of Resnet18, which has 34 layer instead of 18.
learn = cnn_learner(data, models.resnet34, metrics=accuracy)Also to find the correct learning rate, only two lines of code are required:
learn.lr_find();
learn.recorder.plot()This will plot the learning rate against the loss, allowing us to choose the best range.
After this, the training stage is done again in just one line:
learn.fit_one_cycle(4, max_lr=slice(1e-3,1e-1))And the output is pretty nice formatted and with a very good score:
Once the trained is done, we saved this model too to integrate the better one for the live detection.
As we wanted the detection functionaity to be avaiable on other devices, we also decided to use the retrain pretrained Face-Api.js models.
Face.Api.js
const video = document.getElementById('video');
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;
Promise.all([
faceapi.loadFaceLandmarkModel("http://127.0.0.1:5000/static/models/"),
faceapi.loadFaceRecognitionModel("http://127.0.0.1:5000/static/models/"),
faceapi.loadTinyFaceDetectorModel("http://127.0.0.1:5000/static/models/"),
faceapi.loadFaceLandmarkModel("http://127.0.0.1:5000/static/models/"),
faceapi.loadFaceLandmarkTinyModel("http://127.0.0.1:5000/static/models/"),
faceapi.loadFaceRecognitionModel("http://127.0.0.1:5000/static/models/"),
faceapi.loadFaceExpressionModel("http://127.0.0.1:5000/static/models/"),
])
.then(startVideo)
.catch(err => console.error(err));These models run in the browser rather than the server improving performance. The webapp will deployed in a docker container, with the mask detection implemented as webservice called from the javascript. These models found the contours of the face as well as estimating the age, gender and emotion of the people in the camera view. The models work for individals and well as groups of people.
The front end has been implemented using Flask, HTML and Javascript,in a Docker container, calling separately hosted flask webservices doing the machine learning inferences. Web sockets are implemented to allow remote viewing of multiple cameras. The inferences were implemented as a webservice as it would allow quick and easy scaleability, by implementing them using a service like "Amazon Lambda" or "Google Cloud Functions", as well leaves scope of integration with other systems, and IOT technology that already have internet connectivity built in. e.g. security and building access.
The front end functionality was also implemented using OPEN CV, allowing the functionality to run on any computer with a webcam.
- Create a SAAS company selling access to the face classification api to low code developers who want to use the functionailty, with the designed serverless architecture scaleability issues are accounted for
- Create a startup that allow security companies to bolt this classification functionaity to existing IP camera systems (hardware might be needed), the emotion detection could be use to alert security to any potential issues.
- Age detection could be used to help verify age of an individual. Supermarkets and Shops could use this functionality in addition to the mask detector to help enforce regulations.
- Automatic registry system for schools, also could be used to enforce mask wearing policies.
- Automatic entry systems , granting access only if a mask is worn, or if the person is over a certain age
- Big stores like ikea could collect data on face expressions, how long people spent looking a certain times using recognition, together with sales data to train NN to help to know what new items sales potential in addition to knowing when to restock, discount or withdraw items.










