### Crowd Counting Model using Deep Learning
#### Introduction
Artificial Intelligence and Machine Learning is going to be our biggest helper in coming decade!

Today morning, I was reading an article which reported that an AI system won against 20 lawyers and the lawyers were actually happy that AI can take care of repetitive part of their roles and help them work on complex topics. These lawyers were happy that AI will enable them to have more fulfilling roles.

Today, I will be sharing a similar example – How to count number of people in crowd using Deep Learning and Computer Vision, [analyticsvidhya在线课程](https://trainings.analyticsvidhya.com/courses/course-v1:AnalyticsVidhya+CVDL101+CVDL101_T1/about)?  But, before we do that – let us develop a sense of how easy the life is for a Crowd Counting Scientist.

**P.S.** This article assumes that you have a basic knowledge of how convolutional neural networks (CNNs) work

#### Act like a Crowd Counting Scientist
Let’s start!

Can you help me count / estimate number of people in this picture attending this event?

![Alt text](https://github.com/5267/ML/blob/master/resources/scenario_pics/crowdcounting/crowd-at-a-stadium-in-johannesburg-south-africa-for-rugby-768x514.jpg?raw=true)

Ok – how about this one?

![Alt text](https://github.com/5267/ML/blob/master/resources/scenario_pics/crowdcounting/IMG_2-850x592.jpg?raw=true)

You get the hang of it. By end of this tutorial, we will create an algorithm for Crowd Counting with an amazing accuracy (compared to humans like you and me). Will you use such an assistant?

### Table of Contents
1. What is Crowd Counting?
2. Why is Crowd Counting required?
3. Understanding the Different Computer Vision Techniques for Crowd Counting
4. The Architecture and Training Methods of CSRNet
5. Building your own Crowd Counting model in Python

#### 1、What is Crowd Counting?
Crowd Counting is a technique to count or estimate the number of people in an image. Take a moment to analyze the below image:

![Alt text](https://github.com/5267/ML/blob/master/resources/scenario_pics/crowdcounting/IMG_2-850x592%20(1).jpg?raw=true)

Can you give me an approximate number of how many people are in the frame? Yes, including the ones present way in the background. The most direct method is to manually count each person but does that make practical sense? It’s nearly impossible when the crowd is this big!

Crowd scientists (yes, that’s a real job title!) count the number of people in certain parts of an image and then extrapolate to come up with an estimate. More commonly, we have had to rely on crude metrics to estimate this number for decades.

Surely there must be a better, more exact approach?

Yes, there is!

While we don’t yet have algorithms that can give us the EXACT number, most computer vision techniques can produce impressively precise estimates. Let’s first understand why crowd counting is important before diving into the algorithm behind it.

#### 2、Why is Crowd Counting useful?
Let’s understand the usefulness of crowd counting using an example. Picture this – your company just finished hosting a huge data science conference. Plenty of different sessions took place during the event.

You are asked to analyze and estimate the number of people who attended each session. This will help your team understand what kind of sessions attracted the biggest crowds (and which ones failed in that regard). This will shape next year’s conference, so it’s an important task!

![Alt text](https://github.com/5267/ML/blob/master/resources/scenario_pics/crowdcounting/IMG_2-850x592%20(1).jpg?raw=true)

There were hundreds of people at the event – counting them manually will take days! That’s where your data scientist skills kick in. You managed to get photos of the crowd from each session and build a computer vision model to do the rest!

There are plenty of other scenarios where crowd counting algorithms are changing the way industries work:

- Counting the number of people attending a sporting event
- Estimating how many people attended an inauguration or a march (political rallies, perhaps)
- Monitoring of high-traffic areas
- Helping with staffing allocation and resource allotment

Can you come up with some other use cases? Let me know in the [issues](https://github.com/5267/ML/issues) section! We can connect and try to figure out how we can use crowd counting techniques in your scenario.

#### 3、 Understanding the Different Computer Vision Techniques for Crowd Counting
Broadly speaking, there are currently four methods we can use for counting the number of people in a crowd:

1. Detection-based methods
Here, we use a moving window-like detector to identify people in an image and count how many there are. The methods used for detection require well trained classifiers that can extract low-level features. Although these methods work well for detecting faces, they do not perform well on crowded images as most of the target objects are not clearly visible.

2. Regression-based methods
We were unable to extract low level features using the above approach. Regression-based methods come up trumps here. We first crop patches from the image and then, for each patch, extract the low level features.

3. Density estimation-based methods
We first create a density map for the objects. Then, the algorithm learn a linear mapping between the extracted features and their object density maps. We can also use random forest regression to learn non-linear mapping.

4. CNN-based methods
Ah, good old reliable convolutional neural networks (CNNs). Instead of looking at the patches of an image, we build an end-to-end regression method using CNNs. This takes the entire image as input and directly generates the crowd count. CNNs work really well with regression or classification tasks, and they have also proved their worth in generating density maps.

CSRNet, a technique we will implement in this article, deploys a deeper CNN for capturing high-level features and generating high-quality density maps without expanding the network complexity. Let’s understand what CSRNet is before jumping to the coding section.

#### 4、Understanding the Architecture and Training Method of CSRNet
CSRNet uses VGG-16 as the front end because of its strong transfer learning ability. The output size from VGG is ⅛th of the original input size. CSRNet also uses dilated convolutional layers in the back end.

But what in the world are dilated convolutions? It’s a fair question to ask. Consider the below image:

![Alt text](https://github.com/5267/ML/blob/master/resources/scenario_pics/crowdcounting/Screenshot-from-2019-02-01-16-49-21.png?raw=true)

The basic concept of using dilated convolutions is to enlarge the kernel without increasing the parameters. So, if the dilation rate is 1, we take the kernel and convolve it on the entire image. Whereas, if we increase the dilation rate to 2, the kernel extends as shown in the above image (follow the labels below each image). It can be an alternative to pooling layers.

#### 4.1 Underlying Mathematics
I’m going to take a moment to explain how the mathematics work，This will come in handy when you need to tweak or modify your model.



### Prerequisities
#### A Comprehensive Tutorial to learn Convolutional Neural Networks from Scratch

#### An Introductory Guide to Deep Learning and Neural Networks #1
#### Table of Contents
1. Understanding the Course Structure
2. Course 1: Neural Networks and Deep Learning
    - Module 1: Introduction to Deep Learning
    - Module 2: Neural Network Basics
        - Logistic Regression as a Neural Network
        - Python and Vectorization
    - Module 3: Shallow Neural Networks
    - Module 4: Deep Neural Networks

#### 1. Understanding the Course Structure

This deep learning specialization is made up of 5 courses in total. Course #1, our focus in this article, is further divided into 4 sub-modules above

#### 2. Course 1 : Neural Networks and Deep Learning
#### 2.1 Module 1: Introduction to Deep Learning

**What is a Neural Network?**

Consider an example where we have to predict the price of a house. The variables we are given are the size of the house in square feet (or square meters) and the price of the house. Now assume we have 6 houses. So first let’s pull up a plot to visualize what we’re looking at:

![Alt text]()

On the x-axis, we have the size of the house and on the y-axis we have it’s corresponding price. A linear regression model will try to draw a straight line to fit the data:

![Alt text]()

So, the input(x) here is the size of the house and output(y) is the price. Now let’s look at how we can solve this using a simple neural network:

![Alt text]()

Here, a neuron will take an input, apply some activation function to it, and generate an output. One of the most commonly used activation function is **ReLU** (Rectified Linear Unit):

![Alt text]()

ReLU takes a real number as input and returns the maximum of 0 or that number. So, if we pass 10, the output will be 10, and if the input is -10, the output will be 0

For now let’s stick to our example. If we use the ReLU activation function to predict the price of a house based on its size, this is how the predictions may look:

![Alt text]()

So far, we have seen a neural network with a single neuron, i.e., we only had one feature (size of the house) to predict the house price. But in reality, we’ll have to consider multiple features like number of bedrooms, postal code, etc.? House price can also depend on the family size, neighbourhood location or school quality. How can we define a neural network in such cases?

![Alt text]()

It gets a bit complicated here. Refer to the above image as you read – we pass 4 features as input to the neural network as x, it automatically identifies some hidden features from the input, and finally generates the output y. This is how a neural network with 4 inputs and an output with single hidden layer will look like:

![Alt text]()

Now that we have an intuition of what neural networks are, let’s see how we can use them for supervised learning problems.

**Supervised Learning with Neural Networks**

Supervised learning refers to a task where we need to find a function that can map input to corresponding outputs (given a set of input-output pairs). We have a defined output for each given input and we train the model on these examples. Below is a pretty handy table that looks at the different applications of supervised learning and the different types of neural networks that can be used to solve those problems:

Input (X) | Output (y)	| Application	| Type of Neural Network 
- | :-: | -: | -:
Home Features | Price | Real Estate	| Standard Neural Network
Ad, user info | Click prediction (0/1)	| Online Advertising|Standard Neural Network
Image | Image Class	|Photo Tagging | CNN
Audio | Text Transcript|	Speech Recognition | RNN
English | Chinese | Machine Translation | RNN
Image, Radar info| Position of car | Autonomous Driving | Custom / Hybrid NN

Below is a visual representation of the most common Neural Network types:

![Alt text]()

As you might be aware, supervised learning can be used on both structured and unstructured data.

In our house price prediction example, the given data tells us the size and the number of bedrooms. This is **structured data**, meaning that each feature, such as the size of the house, the number of bedrooms, etc. has a very well defined meaning.

In contrast, **unstructured data** refers to things like **audio, raw audio, or images** where you might want to recognize what’s in the image or text (like object detection). Here, the features might be the pixel values in an image, or the individual words in a piece of text. It’s not really clear what each pixel of the image represents and therefore this falls under the unstructured data umbrella.

Simple machine learning algorithms work well with structured data. But when it comes to unstructured data, their performance tends to take quite a dip. This is where neural networks have proven to be so effective and useful. They **perform exceptionally well on unstructured data**. Most of the ground-breaking research these days has neural networks at it’s core.

**Why is Deep Learning Taking off?**

To understand this, take a look at the below graph:

![Alt text]()

As the amount of data increases, the performance of traditional learning algorithms, like SVM and logistic regression, does not improve by a whole lot. In fact, it tends to plateau after a certain point. In the case of neural networks, the performance of the model increases with an increase in the data you feed to the model.

There are basically three scales that drive a typical deep learning process:

1. Data
2. Computation Time
3. Algorithms

To improve the computation time of the model, activation function plays an important role. If we use a sigmoid activation function, this is what we end up with:

![Alt text]()

The slope, or the gradient of this function, at the extreme ends is close to zero. Therefore, the parameters are updated very slowly, resulting in very slow learning. Hence, switching from a sigmoid activation function to ReLU (Rectified Linear Unit) is one of the biggest breakthroughs we have seen in neural networks. ReLU updates the parameters much faster as the slope is 1 when x>0. This is the primary reason for faster computation of the models.

#### 2.2 Module 2: Introduction to Deep Learning
This module is further divided into two parts:

- Part I: Logistic Regression as a Neural Network
- Part II: Python and Vectorization

**Part I: Logistic Regression as a Neural Network**

**Binary Classification**

In a binary classification problem, we have an input x, say an image, and we have to classify it as having a cat or not. If it is a cat, we will assign it a 1, else 0. So here, we have only two outputs – either the image contains a cat or it does not. This is an example of a binary classification problem.

We can of course use the most popular classification technique, logistic regression, in this case.

**Logistic Regression**

We have an input X (image) and we want to know the probability that the image belongs to class 1 (i.e. a cat). For a given X vector, the output will be:




### Reference
[A Comprehensive Tutorial to learn Convolutional Neural Networks from Scratch](https://www.analyticsvidhya.com/blog/2018/12/guide-convolutional-neural-network-cnn/?utm_source=blog&utm_medium=crowd-counting)
[CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes](https://arxiv.org/abs/1802.10062)