# 📖 👆🏻 Printed Links Detection Using TensorFlow 2 Object Detection API

![Links Detector Cover](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/01-banner.png)

## 📃 TL;DR

_In this article we will start solving the issue of making the printed links (i.e. in a book or in a magazine) clickable via your smartphone camera._

We will use TensorFlow 2 [Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) to train a custom object detector model to find positions and bounding boxes of the sub-strings like `https://` in the text image (i.e. in smartphone camera stream).

The text of each link (right continuation of `https://` bounding box) will be recognized by using [Tesseract](https://tesseract.projectnaptha.com/) library. The recognition part will not be covered in this article but you may find the complete code example of the application in [links-detector repository](https://github.com/trekhleb/links-detector).   

> 🚀 [**Launch Links Detector demo**](https://trekhleb.github.io/links-detector/) from your smartphone to see the final result.

> 📝 [**Open links-detector repository**](https://github.com/trekhleb/links-detector) on GitHub to see the complete source code of the application.

Here is how the final solution will look like:

![Links Detector Demo](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/03-links-detector-demo.gif)

> ⚠️ Currently the application is in _experimental_ _Alpha_ stage and has [many issues and limitations](https://github.com/trekhleb/links-detector/issues?q=is%3Aopen+is%3Aissue+label%3Aenhancement). So don't raise your expectations bar to high until these issues are resolved 🤷🏻‍. Also the pruspose of this article is more about learning how to work with TensorFlow 2 Object Detection API rather than comming up with a production ready model.

## 🤷🏻‍♂️ The Problem

I work as a software engineer and on my own time I learn Machine Learning as a hobby. But this is not the problem yet.

I bought a printed book about Machine Learning recently and while I was reading through the first several chapters I've encountered many printed links in the text that looked like `https://tensorflow.org/` or `https://some-url.com/which/may/be/even/longer?and_with_params=true`.

![Printed Links](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/02-printed-links.jpg)

I saw all these links but I couldn't click on them since they were printed (thanks, cap!). To visit these links I needed to start typing them character by character in the browser's address bar, which was pretty annoying and error prone.

## 💡 Possible Solution

So, what if, similarly to QR-code detection, we will try to "teach" the smartphone to _(1)_ _detect_ and _(2)_ _recognize_ printed links for us and also to make them _clickable_? This way you'll do just one click instead of multiple keystrokes. The operational complexity goes from `O(N)` to `O(1)`.

This is how the final workflow will look like:

![Links Detector Demo](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/03-links-detector-demo.gif)

## 📝 Solution Requirements

As I've mentioned earlier I'm just studying a Machine Learning as a hobby. Thus the pruspose of this article is more about _learning_ how to work with TensorFlow 2 Object Detection API rather than comming up with a production ready application.

With that beign said, I simplified the solution requirements to the following:

1. The detection and recognition processes should have a **close-to-real-time** performance (i.e. `0.5-1` frames per second) on a device like iPhone X. It means that whole _detection + recognition_ process should take up to `2` seconds (preatty bearable as for the amateur project).
2. Only **English** links should be supported.
3. Only **dark text** (i.e. black or dark-grey) on **light background** (i.e. white or light-grey) should be supported.
4. Only `https://` links should be supported for now (it is ok if our model will not recognize the `http://`, `ftp://`, `tcp://` or other types of links).

## 🧩 Solution Breakdown

### High-level breakdown

Let's see how we could approach the problem on a high level.

#### Option 1: Detection model on the back-end

**The flow:**

1. Get camera stream (frame by frame) on the client side.
2. Send each frame one by one over the network to the back-end.
3. Do links detection and recognition on the back-end and send the response back to the client.
4. Client draws the detection boxes with the clickable links.

![Model on the back-end](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/04-frontend-backend.jpg)

**Pros:**

- 💚 The detection performance is not limited by the client's device. We may speed the detection up by scaling the service horizontally (adding more instances) and vertically (adding more cores/GPUs).
- 💚 The model might be bigger since there is no need to upload it to the client side. Downloading the `~10Mb` model on the client side may be ok, but loading the `~100Mb` model might be a big issue for the client's network and application UX (user experience) otherwise.
- 💚 It is possible to controll who is using the model. Model is guarded behind the API so we would have complete controll over its callers/clients.

**Cons:**

- 💔 System complexity growth. The aplication tech stack growth from just `JavaScript` to, let's say, `JavaScript + Python`. We need to take care about the autoscaling.
- 💔 Offline mode for the app is not possible since it needs an internet connection to work.
- 💔 Too many HTTP requests between the client and the server may become a bottleneck at some point. Imagine if we would want to improve the performance of the detecton, let's say, from `1` to `10+` frames per second. This means that each client will send `10+` requests per second. For `10` simultanious clients it is already `100+` requests per second. The `HTTP/2` bidirectional streaming and `gRPC` might be useful in this case, but we're going back to increased system complexity here.  
- 💔 System becomes more expensive. Almost all points from Pros section need to be paid for.

#### Option 2: Detection model on the front-end

**The flow:**

1. Get camera stream (frame by frame) on the client side.
2. Do links detection and recognition on the client side (without sending anything to the back-end).
3. Client draws the detection boxes with the clickable links.

![Model on the front-end](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/05-frontend-only.jpg)

**Pros:**

- 💚 System is less complex. We don't need to set up the servers, build the API and introcude an additional Python stack to the system. 
- 💚 Offline mode is possible. The app doesn't need an internet connection to work since the model is loaded on the device. So the Progressive Web Application ([PWA](https://web.dev/progressive-web-apps/)) might be built to support that.
- 💚 System is "kind of" scaling automatically. The more clients you have, the more cores and GPUs they bring. This is not a proper scaling solution though (more about that in a Cons section below). 
- 💚 System is cheaper. We only need a server for static assets (`HTML`, `JS`, `CSS`, model files etc.). This may be done for free, let's say, on GitHub.
- 💚 No issue with the growing number of HTTP requests per second to the server side.

**Cons:**

- 💔 Only the horizontal scaling is possible (each client will have it's own CPU/GPU). Vertical scaling is not possible since we can't influence the client's device performance. As a result we can't guarantee fast detection for low performant devices.
- 💔 It is not possible to guard the model usage and controll the callers/clients of the model. Everyone could download the model and re-use it. 
- 💔 Battery consumption of the client's device might become an issue. For the model to work it needs computational resources. So clients might not be happy with their iPhone getting warmer and warmer while the app is working.

#### High-level conslusion

Since the purpose of the project was more about learning and not comming up with a production ready solution _I decided to go with the second option of serving the model from the client side_. This made the whole project much cheaper (actually with the GitHub it was free to host it) and I could focus more on Machine Learning then on the autoscaling back-end infrastructure.


### Lower level breakdown

Ok, so we've decided to go with the serverless solution. And now we have an image from the camera stream as an input that looks something like this:

![Printed Links Input](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/06-printed-links-clean.jpg)

We need to solve two sub-tasks for this image:

1. Links **detection** (finding the position and bounding boxes of the links)
2. Links **recognition** (recognizing the text of the links)

#### Option 1: Tesseract based solution

The first and the most obvious aproach would be to solve the _Optical Character Recognition_ ([OCR](https://en.wikipedia.org/wiki/Optical_character_recognition)) task by recognizing the whole text of the image by using, let's say, [Tesseract.js](https://github.com/naptha/tesseract.js) library. As a pleasent bonus it returns the bounding boxes of the paragraphs, text lines and text blocks along with the recognized text.

![Recognized text with bounding boxes](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/07-printed-links-boxes.jpg)

Then we may try to extract the links from the recognized text lines or text blocks with a regular expression like [this one](https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url):

```typescript
const URL_REG_EXP = /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_+.~#?&/=]*)/gi;

const extractLinkFromText = (text: string): string | null => {
  const urls: string[] | null = text.match(URL_REG_EXP);
  if (!urls || !urls.length) {
    return null;
  }
  return urls[0];
};
```

💚 Seems like the issue is solved in a pretty straightforward and simple way:

- We know the bounding boxes of the links
- And we also know the text of the links to make them clickable

💔 The thing is that the _recognition + detection_ time may vary from `2` to `20+` seconds depending on the size of the text, on the ammount of "something that looks like a text" on the image, on the image quality and on other factors. So it will be realy hard to achive those `0.5-1` frames per second to make the user experience at least _close_ to the real-time.

💔 Also if we would think about it, we're asking the library to recognize the **whole** text from the image for us even though it might contain only one or two links in it (i.e. only ~10% of the text might be usefull for us) or it may even not contain the links at all. In this case it sounds like a waste of the computational resources. 

#### Option 2: Tesseract + TensorFlow based solution

We could make Tesseract work faster if we used some _additional "adviser" algorithm_ in prior to the links text recognition. This "adviser" algorithm should detect, but not recognize, _the the leftmost position_ of each link on the image if there are any. This will allow us to speed up the recognition part by following these rules:

1. If the image does not contain any link we should not call Tesseract detection/recognition at all.
2. If the image does have the links then we need to ask Tesseract to recognize only those parts of the image that contains the links. We're not interested in spending time for recognition of the irrelevant text that doesn't contain the links.

The "adviser" algorithm that will take place before the Tesseract should work with a constant time regardless of the image quality or the presence/absence of the text on the image. It also should be pretty fast and detect the leftmost positions of the links for less then `1s` so that we could satisfy the "close-to-real-time" requirement (i.e. on iPhone X).

> 💡 So what if we will use another object detection model to help us find all occurrences of the `https://` substrings (every secure link has this prefix, doesn't it) in the image? Then, having these `https://` bounding boxes in the text we may extract the right-side continuation of them and send them to the Tesseract for text recognition.

Take a look at the picture below:

![Tesseract and TensorFlow based solution](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/08-tesseract-vs-tensorflow.jpg)

You may notice that Tesseract needs to do **much less** work in case if it would have some hints about where are the links might be located (see the number of blue boxes on both pictures).

So the question now is which object detection model we should choose and how to re-train it to support the detection of the custom `https://` objects.  

> Finally! We've got closer to the TensorFlow part of the article 😀


## 🤖 Selecting the object detection model

Training a new object detection model is not a reasonable option in our context because of the following reasons:

- 💔 The training process might take days/weeks and bucks.
- 💔 We most probably won't be able to collect houndreds of thouthands of _labeled_ images of the books that have links in them (we might try to generate them though, but more about that later). 

So instead of creating a new model we should better teach an existing object detection model to do the custom object detection for us (to do the [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)). In our case the "custom objects" would be the images with `https://` text drawn in them. This approach has the following benefits:

- 💚 The dataset might be much smaller. We don't need to collect houndreds of thouthands of the labeled images. Instead we may do `~100` pictures and label them manually. This is because the model is already pre-trained on the general dataset like [COCO dataset](https://cocodataset.org/#home) and already learned how to extract general image features.
- 💚 The training process will be much faster (minutes/hours on GPU instead of days/weeks). Again, this is because of a smaller dataset and because of fewer trainable parameters.

We may choose the existing model from [TensorFlow 2 Detection Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) which provides a collection of detection models pre-trained on the [COCO 2017 dataset](https://cocodataset.org/#home). For now it contains `~40` model variations to choose from.

To re-train and fine-tune the model on the custom dataset we will use a [TensorFlow 2 Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). The TensorFlow Object Detection API is an open source framework built on top of [TensorFlow](https://www.tensorflow.org/) that makes it easy to construct, train and deploy object detection models.

If you follow the [Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) link you will find the _detection speed_ and _accuracy_ for each model.

![Model Zoo](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/09-model-zoo.jpg)

Of course we would want to find the right balance between the detection **speed** and **accuracy** while picking the model. But what might be even more important in our case is the **size** of the model since it will be loaded to the client side.

The size of the archived model might vary drastically from `~20Mb` to `~1Gb`. Here are several examples:

- `1386 (Mb)` `centernet_hg104_1024x1024_kpts_coco17_tpu-32`
- ` 330 (Mb)` `centernet_resnet101_v1_fpn_512x512_coco17_tpu-8`
- ` 195 (Mb)` `centernet_resnet50_v1_fpn_512x512_coco17_tpu-8`
- ` 198 (Mb)` `centernet_resnet50_v1_fpn_512x512_kpts_coco17_tpu-8`
- ` 227 (Mb)` `centernet_resnet50_v2_512x512_coco17_tpu-8`
- ` 230 (Mb)` `centernet_resnet50_v2_512x512_kpts_coco17_tpu-8`
- `  29 (Mb)` `efficientdet_d0_coco17_tpu-32`
- `  49 (Mb)` `efficientdet_d1_coco17_tpu-32`
- `  60 (Mb)` `efficientdet_d2_coco17_tpu-32`
- `  89 (Mb)` `efficientdet_d3_coco17_tpu-32`
- ` 151 (Mb)` `efficientdet_d4_coco17_tpu-32`
- ` 244 (Mb)` `efficientdet_d5_coco17_tpu-32`
- ` 376 (Mb)` `efficientdet_d6_coco17_tpu-32`
- ` 376 (Mb)` `efficientdet_d7_coco17_tpu-32`
- ` 665 (Mb)` `extremenet`
- ` 427 (Mb)` `faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8`
- ` 424 (Mb)` `faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8`
- ` 337 (Mb)` `faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8`
- ` 337 (Mb)` `faster_rcnn_resnet101_v1_640x640_coco17_tpu-8`
- ` 343 (Mb)` `faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8`
- ` 449 (Mb)` `faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8`
- ` 449 (Mb)` `faster_rcnn_resnet152_v1_640x640_coco17_tpu-8`
- ` 454 (Mb)` `faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8`
- ` 202 (Mb)` `faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8`
- ` 202 (Mb)` `faster_rcnn_resnet50_v1_640x640_coco17_tpu-8`
- ` 207 (Mb)` `faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8`
- ` 462 (Mb)` `mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8`
- `  86 (Mb)` `ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8`
- `  44 (Mb)` `ssd_mobilenet_v2_320x320_coco17_tpu-8`
- `  20 (Mb)` `ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8`
- `  20 (Mb)` `ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8`
- ` 369 (Mb)` `ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8`
- ` 369 (Mb)` `ssd_resnet101_v1_fpn_640x640_coco17_tpu-8`
- ` 481 (Mb)` `ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8`
- ` 480 (Mb)` `ssd_resnet152_v1_fpn_640x640_coco17_tpu-8`
- ` 233 (Mb)` `ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8`
- ` 233 (Mb)` `ssd_resnet50_v1_fpn_640x640_coco17_tpu-8`

The **`ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8`** model might be a good fit for our case:

- 💚 It is relativelly lightweight: `20Mb` archived.
- 💚 It is pretty fast: `39ms` for the detection.
- 💚 It uses MobileNet v2 network as a feature extractor which is optimized for usage on mobile devices and with lower energy consumption.
- 💚 It does the objects detection for the whole image and for the all objects in it **in one go** regardless of the image content. 
- 💔 It is not the most accurate model though (everythin is a tradeof ⚖️).

The model name encodes its several important characteristics that you may read more about if you want:

- The expected image input size is `640x640px`.
- The model implements [Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) (SSD) and [Feature Pyramid Network](https://arxiv.org/abs/1612.03144) (FPN).
- [MobileNet v2](https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html) convolutional neural network ([CNN](https://en.wikipedia.org/wiki/Convolutional_neural_network)) is used as a feature extractor.
- The model was trained on [COCO dataset](https://cocodataset.org/#home)



##### Scripts

In [2]:
import os
import pathlib
import tensorflow as tf

def _model_zoo():
    CACHE_FOLDER = './models_zoo'
    DATASETS_FOLDER = os.path.join(CACHE_FOLDER, 'datasets')

    MODEL_NAMES = [
    #   'centernet_hg104_512x512_kpts_coco17_tpu-3',
    'centernet_hg104_1024x1024_kpts_coco17_tpu-32',
    'centernet_resnet50_v1_fpn_512x512_coco17_tpu-8',
    'centernet_resnet50_v1_fpn_512x512_kpts_coco17_tpu-8',
    'centernet_resnet101_v1_fpn_512x512_coco17_tpu-8',
    'centernet_resnet50_v2_512x512_coco17_tpu-8',
    'centernet_resnet50_v2_512x512_kpts_coco17_tpu-8',
    'efficientdet_d0_coco17_tpu-32',
    'efficientdet_d1_coco17_tpu-32',
    'efficientdet_d2_coco17_tpu-32',
    'efficientdet_d3_coco17_tpu-32',
    'efficientdet_d4_coco17_tpu-32',
    'efficientdet_d5_coco17_tpu-32',
    'efficientdet_d6_coco17_tpu-32',
    'efficientdet_d7_coco17_tpu-32',
    'ssd_mobilenet_v2_320x320_coco17_tpu-8',
    'ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8',
    'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8',
    'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8',
    'ssd_resnet50_v1_fpn_640x640_coco17_tpu-8',
    'ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8',
    'ssd_resnet101_v1_fpn_640x640_coco17_tpu-8',
    'ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8',
    'ssd_resnet152_v1_fpn_640x640_coco17_tpu-8',
    'ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8',
    'faster_rcnn_resnet50_v1_640x640_coco17_tpu-8',
    'faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8',
    'faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8',
    'faster_rcnn_resnet101_v1_640x640_coco17_tpu-8',
    'faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8',
    'faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8',
    'faster_rcnn_resnet152_v1_640x640_coco17_tpu-8',
    'faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8',
    'faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8',
    'faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8',
    'faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8',
    'mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8',
    'extremenet',
    ];

    def create_cache(folder_name):
        if not os.path.exists(folder_name):
            os.makedirs(folder_name)

    def download_tf_model(model_name, cache_path):
        TF_MODELS_BASE_PATH = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/'
        model_url = TF_MODELS_BASE_PATH + model_name + '.tar.gz'
        model_dir = tf.keras.utils.get_file(
            fname=model_name, 
            origin=model_url,
            untar=False,
            cache_dir=pathlib.Path(cache_path).absolute()
        )
        return model_dir

    def download_models(model_names, cache_path):
        for model_name in model_names:
            download_tf_model(model_name, cache_path)

    def ls(base_dir):
        KB = 1024
        MB = KB * KB
        names = os.listdir(base_dir)
        sizes = [(name, os.stat(os.path.join(base_dir, name)).st_size) for name in sorted(names)]
        for (name, size) in sizes:
            size_mb = round(size / MB)
            print('- `{:>4} (Mb)` `{}`'.format(size_mb, name))

    create_cache(CACHE_FOLDER)
    download_models(MODEL_NAMES, CACHE_FOLDER)
    ls(DATASETS_FOLDER)

_model_zoo()

- `1386 (Mb)` `centernet_hg104_1024x1024_kpts_coco17_tpu-32`
- ` 330 (Mb)` `centernet_resnet101_v1_fpn_512x512_coco17_tpu-8`
- ` 195 (Mb)` `centernet_resnet50_v1_fpn_512x512_coco17_tpu-8`
- ` 198 (Mb)` `centernet_resnet50_v1_fpn_512x512_kpts_coco17_tpu-8`
- ` 227 (Mb)` `centernet_resnet50_v2_512x512_coco17_tpu-8`
- ` 230 (Mb)` `centernet_resnet50_v2_512x512_kpts_coco17_tpu-8`
- `  29 (Mb)` `efficientdet_d0_coco17_tpu-32`
- `  49 (Mb)` `efficientdet_d1_coco17_tpu-32`
- `  60 (Mb)` `efficientdet_d2_coco17_tpu-32`
- `  89 (Mb)` `efficientdet_d3_coco17_tpu-32`
- ` 151 (Mb)` `efficientdet_d4_coco17_tpu-32`
- ` 244 (Mb)` `efficientdet_d5_coco17_tpu-32`
- ` 376 (Mb)` `efficientdet_d6_coco17_tpu-32`
- ` 376 (Mb)` `efficientdet_d7_coco17_tpu-32`
- ` 665 (Mb)` `extremenet`
- ` 427 (Mb)` `faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8`
- ` 424 (Mb)` `faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8`
- ` 337 (Mb)` `faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8`
- ` 337 (Mb)` `faster_

## 🛠 Installing Object Detection API 

In this article we're going to install the Tensorflow 2 Object Detection API _as a Python package_. It is convenient in case if you're experimenting in [Google Colab](https://colab.research.google.com/) or in [Jupyter](https://jupyter.org/try) (no local installation is needed, you may experiment right in your browser).

You may also follow the [official documentation](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2.md) if you would preffer to install Object Detection API via Docker.

First, let's clone the [API repository](https://github.com/tensorflow/models):

```bash
git clone --depth 1 https://github.com/tensorflow/models
```

_output →_

```
Cloning into 'models'...
remote: Enumerating objects: 2301, done.
remote: Counting objects: 100% (2301/2301), done.
remote: Compressing objects: 100% (2000/2000), done.
remote: Total 2301 (delta 561), reused 922 (delta 278), pack-reused 0
Receiving objects: 100% (2301/2301), 30.60 MiB | 13.90 MiB/s, done.
Resolving deltas: 100% (561/561), done.
```

Now, let's compile the [API proto files](https://github.com/tensorflow/models/tree/master/research/object_detection/protos) into Python files by using [protoc](https://grpc.io/docs/protoc-installation/) tool:

```bash
cd ./models/research
protoc object_detection/protos/*.proto --python_out=.
```

And finally, let's install the TF2 version of [setup.py](https://github.com/tensorflow/models/blob/master/research/object_detection/packages/tf2/setup.py) via `pip`:

```bash
cp ./object_detection/packages/tf2/setup.py .
pip install . --quiet
```

> It is possible that the last step will fail because of some dependency errors. In this case you might want to run `pip install . --quiet` one more time.

We may test that installation went successfully by running the following tests:

```bash
python object_detection/builders/model_builder_tf2_test.py
```

You should see the logs that end with something similar to this:

```
[       OK ] ModelBuilderTF2Test.test_unknown_ssd_feature_extractor
----------------------------------------------------------------------
Ran 20 tests in 45.072s

OK (skipped=1)
```

The TensorFlow Object Detection API is installed! You may now use the scripts that API provides for doing the model [inference](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb), [training](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md) or [fine-tunning](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb).

##### Scripts

In [3]:
!git clone --depth 1 https://github.com/tensorflow/models

Cloning into 'models'...
remote: Enumerating objects: 2301, done.[K
remote: Counting objects: 100% (2301/2301), done.[K
remote: Compressing objects: 100% (2000/2000), done.[K
remote: Total 2301 (delta 561), reused 922 (delta 278), pack-reused 0[K
Receiving objects: 100% (2301/2301), 30.60 MiB | 13.90 MiB/s, done.
Resolving deltas: 100% (561/561), done.


In [6]:
%%bash
cd ./models/research
# Compile protos.
protoc object_detection/protos/*.proto --python_out=.
# Install TensorFlow Object Detection API.
cp ./object_detection/packages/tf2/setup.py .
# In case if the following step will fail with dependency errors
# try to launch it for the second time.
pip install . --quiet

In [7]:
%%bash
cd ./models/research
python object_detection/builders/model_builder_tf2_test.py

2020-11-25 20:37:21.400630: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Running tests under Python 3.6.9: /usr/bin/python3
[ RUN      ] ModelBuilderTF2Test.test_create_center_net_model
2020-11-25 20:37:25.041573: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-25 20:37:25.116208: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-11-25 20:37:25.116294: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (f2941dbc7ab0): /proc/driver/nvidia/version does not exist
2020-11-25 20:37:25.237954: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz
2020-11-25 20:37:25.242280: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1590a00 initialized for pla

## ⬇️ Downloading the Pre-Trained Model

Let's download our selected `ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8` model from the TensorFlow Model Zoo and check how it does the general objects detection (detection of the object classes from COCO dataset like "cat", "dog", "car", etc.).

We will use the [get_file()](https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file) TensorFlow helper to download the archived model from the URL and unpack it.

```python
import tensorflow as tf
import pathlib

MODEL_NAME = 'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8'
TF_MODELS_BASE_PATH = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/'
CACHE_FOLDER = './cache'

def download_tf_model(model_name, cache_folder):
    model_url = TF_MODELS_BASE_PATH + model_name + '.tar.gz'
    model_dir = tf.keras.utils.get_file(
        fname=model_name, 
        origin=model_url,
        untar=True,
        cache_dir=pathlib.Path(cache_folder).absolute()
    )
    return model_dir

# Start the model download.
model_dir = download_tf_model(MODEL_NAME, CACHE_FOLDER)
print(model_dir)
```

_output →_

```
/content/cache/datasets/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8
```

Here is how our folder structure looks so far:

![Cache Folder](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/10-cache-folder.jpg)

The `checkpoint` folder contains the snapshot of pre-trained model.

The `pipeline.config` file contains the detection settings of the model. We'll come back to this file later when we will need to fine-tune the model.

##### Scripts

In [11]:
mkdir -p ./cache

In [14]:
import tensorflow as tf
import pathlib

MODEL_NAME = 'ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8'
TF_MODELS_BASE_PATH = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/'
CACHE_FOLDER = './cache'

def download_tf_model(model_name, cache_folder):
    model_url = TF_MODELS_BASE_PATH + model_name + '.tar.gz'
    model_dir = tf.keras.utils.get_file(
        fname=model_name, 
        origin=model_url,
        untar=True,
        cache_dir=pathlib.Path(cache_folder).absolute()
    )
    return model_dir

# Start the model downloading.
model_dir = download_tf_model(MODEL_NAME, CACHE_FOLDER)
print(model_dir)

/content/cache/datasets/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8


In [15]:
def get_folder_size(folder_path):
    mB = 1000000
    root_dir = pathlib.Path(folder_path)
    sizeBytes = sum(f.stat().st_size for f in root_dir.glob('**/*') if f.is_file())
    return f'{sizeBytes//mB} MB'

print(f'Unpacked model size: {get_folder_size(model_dir)}')


Unpacked model size: 31 MB


## 🏄🏻‍♂️ Trying the Model (Inference)

- Show that model works for general purpose classes
- Show that model doesn't work for custom objects (links)

## 📝 Creating the Dataset Manually

- Making pictures of the book
- What tools to use to add bounding boxes
- How to convert to protobuf
- Issues with custom dataset (fonts, colors, bolds, underlined, etc.)
- Train/test split approach

### 🌅 Preprocessing the data

- Data preprocessing: resize, crop square, color adjustment

### 🔖 Labeling the dataset

- How to use LabelImg

### 🗜 Exporting the dataset

- Protobuf (the way of storing the dataset)

## 📚 Generating the Dataset Automatically (?)

- Automated way of generating the dataset
- Train/test split approach

## 📖 Exploring the Dataset

- Preview images with detection boxes
- Number of images (why is this enough)
- Do we need to preprocess the images

## 📈 Setting Up TensorBoard

- Why do we need it (for debugging)
- What we will monitor

## 👨‍🎓 Transfer Learning

- What is transfer learning
- Why don't we train the model from scratch
- Allows us to use small dataset

### ⚙️ Configuring the Detection Pipeline

- Performance issues: batch size
- Starting not from scratch: checkpoints

### 🏋🏻‍♂️ Model Training

- Error prone: saving checkpoints
- How many epochs
- Monitoring the performance while training

### 🚀 Evaluating the Model

- Checking how accurate our model is on test dataset
- Are we good with performance, should we save the model?
- It is not a general purpose anymore, does it recognize our custom objects?

## 🗜 Exporting the Model

- Saving the model to the file for further re-use
- Show the list of files, how the model looks like on dics
- What the size of the model

## 🚀 Evaluating the Exported Model

- Example of how to use the trained model

## 🗜 Converting the Model for Web

- What formats are sutable for the web
- Few words about Tensorflow.js
- Show list of exported files - how model looks like on disc
- What the size of the model
- Why it is split in chucnks and how they are connected (via model.json)

In [None]:
pip install tensorflowjs --quiet

[?25l[K     |█████▎                          | 10kB 26.6MB/s eta 0:00:01[K     |██████████▌                     | 20kB 12.7MB/s eta 0:00:01[K     |███████████████▊                | 30kB 9.5MB/s eta 0:00:01[K     |█████████████████████           | 40kB 8.3MB/s eta 0:00:01[K     |██████████████████████████▏     | 51kB 4.7MB/s eta 0:00:01[K     |███████████████████████████████▍| 61kB 5.3MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 4.1MB/s 
[?25h[?25l[K     |███▏                            | 10kB 23.4MB/s eta 0:00:01[K     |██████▍                         | 20kB 20.6MB/s eta 0:00:01[K     |█████████▌                      | 30kB 16.4MB/s eta 0:00:01[K     |████████████▊                   | 40kB 14.5MB/s eta 0:00:01[K     |███████████████▉                | 51kB 11.1MB/s eta 0:00:01[K     |███████████████████             | 61kB 11.1MB/s eta 0:00:01[K     |██████████████████████▏         | 71kB 7.4MB/s eta 0:00:01[K     |██████████████████████

## 🤔 Conclusions

- I'm just an amatour
- Links to demo app
- Issues and limitations of this approach
- Links to my ML repositories that thy might like