# 📖 👆🏻 Printed Links Detection Using TensorFlow 2 Object Detection API

![Links Detector Cover](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/01-banner.png)

## 📃 TL;DR

_In this article we will start solving the issue of making the printed links (i.e. in a book or in a magazine) clickable via your smartphone camera._

We will use [TensorFlow 2 Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) to train a custom object detector model to find a position and a bounding box of the sub-string `https://` in the text image (i.e. in smartphone camera stream).

The text of each link will be recognized by using [Tesseract](https://tesseract.projectnaptha.com/) library. The recognition part will not be covered in this article but you may find the complete code example in 📝 [**links-detector repository**](https://github.com/trekhleb/links-detector).   

> 🚀 [**Launch Links Detector demo**](https://trekhleb.github.io/links-detector/) from your smartphone to see the final result.

Here is how the final solution works:

![Links Detector Demo](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/03-links-detector-demo.gif)

> ⚠️ Currently the application is in _experimental_ _Alpha_ stage and has [many issues and limitations](https://github.com/trekhleb/links-detector/issues?q=is%3Aopen+is%3Aissue+label%3Aenhancement). So don't raise your expectations bar to high until these issues are resolved 🤷🏻‍. The pruspose of the article is more about learning how to work with TensorFlow 2 Object Detection API rather than comming up with a production ready application.

## 🤷🏻‍♂️ The Problem

So you read a book or a magazine and see the links like `https://tensorflow.org/` or `https://some-url.com/which/may/be/longer?and_with_params=true`, but you can't click on them since they are printed. To visit these links you need to start typing them character by character in the browser's address bar, which may be pretty annoying and error prone.

![Printed Links](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/02-printed-links.jpg)

## 💡 Possible Solution

Similarly to QR-code detection we may try to "teach" the smartphone to _detect_ and _recognize_ printed links for us and also to make them _clickable_. This way you'll do just one click instead of multiple keystrokes. Your operational complexity goes from `O(N)` to `O(1)`.

![Links Detector Demo](https://raw.githubusercontent.com/trekhleb/links-detector/master/articles/printed_links_detection/assets/03-links-detector-demo.gif)

**Solution requirements**:

- Detection and recognition processes should have **close**-to-real-time performance (i.e. at least 0.5-1 frames per seccond).
- Only English text for now
- Only a black text on a white background for now
- Only `https://` links for now (not `http://`,not `ftp://`, ...) 

## 🧩 Solution Breakdown

- Technichal details on how the solution may be achieved
- Why TensorFlow, why Object Detection API
- Problems that needs to be solved (custom objects, well-known models don't work, small objects, real-time)
- Serverless, working on Mobile solution (why no backend? why lightweight model?)
- Possible models review, why MobileNet
- The issue with the Dataset (there is none)
- I'm just learning and don't have too much of experience, wanted to experiment



Let's assume that we want to achieve close-to-real-time performance.

The task of finding the links on the image and making the mclickable may be split into two parts:

1. Links **detection** (finding the position of the links)
2. Links **recognition** (recognizing the text of the links)

## 📝 Creating the Dataset Manually

- Making pictures of the book
- What tools to use to add bounding boxes
- How to convert to protobuf
- Issues with custom dataset (fonts, colors, bolds, underlined, etc.)
- Train/test split approach

### 🌅 Preprocessing the data

- Data preprocessing: resize, crop square, color adjustment

### 🔖 Labeling the dataset

- How to use LabelImg

### 🗜 Exporting the dataset

- Protobuf (the way of storing the dataset)

## 📚 Generating the Dataset Automatically (?)

- Automated way of generating the dataset
- Train/test split approach

## 📖 Exploring the Dataset

- Preview images with detection boxes
- Number of images (why is this enough)
- Do we need to preprocess the images

## 🛠 Installing Object Detection API 

- What is object detection API
- Why it will simplify our lives
- How it may be used

In [2]:
!git clone --depth 1 https://github.com/tensorflow/models

fatal: destination path 'models' already exists and is not an empty directory.


In [21]:
ls -la models

total 72
drwxr-xr-x  8 root root  4096 Nov 21 17:22 [0m[01;34m.[0m/
drwxr-xr-x  1 root root  4096 Nov 21 17:24 [01;34m..[0m/
-rw-r--r--  1 root root   337 Nov 21 17:22 AUTHORS
-rw-r--r--  1 root root  1015 Nov 21 17:22 CODEOWNERS
drwxr-xr-x  2 root root  4096 Nov 21 17:22 [01;34mcommunity[0m/
-rw-r--r--  1 root root   390 Nov 21 17:22 CONTRIBUTING.md
drwxr-xr-x  8 root root  4096 Nov 21 17:22 [01;34m.git[0m/
drwxr-xr-x  3 root root  4096 Nov 21 17:22 [01;34m.github[0m/
-rw-r--r--  1 root root  1104 Nov 21 17:22 .gitignore
-rw-r--r--  1 root root  1115 Nov 21 17:22 ISSUES.md
-rw-r--r--  1 root root 11405 Nov 21 17:22 LICENSE
drwxr-xr-x 12 root root  4096 Nov 21 17:22 [01;34mofficial[0m/
drwxr-xr-x  3 root root  4096 Nov 21 17:22 [01;34morbit[0m/
-rw-r--r--  1 root root  3668 Nov 21 17:22 README.md
drwxr-xr-x 23 root root  4096 Nov 21 17:22 [01;34mresearch[0m/


In [22]:
%%bash
cd ./models/research
protoc object_detection/protos/*.proto --python_out=.

In [23]:
%%bash
cd ./models/research
cp ./object_detection/packages/tf2/setup.py .
pip install . --quiet

ERROR: multiprocess 0.70.10 has requirement dill>=0.3.2, but you'll have dill 0.3.1.1 which is incompatible.
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.0 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
ERROR: apache-beam 2.25.0 has requirement avro-python3!=1.9.2,<1.10.0,>=1.8.1; python_version >= "3.0", but you'll have avro-python3 1.10.0 which is incompatible.


## ⬇️ Downloading Pre-Trained Model

- Model detection Zoo review
- What models we could use possibly
- Why I've picked the MobileNet model
- Diagram of the model architecture

## 🏄🏻‍♂️ Trying the Model (Inference)

- Show that model works for general purpose classes
- Show that model doesn't work for custom objects (links)

## 📈 Setting Up TensorBoard

- Why do we need it (for debugging)
- What we will monitor

## 👨‍🎓 Transfer Learning

- What is transfer learning
- Why don't we train the model from scratch
- Allows us to use small dataset

### ⚙️ Configuring the Detection Pipeline

- Performance issues: batch size
- Starting not from scratch: checkpoints

### 🏋🏻‍♂️ Model Training

- Error prone: saving checkpoints
- How many epochs
- Monitoring the performance while training

### 🚀 Evaluating the Model

- Checking how accurate our model is on test dataset
- Are we good with performance, should we save the model?
- It is not a general purpose anymore, does it recognize our custom objects?

## 🗜 Exporting the Model

- Saving the model to the file for further re-use
- Show the list of files, how the model looks like on dics
- What the size of the model

## 🚀 Evaluating the Exported Model

- Example of how to use the trained model

## 🗜 Converting the Model for Web

- What formats are sutable for the web
- Few words about Tensorflow.js
- Show list of exported files - how model looks like on disc
- What the size of the model
- Why it is split in chucnks and how they are connected (via model.json)

In [None]:
pip install tensorflowjs --quiet

[?25l[K     |█████▎                          | 10kB 26.6MB/s eta 0:00:01[K     |██████████▌                     | 20kB 12.7MB/s eta 0:00:01[K     |███████████████▊                | 30kB 9.5MB/s eta 0:00:01[K     |█████████████████████           | 40kB 8.3MB/s eta 0:00:01[K     |██████████████████████████▏     | 51kB 4.7MB/s eta 0:00:01[K     |███████████████████████████████▍| 61kB 5.3MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 4.1MB/s 
[?25h[?25l[K     |███▏                            | 10kB 23.4MB/s eta 0:00:01[K     |██████▍                         | 20kB 20.6MB/s eta 0:00:01[K     |█████████▌                      | 30kB 16.4MB/s eta 0:00:01[K     |████████████▊                   | 40kB 14.5MB/s eta 0:00:01[K     |███████████████▉                | 51kB 11.1MB/s eta 0:00:01[K     |███████████████████             | 61kB 11.1MB/s eta 0:00:01[K     |██████████████████████▏         | 71kB 7.4MB/s eta 0:00:01[K     |██████████████████████

## 🤔 Conclusions

- I'm just an amatour
- Links to demo app
- Issues and limitations of this approach
- Links to my ML repositories that thy might like