First, install tesseract on Colab Notebook, the OCR library that our script will use to extract the data.

In [0]:
!sudo apt install tesseract-ocr
!pip install pytesseract

Import the libraries that we'll use on the script and put the extractor class (class that makes work happen) on the script.

In [0]:
import cv2
import numpy as np
import pandas as pd
import pytesseract
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
import random
import json

class SpeedExtractor:
  def __init__(self, video_file):
    self.video_file = video_file
    self.capture = cv2.VideoCapture(video_file)
    self.data_array = []

  def get_frames_quantity(self):
    return self.capture.get(cv2.CAP_PROP_FRAME_COUNT)

  def set_capture_to_frame(self, frame):
    self.capture.set(cv2.CAP_PROP_POS_FRAMES, frame)
  
  def get_random_frame(self):
    current_frame_position = self.capture.get(cv2.CAP_PROP_POS_FRAMES)
    random_frame_position = random.randint(0, self.get_frames_quantity())
    self.set_capture_to_frame(random_frame_position)
    success, image = self.capture.read()
    self.set_capture_to_frame(current_frame_position)
    return image

  def set_roi(self, x0, x1, y0, y1):   #region of interest
    self.x0 = x0
    self.x1 = x1
    self.y0 = y0
    self.y1 = y1

  def get_roi(self, image):   #region of interest
    return image[self.y0:self.y1, self.x0:self.x1]

  def pre_process_image(self, image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    kernel = np.ones((1, 1), np.uint8)
    image = cv2.dilate(image, kernel, iterations=1)
    image = cv2.erode(image, kernel, iterations=1)
    ret, image = cv2.threshold(image, 130, 255, cv2.THRESH_BINARY_INV)
    return image

  def get_speed_from_image(self, image):
    speed = pytesseract.image_to_string(image, config='--psm 10 --oem 3 digits')
    try:
      speed = int(speed)
    except:
      speed = ''
    return speed

  def extract_speed(self, frame):
    roi = self.get_roi(frame)
    pre_processed_roi = self.pre_process_image(roi)
    speed = self.get_speed_from_image(pre_processed_roi)
    return speed, pre_processed_roi

  def export_xls(self, filename):
    print(self.data_array)
    pd.read_json(json.dumps(self.data_array)).to_excel(filename + ".xlsx")

  def run(self, interval, decimal_digits):
    last_time = 0.0
    self.set_capture_to_frame(0)
    while self.capture.isOpened():
      ret, current_frame = self.capture.read()
      if ret:
        time = self.capture.get(cv2.CAP_PROP_POS_MSEC)/1000
        if time - last_time > interval:
          speed, analyzed_image = extractor.extract_speed(current_frame)
          print("-------------")
          cv2_imshow(analyzed_image)
          
          rounded_time = round(time, decimal_digits)
          print("Time:", rounded_time)
          print("Speed:", speed)

          data_object = {
            'time': rounded_time,
            'speed': speed
          }
          self.data_array.append(data_object)
          
          last_time = time
        cv2.waitKey(1) & 0xff
      else:
        break
    print("-- JSON DATA REPORT --")
    print(self.data_array)

print("OK!")

Put the video that will be taken the data on the (colab) notebook. Just drag the video to the file section on the left side and wait to upload.
<br/>
![How to import file on notebook](https://github.com/ViniciusGambi/jupyter-waze-speed-extractor/blob/master/.github/import.JPG?raw=true)

Change the file name on the SpeedExtractor. After run cell you should be able to see a random frame of the video, otherwise, the script was unable to access the file. Make sure the file has been uploaded, ensuring that a random frame is displayed.

In [0]:
extractor = SpeedExtractor('file_example.mp4')

first_frame = extractor.get_random_frame()
graph = plt.imshow(cv2.cvtColor(first_frame, cv2.COLOR_BGR2RGB))

Place x0, x1, y0 and y1 coordinates of the image area that contains only the speed in the frame, in their respective variables. You can run this cell many times until you find the perfect coordinates. The image below shows what a perfect return would look like.
<br/>
![Goal Image](https://github.com/ViniciusGambi/jupyter-waze-speed-extractor/blob/master/.github/example.png?raw=true)



In [0]:
x0 = 55
x1 = 136
y0 = 1109
y1 = 1170

first_frame = extractor.get_random_frame()
extractor.set_roi(x0, x1, y0, y1)
first_frame_roi = extractor.get_roi(first_frame)
height, width, channels = first_frame.shape
print("Frame dimensions:", width, height)
cv2_imshow(first_frame_roi)

Adjust the interval of time you want in your data and a quantity of decimal digits you want to have on time attribute. Then run the cell and wait until all the video be analyzed.

In [0]:
extract_interval =  0.1
decimal_digits_of_time = 1

extractor.run(extract_interval, decimal_digits_of_time)

Run this cell to export data to a .xls (excel file). Change filename however you want.

In [0]:
extractor.export_xls('filename')

Refresh Files and download the XLS file. (Right Click on xls file and download)
<br/>
![](https://github.com/ViniciusGambi/jupyter-waze-speed-extractor/blob/master/.github/download.JPG?raw=true)

Ready! If you made it this far you will have something like this.
<br/>
![XLS File](https://github.com/ViniciusGambi/jupyter-waze-speed-extractor/blob/master/.github/spreadsheet.JPG?raw=true)

Now you can manipulate this data however you need, filtering, plotting etc!
<br/>
![Plot example](https://github.com/ViniciusGambi/jupyter-waze-speed-extractor/blob/master/.github/calibration.png?raw=true)

I hope this can help you! 

<br/>
Developed by Vinicius Gambi.