<a href="https://colab.research.google.com/github/adriaanslechten/colabs/blob/main/build_a_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## source: https://colab.research.google.com/github/Giffy/CarCrashDetector/blob/master/1_Building_a_Dataset.ipynb

# Building a Dataset

To build a Dataset is the first challenge, sometimes there are huge datasets ready to be downloaded ( https://www.kaggle.com , in public organizations http://governobert.gencat.cat/en/dades_obertes/ , etc). Unfortunatelly, there was not any dataset available. Where could we get thousands of videos recorded with dshboard camera? Right, in www.youtube.com

First task was to download the candidate videos in order to process the images and homogenize the data.
We used OpenCV library to extract the frames, and scikit-image to modify and resize them. Frames are converted to 640 pixels width and from colors to grayscale.

### **INPUT required** : a video with mp4 format stored in Google Drive folder called "content"

# Index
<ol>
    <li><a href="#env_setup">Environment setup </a>
    <li><a href="#drive_setup">Connection to Google drive</a>
    <li><a href="#variables">Constants and variables</a>
    <li><a href="#video_frame">Transform videos to frames</a>

<a id="env_setup"> </a>
## 1. Environment setup and library import

In [None]:
!pip install --upgrade pip > /dev/null
!pip install scikit-image==0.13.1
!pip install opencv-python==3.4.0.12

Collecting scikit-image==0.13.1
  Downloading scikit-image-0.13.1.tar.gz (26.1 MB)
[K     |████████████████████████████████| 26.1 MB 113 kB/s 
Building wheels for collected packages: scikit-image
  Building wheel for scikit-image (setup.py) ... [?25lerror
[31m  ERROR: Failed building wheel for scikit-image[0m
[?25h  Running setup.py clean for scikit-image
Failed to build scikit-image
Installing collected packages: scikit-image
  Attempting uninstall: scikit-image
    Found existing installation: scikit-image 0.16.2
    Uninstalling scikit-image-0.16.2:
      Successfully uninstalled scikit-image-0.16.2
    Running setup.py install for scikit-image ... [?25l[?25herror
  Rolling back uninstall of scikit-image
  Moving to /usr/local/bin/skivi
   from /tmp/pip-uninstall-v4tisjnh/skivi
  Moving to /usr/local/lib/python3.7/dist-packages/scikit_image-0.16.2.dist-info/
   from /usr/local/lib/python3.7/dist-packages/~cikit_image-0.16.2.dist-info
  Moving to /usr/local/lib/python3.7/dist-

In [None]:
import cv2
from skimage.color import rgb2gray
from skimage.transform import resize
import matplotlib.pyplot as plt
import math

<a id="drive_setup"> </a>
## 2. Link Goggle Drive with Colab

Run the code and follow the link to get an authentification key, copy it and paste in the box that will appear in jupyter notebook. After fist key the script will ask for a second authentification key, follow the process as above.

Source : https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d 

In [None]:
# Check is Link to Drive is OK
google = !if [ -d 'GDrive/' ]; then echo "1" ; else echo "0"; fi
if (google[0] is '0' ):
  from google.colab import drive
  drive.mount('/content/GDrive/')
!if [ -d 'GDrive/' ]; then echo "Connection to Google drive successful" ; else echo "Error to connect to Google drive"; fi

Connection to Google drive successful


<a id="variables"> </a>
## 3. Constants and variables

In [None]:
# Make a temporarty directory in Google colab
!mkdir -p /content/pushups/Sources/frames 
!mkdir -p /content/pushups/frames 


# copy video stored in Google Drive /content/ to Colab
!cp GDrive/My\ Drive/content/*.mp4 pushups/Sources/

In [None]:
INPUT_VIDEOS_PATH = 'pushups/Sources'              # Path to folder with videos 
OUTPUT_FRAMES_PATH = 'pushups/frames'      # Location of extracted images

frame_name = 'frame'                                                  # Frame name       
one_frame_each = 4                                                    # Extract one frame each 70


!if [ -d {OUTPUT_FRAMES_PATH} ]; then echo "Output to be stored in "{OUTPUT_FRAMES_PATH} ; else mkdir {OUTPUT_FRAMES_PATH} && echo "Output directory created"; fi

files = !ls {INPUT_VIDEOS_PATH}/*.mp4                                 # Video file names in INPUT VIDEOS PATH
videofile = files[0]                                                  # Refactor of variable 'files' to convert the first video of list 

Output to be stored in pushups/frames


<a id="video_frame"> </a>
## 4. Transform videos to frames

In [None]:
count = 0
success = True

vidcap = cv2.VideoCapture(videofile)
while success:
    if (count%one_frame_each == 0):                                   # checks frame number and keeps one_frame_each          
        success,image = vidcap.read()                                 # reads next frame           
        image_gray = rgb2gray(image)                                  # grayscale image
        tmp = resize(image_gray, (math.floor(640 / image_gray.shape[1] * image_gray.shape[0]), 640),mode='constant') #resize it
        plt.imsave(f"{OUTPUT_FRAMES_PATH}/{frame_name}{count}.png", tmp, cmap= plt.cm.gray) # saves images to frame folder
        print (f'{count}', end=" ")
    else:
        success,image = vidcap.read()                                 # reads next frame
    count += 1                                                        

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220 224 228 232 236 240 244 248 252 256 260 264 268 272 276 280 284 288 292 296 300 304 308 312 316 320 324 328 332 336 340 344 348 352 356 360 364 368 372 376 380 384 388 392 396 400 404 408 412 416 420 424 428 432 436 440 444 448 452 456 460 464 468 472 476 480 484 488 492 496 500 504 508 512 516 520 524 528 532 536 540 544 548 552 556 560 564 568 572 576 580 584 588 592 596 600 604 608 612 616 620 624 628 632 636 640 644 648 652 656 660 664 668 672 676 680 684 688 692 696 700 704 708 712 716 720 724 728 732 736 740 744 748 752 756 760 764 768 772 776 780 784 788 792 796 800 804 808 812 816 820 824 828 832 836 840 844 848 852 856 860 864 868 872 876 880 884 888 892 896 900 904 908 912 916 920 924 928 932 936 940 944 948 952 956 960 964 968 972 976 980 984 988 992 996 1000 1004 1008 1012 1016 102

SystemError: ignored

In [None]:
!cp pushups/frames/* GDrive/My\ Drive/content/frames

In [None]:
!ls GDrive/My\ Drive/content/frames

frame0.png     frame2596.png  frame4196.png  frame5792.png  frame7392.png
frame1000.png  frame2600.png  frame4200.png  frame5796.png  frame7396.png
frame1004.png  frame2604.png  frame4204.png  frame5800.png  frame7400.png
frame1008.png  frame2608.png  frame4208.png  frame5804.png  frame7404.png
frame100.png   frame260.png   frame420.png   frame5808.png  frame7408.png
frame1012.png  frame2612.png  frame4212.png  frame580.png   frame740.png
frame1016.png  frame2616.png  frame4216.png  frame5812.png  frame7412.png
frame1020.png  frame2620.png  frame4220.png  frame5816.png  frame7416.png
frame1024.png  frame2624.png  frame4224.png  frame5820.png  frame7420.png
frame1028.png  frame2628.png  frame4228.png  frame5824.png  frame7424.png
frame1032.png  frame2632.png  frame4232.png  frame5828.png  frame7428.png
frame1036.png  frame2636.png  frame4236.png  frame5832.png  frame7432.png
frame1040.png  frame2640.png  frame4240.png  frame5836.png  frame7436.png
frame1044.png  frame2644.png  frame4244