# Data preparation

## 1. Prepare raw data

Download raw data.

In [1]:
!mkdir datasets
!wget https://raw.githubusercontent.com/ifzhang/FairMOT/master/videos/MOT16-03.mp4 -O datasets/MOT16-03.mp4

--2023-05-18 15:14:26--  https://raw.githubusercontent.com/ifzhang/FairMOT/master/videos/MOT16-03.mp4
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25038034 (24M) [application/octet-stream]
Saving to: ‘datasets/MOT16-03.mp4’


2023-05-18 15:14:27 (267 MB/s) - ‘datasets/MOT16-03.mp4’ saved [25038034/25038034]



Split video into clips

In [4]:
!pip install opencv-python

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting opencv-python
  Downloading opencv_python-4.7.0.72-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.8/61.8 MB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: opencv-python
Successfully installed opencv-python-4.7.0.72


In [5]:
import os
import cv2

def split_video(video_path, clip_dir="./datasets/clips"):
    cap = cv2.VideoCapture(video_path)
    
    if not os.path.exists(clip_dir):
        os.makedirs(clip_dir)
    
    if (cap.isOpened()== False): 
        print("Error opening video stream or file")
    
    frame_width = int(cap.get(3))
    frame_height = int(cap.get(4))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
    frame_num = 200
    
    frame_cnt = 0
    clip_cnt = 0
    while(cap.isOpened()):
        ret, frame = cap.read()
        if ret == True:
            if frame_cnt % frame_num == 0:
                if frame_cnt > 0:
                    out.release()
                
                clip_path = f"{clip_dir}/sample_{clip_cnt}.mp4"
                out = cv2.VideoWriter(clip_path, fourcc, fps, (frame_width, frame_height))
                print(f"save clip: {clip_path}")
                clip_cnt += 1
            out.write(frame)
            frame_cnt += 1
        else:
            break
    out.release()

clip_dir = "./datasets/clips"
split_video('./datasets/MOT16-03.mp4', clip_dir=clip_dir)

save clip: ./datasets/clips/sample_0.mp4
save clip: ./datasets/clips/sample_1.mp4
save clip: ./datasets/clips/sample_2.mp4
save clip: ./datasets/clips/sample_3.mp4
save clip: ./datasets/clips/sample_4.mp4
save clip: ./datasets/clips/sample_5.mp4
save clip: ./datasets/clips/sample_6.mp4
save clip: ./datasets/clips/sample_7.mp4


In [6]:
bucket_name = "sagemaker-us-east-1-822507008821"
prefix = "sm-bytetrack"
sample_data_s3uri = f"s3://{bucket_name}/{prefix}/sample-data"

In [7]:
!aws s3 cp --recursive $clip_dir $sample_data_s3uri

upload: datasets/clips/sample_7.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_7.mp4
upload: datasets/clips/sample_0.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_0.mp4
upload: datasets/clips/sample_4.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_4.mp4
upload: datasets/clips/sample_2.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_2.mp4
upload: datasets/clips/sample_3.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_3.mp4
upload: datasets/clips/sample_1.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_1.mp4
upload: datasets/clips/sample_5.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_5.mp4
upload: datasets/clips/sample_6.mp4 to s3://sagemaker-us-east-1-822507008821/sm-bytetrack/sample-data/sample_6.mp4


## 2. Label raw data

You can follow the [SageMaker Ground Truth guide](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-getting-started.html) to run the below tasks:
- Step-1: [create a Private workforce(team)](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-cognito.html)
- Step-2: Add a worker into the private team you created, you will receive an email with the title like `You're invited by aws to work on a labeling project.` which includes `User name`, `Temporary password` and the link for login
- Step-3: Create a labeling job with  as the input datasets in SageMaker Ground Truth Console.
    - Choose `Automated data setup` as `Input data setup`
    - Set `sample_data_s3uri` to `S3 location for input datasets`
    - In `S3 location for output datasets`, choose `Specify a new location` and set the s3 bucket as name such as `s3://{bucket_name}/{prefix}/sample-data-gt`
    - In `IAM Role`, choose a IAM role or create a new one which can access S3 bucket
    - Run `Complete data setup` to complete your input data setup. It will take minutes in this step. You will see `Input data connection successful.` once this step is done.
    - In `Data type`, Choose `Video->Video files`
    - In `Frame extraction`, choose `Use every 5 frame from a video to create a labeling task.`
    - In `Task type->Task category`, choose `Video - Object tracking` and select `Bounding box`
    - Click `next` to go to `Select workers and configure tool`, and choose `Private` in `Worker types`
    - Choose the private team you created before in `Private teams`
    - Leave default values for `Task timeout`
    - In `Video object tracking`, fill in `Task description` with description such as `this is a labelling task video tracking`
    - In `Label values`, add `person` and `car` as label and create labeling job.
    - You will the the status of the labelling job you created is `In progress`
    - Go to `Ground Truth->Labeling workforces` and choose `Private`, you will see `https://xxxxx.labeling.us-east-1.sagemaker.aws` under `Labeling portal sign-in URL` in `Private workforce summary`, and open this link.
    - You can the labeling job you just created before which title is `Track objects across video frames: this is a labelling task video tracking`
    - By clicking `Start working` button, you can start labeling job.
    
- Step-4: Label data by following [the guide](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-object-tracking.html). To accelarate the labeling, you can use the `Predict` function to predict the boxes in the current frame.
<img align="center" src="img/label_video.png"></img>


Once finishing a labeling task, you can get the following annotation directory in the defined S3 path.

<div align="center">
    <img width=300 src="img/gt_structure.png">
    <figcaption>Ground Truth Structure</figcaption>
</div>

Under manifest directory, there should be an `out` folder created if we finish labeling all files.
<div align="center">
    <img width=300 src="img/gt_manifest_structure.png">
    <figcaption>Manifest in Ground Truth Structure</figcaption>
</div>

You will see a file `output.manifest` like this:
<div align="center">
    <img width=600 src="img/out_manifest.png">
    <figcaption>output.manifest</figcaption>
</div>

Refer to [Use Amazon SageMaker Ground Truth to Label Data](https://docs.aws.amazon.com/sagemaker/latest/dg/sms.html) for guide of labeling data. You can choose either video files or frame files to label data. 