# **Setup**
Requirement 1: A python Jupyter notebook residing in a GitHub repository

User Story 1: As a USER, I want to be able to clone the Github Repo successfully, so that I would be able to use the online reference.

User Story 2: As a USER, I want to be able to load up the localhost notebook successfully, so that I can run it locally.

User Story 3: As a USER, I want to have a browser in a linux, mac or windows environment that has python 3.x, Jupyter and other necessary dependencies installed, so that it can successfully run on either of the 3 browsers.  


In [None]:
# @title Initialize paths test
# @markdown The codes in this cell sets up 5 path variables:
# @markdown
# @markdown
# @markdown 1. home_path: defines the path for the root directory of the project
# @markdown 2. data_path: defines the path for storing video datas
# @markdown 3. checkpoint_path: defines the path for saving and loading of model checkpoints
# @markdown 4. skeleton_path: defines the path for storing output related to skeleton data
# @markdown 5. model_path: defines the path for storing output related to the project's model

import os
%cd /content

home_path = "/content/ICT3104Project/FollowYourPose"
data_path = os.path.join(home_path, "data")
checkpoint_path = os.path.join(home_path, "checkpoints")
skeleton_path = os.path.join(home_path, "outputSkeleton")
model_path = os.path.join(home_path, "output")

/content


The codes in the cell below uses Git commands to configure global settings to one of the github account and credential such as the name, email, etc so that it is able to clone the repo.

In [None]:
#@title Clone Repository

!git clone https://github.com/2102673/ICT3104ProjectTeam03.git

Cloning into 'ICT3104Project'...
remote: Enumerating objects: 231, done.[K
remote: Counting objects: 100% (83/83), done.[K
remote: Compressing objects: 100% (55/55), done.[K
remote: Total 231 (delta 20), reused 68 (delta 15), pack-reused 148[K
Receiving objects: 100% (231/231), 95.92 MiB | 40.19 MiB/s, done.
Resolving deltas: 100% (29/29), done.


In [None]:
#@title Install requirements
# @markdown List of the different python libraries and packages of the suitable versions required for installations

#@markdown The codes in this cell consist of the various Python libraries and packages of the suitable versions that are required for installations in order for the project to work.

!pip install -q diffusers[torch]==0.11.1 transformers==4.26.0 bitsandbytes==0.35.4 decord accelerate omegaconf einops ftfy imageio-ffmpeg xformers
!pip install -q gradio==3.50.2
!pip install -q av

!apt-get install -y imagemagick > /dev/null 2>&1

!pip uninstall -q -y torch torchvision triton
!pip install -q torch==2.0.0 torchvision -f https://download.pytorch.org/whl/cu118/touch_stable.html
!pip install -q xformers==0.0.19 triton==2.0.0 -U
!pip uninstall -q -y nvidia-cudnn-cu11
!pip install -q nvidia-cudnn-cu11==8.6.0.163

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m524.9/524.9 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.6/13.6 MB[0m [31m51.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [None]:
#@title Download pretrained model
#@markdown The codes in this cell is responsible for downloading a pretrained model for our project which is stored in the home_path directory.
%cd $home_path

#@markdown Name/Path of the initial model.
MODEL_NAME = "YueMafighting/FollowYourPose_v1" #@param {type:"string"}

!git lfs install
!git clone https://huggingface.co/$MODEL_NAME $checkpoint_path

/content/ICT3104Project/FollowYourPose
Updated git hooks.
Git LFS initialized.
Cloning into '/content/ICT3104Project/FollowYourPose/checkpoints'...
remote: Enumerating objects: 42, done.[K
remote: Total 42 (delta 0), reused 0 (delta 0), pack-reused 42[K
Unpacking objects: 100% (42/42), 584.62 KiB | 8.73 MiB/s, done.
Filtering content: 100% (9/9), 9.75 GiB | 52.15 MiB/s, done.


# **Data Exploration**
Requirement 2: A Data Exploration section in the notebook that can load, list and display video data from the Charades project

Related User Stories:
User Story 4: As a USER, I want to interact with a Data Exploration section within the notebook, so that it can load, list and display video data from the Charades project.

User Story 5: As a USER, I want to see video data organized in subfolders within a data folder in the repository, so that I can easily access and select the videos for evaluation.

User Story 6: As a USER, I want to be able to play the selected video in an output cell within the notebook, so that I can conveniently view and interact with the video content directly.

User Story 7: As a USER, I want to be able to play the selected video in an output cell within the notebook, so that I can conveniently view and interact with the video content directly.

User Story 39: As a USER, i want to be able to load videos from another dataset, so that i can produce different kind of results.

User Story 34: As a USER, I want to automatically resize the uploaded video to a resolution of 512 pixels by 512 pixels so that I can use it for training the inference.

The codes in this cell is responsible for uploading the charades video into the "Data Folder" which is the path that we have chosen to store it at.

After running the cell, users would be prompt to upload a video locally from their selected directory by clicking on the "Choose Files" button.

In [None]:
#@title Upload charades video into Data folder
#@markdown Execute the cell to select a video to upload, the uploaded video will be saved into the data folder.

auto_resize_to_512_512 = True #@param {type:"boolean"}

%cd $home_path
from EditVideo import EditVideo

#Change directory into data folder
%cd $data_path

import os
import shutil
from google.colab import files

# Create a function to upload and organize files
def upload_and_organize_files():
    # Upload the files
    uploaded = files.upload()

    for file_name, file_content in uploaded.items():
      print(file_name)
      # Extract the file name without the extension
      file_name_without_extension = os.path.splitext(file_name)[0]

      # Create a directory with the same name as the file
      file_directory = os.path.join(data_path, file_name_without_extension)
      os.makedirs(file_directory, exist_ok=True)

      # Save the file in its respective directory
      file_path = os.path.join(file_directory, file_name)
      if (auto_resize_to_512_512):
        editVideo = EditVideo(file_name)
        editVideo.resize(512, 512)
        editVideo.save(file_path)

        os.remove(file_name)
      else:
        with open(file_path, 'wb') as f:
            f.write(file_content)

        # Use shutil.move to move the file to the target directory
        shutil.move(file_name, file_path)

        print(f'Saved file {file_name} in directory {file_path}')


# Call the function to upload and organize files
upload_and_organize_files()

/content/ICT3104Project/FollowYourPose
/content/ICT3104Project/FollowYourPose/data


Saving 0AGCS.mp4 to 0AGCS.mp4
0AGCS.mp4
Original Before: width:480 height:270
Skeleton Before: width:480 height:270
Original After: width:512 height:512
Skeleton After: width:512 height:512
Moviepy - Building video /content/ICT3104Project/FollowYourPose/data/0AGCS/0AGCS.mp4.
MoviePy - Writing audio in 0AGCSTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/ICT3104Project/FollowYourPose/data/0AGCS/0AGCS.mp4






Moviepy - Done !
Moviepy - video ready /content/ICT3104Project/FollowYourPose/data/0AGCS/0AGCS.mp4


In [None]:
#@title Create Dropdown class


#@markdown The codes in this cell is responsible for creating 2 dropdown classes where one of it is to select the folder that we are going to retrieve videos from. The other dropdown is to select the video that is stored in the folder that was previously selected.

#@markdown After selecting the specific folder and video, the cell would display the video selected for the users to see.

# @markdown Creates 2 dropdown classes:
# @markdown  1. to retrieve all the folders
# @markdown  2. to retrieve all the videos stored in each folder

import os
import ipywidgets as widgets
from ipywidgets import Video, HBox
from IPython.display import display, clear_output

# A dropdown class to get all folders and videos within each folder
class FolderDropdown:
    def __init__(self, data_path, toDisplayVideo=False):
        self.data_path = data_path
        self.folder_names = [f for f in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, f))]
        self.dropdown = widgets.Dropdown(
            options=['--Select a folder--'] + self.folder_names,
            description='Select a folder:'
        )
        self.video_dropdown = widgets.Dropdown(
            options=['--Select a video--'],
            description='Select a video:'
        )
        self.output = widgets.Output()
        self.dropdown.observe(self.on_dropdown_change)
        self.selected_folder = '--Select a folder--'
        self.selected_video = '--Select a video--'
        self.toDisplay = toDisplayVideo

    def display(self):
        display(self.dropdown)
        display(self.video_dropdown)
        display(self.output)

    def on_dropdown_change(self, change):
        with self.output:
            self.selected_folder = self.dropdown.get_interact_value()
            clear_output(wait=True)

            if change['type'] == 'change' and change['name'] == 'value' and not str(change['new']) == "--Select a folder--":
                selected_folder = str(change['new'])
                folder_path = os.path.join(self.data_path, selected_folder)
                condition = lambda file_name: not file_name.startswith(".ipynb") and file_name.endswith(".mp4")
                video_files = [f for f in os.listdir(folder_path) if condition(f)]

                self.video_dropdown.options = ['--Select a video--'] + video_files
                self.video_dropdown.observe(self.on_video_dropdown_change)

            elif str(change['new']) == "--Select a folder--":
              self.video_dropdown.options = ['--Select a video--']

    def on_video_dropdown_change(self, change):
        with self.output:
          self.selected_video = self.video_dropdown.get_interact_value()

          if change['type'] == 'change' and change['name'] == 'value' and not str(change['new']) == "--Select a video--":
            selected_video_name = str(change['new'])
            video_path = os.path.join(self.data_path, self.selected_folder, self.selected_video)
            if(self.toDisplay):
              video_player = Video.from_file(video_path)
              display(video_player)
    def get_Selected_Folder(self):
        return self.selected_folder

    def get_Selected_Video(self):
        return self.selected_video

    def get_Full_Video_Path(self):
        return os.path.join(self.data_path, self.selected_folder, self.selected_video)


# **MM Pose Section**
Creates skeleton poses based on input videos.



In [None]:
#@title Load API
%cd $home_path
from inference_followyourpose import *
config_runner = merge_config_then_run()

/content/ICT3104Project/FollowYourPose




Fetching Space from: https://huggingface.co/spaces/YueMafighting/mmpose-estimation
Loaded as API: https://yuemafighting-mmpose-estimation.hf.space ✔


In [None]:
#@title Select Video
#@markdown 1. Select the folder containing the video
#@markdown 2. Select the video to display
#@markdown 3. Video is displayed

#Select the original video
dropdown = FolderDropdown(data_path, toDisplayVideo=True)
dropdown.display()

Dropdown(description='Select a folder:', options=('--Select a folder--', '1114', 'X8AP2'), value='--Select a f…

Dropdown(description='Select a video:', options=('--Select a video--',), value='--Select a video--')

Output()

In [None]:
# @title
# @markdown Responsible to print out the full path of the selected video which includes folder name and
# @markdown video name
#Get the full path, folder name and video name from the FolderDropdown object
#(Run this after dropdown selection)
video_path=dropdown.get_Full_Video_Path()
folder_path = dropdown.get_Selected_Folder()
video_name = dropdown.get_Selected_Video()
print(video_path)

/content/ICT3104Project/FollowYourPose/data/1114/1114.mp4


In [None]:
#@title Edit Config file (configs/pose_sample.yaml)

%cd $home_path
#@markdown Setup Config file for inference
from omegaconf import OmegaConf

CONFIG_NAME = "configs/pose_sample.yaml" #@param {type:"string"}

pretrained_model_path = "./checkpoints/stable-diffusion-v1-4" #@param {type:"string"}
output_dir = "outputSkeleton" #@param {type:"string"}
prompts = "Iron man on the beach" #@param {type:"string"}
video_length = 24 #@param {type:"number"}
width = 512 #@param {type:"number"}
height = 512 #@param {type:"number"}
num_inference_steps = 50 #@param {type:"number"}
guidance_scale = 12.5 #@param {type:"number"}
num_inv_steps = 50 #@param {type:"number"}
train_batch_size = 1 #@param {type:"number"}
validation_steps = 100 #@param {type:"number"}
resume_from_checkpoint = "./checkpoints/followyourpose_checkpoint-1000" #@param {type:"string"}


config = {
  "pretrained_model_path": pretrained_model_path,
  "output_dir": output_dir,
  "validation_data": {
    "prompts": [
      prompts
    ],
    "video_length": video_length,
    "width": width,
    "height": height,
    "num_inference_steps": num_inference_steps,
    "guidance_scale": guidance_scale,
    "use_inv_latent": False,
    "num_inv_steps": num_inv_steps,
    "dataset_set": "val"
  },
  "train_batch_size": train_batch_size,
  "validation_steps": validation_steps,
  "resume_from_checkpoint": "./checkpoints/followyourpose_checkpoint-1000",
  "seed": 33,
  "mixed_precision": "no",
  "gradient_checkpointing": False,
  "enable_xformers_memory_efficient_attention": True
}

OmegaConf.save(config, CONFIG_NAME)





/content/ICT3104Project/FollowYourPose


In [None]:
#Define function parameters
#@markdown function parameters for MM Pose
target_prompt="A boy dancing on the street"#@param {type: "string"}
num_steps=50 #@param {type: "number"}
guidance_scale=12.5 #@param {type: "number"}
video_type="Raw Video" #@param {type: "string"}
user_input_video=None
start_sample_frame=0 #@param {type: "number"}
n_sample_frame=8 #@param {type: "number"}
stride=1 #@param {type: "number"}
left_crop=0 #@param {type: "number"}
right_crop=0 #@param {type: "number"}
top_crop=0 #@param {type: "number"}
bottom_crop=0 #@param {type: "number"}

In [None]:
#@title Generate Skeleton
#@markdown - Generate the skeleton video based on the config file parameters
#@markdown - The generated video will be saved in outputSkeleton folder
%cd $home_path
output = config_runner.run(video_path, target_prompt, num_steps, guidance_scale, video_type, user_input_video, start_sample_frame, n_sample_frame, stride, left_crop, right_crop, top_crop, bottom_crop)

#Move created skeleton and config file to folder
import shutil
rename_video_name = video_name.split(".")[0] + ".mp4"
config_name = "pose.yaml"
#path to the new location
renamed_video_path = os.path.join("./", output_dir, folder_path, rename_video_name)
renamed_config_path = os.path.join("./", output_dir, folder_path, config_name)
#Rename current output to input name
os.rename(output, rename_video_name)
#Make Directory
os.makedirs(os.path.join("./", output_dir, folder_path), exist_ok=True)
#Move output video and config to new directory
shutil.move(rename_video_name, renamed_video_path)
shutil.move(config_name, renamed_config_path)
print(f"Video saved to {renamed_video_path}")
print(f"Config saved to {renamed_config_path}")

/content/ICT3104Project/FollowYourPose
video rate is OK
Moviepy - Building video ./video_resized.mp4.
MoviePy - Writing audio in video_resizedTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video ./video_resized.mp4





Moviepy - Done !
Moviepy - video ready ./video_resized.mp4
video resized to 512 height
video fps: 24.0
broke the video into frames
video is shorter than the cut value
set stop frames to: 120
frame ./raw_frames/kang0.jpg/120: done;
frame ./raw_frames/kang1.jpg/120: done;
frame ./raw_frames/kang2.jpg/120: done;
frame ./raw_frames/kang3.jpg/120: done;
frame ./raw_frames/kang4.jpg/120: done;
frame ./raw_frames/kang5.jpg/120: done;
frame ./raw_frames/kang6.jpg/120: done;
frame ./raw_frames/kang7.jpg/120: done;
frame ./raw_frames/kang8.jpg/120: done;
frame ./raw_frames/kang9.jpg/120: done;
frame ./raw_frames/kang10.jpg/120: done;
frame ./raw_frames/kang11.jpg/120: done;
frame ./raw_frames/kang12.jpg/120: done;
frame ./raw_frames/kang13.jpg/120: done;
frame ./raw_frames/kang14.jpg/120: done;
frame ./raw_frames/kang15.jpg/120: done;
frame ./raw_frames/kang16.jpg/120: done;
frame ./raw_frames/kang17.jpg/120: done;
frame ./raw_frames/kang18.jpg/120: done;
frame ./raw_frames/kang19.jpg/120: done;



Moviepy - Done !
Moviepy - video ready mmpose_result.mp4
Video saved to ./outputSkeleton/1114/1114.mp4
Config saved to ./outputSkeleton/1114/pose.yaml


# **Training Section**
Requirement 4: A Training section in the notebook that can train a new genAI model based on the Charades project.

**Required Dropdown class (Run: Data Exploration > Create Dropdown class)**

User Story 12: As a USER, I want to access the Training section in the notebook, so that I can train new genAI models based on the Charades project.

User Story 13: As a USER, I want to select a dataset subfolder using intuitive UI elements within the data folder, so that it can be used for training purposes.

User Story 14: As a USER, I want to specify a name for the new model using user-friendly UI elements, so that I can easily name the new model.

User Story 15: As a USER, I want to set the batch size and number of training epochs conveniently using appropriate UI components before initiating the training sequence, so that I can customize the training parameters to the best possible parameters.

User Story 16: As a USER, I want to monitor the progress of model training through visual elements within the notebook, so that I can see the progress of the model.

User Story 17: As a USER, I want to be able to choose the best-trained model from a list of models so that it can be saved and any other model that is generated during the training process will be deleted.

User Story 18: As a USER, I want to have access to a designated folder where pretrained models are stored for future use, so that the models can be easily found when needed.

User Story 19: As a USER, I want the option to cycle through each video in the folder on multiple video consecutively, so that fine-tuning can be done.



In [None]:
#@title Add video properties into CharadesVideo.json
# @markdown Customizing the video name and captions of the selected video
import ipywidgets as widgets
from IPython.display import display
import json

%cd $home_path

# Define the text fields
id_field = widgets.Text(description='video name:')
scripts_field = widgets.Text(description='script:')

# Define help text using Label widgets
id_help = widgets.Label(value='Enter the video name (Without .mp4)')
scripts_help = widgets.Label(value='Enter a short description of the video')

error_label = widgets.HTML(value='')

# Define a button
button = widgets.Button(description='Submit')

# Define a function to be executed on button click
def on_button_click(b):
    id = id_field.value
    scripts = scripts_field.value
    if id is None or id.strip() == "":
      error_label.value = f"""<span style="color: red; font-size: 18px;">Invalid ID</span>"""
      return
    if scripts is None or scripts.strip() == "":
      error_label.value = f"""<span style="color: red; font-size: 18px;">Invalid scripts</span>"""
      return

    # Load existing JSON data
    with open('./data/CharadesVideo.json', 'r') as json_file:
        data = json.load(json_file)

    # Create a new JSON object
    new_data = {
        "id": id,
        "script": scripts
    }

    # Append the new object to the existing JSON list
    data.append(new_data)

    # Write the updated JSON data back to the file
    with open('./data/CharadesVideo.json', 'w') as json_file:
        json.dump(data, json_file, indent=4)
    error_label.value = f"""<span style="color: green; font-size: 18px;">Successfully Added Video ID: {id}</span>"""
    id_field.value = ""
    scripts_field.value = ""


# Attach the function to the button's click event
button.on_click(on_button_click)

# Display the widgets
display(id_help, id_field, scripts_help, scripts_field, button)
display(error_label)

/content/ICT3104Project/FollowYourPose


Label(value='Enter the video name (Without .mp4)')

Text(value='', description='video name:')

Label(value='Enter a short description of the video')

Text(value='', description='script:')

Button(description='Submit', style=ButtonStyle())

HTML(value='')

In [None]:
#@title Remove video properties from CharadesVideo.json (Optional)

#@markdown *Need to run the Dropdown class above under the Data Exploration section before you can run the training section

%cd $home_path

import ipywidgets as widgets
from IPython.display import display
import json

# Create a function to display IDs and remove on button click
def display_and_remove_id():
    # Load existing JSON data
    json_file_path = f'{data_path}/CharadesVideo.json'
    with open(json_file_path, 'r') as json_file:
        data = json.load(json_file)

    # Define a dropdown widget to display IDs
    id_dropdown = widgets.Dropdown(options=[item["id"] for item in data], description='Select ID:')

    # Define a button to remove the selected ID
    remove_button = widgets.Button(description='Remove ID')

    # Define an output widget for displaying success messages
    success_output = widgets.Output()

    # Define Error message label
    error_label = widgets.HTML(value='')

    # Define a function to remove the selected ID and display a success message
    def on_remove_button_click(b):
        selected_id = id_dropdown.value
        data[:] = [item for item in data if item["id"] != selected_id]
        id_dropdown.options = [item["id"] for item in data]

        # Save the updated data to the JSON file
        with open(json_file_path, 'w') as json_file:
            json.dump(data, json_file, indent=4)

        # Display a success message
        with success_output:
            error_label.value = f"""<span style="color: green; font-size: 18px;">Successfully removed ID: {selected_id}</span>"""

    # Attach the remove function to the remove button's click event
    remove_button.on_click(on_remove_button_click)

    # Display the widgets
    display(id_dropdown, remove_button, success_output)
    display(error_label)

# Call the function to display IDs and the remove button
display_and_remove_id()



/content/ICT3104Project/FollowYourPose


Dropdown(description='Select ID:', options=('1114',), value='1114')

Button(description='Remove ID', style=ButtonStyle())

Output()

HTML(value='')

In [None]:
#@title Edit Config File
%cd $home_path

#@markdown Setup Config file for training

from omegaconf import OmegaConf
#@markdown Config File Path
CONFIG_NAME = "configs/pose_train1.yaml"     #@param {type:"string"}


#@markdown Pretrained Model  Path
pretrained_model_path = "./checkpoints/stable-diffusion-v1-4" #@param {type:"string"}

#@markdown model Output Path
#Location and name of newly trained model
output_dir = "output/model1"                        #@param {type:"string"}

#@markdown train_data
train_prompt = "A boy dancing on the street"                    #@param {type:"string"}
n_sample_frames = 4                                 #@param {type:"number"}
# width = 512                                       #@param {type:"number"}
# height = 512                                      #@param {type:"number"}

learning_rate = 3e-5                                #@param {type:"number"}
#This might help determin no of epochs (refer train_followyourpose.py)
train_batch_size = 1                                #@param {type:"number"}
# might need to change the training code to take in batch size from config file
max_train_steps = 10                                #@param {type:"number"}
checkpointing_steps = 10                            #@param {type:"number"}
validation_steps = 10                              #@param {type:"number"}

skeleton_path = "./outputSkeleton/1114/1114.mp4"  #@param {type:"string"}

config = {
  "pretrained_model_path": "./checkpoints/stable-diffusion-v1-4",
  "output_dir": output_dir,
  "train_data": {
    "video_path": data_path,
    "prompt": train_prompt,
    "n_sample_frames": n_sample_frames,
    "width": 512,
    "height": 512,
    "sample_start_idx": 0,
    "sample_frame_rate": 4,
    "dataset_set": "train"
  },
  "validation_data": {
    "prompts": [
      "A Spider man on the snow"
    ],
    "video_length": 1,
    "width": 512,
    "height": 512,
    "num_inference_steps": 1,
    "guidance_scale": 12.5,
    "use_inv_latent": False,
    "num_inv_steps": 1,
    "dataset_set": "val"
  },
  "learning_rate": learning_rate,
  "train_batch_size": train_batch_size,
  "max_train_steps": max_train_steps,
  "checkpointing_steps": checkpointing_steps,
  "validation_steps": validation_steps,
  "trainable_modules": [
    "attn1.to_q",
    "attn2.to_q",
    "attn_temp",
    "conv_temporal"
  ],
  "skeleton_path": skeleton_path,
  "seed": 33,
  "mixed_precision": "no",
  "use_8bit_adam": False,
  "gradient_checkpointing": False,
  "enable_xformers_memory_efficient_attention": True
}

OmegaConf.save(config, CONFIG_NAME)

/content/ICT3104Project/FollowYourPose


In [None]:
#@title Start Training
# @markdown Responsible of running the training script along with a progress bar to illustrate the training process

#Using custom progressbar
%cd $home_path

from tqdm import tqdm
import subprocess
from omegaconf import OmegaConf

# Define the training script and configuration file
training_script = "train_followyourpose.py"
config_file = CONFIG_NAME  # Use the updated config file name

# Get the number of training steps from the config
config = OmegaConf.load(CONFIG_NAME)
train_steps = config.max_train_steps

# Define ANSI escape codes for blue color
blue_color = "\033[94m"  # Blue text color
end_color = "\033[0m"    # Reset to default color

# Create a colored progress bar with blue bars and custom formatting
bar_format = f"Training Progress: |{blue_color}{{bar}}{end_color}| {{percentage:3.0f}}% {{n_fmt}}/{{total_fmt}}"
with tqdm(total=train_steps, bar_format=bar_format, ncols=100, dynamic_ncols=True) as pbar:
    for step in range(train_steps):
        # Run the training script
        command = f"TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch {training_script} --config={config_file}"
        subprocess.run(command, shell=True)

        # Update the progress bar
        pbar.update(1)

#!TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch train_followyourpose.py --config=$CONFIG_NAME

/content/ICT3104Project/FollowYourPose


Training Progress: |[94m██████████[0m| 100% 10/10


In [None]:
#@title used to generate folder for model simulation (testing purposes)
import os
import random
import string

# Specify the directory where you want to create the folders
folder_path = model_path

# Function to generate a random folder name
def random_folder_name(length=10):
    return ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(length))

# Create 10 random folders
for _ in range(10):
    folder_name = random_folder_name()
    folder_full_path = os.path.join(folder_path, folder_name)
    try:
        os.makedirs(folder_full_path)
        print(f"Created folder: {folder_name}")
    except Exception as e:
        print(f"Error creating folder: {e}")

Created folder: pDSoliA17J
Created folder: 5HKWErhgvL
Created folder: ouvchQyHFZ
Created folder: RNPw5uQmvH
Created folder: HoEBnUzPxT
Created folder: envQPxfldH
Created folder: VRLnjqnSRw
Created folder: cuDv1XUdjh
Created folder: SFRmFFjEwY
Created folder: ySW5NxLcIh


In [None]:
#@title Delete multiple model
# @markdown Delete multiple unwanted model
import os
import shutil
import ipywidgets as widgets
from IPython.display import display

class FolderDeleter:
    def __init__(self, path):
        self.path = path
        self.folder_list = self.get_folders()
        self.folder_selection = widgets.SelectMultiple(options=self.folder_list, description='Folders:')
        self.delete_button = widgets.Button(description='Delete Selected Folders')
        self.output_label = widgets.HTML(value='')
        self.delete_button.on_click(self.delete_folders)

    def get_folders(self):
        return [f for f in os.listdir(self.path) if os.path.isdir(os.path.join(self.path, f)) and f != '.ipynb_checkpoints']

    def delete_folders(self, button):
        selected_folders = self.folder_selection.value
        if selected_folders:
            self.output_label.value = ""
            for folder in selected_folders:
                folder_path = os.path.join(self.path, folder)
                try:
                    shutil.rmtree(folder_path)
                    self.output_label.value += f'<p style="color: green; font-size: 18px;">Folder "{folder}" deleted successfully.</p>'
                except Exception as e:
                    self.output_label.value += f'<p style="color: red; font-size: 18px;">Error deleting folder "{folder}": {e}</p>'
            self.folder_list = self.get_folders()
            self.folder_selection.options = self.folder_list
            self.folder_selection.value = ()
        else:
            self.output_label.value = '<p style="color: red; font-size: 18px;">Please select folders to delete.</p>'

    def display_ui(self):
        display(self.folder_selection)
        display(self.delete_button)
        display(self.output_label)

folder_deleter = FolderDeleter(model_path)
folder_deleter.display_ui()

SelectMultiple(description='Folders:', options=('LPZK8BeGbR', 'model1', 'cux2S6qYJT', 'envQPxfldH', 'RNPw5uQmv…

Button(description='Delete Selected Folders', style=ButtonStyle())

HTML(value='')

# **Inference Section**
 Requirement 3: An Inference section in the notebook that can perform inference using a pretrained genAI model based on the Charades project.

 User Story 7: As a USER, I want to load pretrained genAI models conveniently through an appropriate UI component, so that it can be used as an inference model.

User Story 8: As a USER, I want to choose an input video from the Charades project, so that a text prompt can be added.

User Story 9: As a USER, I want to provide text prompts through user-friendly UI components for the genAI model, so that the caption of the video can be added.

User Story 10: As a USER, I want to view the inference results, including captions depicting actions or activities, in the form of output videos, so that the output can be reviewed.

In [None]:
#@title Edit Config File

#@markdown Setup Config file for inference
#@markdown - Change the inputs accordingly.

%cd $home_path

from omegaconf import OmegaConf

CONFIG_NAME = "configs/pose_sample.yaml" #@param {type:"string"}


pretrained_model_path = "./checkpoints/stable-diffusion-v1-4" #@param {type:"string"}
use_checkpoint = True #@param {type: "boolean"}
resume_from_checkpoint = "./checkpoints/followyourpose_checkpoint-1000" #@param {type:"string"}
if not use_checkpoint:
  resume_from_checkpoint = None
skeleton_video = "./outputSkeleton/1114/1114.mp4" #@param {type:"string"}
#@markdown For multiple prompts, separate each with a ";" (example: prompts1;prompts2)
prompts = "Iron man on the beach" #@param {type:"string"}
prompts = prompts.split(";")
video_length = 24 #@param {type:"number"}
width = 512 #@param {type:"number"}
height = 512 #@param {type:"number"}

config = {
  "pretrained_model_path": pretrained_model_path,
  "output_dir": "outputInference",
  "validation_data": {
    "prompts": prompts,
    "video_length": video_length,
    "width": width,
    "height": height,
    "num_inference_steps": 100,
    "guidance_scale": 12.5,
    "use_inv_latent": False,
    "num_inv_steps": 100,
    "dataset_set": "val"
  },
  "train_batch_size": 1,
  "validation_steps": 100,
  "resume_from_checkpoint": resume_from_checkpoint,
  "seed": 33,
  "mixed_precision": "no",
  "gradient_checkpointing": False,
  "enable_xformers_memory_efficient_attention": True
}

OmegaConf.save(config, CONFIG_NAME)


/content/ICT3104Project/FollowYourPose


In [None]:
#@title Running Inference
# @markdown Codes for inference
%cd $home_path

!TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch txt2video.py --config=$CONFIG_NAME --skeleton_path=$skeleton_video

/content/ICT3104Project/FollowYourPose
2023-11-17 06:17:13.119770: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-17 06:17:13.119818: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-17 06:17:13.119854: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-17 06:17:13.130966: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
The followi

# **Testing Section**
Requirement 5: A Testing section in the notebook that will evaluate a trained model based on the Charades project.

User Story 20: As a USER, I want to access the Testing Section in the notebook, so that I can evaluate a trained model based on the Charades project.

User Story 21: As a USER, I want to choose a dataset subfolder from the data folder, so that the chosen dataset can be used for testing.

User Story 22: As a USER, I want to run the testing sequence, so that I can see all the results and visually assess the quality of the output content.

User Story 23: As a USER, I want to see visual elements, so that I can have a indication on the progress of testing in the notebook.

User Story 24: As a USER, I want to save the results to a results folder in the repo, so that the results are being organized neatly.

User Story 25: As a USER, I should be able to select up to 5 trained models, so that results can be quickly obtained.

User Story 26: As a USER, I should obtain the testing results, so that the results can be utilized to train the model.

User Story 27: As a USER, I should be able to represent each testing result as part of a bar graph, so that I can easily visualize the results.

User Story 34: As a USER, I want to automatically resize the uploaded video to a resolution of 512 pixels by 512 pixels so that I can use it for training the inference.

User Story 37: As a USER, I want to be able to view the resultant gif side by side with the original input video so that I can compare the performance of the 2 videos.


User Story 36 As a USER, I want to have a function to cut the input video length so that the length of the video can be customized to the length that is appropriate.


In [None]:
#@title Selects models for batch
%cd $home_path
import os
import shutil
import ipywidgets as widgets
from IPython.display import display

allModel = [f for f in os.listdir(model_path) if os.path.isdir(os.path.join(model_path, f)) and f != '.ipynb_checkpoints']
model_selection = widgets.SelectMultiple(options=allModel, description='Models:')
preview = widgets.Button(description='Preview')
output_label = widgets.HTML(value='')
def show_selected_model(button):
  text = '<p style="color: green; font-size: 24px;">Selected Model</p>'
  for each in get_selected_model():
    text += f'<p style="font-size: 18px;">{each}</p>'
  output_label.value = f'{text}'
def get_selected_model():
  return model_selection.get_interact_value()
display(model_selection)
display(preview)
display(output_label)

preview.on_click(show_selected_model)


SelectMultiple(description='Models:', options=('ixsFA93Rud', 'SpbQPyTRMh', '7ODOezTgdo', 'YJWbh80Z0n', 'ZEwwJ1…

Button(description='Preview', style=ButtonStyle())

HTML(value='')

In [None]:
#@title Select skeleton
%cd $home_path
skeletonDropdown = FolderDropdown(skeleton_path, toDisplayVideo=True)
skeletonDropdown.display()


Dropdown(description='Select a folder:', options=('--Select a folder--', 'X8AP2'), value='--Select a folder--'…

Dropdown(description='Select a video:', options=('--Select a video--',), value='--Select a video--')

Output()

The sections below runs batch inference

*Need to run the Dropdown class above under the Data Exploration section before you can run the testing section


In [None]:
#@title Testing
#Loop Inference base on No. of models

from tqdm import tqdm
import subprocess

%cd $home_path

from omegaconf import OmegaConf

inference_script = "txt2video.py"

models = get_selected_model()

# Define the total number of iterations (models in this case)
total_iterations = len(models)

with tqdm(total=total_iterations, desc="Testing Progress") as pbar:
  for model in models:
    #Path to config file and skeleton video
    configFile = os.path.join(skeleton_path, skeletonDropdown.get_Selected_Folder(), "pose.yaml")
    skeleton_video = skeletonDropdown.get_Full_Video_Path()

    #load config file
    config = OmegaConf.load(configFile)
    #update config to use selected model
    config.pretrained_model_path = f'{model_path}/{model}'
    #Save config file
    OmegaConf.save(config, configFile)

    command = f"TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch txt2video.py --config={configFile} --skeleton_path={skeleton_video}"
    process = subprocess.run(command, shell=True, capture_output=True, text=True)
    output = process.stdout
    print(f"Inference Path: {output}")
    # Update the progress bar
    pbar.update(1)


  # !TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch txt2video.py --config=$configFile --skeleton_path=$skeleton_video



/content/ICT3104Project/FollowYourPose


Testing Progress:   0%|          | 0/4 [00:00<?, ?it/s]

/content/ICT3104Project/FollowYourPose/outputSkeleton/X8AP2/pose.yaml


Testing Progress:  25%|██▌       | 1/4 [00:18<00:54, 18.24s/it]

/content/ICT3104Project/FollowYourPose/outputSkeleton/X8AP2/pose.yaml


Testing Progress:  50%|█████     | 2/4 [00:36<00:36, 18.11s/it]

/content/ICT3104Project/FollowYourPose/outputSkeleton/X8AP2/pose.yaml


Testing Progress:  75%|███████▌  | 3/4 [00:54<00:18, 18.11s/it]

/content/ICT3104Project/FollowYourPose/outputSkeleton/X8AP2/pose.yaml


Testing Progress: 100%|██████████| 4/4 [01:12<00:00, 18.09s/it]


## Video Editing and display ##

In [None]:
#@title Create EditVideo class
#@markdown We have create EditVideo class in EditVideo.py.Below are the functions it supports:

#@markdown - edit FPS for original and skeleton video to be in sync with the GIF
#@markdown - resize original and skeleton video by inputed width and height
#@markdown - Trim the original video by start and end frame
#@markdown - Trim the original video by start and end Time in seconds
#@markdown - Perform super-impose on original+skeleton and gif+skeleton
#@markdown - Preview the edited video, or inputed video if user inputed a video path
#@markdown - Place edited original and edited gif video side by side and save it
#@markdown - Save the edited original video


In [None]:
#@title Display Super-imposed

%cd $home_path

from EditVideo import EditVideo

original_video_path = "./data/1114/1114.mp4" #@param{type:"string"}
skeleton_video_path = "./outputSkeleton/1114/1114.mp4" #@param{type:"string"}
gif_path = "./outputInference/stable-diffusion-v1-4/1114/inference/sample-1000-33-2023-11-17 06:18:27.095904/Iron man on the beach.gif" #@param{type:"string"}
width = 512 #@param {type:"integer"}
height = 512 #@param {type:"integer"}
output_super_impose_path = "./Combine.mp4" #@param {type:"string"}

#Create a EditVideo object
editVideo = EditVideo(original_video_path, skeleton_video_path, gif_path)

#Resize original and skeleton video to 512 by 512
editVideo.resize(width, height)

#Overlay skeleton video onto original and gif video
editVideo.perform_superimpose()

#Place both edited original and edited gif side by side and save it
editVideo.combine_and_save_videos(output_super_impose_path)

#Display the saved side by side super-imposed video
editVideo.preview(output_super_impose_path)

/content/ICT3104Project/FollowYourPose
Original Before: width:512 height:512
Skeleton Before: width:512 height:512
Original After: width:512 height:512
Skeleton After: width:512 height:512
Set Original Video fps:24.0 -> 8
Original Video fps successfully set to 8
Set Skeleton Video fps:24.0 -> 8
Skeleton Video fps successfully set to 8
Original Before: width:512 height:512
Skeleton Before: width:512 height:512
Original After: width:512 height:512
Skeleton After: width:512 height:512
Moviepy - Building video ./Combine.mp4.
Moviepy - Writing video ./Combine.mp4





Moviepy - Done !
Moviepy - video ready ./Combine.mp4
Moviepy - Building video __temp__.mp4.
Moviepy - Writing video __temp__.mp4



                                                            

Moviepy - Done !
Moviepy - video ready __temp__.mp4




In [None]:
#@title Video Trimming by Frame (Optional)

%cd $home_path
from EditVideo import EditVideo

video_path = "./data/1114/1114.mp4" #@param{type:"string"}
start_frame = 0 #@param {type:"integer"}
end_frame = 10 #@param {type:"integer"}
output_path = "./TrimmedByFrame.mp4" #@param {type:"string"}

#Create a EditVideo object
editVideo = EditVideo(original_video_path)

#Trim the video
editVideo.trimByFrame(start_frame, end_frame)
editVideo.save(output_path)

In [None]:
#@title Video Trimming by seconds (Optional)

%cd $home_path
from EditVideo import EditVideo

video_path = "./data/1114/1114.mp4" #@param{type:"string"}
start_seconds = 0 #@param {type:"number"}
end_seconds = 2 #@param {type:"number"}
output_path = "./TrimmedBySeconds.mp4" #@param {type:"string"}

#Create a EditVideo object
editVideo = EditVideo(original_video_path)

#Trim the video
editVideo.trimBySeconds(start_frame, end_frame)
editVideo.save(output_path)

# **Evaluation Metric**
FID (Fretchet Inception Distance) - Fréchet inception distance (FID) is a metric for quantifying the realism and diversity of images generated by generative adversarial networks (GANs)

User Story 38: As a USER, I want to have an evaluation metric so that I can use it to gauge the performance of the resultant video.

In [None]:
%cd $home_path

import numpy as np
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
from scipy.linalg import sqrtm
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_v3 import preprocess_input
import cv2
from PIL import Image, ImageSequence
from EditVideo import EditVideo

# Functions to extract features from videos and GIFs
def extract_features_from_video(video_path, model, target_size=(299, 299)):
    cap = cv2.VideoCapture(video_path)
    frame_features = []

    with tf.device("/cpu:0"):
        with tf.compat.v1.Session() as sess:
            sess.run(tf.compat.v1.global_variables_initializer())

            while True:
                ret, frame = cap.read()
                if not ret:
                    break
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                frame = cv2.resize(frame, target_size)
                frame = preprocess_input(frame)
                frame = np.expand_dims(frame, axis=0)
                features = sess.run(model.output, feed_dict={model.input: frame})
                frame_features.append(features)

    return frame_features

def extract_features_from_gif(gif_path, model, target_size=(299, 299)):
    gif = Image.open(gif_path)
    frames = []

    for frame in ImageSequence.Iterator(gif):
        frame = frame.convert('RGB')
        frame = frame.resize(target_size)
        frame = np.array(frame)
        frame = preprocess_input(frame)
        frame = np.expand_dims(frame, axis=0)
        features = model.predict(frame)
        frames.append(features)

    return frames

# Calculate the FID between two sets of feature vectors
def calculate_fid(real_features, generated_features):
    mu1, sigma1 = real_features.mean(axis=0), np.cov(real_features, rowvar=False)
    mu2, sigma2 = generated_features.mean(axis=0), np.cov(generated_features, rowvar=False)

    # Calculate the FID score
    sum_squared_diff = np.sum((mu1 - mu2)**2)
    cov_mean_sqrt = sqrtm(sigma1.dot(sigma2))
    if np.iscomplexobj(cov_mean_sqrt):
        cov_mean_sqrt = cov_mean_sqrt.real

    fid = sum_squared_diff + np.trace(sigma1 + sigma2 - 2.0 * cov_mean_sqrt)

    return max(fid, 0)  # Clamp the FID to be non-negative

# Paths to real video (MP4) and generated GIF
real_video_path = './data/X8AP2/X8AP2.mp4' #@param{type:"string"}
generated_gif_path = './outputInference/stable-diffusion-v1-4/X8AP2/inference/sample-0-33-2023-11-06/IronmanOntheBeach.gif' #@param{type:"string"}

#stride and change fps of original video to match GIF
editVideo = EditVideo(real_video_path)
editVideo.set_fps(8)
editVideo.save("./Original_FID.mp4")
real_video_path = "./Original_FID.mp4"

print("Starting FID calculation")

# Load the InceptionV3 model (you may need to install Keras and TensorFlow)
model = InceptionV3(include_top=False, pooling='avg')

# Extract features from real video (MP4) and generated GIF
real_features = extract_features_from_video(real_video_path, model)
generated_features = extract_features_from_gif(generated_gif_path, model)

# Calculate the FID
fid_score = calculate_fid(np.concatenate(real_features), np.concatenate(generated_features))
print(f'FID between real video and generated GIF: {fid_score}')


/content/ICT3104Project/FollowYourPose
FID between real video and generated GIF: 1.2586505645766333e+126
