6/29/2023 - Melody

**Note** This is from [Google Colab](https://colab.research.google.com/drive/1VyTkcKwy17Bg6Eblc0wtV9GwJo3oEjU6#scrollTo=sFuyni5jWDlV), need to add `!` or `%` in front of some command lines.

I had modified it to work an a series of frames, and the output would be the series of frames with only the masked (needed) parts.


Object masks from prompts with SAM

https://github.com/facebookresearch/segment-anything/blob/main/notebooks/predictor_example.ipynb


Segment Anything

https://segment-anything.com/

### Overall workflow

1. Object Detection (YOLOv8)

**2. Human figure segmentation (Segment Anything)**

3. Pose Estimation (MoveNet)

4. Pose Estimation Correction

5. Define and classify fitness poses

6. Output the text description

The Segment Anything Model (SAM) predicts object masks given prompts that indicate the desired object. The model first converts the image into an image embedding that allows high quality masks to be efficiently produced from a prompt.

The `SamPredictor` class provides an easy interface to the model for prompting the model. It allows the user to first set an image using the `set_image` method, which calculates the necessary image embeddings. Then, prompts can be provided via the `predict` method to efficiently predict masks from those prompts. The model can take as input both point and box prompts, as well as masks from the previous iteration of prediction.

In [None]:
from IPython.display import display, HTML
display(HTML(
"""
<a target="_blank" href="https://colab.research.google.com/github/facebookresearch/segment-anything/blob/main/notebooks/predictor_example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
"""
))

## Environment Set-up

If running locally using jupyter, first install `segment_anything` in your environment using the [installation instructions](https://github.com/facebookresearch/segment-anything#installation) in the repository. If running from Google Colab, set `using_colab=True` below and run the cell. In Colab, be sure to select 'GPU' under 'Edit'->'Notebook Settings'->'Hardware accelerator'.

In [None]:
using_colab = True

In [None]:
if using_colab:
    import torch
    import torchvision
    print("PyTorch version:", torch.__version__)
    print("Torchvision version:", torchvision.__version__)
    print("CUDA is available:", torch.cuda.is_available())
    import sys
    !{sys.executable} -m pip install opencv-python matplotlib
    !{sys.executable} -m pip install 'git+https://github.com/facebookresearch/segment-anything.git'

    !mkdir images
    !wget -P images https://raw.githubusercontent.com/facebookresearch/segment-anything/main/notebooks/images/truck.jpg
    !wget -P images https://raw.githubusercontent.com/facebookresearch/segment-anything/main/notebooks/images/groceries.jpg

    !wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

### Setup

Necessary imports and helper functions for displaying points, boxes, and masks.

In [None]:
import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2

In [None]:
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis = 0)
    else:
        color = np.array([30/255, 144/255, 255/255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

def show_points(coords, labels, ax, marker_size = 375):
    pos_points = coords[labels==1]
    neg_points = coords[labels==0]
    ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)
    ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)

def show_box(box, ax):
    x0, y0 = box[0], box[1]
    w, h = box[2] - box[0], box[3] - box[1]
    ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='green', facecolor=(0,0,0,0), lw=2))

### Example image

Needs to upload image each time if in Google Colab.

In [None]:
from __future__ import annotations
import os
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import cv2
import numpy as np
import matplotlib.pyplot as plt

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)
    for ann in sorted_anns:
        m = ann['segmentation']
        img = np.ones((m.shape[0], m.shape[1], 3))
        color_mask = np.random.random((1, 3)).tolist()[0]
        for i in range(3):
            img[:,:,i] = color_mask[i]
        ax.imshow(np.dstack((img, m*0.35)))

def write_masks_to_png(masks: List[Dict[str, Any]], image, path: str) -> None:
    plt.figure(figsize=(20,20))
    plt.imshow(image)
    show_anns(masks)
    plt.axis('off')
    #plt.show()
    filename = f"masks.png"
    plt.savefig(os.path.join(path, filename))
    return

In [None]:
# Melody - 6/30/2023
# Change all the non-masked (non-human) parts to white

def mask_by_white(mask, row, col, img):
  # get (i, j) positions of all RGB pixels that are black (i.e. [0, 0, 0])
  if mask[row][col] == False:
    # set those pixels to white
    img[row, col] = [255, 255, 255]

In [None]:
# The coordinate of imitating the click
star_x = 280
star_y = 400

The order of the files is random. Pls refer to the below link: 7/9/2023 - Melody

https://stackoverflow.com/questions/66537490/image-data-is-being-stored-in-different-random-order-in-array-after-reading-from

Video to JPG: https://ezgif.com/video-to-jpg

Build **images/video_frames** and put the jpgs

Build **images/output2**

In [None]:
import sys
from PIL import Image

path = 'images/video_frames/'

sam_checkpoint = "sam_vit_h_4b8939.pth"
model_type = "vit_h"

device = "cuda"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device = device)

predictor = SamPredictor(sam)

# Change directory to the path
os.chdir(path)

counter = 0

sorted_list = sorted(os.listdir())

for each_frame in sorted_list:

  image = cv2.imread(each_frame) #'ezgif-frame-001.jpg')
  print(each_frame)

  # Get the row and col number
  height, width, channel = image.shape
  #height = 1136
  #width = 640

  image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

  '''
  plt.figure(figsize=(10,10))
  plt.imshow(image)
  plt.axis('on')
  plt.show()
  '''
  sys.path.append("..")

  predictor.set_image(image)

  input_point = np.array([[star_x, star_y]])
  input_label = np.array([1])

  '''
  plt.figure(figsize=(10,10))
  plt.imshow(image)
  show_points(input_point, input_label, plt.gca())
  plt.axis('on')
  plt.show()
  '''
  masks, scores, logits = predictor.predict(
    point_coords = input_point,
    point_labels = input_label,
    multimask_output = True,
  )

  #masks.shape  # (number_of_masks) x H x W

  '''
  for i, (mask, score) in enumerate(zip(masks, scores)):
    plt.figure(figsize = (10,10))
    plt.imshow(image)
    show_mask(mask, plt.gca())
    show_points(input_point, input_label, plt.gca())
    plt.title(f"Mask {i+1}, Score: {score:.3f}", fontsize = 18)
    plt.axis('off')
    plt.show()
  '''

  human_mask = masks[2]

  for row in range(height):
    for col in range(width):
      mask_by_white(human_mask, row, col, image)

  # Save the third mask
  #write_masks_to_png(masks, image, "segmented")

  # Save image to file

  new_image = Image.fromarray(image)
  # Create a folder named "output" first
  output_path = "/content/images/output2/" + str(counter) + ".jpeg"
  new_image.save(output_path)

  counter += 1

In [None]:
# Download the folder
!zip -r /content/SAM_output2.zip /content/images/output2

from google.colab import files
files.download("/content/SAM_output2.zip")