# Drone Control Using Gestures


### Setting Up
This is the main file and should be run on the Jetson Nano.
To be able to tunnel jupyter notebook from Nano to your laptop follow [this guide](https://www.digitalocean.com/community/tutorials/how-to-install-run-connect-to-jupyter-notebook-on-remote-server)

We are doing this in Jupyter Notebook to be able to stream the camera input and our model's outputs to our laptop. Make sure both jetson nano and your laptop are on the same wifi network. This will be important specially for commnuicating gestures between the two.

If you haven't wiped out the Nano, everything should already be installed. But if otherwise

- Install Python packages 
    - cv2 - version 4.1.1
    - PIL - version 6.2.2
    - numpy - version 1.17.1
    
- Build and Install Tensorflow Lite Runtime 2.4.0
    - https://qengineering.eu/install-tensorflow-2-lite-on-jetson-nano.html
    - This might be a pain but a cruicial step.

In [50]:
import os
from collections import defaultdict
import socket
import time
import re
import numpy as np
import PIL
from PIL import Image
from PIL import ImageDraw
import io
from IPython.display import display
from IPython.display import clear_output
import ipywidgets
from base64 import b64decode , b64encode
import cv2
import pathlib

In [13]:
import json

In [14]:
cv2.__file__

'/usr/lib/python3.6/dist-packages/cv2/python-3.6/cv2.cpython-36m-aarch64-linux-gnu.so'

In [15]:
cv2.__version__

'4.1.1'

In [16]:
PIL.__version__

'6.2.2'

In [17]:
np.__version__

'1.17.1'

### Creating GStreamer Pipeline

We are using GStreamer. Which is a pipeline-based multimedia
framework that links together a wide variety of media processing systems to complete
complex workflows. For instance, GStreamer can be used to build a system that reads
files in one format, processes them, and exports them in another. The formats and
processes can be changed in a plug and play fashion.

[Note]

NVIDIA Gstreamer is pre-installed on the jetson Nano.

To test if its available and working as expected, you may run the following from a ssh terminal onto the Jetson Nano

`gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM), width=3280, height=2464, format=NV12, framerate=30/1' ! fakesink`

[End Note]


We are using Gstreamer for 2 broad things

1. **Moving Image Frame from GPU Cache to Shared Memory**

    Image frames once captured from the camera are moved into the GPU cache. For our processing we moved it to Shared System memory to allow it to be read by our python programs


2. **Image Pre-processing**
    
    a. We then pre-processed the image stream by resizing them to the size needed by the machine learning model.
    
    b. We also added steps for flipping the image and adjusting the frame rate.
    
    c. Frame rate was adjusted because the models we were using were not meant for high framerates that the camera supported like 60fps or 120fps.

![image](https://user-images.githubusercontent.com/6872080/118668202-a84ba400-b7c2-11eb-9c81-8e551706ab62.png)


More on GStreamer deveopment guide is [here](https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/accelerated_gstreamer.html)

In [18]:

SRC_WIDTH ,SRC_HEIGHT  =1280,720

def gstreamer_pipeline (capture_width=1280, capture_height=720 , display_width=640, 
     display_height=480, framerate=21, flip_method=2) :   
     return f"""nvarguscamerasrc ! 
     video/x-raw(memory:NVMM),width=(int){capture_width}, height=(int){capture_height}, format=(string)NV12, framerate=(fraction){framerate}/1 !
     nvvidconv  flip-method={flip_method} ! video/x-raw,width=(int){display_width}, height=(int){display_height},  format=BGRx ! 
     videoconvert ! video/x-raw,format=(string)BGR !
     appsink wait-on-eos=false max-buffers=2 drop=True
     """
    #
    
camera2 = cv2.VideoCapture(gstreamer_pipeline(display_width = SRC_WIDTH , display_height=SRC_HEIGHT))#, cv2.CAP_GSTREAMER)

In [19]:
def cam_read():
    img = camera2.read()[1]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return _ , img

### Camera Test

To camera from within Jupyter Notebook, we will leverage *Ipywidgets*

In [20]:
image_widget = ipywidgets.Image(format='jpg' , height=256 ,width=256)
display(image_widget)

Image(value=b'', format='jpg', height='256', width='256')

The code below will create an infinite loop and stream images from the camera to the container created above. To exit the loop use

*kernel -> interrupt* option from the toolbar

In [21]:
try:
    while True:
        image_widget.value =  cv2.imencode('.jpg',cam_read()[1])[1].tobytes()
except KeyboardInterrupt:
    print("Breaking")

Breaking


## Loading Pose Estimation Model

We are leveraging *Tensorflow lite* for our pose estimation model.

Beauty of these *lite* models are that they are quite compressed in size and despite that resonably efficient.

We are going to use `wget` to download the model and place it in correct directory.

Alternatively - 

You can download the model from [here](http://59.36.11.51/dataset/workspace/mindspore_dataset/mslite/models/hiai/posenet_mobilenet_float_075_1_default_1.tflite)
and place it in `/home/unccv/drone_project` directory. 

Or you can place it anywhere and just make sure the path below is updated to reflect the same.



In [27]:
import tflite_runtime.interpreter as tflite

In [35]:
import tflite_runtime

In [36]:
tflite_runtime.__version__

'2.5.0'

In [28]:
import pathlib
import os

In [29]:
path = pathlib.Path("/home/unccv/drone_project")

In [25]:
!wget -O test_model_05.tflite http://59.36.11.51/dataset/workspace/mindspore_dataset/mslite/models/hiai/posenet_mobilenet_float_075_1_default_1.tflite -P /home/unccv/drone_project

--2021-05-18 10:32:23--  http://59.36.11.51/dataset/workspace/mindspore_dataset/mslite/models/hiai/posenet_mobilenet_float_075_1_default_1.tflite
Connecting to 59.36.11.51:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5048148 (4.8M)
Saving to: ‘test_model_05.tflite’


2021-05-18 10:32:28 (1.07 MB/s) - ‘test_model_05.tflite’ saved [5048148/5048148]



In [26]:
#interpreter = tflite.Interpreter(model_path=os.path.join(path , "posenet_mobilenet_float_075_1_default_1.tflite"))
interpreter = tflite.Interpreter(model_path=os.path.join(path , "test_model_05.tflite"))

## Testing loaded Model

Below we run a simple test to see if the model loaded correctly. We also try and see what `inputs` and `outputs` model has.

In [31]:
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

In [32]:
input_details

[{'name': 'sub_2',
  'index': 97,
  'shape': array([  1, 353, 257,   3], dtype=int32),
  'shape_signature': array([  1, 353, 257,   3], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

We see above that the `shape_signature` field is valued as `[1 , 353 , 257, 3]` , therefore the model expects us to pass a batch of 353x257 images with `rgb` channels

In [16]:
output_details

[{'name': 'float_heatmaps',
  'index': 93,
  'shape': array([ 1, 23, 17, 17], dtype=int32),
  'shape_signature': array([ 1, 23, 17, 17], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'float_short_offsets',
  'index': 96,
  'shape': array([ 1, 23, 17, 34], dtype=int32),
  'shape_signature': array([ 1, 23, 17, 34], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'float_mid_offsets',
  'index': 94,
  'shape': array([ 1, 23, 17, 64], dtype=int32),
  'shape_signature': array([ 1, 23, 17, 64], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': a

Now to make sense of all that jargon you will have to understand how this model really works.

[Here](https://blog.tensorflow.org/2019/08/track-human-poses-in-real-time-on-android-tensorflow-lite.html) is a blog-post from `Tensorflow Lite`. We will port their code into python below.

In [33]:
HEIGHT, WIDTH  = input_details[0]["shape"][1:3]

In [34]:
print(WIDTH);print(HEIGHT)

257
353


Let's create some helper classes that will be used with the pose estimation model.

1. `BodyPart` class to be able to interpret the outputs of the Model
2. `Position` class - represents the x,y position of identified keypoints.
3. `Keypoint` class- represents the identified keypoint like elbow or nose etc
4. `Person` class- stores all the keypoints of the identified person

In [38]:
import enum

class BodyPart(enum.IntEnum):
    __order__ = "NOSE LEFT_EYE RIGHT_EYE LEFT_EAR RIGHT_EAR LEFT_SHOULDER RIGHT_SHOULDER LEFT_ELBOW RIGHT_ELBOW LEFT_WRIST RIGHT_WRIST LEFT_HIP RIGHT_HIP LEFT_KNEE RIGHT_KNEE LEFT_ANKLE RIGHT_ANKLE"
    NOSE = 0
    LEFT_EYE = 1
    RIGHT_EYE= 2
    LEFT_EAR= 3
    RIGHT_EAR= 4
    LEFT_SHOULDER= 5
    RIGHT_SHOULDER = 6
    LEFT_ELBOW = 7
    RIGHT_ELBOW = 8
    LEFT_WRIST= 9
    RIGHT_WRIST= 10
    LEFT_HIP= 11
    RIGHT_HIP= 12
    LEFT_KNEE= 13
    RIGHT_KNEE = 14
    LEFT_ANKLE = 15
    RIGHT_ANKLE = 16

class Position:
    def __init__(self, x=0,y=0):
        self.x = x
        self.y = y
    
class KeyPoint:
    def __init__(self,bodypart = BodyPart.NOSE, position = Position() , score=0.0 ):
        self.bodyPart = bodypart
        self.position = position
        self.score = score
    
    def toJson(self):
        return json.dumps(self, default=lambda o: o.__dict__)
    
class Person:
    def __init__(self,keypoints = [] , score=0.0 , bodyScore=0.0):
        self.keyPoints = keypoints
        self.score = score
        self.bodyScore = bodyScore

Below is the core logic for `Person Decoding` the output by the model. This is based on the blogpost above. 

<img src="https://user-images.githubusercontent.com/6872080/118673031-b00d4780-b7c6-11eb-958a-28441874fcfb.png" width="50%"/>


#### Posenet Paper is [here](https://arxiv.org/abs/1803.08225)

In [39]:
class Posenet:

    def __init__(self,model_path="posenet_model.tflite"):
        self.lastInferenceTimeNanos = -1
        self.interpreter = None
        self.gpuDelegate = None
        self.model_path = model_path
        self.NUM_LITE_THREADS  = 4


    def getInterpreter(self):
        if self.interpreter is not None:
            return self.interpreter
        interpreter = tflite.Interpreter(model_path=self.model_path , num_threads = self.NUM_LITE_THREADS)
        interpreter.allocate_tensors()
        self.input_details = interpreter.get_input_details()
        self.output_details = interpreter.get_output_details()
        self.interpreter = interpreter
        return interpreter

    def close(self):
        self.interpreter.close()
        self.interpreter = None

    def sigmoid(self , x):
        return (1 / (1 + np.exp(-x)))

    def getKeyPointLocations(self, heatmaps):
        height , width , numKeyPoints = heatmaps.shape
        keypointPositions = [None]*numKeyPoints
        for keypoint in range(numKeyPoints):
            maxVal  = heatmaps[0][0][keypoint ]
            maxRow  , maxCol = 0,0
            for row in range(height):
                for col in range(width):
                     if (heatmaps[row][col][keypoint] > maxVal):
                         maxVal = heatmaps[row][col][keypoint]
                         maxRow = row
                         maxCol = col

            keypointPositions[keypoint] = (maxRow, maxCol)

        return keypointPositions

    def getConfidenceScores(self,heatmaps ,offsets,keypointPositions , height , width, HEIGHT , WIDTH):
        numKeyPoints = len(keypointPositions)
        xCoords = np.zeros(numKeyPoints)
        yCoords = np.zeros(numKeyPoints)
        confidenceScores  = np.zeros(numKeyPoints)

        for idx ,position in enumerate(keypointPositions):
            positionY  = keypointPositions[idx][0]
            positionX = keypointPositions[idx][1]
            yCoords[idx] = int( position[0] / float(height - 1) * HEIGHT + offsets[positionY][positionX][idx])
            xCoords[idx] = int( position[1] / float(width - 1) * WIDTH + offsets[positionY][positionX][idx + numKeyPoints])
            confidenceScores[idx] = self.sigmoid(heatmaps[positionY][positionX][idx])

        return xCoords , yCoords , confidenceScores

    def getPersonDetails(self , numKeyPoints , xCoords , yCoords,confidenceScores):
        person = Person()
        keypointList = []
        totalScore = 0
        bodyScore = 0
        for idx,it in enumerate(BodyPart):
            kp = KeyPoint()
            kp.bodyPart = it
            kp.position = Position(xCoords[idx],yCoords[idx]) 
            kp.score  = confidenceScores[idx]
            keypointList.append(kp)
            
            if idx > 4:
                bodyScore += confidenceScores[idx]
            totalScore += confidenceScores[idx]

        person.keyPoints = keypointList
        person.score = totalScore / numKeyPoints
        #print(bodyScore)
        person.bodyScore = bodyScore / (numKeyPoints - 5.0)
        return person

    def estimateSinglePose(self, image):
        self.getInterpreter()
        
        HEIGHT, WIDTH  = self.input_details[0]["shape"][1:3]
        input_data = np.expand_dims(image.resize((WIDTH ,HEIGHT) , Image.HAMMING), axis=0)
        input_mean , input_std = 127.5  ,127.5
        input_data = (np.float32(input_data) - input_mean) / input_std

        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()

        heatmaps  = self.interpreter.get_tensor(self.output_details[0]['index'])
        heatmaps  = np.squeeze(heatmaps)

        offsets   = self.interpreter.get_tensor(self.output_details[1]['index'])
        offsets   = np.squeeze(offsets )

        height , width , numKeyPoints = heatmaps.shape

        keypointPositions = self.getKeyPointLocations(heatmaps )
        
        xCoords , yCoords , confidenceScores = (self.getConfidenceScores(heatmaps
                                                    , offsets
                                                    ,keypointPositions
                                                    , height
                                                    , width
                                                    , HEIGHT
                                                    , WIDTH))
        
        
        #print(xCoords , yCoords,confidenceScores)
        
        person = self.getPersonDetails( numKeyPoints , xCoords , yCoords,confidenceScores)
        return person
    
    def getDrawnImage(self, image):
        person = self.estimateSinglePose(image)
        out_img = np.array(image.resize((WIDTH ,HEIGHT)))
        for keypoint in person.keyPoints:
            out_img = cv2.circle( out_img , (int(keypoint.position.x) , int(keypoint.position.y)) , 10 , (42, 157, 143))
        return out_img

In [40]:
#pnet = Posenet(os.path.join(path , "posenet_mobilenet_float_075_1_default_1.tflite"))
pnet = Posenet(os.path.join(path , "test_model_05.tflite"))

## Drawing Estimated Positions

Once we have extraced out the positions. We want to add them back to the input image and a stick figure. For this we used a confidence threshold of 50%. If the model is not really sure about the point, we don't draw it on the image.

To get a stick figure we use `OpenCv`'s `line` function to draw lines between the points.

In [42]:
class StickMan:
    
    def lineBetweenPoints(self,image,pointA , pointB):
        if pointA.score > 0.5 and pointB.score > 0.5:
            return cv2.line(image 
                            , (int(pointA.position.x) , int(pointA.position.y)) 
                            , (int(pointB.position.x) , int(pointB.position.y))
                           , (42, 157, 143) , 2)
        return image
    
    def get_val(self , key_point):
        return key_point.position if key_point.score > 0.5 else Position(np.nan,np.nan)
    
    def get_bounding_box(self,person):
        keypoints = person.keyPoints
        xmin = self.get_val(keypoints[int(BodyPart.LEFT_WRIST)]).x
        xmax = self.get_val(keypoints[int(BodyPart.RIGHT_WRIST)]).x
        ymax = self.get_val(keypoints[int(BodyPart.RIGHT_ANKLE)]).y
        ymin = self.get_val(keypoints[int(BodyPart.NOSE)]).y
        return [xmin , ymin , xmax , ymax]
        
    def draw(self ,image , person , direction=None):
        keypoints = person.keyPoints
        #print(keypoints)
        out_img = np.array(image.resize((WIDTH ,HEIGHT)))
        for keypoint in person.keyPoints:
            if keypoint.score > 0.5:
                out_img = cv2.circle( out_img , (int(keypoint.position.x) , int(keypoint.position.y)) , 5 , (251, 133, 0) , -1)
        
#         font = cv.FONT_HERSHEY_SIMPLEX
#         cv.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv.LINE_AA)
        out_img = cv2.putText(out_img , str(person.score) , (20,20) 
                             , cv2.FONT_HERSHEY_SIMPLEX ,0.5, (0, 0, 0), 1, cv2.LINE_AA)
        
        if direction is not None:
            out_img = cv2.putText(out_img , direction , (20,40) 
                             , cv2.FONT_HERSHEY_SIMPLEX ,0.5, (0, 0, 0), 1, cv2.LINE_AA)
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.LEFT_WRIST)] , keypoints[int(BodyPart.LEFT_ELBOW)])
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.LEFT_ELBOW)] , keypoints[int(BodyPart.LEFT_SHOULDER)])
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.LEFT_SHOULDER)] , keypoints[int(BodyPart.RIGHT_SHOULDER)])
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.RIGHT_SHOULDER)] , keypoints[int(BodyPart.RIGHT_ELBOW)])
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.RIGHT_ELBOW)] , keypoints[int(BodyPart.RIGHT_WRIST)])
        
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.LEFT_HIP)] , keypoints[int(BodyPart.LEFT_KNEE)])
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.LEFT_KNEE)] , keypoints[int(BodyPart.LEFT_ANKLE)])
        
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.RIGHT_HIP)] , keypoints[int(BodyPart.RIGHT_KNEE)])
        out_img =  self.lineBetweenPoints(out_img , keypoints[int(BodyPart.RIGHT_KNEE)] , keypoints[int(BodyPart.RIGHT_ANKLE)])
        
        return out_img

Now below we run a test on the model. Firstly we create a `Image` widget and then inside a forever loop run captured images through our model and display the results in the widget.

In [43]:
image_widget = ipywidgets.Image(format='jpg' , height=HEIGHT ,width=WIDTH)
display(image_widget)

Image(value=b'', format='jpg', height='353', width='257')

Again to break from the loop below use `kernel -> interrupt`

In [44]:
stick_man = StickMan()
try:
    while True:
        img = cam_read()[1]
        if img is None:
            break
            
        pil_img = Image.fromarray(img)
        person = pnet.estimateSinglePose(pil_img)
        #if person.bodyScore > 0.5:
        img = stick_man.draw(pil_img,person)
        
        image_widget.value = cv2.imencode('.jpg',img)[1].tobytes()
        clear_output(wait=True)
        print(person.bodyScore)
except KeyboardInterrupt:
    print("Breaking")


0.015070840181832851
Breaking


## Generic Image Pipeline

Having seen that we are repeating this process of creating a display widget and then running a loop. We encapsulate all of that inside a `ImagePipeline` class. We will use this class to quickly instantiate all the boilerplate stuff later.

In [45]:
class ImagePipeline:
    
    def __init__(self ,camera , create_display=True , height=256 , width=256):
        self.camera = camera
        self.create_display = create_display
        if self.create_display:
            self.create_display_frame(height , width)
    
    def create_display_frame(self , height , width):
        self.image_widget = ipywidgets.Image(format='jpg' , height=height ,width=width)
        display(self.image_widget)
    
    def _process(self , img , **kwargs):
        return img
        
    def run(self, **kwargs):
        while True:
            img = self.camera.read()[1]
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            if img is None:
                break
            img = self._process(img , **kwargs)
            
            if self.create_display:
                self.image_widget.value = cv2.imencode('.jpg',img)[1].tobytes()
                clear_output(wait=True)
            

# Mapping Keypoints to Gesture

Below we create a rule based system to identify 4 simple gestures. These gestures are simple enough to be distinguished easily by a bunch of `if-else` statements.

|-|-|-|-|
|----|----|---|----|
|<img src="https://user-images.githubusercontent.com/6872080/118677163-092aaa80-b7ca-11eb-8f59-8d692e7c9148.png" width="50%">|<img src="https://user-images.githubusercontent.com/6872080/118677572-51e26380-b7ca-11eb-9968-24f2280e9197.png" width="50%" >|<img src="https://user-images.githubusercontent.com/6872080/118677377-38411c00-b7ca-11eb-9a85-d6ebdb8a7adc.png" width="50%">|<img src="https://user-images.githubusercontent.com/6872080/118677650-6161ac80-b7ca-11eb-9241-0cbfc0ea925e.png" width="50%">|



In [47]:
class PoseToDirection:
    
    def x_dist(self,point_a , point_b):
        return abs(point_a.position.x - point_b.position.x)
    
    def is_descend(self, points):
        if np.all([points[c].score < 0.25 for c in ["RIGHT_WRIST" , "LEFT_WRIST" ]]):
            return False
        
        if ((points["RIGHT_WRIST"].position.x > points["RIGHT_SHOULDER"].position.x)
           and (points["LEFT_WRIST"].position.x < points["LEFT_SHOULDER"].position.x)):
            return True
        else:
            return False
    
    def is_left(self, points):
        if np.all([points[c].score < 0.25 for c in ["LEFT_WRIST" , "LEFT_SHOULDER" ]]):
            return False
        if ((points["LEFT_WRIST"].position.x < points["LEFT_SHOULDER"].position.x)
           and (points["LEFT_WRIST"].position.y < points["LEFT_ELBOW"].position.y)):
            return True
        else:
            return False
    
    def is_right(self, points):
        if np.all([points[c].score < 0.25 for c in ["RIGHT_WRIST" , "RIGHT_SHOULDER" ]]):
            return False
        
        if ((points["RIGHT_WRIST"].position.x > points["RIGHT_SHOULDER"].position.x)
            and (points["RIGHT_WRIST"].position.y < points["RIGHT_ELBOW"].position.y)):
            return True
        else:
            return False
    
    def is_ascend(self, points):
        if np.all([points[c].score < 0.25 for c in ["RIGHT_WRIST" , "LEFT_WRIST" ]]):
            return False
        
        if ((points["RIGHT_WRIST"].position.x < points["RIGHT_SHOULDER"].position.x)
           and (points["LEFT_WRIST"].position.x > points["LEFT_SHOULDER"].position.x)):
            return True
        else:
            return False
        
    def keypoints_to_direction(self , estimated_pose):
        points = {}
        for point in estimated_pose.keyPoints:
            points[str(point.bodyPart).split(".")[1]] = point
        
        print(points)
        if  (self.is_ascend(points)):
            return "ASCEND"
        elif (estimated_pose.score > 0.5 ) and self.is_descend(points):
            return "DESCEND"
        elif self.is_left(points):
            return "LEFT"
        elif self.is_right(points):
            return "RIGHT"
        else:
            return "NONE"
        

# Client Server Architecture

Now that we have code ready for going from image to gesture. We need to send these identified gestures to our laptop.
For this we will use a `client-server` architecture.

**Server:**
Jetson Nano with the model running on top of it acting as a server and sends
the identified gestures to the client.

**Client:**
A Laptop acting as a client with drone controller script and Mission Planner
running on it. We will have a Drone controller script accept the stream of identified
gestures and translated them into control commands for the drone. 

![image](https://user-images.githubusercontent.com/6872080/118680434-bb637180-b7cc-11eb-9c77-155c17da5139.png)


In [48]:
class MessengerServer:
    
    def __init__(self , host="localhost" , port=42425):
        self.host = host
        self.port = port
        self.s = socket.socket()
        self.s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self.s.bind((host, port))
        
    
    def wait_for_connection(self):
        self.s.listen(1)
        self.connection, addr = self.s.accept()
    
    def send(self , msg):
        self.connection.send(msg.encode("utf-8"))
    
    def close(self):
        self.connection.close()
        
class MessengerClient:
    def __init__(self , host="localhost" , port=42425):
        self.host = host
        self.port = port
        self.s = socket.socket()
    
    def connect(self):
        self.s.connect((self.host,self.port))
    
    def relay_msg(self , function):
        while True:
            data = s.recv(1024)
            if data is None:
                break
            function(data)
        

## Communicating with the Client


To send the identified direction to drone control we use a staggered/queue approach. We wait for some time and queue all the recognized gestures for that time. Then send the one with largest count to the `client`. Now once `client` recieves the gesture it will translate those to drone control signals.

In [None]:
class DirectionToControl:
    
    def __init__(self , messenger_server , time_interval = 1):
        self.trace = []
        self.time_stamp = time.time()
        self.messenger = messenger_server
        self.time_interval = time_interval
    
    def find_max(self):
        counts = defaultdict(int)
        current_max = "NONE"
        for i in self.trace:
            counts[i] += 1
            if counts[i] > counts[current_max]:
                current_max = i
        
        return current_max
            
    def track_and_send(self, direction):
        self.trace.append(direction)
        print((time.time() - self.time_stamp) , self.time_interval)
        if (time.time() - self.time_stamp) > self.time_interval:
            print("Sending")
            self.messenger.send(self.find_max())
            self.time_stamp = time.time()
            self.trace = []

## Complete Pipeline

Below we integrate together all the pieces we have developed so-far. We are inheriting from the `ImagePipeline` class we created above and therefore will just override the `_process` method.

The pipeline performs following steps

1. Converted received image to `PIL` array
2. Extract the keypoints using the `PoseEstimator`
3. Classify the gesture based on extracted keypoints.
4. Draw the gesture (if enabled using `create_display` attribute)
5. Send the identified gesture to the `DirectionToControl` class, to be sent to the `client`

In [49]:
class CompletePipeline(ImagePipeline):
    
    def __init__(self,vid_src , messenger_server , create_display=True , height=256 , width=256):
        super().__init__(vid_src, create_display , height , width)
        
        #self.create_display = create_display
        self.object_detector = ObjectDetector(vid_src , "/tmp/detect.tflite" , "/tmp/coco_labels.txt")
        self.pose_estimator = Posenet(os.path.join(path , "posenet_mobilenet_float_075_1_default_1.tflite"))
        self.direction_estimator = PoseToDirection()
        self.direction_control = DirectionToControl(messenger_server)
        self.stick_man = StickMan()
        self.bound_tracker = BoundTracker()
        self.itr = 0
        self.trace = []
    
    def _process(self,image , threshold):
        self.itr += 1
        img = None

        pil_img = Image.fromarray(image)
        person = self.pose_estimator.estimateSinglePose(pil_img)
        direction = self.direction_estimator.keypoints_to_direction(person)
        
        if self.create_display:
            img = self.stick_man.draw(pil_img,person , direction)
        
        self.direction_control.track_and_send(direction)
        return img
            
            
    def dump(self , fname):
        d = {"fname" : str(fname) , "data" : self.trace}
        with open(fname ,"w") as fp:
            json.dump(d,fp)

In [34]:
port = 42425
messenger_server = MessengerServer(port = port)

Now once you run the code below, the server script will wait for a client to connect to it. You should now switch to your client and run the client scripts.

In [35]:
messenger_server.wait_for_connection()

## SETUP Messenger Client Before running code below

Now that you have your client setup. Run the code below and pheww it's all integrated together. In your client script you should finally see series of `prints` of the gesture recognized. If you have hooked up mission planner via Mav Proxy you should be able to see drone movement in real-time.

In [36]:
cmpipe = CompletePipeline(camera2 ,messenger_server=messenger_server, height=HEIGHT ,width=WIDTH )

Image(value=b'', format='jpg', height='353', width='257')

In [None]:
cmpipe.run(threshold=0.55)

In [62]:
cmpipe.trace

[{'keyPoints': ['{"bodyPart": 0, "position": {"x": 118.0, "y": 26.0}, "score": 0.8567788626791317}',
   '{"bodyPart": 1, "position": {"x": 125.0, "y": 17.0}, "score": 0.891833170924189}',
   '{"bodyPart": 2, "position": {"x": 112.0, "y": 20.0}, "score": 0.8476605656514117}',
   '{"bodyPart": 3, "position": {"x": 134.0, "y": 25.0}, "score": 0.590182399789272}',
   '{"bodyPart": 4, "position": {"x": 104.0, "y": 28.0}, "score": 0.3752255137360918}',
   '{"bodyPart": 5, "position": {"x": 137.0, "y": 80.0}, "score": 0.6957582805034676}',
   '{"bodyPart": 6, "position": {"x": 99.0, "y": 79.0}, "score": 0.9458621737091122}',
   '{"bodyPart": 7, "position": {"x": 143.0, "y": 118.0}, "score": 0.5022092943234557}',
   '{"bodyPart": 8, "position": {"x": 93.0, "y": 116.0}, "score": 0.5801301498336171}',
   '{"bodyPart": 9, "position": {"x": 137.0, "y": 137.0}, "score": 0.27683238004469657}',
   '{"bodyPart": 10, "position": {"x": 83.0, "y": 195.0}, "score": 0.29424426551143823}',
   '{"bodyPart": 