# Real Time Object Detection for Blind People

This project aims to develop an assistive technology solution that enhances the independence and safety of visually impaired individuals. By integrating advanced computer vision techniques, this system will provide real-time feedback about the user's surroundings, helping them navigate and interact with their environment more effectively. The process is divided into 4 steps which are as follows :-

### Step 1: Download and import dependencies

Download all the dependencies imported below and run the below cell

In [None]:
from transformers import DPTImageProcessor, DPTForDepthEstimation, DetrImageProcessor, DetrForObjectDetection
import torch
import numpy as np
import cv2
import winsound

### Step 2: Initialise the models

Run the below cell if you do not have a lot of computation power you can visit the hugginface (https://huggingface.co/models) website and try small models from the website you just need to replace the name of models specified below.

In [None]:
depth_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
depth_model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large")
objd_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
objd_model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

### Step 3: Write the Logic

The below cell shows the logic for the predictions. The function below works in the following way:
1. Get all the objects from the image.
2. Get depth analysis of the image.
3. Check only the depth analysis of the objects from the image.
4. Make a beep sound if the object is close.

In [None]:
def model_logic(image):
    depth_inputs = depth_processor(images=image, return_tensors="pt")
    objd_inputs = objd_processor(images=image, return_tensors="pt")
    with torch.no_grad():
        depth_outputs = depth_model(**depth_inputs)
        predicted_depth = depth_outputs.predicted_depth

    objd_outputs = objd_model(**objd_inputs)
    prediction = torch.nn.functional.interpolate(
        predicted_depth.unsqueeze(1),
        size=image.shape[:2],
        mode="bicubic",
        align_corners=False,
    )

    objd_target = torch.tensor([image.shape[:2]])
    results = objd_processor.post_process_object_detection(objd_outputs, target_sizes=objd_target, threshold=0.9)[0]
    output = prediction.squeeze().cpu().numpy()
    formatted = (output * 255 / np.max(output)).astype("uint8")
    for _, box in zip(results["labels"], results["boxes"]):
        box = [round(i, 2) for i in box.tolist()]
        row1, row2, column1, column2 = box[1], box[3], box[0], box[2]
        cv2.rectangle(formatted, (int(column1), int(row1)), (int(column2), int(row2)), color=(0,0,255), thickness=1)
        new_image = formatted[int(row1):int(row2), int(column1):int(column2)]
        if new_image.max() >= 200:
            frequency = 2500
            duration = 1000
            winsound.Beep(frequency, duration)

### Step 5: Start the camera

Run the below cell to turn your camera on and read the images for predictions. Now point your camera tothe environment and listen to the beep sound.

In [None]:
vid = cv2.VideoCapture(0)
  
while True: 

    ret, frame = vid.read()
    model_logic(frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):  
        break  

vid.release() 
cv2.destroyAllWindows() 