# Object Detection - YOLOS from Hugging Face

This model identifies objects present in images, returning its scores, labels and bounding boxes.


We are using a dataset from [UCF](https://www.crcv.ucf.edu/data/GMCP_Geolocalization/#Dataset) and the model [YOLOS](https://huggingface.co/hustvl/yolos-base) from [Hugging Face](https://huggingface.co/).

<a href="https://colab.research.google.com/github/VertaAI/examples/blob/main/deployment/huggingface/yolo-object-detection/yolo-predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Dependencies

This notebook has been tested with **Python 3.8.15** and the following package versions:

In [None]:
%%capture
!pip install beautifulsoup4==4.6.3
!pip install verta==0.21.1

## Imports

In [None]:
import concurrent.futures
import multiprocessing
import os
import requests
import time

from bs4 import BeautifulSoup
from verta import Client

## Verta Set Up

In [None]:
os.environ['VERTA_HOST'] = ''
os.environ['VERTA_EMAIL'] = ''
os.environ['VERTA_DEV_KEY'] = ''

In [None]:
client = Client(os.environ['VERTA_HOST'] , debug=True)

In [None]:
endpoint = client.get_or_create_endpoint('yolos')

In [None]:
model = endpoint.get_deployed_model()

## Get URLs

In [None]:
url = 'http://www.cs.ucf.edu/~aroshan/index_files/Dataset_PitOrlManh/images/'

In [None]:
req = requests.get(url)

In [None]:
soup = BeautifulSoup(req.text, 'lxml')

In [None]:
urls = []

for link in soup.find_all('a'):
    href = link.get('href')
  
    if href.endswith('.jpg'):
      url = f"http://www.cs.ucf.edu/~aroshan/index_files/Dataset_PitOrlManh/images/{href}"
      urls.append(url)

In [None]:
len(urls)

## Tests

In [None]:
# n_urls = urls[:100]
# n_urls = urls[:1000]
n_urls = urls[:10000]

In [None]:
def process_image(url):
    return model.predict(url)

In [None]:
def show_metrics(version, n_urls, start_time, end_time):
    total_time = end_time - start_time
    total_time = time.strftime('%Hh %Mm %Ss', time.gmtime(total_time))
    
    print(f"Processing Time (v{version}): {total_time} for {len(n_urls)} URLs.")

### v0 - Single Process

In [None]:
results = []
start_time = time.time()

for url in n_urls:
    result = model.predict(url)
    results.append([url, result])

end_time = time.time()
show_metrics(0, n_urls, start_time, end_time)

**Results:**

- Processing Time: 00h 01m 31s for 100 URLs.

### v1 - ThreadPoolExecutor (concurrent.futures)

In [None]:
results = []
start_time = time.time()

with concurrent.futures.ThreadPoolExecutor() as executor:
    for url, result in zip(n_urls, executor.map(process_image, n_urls)):
        results.append([url, result])

end_time = time.time()
show_metrics(1, n_urls, start_time, end_time)

**Results:**

- Processing Time: 00h 00m 39s for 100 URLs.
- Processing Time: 00h 05m 17s for 1000 URLs.
- Processing Time: 00h 38m 18s for 10000 URLs.

### v2 - ProcessPoolExecutor (concurrent.futures)

In [None]:
results = []
start_time = time.time()

with concurrent.futures.ProcessPoolExecutor() as executor:
    for url, result in zip(n_urls, executor.map(process_image, n_urls)):
        results.append([url, result])

end_time = time.time()
show_metrics(2, n_urls, start_time, end_time)

**Results:**

- Processing Time: 00h 00m 47s for 100 URLs.
- Processing Time: 00h 08m 27s for 1000 URLs.
- Processing Time: 01h 31m 47s for 10000 URLs.

### v3 - Pool (multiprocessing)

In [None]:
results = []
n_cpu = multiprocessing.cpu_count()
start_time = time.time()

try:
    pool = multiprocessing.Pool(processes = n_cpu)

    for url, result in zip(n_urls, pool.map(process_image, n_urls)):
        results.append([url, result])
finally:
    pool.close()
    pool.join()
    end_time = time.time()

show_metrics(3, n_urls, start_time, end_time)

**Results:**

- Processing Time: 00h 00m 48s for 100 URLs.
- Processing Time: 00h 08m 48s for 1000 URLs.
- Processing Time: 01h 38m 02s for 10000 URLs.