# Object Detection

- Install the following:

```
    !pip install transformers
    !pip install gradio
    !pip install timm
    !pip install inflect
    !pip install phonemizer
```

**Note:**  `py-espeak-ng` is only available Linux operating systems.

To run locally in a Linux machine, follow these commands:
```
    sudo apt-get update
    sudo apt-get install espeak-ng
    pip install py-espeak-ng
```

### Build the `object-detection` pipeline using 🤗 Transformers Library

- This model was release with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) from Carion et al. (2020)

In [None]:
from helper import load_image_from_url, render_results_in_image

In [None]:
from transformers import pipeline

- Here is some code that suppresses warning messages.

In [None]:
from transformers.utils import logging
logging.set_verbosity_error()

from helper import ignore_warnings
ignore_warnings()

In [None]:
od_pipe = pipeline("object-detection", "./models/facebook/detr-resnet-50")

Info about [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50)

Explore more of the [Hugging Face Hub for more object detection models](https://huggingface.co/models?pipeline_tag=object-detection&sort=trending)

### Use the Pipeline

In [None]:
from PIL import Image

In [None]:
raw_image = Image.open('') # Give the reference of the image file you want to use.
raw_image.resize((569, 491))

In [None]:
pipeline_output = od_pipe(raw_image)

- Return the results from the pipeline using the helper function `render_results_in_image`.

In [None]:
processed_image = render_results_in_image(
    raw_image, 
    pipeline_output)

In [None]:
processed_image

### Using `Gradio` as a Simple Interface

- Use [Gradio](https://www.gradio.app) to create a demo for the object detection app.
- The demo makes it look friendly and easy to use.
- You can share the demo with your friends and colleagues as well.

In [None]:
import os
import gradio as gr

In [None]:
def get_pipeline_prediction(pil_image):
    
    # First get the pipeline output for the given pil_image
    pipeline_output = od_pipe(pil_image)

    # Then process the image using the pipeline output
    processed_image = render_results_in_image(pil_image,
                                            pipeline_output)
    return processed_image

In [None]:
demo = gr.Interface(
  fn=get_pipeline_prediction,
  inputs=gr.Image(label="Input image", 
                  type="pil"),
  outputs=gr.Image(label="Output image with predicted instances",
                   type="pil")
)

- `share=True` will provide an online link to access to the demo

In [None]:
demo.launch(share=True, server_port=int(os.environ['PORT1']))

In [None]:
demo.close()

### Close the app
- Remember to call `.close()` on the Gradio app when you're done using it.

### Make an AI Powered Audio Assistant

- Combine the object detector with a text-to-speech model that will help dictate what is inside the image.

- Inspect the output of the object detection pipeline.

In [None]:
pipeline_output

In [None]:
od_pipe

In [None]:
from helper import summarize_predictions_natural_language

In [None]:
text = summarize_predictions_natural_language(pipeline_output)

In [None]:
text

### Generate Audio Narration of an Image

In [None]:
tts_pipe = pipeline("text-to-speech",
                    model="./models/kakao-enterprise/vits-ljs")

More info about [kakao-enterprise/vits-ljs](https://huggingface.co/kakao-enterprise/vits-ljs).

In [None]:
narrated_text = tts_pipe(text)

### Play the Generated Audio

In [None]:
from IPython.display import Audio as IPythonAudio

In [None]:
IPythonAudio(narrated_text["audio"][0],
             rate=narrated_text["sampling_rate"])

### Try it yourself! 
- Try these models with other images!