# Deep Learning for Business Applications course

## TOPIC 5: Object detection problem. YOLO training

### 1. Libraries and configurations

#### 1.1. Streamlit

[Streamlit](https://streamlit.io/) is a framework that offers a faster way to build and share data applications. It helps you to turn data scripts into shareable web apps in minutes. It is written in pure Python and does not require front‑end experience to work with. Installation is very simple in our environment. Just use terminal or type here:

In [1]:
!pip install streamlit


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### 1.2. Other libraries

In [2]:
!pip install opencv-python ultralytics transformers deepface


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
import os

#### 1.3. Free space

In [4]:
!df -h | grep dev

tmpfs                                      64M     0   64M   0% /dev
/dev/vdg                                   12G  2.4G  9.4G  20% /home/jovyan
/dev/vda2                                  95G   53G   39G  58% /etc/hosts
shm                                        64M     0   64M   0% /dev/shm


In [5]:
# a cache dir for Huggin Face Hub models
!ls -ls ~/.cache/huggingface/hub

# a cache dir for PyTorch models
!ls -ls ~/.cache/torch/hub/

# a cache dir for DeepFace models
!ls -la ~/.deepface/weights

total 12
4 drwxr-sr-x 6 jovyan users 4096 Nov 27 17:41 models--openai--clip-vit-base-patch16
4 drwxr-sr-x 6 jovyan users 4096 Nov 27 17:41 models--Salesforce--blip-image-captioning-base
4 -rw-r--r-- 1 jovyan users    1 Nov 27 17:41 version.txt
ls: cannot access '/home/jovyan/.cache/torch/hub/': No such file or directory
total 566504
drwxr-sr-x 2 jovyan users      4096 Nov 27 17:43 .
drwxr-sr-x 3 jovyan users      4096 Nov 27 17:41 ..
-rw-r--r-- 1 jovyan users 580085408 Nov 27 17:43 vgg_face_weights.h5


In [6]:
# use `rm -rf` !!! WITH CARE !!!

!rm -rf ~/.cache/huggingface/hub
!rm -rf ~/.cache/torch/hub/checkpoints
!rm -rf ~/.deepface/weights

### 2. How Streamlit works

[Main concepts](https://docs.streamlit.io/library/get-started/main-concepts) require you to create a normal Python script with all necessary elements for your future app and run it with `streamlit run` like `streamlit run your_script.py [-- script args]`.

#### 2.1. Python script with app

Streamlit's architecture allows you to write apps the same way you write plain Python scripts. Let's create the sample script with `%%writefile` magic command:

In [None]:
%%writefile stapp.py

import streamlit as st

# Title of our demo app
st.title('Meet the first Streamlit application')

#### 2.2. Run application

Run application is very easy. Just open a terminal in the folder with your Python script `stapp.py` and type:

`streamlit run stapp.py --server.port 20000 --browser.gatherUsageStats False` 

Your Streamlit application will be available with the following URL:

In [4]:
print('Streamlit available at:',
      'https://jhas01.gsom.spbu.ru{}proxy/{}/'.format(
          os.environ['JUPYTERHUB_SERVICE_PREFIX'], 20000))

Streamlit available at: https://jhas01.gsom.spbu.ru/user/st136973/proxy/20000/


#### 2.2. Basic examples

##### 2.2.1. Nice headers

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

import streamlit as st

st.header('Nice looking header string', divider='rainbow')
st.header('_Here is header under the line_ :fire:')

st.subheader('Subheader is also here', divider='rainbow')
st.subheader(':blue[_We like Streamlit_] :star:')

##### 2.2.2. Text

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

import streamlit as st

st.header('Just a header', divider='rainbow')
st.text('Just a text under the header')

##### 2.2.3. Write

Along with magic commands, `st.write()` is Streamlit's "Swiss Army knife". You can pass almost anything to `st.write()`: text, data, Matplotlib figures, charts and more.

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

import streamlit as st
import numpy as np
import pandas as pd

st.header('Demo of write function', divider='rainbow')
st.subheader('Table and plot at one application')

st.divider()

st.write("Here's demo table from the dataframe:")
fruits_data = pd.DataFrame(
    {
        'fruits': ['apple', 'peach', 'pineapple', 'watermelon'],
        'color': ['green', 'orange', 'yellow', 'stripes'],
        'weight': [1, 2, 5, 10]
    }
)
st.write(fruits_data)

st.divider()

st.write("Here's demo chart for fruits:")
chart_data = pd.DataFrame(
     np.random.randn(20, 4),
     columns=['apple', 'peach', 'pineapple', 'watermelon']
)
st.line_chart(chart_data)

### 3. AI with Streamlit

#### 3.1. Upload pipeline for the data

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

import io
import streamlit as st
from PIL import Image

st.header('Demo of image uploading', divider='rainbow')
st.subheader('Uploading file and plot image')
st.divider()

st.write('#### Upload you image')
uploaded_file = st.file_uploader('Select an image file (JPEG format)')
if uploaded_file is not None:
    file_name = uploaded_file.name
    if '.jpg' in file_name:
        bytes_data = uploaded_file.read()
        img = Image.open(io.BytesIO(bytes_data))
        st.divider()
        st.image(img, caption='Uploaded image')
    else:
        st.error('File read error', icon='⚠️')

#### 3.2. Add some AI

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

# use of AI model for image captioning

import io
import streamlit as st
from PIL import Image
from transformers import (
    BlipProcessor, 
    BlipForConditionalGeneration
)


def img_caption(model, processor, img, text=None):
    """
    Uses BLIP model to caption image.
    
    """
    res = None
    if text:
        # conditional image captioning
        inputs = processor(img, text, return_tensors='pt')
    else:
        # unconditional image captioning
        inputs = processor(img, return_tensors='pt')
    out = model.generate(**inputs)
    res = processor.decode(out[0], skip_special_tokens=True)
    return res


with st.spinner('Please wait, application is initializing...'):
    MODEL_CAP_NAME = 'Salesforce/blip-image-captioning-base'
    PROCESSOR_CAP = BlipProcessor.from_pretrained(MODEL_CAP_NAME)
    MODEL_CAP = BlipForConditionalGeneration.from_pretrained(MODEL_CAP_NAME)

st.header('Demo of image uploading', divider='rainbow')
st.subheader('Uploading file and plot image with AI caption')
st.divider()

st.write('#### Upload you image')
uploaded_file = st.file_uploader('Select an image file (JPEG format)')
if uploaded_file is not None:
    file_name = uploaded_file.name
    if '.jpg' in file_name:
        # input text for conditional image captioning
        text = st.text_input(
            'Input text for conditional image captioning (if needed)', 
            ''
        )
        with st.spinner('Please wait...'):
            bytes_data = uploaded_file.read()
            img = Image.open(io.BytesIO(bytes_data))
            
            # use image caption model for uploaded image
            caption = img_caption(
                model=MODEL_CAP, 
                processor=PROCESSOR_CAP, 
                img=img, 
                text=text
            )
            st.divider()
            st.image(img, caption=caption)
    else:
        st.error('File read error', icon='⚠️')

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

# Use of AI model for image captioning
# and object detection at the image

import io
import streamlit as st
from PIL import Image
from transformers import (
    BlipProcessor, 
    BlipForConditionalGeneration
)
from ultralytics import YOLO


def img_caption(model, processor, img, text=None):
    """
    Uses BLIP model to caption image.
    
    """
    res = None
    if text:
        # conditional image captioning
        inputs = processor(img, text, return_tensors='pt')
    else:
        # unconditional image captioning
        inputs = processor(img, return_tensors='pt')
    out = model.generate(**inputs)
    res = processor.decode(out[0], skip_special_tokens=True)
    return res


def img_detect(model, img, plot=False):
    """
    Run YOLO inference on an image.
    
    """
    result = model(img)[0]
    boxes = result.boxes  # boxes object for bounding box outputs
    names = model.names
    objs = []
    for c, p in zip(boxes.cls, boxes.conf):
        objs.append({names[int(c)]: p.item()})
    img_bgr = result.plot()  # BGR-order numpy array
    img_rgb = Image.fromarray(img_bgr[..., ::-1])  # RGB-order PIL image
    if plot:
        plt.figure(figsize=(16, 8))
        plt.imshow(img_rgb)
        plt.show()
    return objs, img_rgb


with st.spinner('Please wait, application is initializing...'):
    MODEL_CAP_NAME = 'Salesforce/blip-image-captioning-base'
    PROCESSOR_CAP = BlipProcessor.from_pretrained(MODEL_CAP_NAME)
    MODEL_CAP = BlipForConditionalGeneration.from_pretrained(MODEL_CAP_NAME)

    MODEL_DET_NAME = 'yolov8n.pt'
    MODEL_DET = YOLO(MODEL_DET_NAME)

st.header('Demo of image uploading', divider='rainbow')
st.subheader('Uploading file and plot image with AI caption and YOLO detection')
st.divider()

st.write('#### Upload you image')
uploaded_file = st.file_uploader('Select an image file (JPEG format)')
if uploaded_file is not None:
    file_name = uploaded_file.name
    if '.jpg' in file_name:
        # input text for conditional image captioning
        text = st.text_input(
            'Input text for conditional image captioning (if needed)', 
            ''
        )
        with st.spinner('Please wait...'):
            bytes_data = uploaded_file.read()
            img = Image.open(io.BytesIO(bytes_data))
            
            # image caption model for uploaded image
            caption = img_caption(
                model=MODEL_CAP, 
                processor=PROCESSOR_CAP, 
                img=img, 
                text=text
            )
            st.divider()
            st.image(img, caption=caption)
            
            # object detection model for uploaded image
            objs, img_det = img_detect(
                model=MODEL_DET, 
                img=img
            )
            st.divider()
            st.image(img_det, caption='object detection', width=800)
            st.divider()
            st.caption('Objects dictionary:')
            st.write(objs)
    else:
        st.error('File read error', icon='⚠️')

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

# Use of AI model for image captioning
# and object detection at the image

import io
import streamlit as st
from PIL import Image
from transformers import pipeline
import pandas as pd


def zeroshot(classifier, classes, img):
    scores = classifier(
        img,
        candidate_labels=classes
    )
    return scores


with st.spinner('Please wait, application is initializing...'):
    MODEL_ZERO_NAME = 'openai/clip-vit-base-patch16'
    CLASSIFIER_ZERO = pipeline('zero-shot-image-classification', model=MODEL_ZERO_NAME)
    CLASSES = [
        'a photo of nature',
        'a photo of cat',
        'a photo of a party',
        'a photo of a food'
    ]

st.header('Demo of image uploading', divider='rainbow')
st.subheader('Uploading file and classifying it with zero-shot')
st.divider()

st.write('#### Upload you image')
uploaded_file = st.file_uploader('Select an image file (JPEG format)')
if uploaded_file is not None:
    file_name = uploaded_file.name
    if '.jpg' in file_name:
        with st.spinner('Please wait...'):
            bytes_data = uploaded_file.read()
            img = Image.open(io.BytesIO(bytes_data))
            
            # classifying image with zero-shot modele
            scores = zeroshot(
                classifier=CLASSIFIER_ZERO, 
                classes=CLASSES, 
                img=img
            )
            st.divider()
            st.image(img, caption='zero-shot classification')
            
            # plot a diagram ith scores and scores output
            st.divider()
            df = pd.DataFrame(scores)
            df = df.set_index('label')
            st.bar_chart(df)
            st.divider()
            st.caption('Scores dictionary:')
            st.write(scores)
    else:
        st.error('File read error', icon='⚠️')

#### 3.3. Add some OCR

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

# Use of OCR model for text extracting 
# from image or PDF file

import io
import streamlit as st
from PIL import Image
import pytesseract
from pdf2image import convert_from_bytes
import pandas as pd


def pdf2img(pdf_bytes):
    """
    Turns pdf file to set of jpeg images.

    """
    images = convert_from_bytes(pdf_bytes.read())
    return images


def ocr_text(img, lang='eng'):
    """
    Takes the text from image.
    
    :lang: language is `eng` by default,
           use `eng+rus` for two languages in document

    """
    text = str(pytesseract.image_to_string(
        img,
        lang=lang
    ))
    return text


def ocr_text_dir(img_dir, lang='eng'):
    """
    Takes the text from images in a folder.

    """
    text = ''
    for img_name in tqdm(sorted(os.listdir(img_dir))):
        if '.jpg' in img_name:
            img = Image.open(f'{IMG_PATH}/{img_name}')
            text_tmp = ocr_text(img, lang=lang)
            text = ' '.join([text, text_tmp])
    return text


st.header('Demo of image uploading', divider='rainbow')
st.subheader('Uploading file and extracting text from it')
st.divider()

st.write('#### Upload you file or image')
uploaded_file = st.file_uploader('Select a file (JPEG or PDF)')
if uploaded_file is not None:
    file_name = uploaded_file.name
    lang = st.selectbox(
            'Select language to extract ',
            ('eng', 'rus', 'eng+rus')
        )
    if '.jpg' in file_name:
        with st.spinner('Please wait...'):
            bytes_data = uploaded_file.read()
            img = Image.open(io.BytesIO(bytes_data))
            
            # image caption model for uploaded image
            text = ocr_text(img, lang=lang)
            st.divider()
            st.write('#### Text extracted')
            st.write(text)
    elif '.pdf' in file_name:
        with st.spinner('Please wait...'):
            imgs = pdf2img(uploaded_file)
            text = ''
            for img in imgs:
                text_tmp = ocr_text(img, lang=lang)
                text = ' '.join([text, text_tmp])
            st.divider()
            st.write('#### Text extracted')
            st.write(text)
    else:
        st.error('File read error', icon='⚠️')

### 3.4. Add some faces

In [None]:
%%writefile stapp.py
#!/usr/bin/env python
# coding: utf-8

# Use of DeepFace framework
# for faces detection and recognotion

import io
import cv2
import streamlit as st
from PIL import Image
from deepface import DeepFace
from transformers import pipeline
import pandas as pd
import numpy as np


def zeroshot(classifier, classes, img):
    scores = classifier(
        img,
        candidate_labels=classes
    )
    return scores


with st.spinner('Please wait, application is initializing...'):
    MODEL_ZERO_NAME = 'openai/clip-vit-base-patch16'
    CLASSIFIER_ZERO = pipeline('zero-shot-image-classification', model=MODEL_ZERO_NAME)
    CLASSES = [
        'a photo of nature',
        'a photo of cat',
        'a photo of a party',
        'a photo of a food'
    ]
    DEEPFACE_MODELS = [
        'VGG-Face',
        'Facenet',
        'Facenet512',
        'OpenFace',
        'DeepFace',
        'DeepID',
        'ArcFace',
        'Dlib',
        'SFace',
        'GhostFaceNet'
    ]
    DB_PATH = '/home/jovyan/dlba/topic_09/app/data/db'

st.header('Demo of image uploading', divider='rainbow')
st.subheader('Uploading file and classifying it with zero-shot and face recognition')
st.divider()

st.write('#### Upload you image')
uploaded_file = st.file_uploader('Select an image file (JPEG format)')
if uploaded_file is not None:
    file_name = uploaded_file.name
    if '.jpg' in file_name:
        with st.spinner('Please wait...'):
            bytes_data = uploaded_file.read()
            img = Image.open(io.BytesIO(bytes_data))
            
            # classifying image with zero-shot modele
            scores = zeroshot(
                classifier=CLASSIFIER_ZERO, 
                classes=CLASSES, 
                img=img
            )
            st.divider()
            st.image(img, caption='zero-shot classification')
            
            # plot a diagram ith scores and scores output
            st.divider()
            df = pd.DataFrame(scores)
            df = df.set_index('label')
            st.bar_chart(df)
            st.divider()
            st.caption('Scores dictionary:')
            st.write(scores)
            
            # faces detection and recognition
            results = DeepFace.find(
                img_path=np.array(img),  # face to find
                db_path=f'{DB_PATH}',  # path to directory with faces
                model_name=DEEPFACE_MODELS[0],
                enforce_detection=False
            )
            st.divider()
            st.caption('Faces recognition:')
            found = []
            for result in results:
                name = result.identity.values
                if name:
                    found.append(
                        name[0].replace(f'{DB_PATH}/', '').replace('.jpg', '')
                    )
            if found:
                st.write(f'Found: {" ,".join(found)}')
            else:
                st.write('No known faces found')
    else:
        st.error('File read error', icon='⚠️')

## 4. Move to developing app

Let's get out Jupyter notebooks to a hardcore development process...

### <font color='red'>HOME ASSIGNMENT (Final project)</font>

Your final project tasks are:
1. Enlarge number of categories for uploaded images classification
2. Change YOLO detection to segmentation model (use new version of YOLO, version 11 is released)
3. Apply emotion detector model and add emotions for your friends faces recognition pipeline
4. Implement application's page for OCR with Tesseract library
5. Add text summarization option for text extracted in OCR page

<font color='red'>Use application code snippets for the Final project, not the notebooks!</font> To run application use the following command from the terminal:

`streamlit run Main_page.py --server.port 20000 --browser.gatherUsageStats False`