# AI Bauchi 6 Weeks Computer Vision Bootcamp

<div style="display: flex; justify-content: space-evenly; align-items: center; width: 100%;">
<img src="../../logos\aib.png" width='100px'/>
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQeyMRtudTwUIhRHGT1VKvVbnRYTu8VaQtaHg&s" width='100px'/>
<img src="https://miro.medium.com/v2/resize:fit:800/0*qa3Uh-1JZUhCuBVK.png" width='100px'/>
</div>

---

## Session 13: Optical Character Recognition (OCR)

**Overview:**
Optical Character Recognition (OCR) is the process of converting images of text into machine-encoded text. This is useful for extracting text from images, such as scanned documents, photos, or screenshots. In this session, we will learn how to use the Tesseract OCR engine to extract text from images.

![image-2.png](attachment:image-2.png)

**Topics Covered:**
- Introduction to OCR
- Applications of OCR in Computer Vision
- Understanding the Tesseract OCR engine
- Installing Tesseract OCR
- Implementing OCR with Tesseract


**Learning Objectives:**
- Understand the basics of OCR
- Learn how to use the Tesseract OCR engine
- Implement OCR on images using Tesseract
- Extract text from images and display the results
- Apply OCR to real-world applications

**Key Concepts:**
1. Introduction to OCR
    - **Optical Character Recognition (OCR)**: is a technology that recognizes text within digital images. It is used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.
    - **History and Evolution of OCR**: OCR technology has evolved over the years, from early systems that required manual intervention to modern systems that use machine learning and deep learning algorithms to recognize text accurately.
    - **Importance of OCR in Computer Vision**: OCR plays a crucial role in various applications, such as document processing, text extraction, data entry automation, and image-to-text conversion.

2. **Applications of OCR in Computer Vision**:
    - **Document Processing**: OCR is used to extract text from scanned documents, invoices, receipts, and forms.
    - **Text Recognition**: OCR can recognize text in images, videos, and real-world scenes.
    - **Data Entry Automation**: OCR automates the process of entering data from documents into digital systems.
    - **Image-to-Text Conversion**: OCR converts images containing text into editable and searchable text.

3. **How OCR Works**:
    - **Preprocessing**: 
        - **Image Acquisition**: The input image containing text is captured using a camera or scanner.
        - **Image Enhancement**: The image is preprocessed to improve its quality, such as removing noise, adjusting brightness, and enhancing contrast.
        - **Image Binarization**: The image is converted to binary format (black and white) to separate text from the background.
        - **Skew Correction**: The image is rotated to correct any skew or tilt in the text.
    - **Text Detection**:
        - **Segmentation**: Identify regions of interest (ROIs) containing text in the image.
        - **Character Recognition**: Recognize individual characters within the ROIs using OCR algorithms.
    - **Postprocessing**:
        - **Text Correction**: Apply spell checking and language models to correct errors in the recognized text.
        - **Text Extraction**: Extract the recognized text from the image and output the results.

4. **Tesseract OCR Engine**:
![image.png](attachment:image.png)
    - **Overview**: Tesseract is an open-source OCR engine developed by Google. It supports over 100 languages and provides high accuracy in text recognition.
    - **Installation**: Tesseract can be installed on various platforms, such as Windows, macOS, and Linux.


In [1]:
!pip install pytesseract

Collecting pytesseract
  Downloading pytesseract-0.3.10-py3-none-any.whl.metadata (11 kB)
Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10


- **Installing Tesseract OCR**:
    - **Windows**: Download the Tesseract installer from the official GitHub repository and run the setup wizard.
    - **macOS**: Install Tesseract using Homebrew package manager by running the command `brew install tesseract`.
    - **Linux**: Install Tesseract using the package manager of your distribution, such as `apt-get install tesseract-ocr` for Ubuntu.

- **Example Workflow**
    - **Step 1**: Load the input image containing text with opencv or PIL.
    - **Step 2**: Preprocess the image by converting it to grayscale, applying thresholding, and performing other enhancements.
    - **Step 3**: Use Tesseract OCR to recognize text in the preprocessed image.
    - **Step 4**: Extract the recognized text and display the results or save to a file desired format(e.g., text file, JSON, etc.).

In [8]:
import cv2
import pytesseract

# Load the image
image = cv2.imread('images/image.jpeg')

# Convert the image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

# Use pytesseract to detect text
text = '''pytesseract.image_to_string(threshold_img)'''

# Print the text
print(text)

pytesseract.image_to_string(threshold_img)


5. **Challenges and Limitations of OCR**:
    - **Complex Layouts**: OCR may struggle with complex document layouts, multiple fonts, or overlapping text.
    - **Handwriting Recognition**: OCR accuracy may vary for handwritten text, especially cursive or stylized handwriting.
    - **Low-Quality Images**: OCR performance may degrade with low-resolution, noisy, or distorted images.
    - **Language Support**: OCR engines may have limitations in recognizing languages other than the supported ones.

6. **Advancements in OCR Technology**
   - **Deep Learning-based OCR:** Modern OCR systems often use deep learning techniques, particularly Convolutional Neural Networks (CNNs), for improved accuracy and robustness.
   - **End-to-End OCR Models:** These models can handle the entire OCR pipeline, from text detection to recognition, in a single integrated system.
   - **Multilingual OCR:** OCR systems are now capable of recognizing text in multiple languages, making them versatile for global applications.

#### Practical Assignment
- **Task:** Implement an OCR system that extracts text from a set of images provided. 
- **Steps:**
  1. Preprocess the images using OpenCV techniques.
  2. Use Tesseract to perform OCR on the preprocessed images.
  3. Evaluate the accuracy of the OCR output and refine the preprocessing steps if needed.
  4. Submit the extracted text in a structured format (e.g., CSV or JSON).

#### Conclusion
OCR is a powerful tool that bridges the gap between the physical and digital worlds by converting images of text into machine-readable data. Understanding OCR’s underlying concepts and implementation methods is crucial for anyone working in the field of computer vision, document processing, or data entry automation.