You can install Tesseract OCR and the pytesseract Python wrapper using the following steps:

1. Install Tesseract OCR: You can download and install Tesseract OCR from the following website based on your operating system:
Windows: https://github.com/UB-Mannheim/tesseract/wiki

MacOS: https://brew.sh/

Linux: Use your package manager (e.g. apt-get install tesseract-ocr on Ubuntu)

2. Install the pytesseract package: You can install the pytesseract package using pip, the Python package installer. Open a terminal window and run the following command:

In [4]:
# !pip install pytesseract

The path to the Tesseract OCR executable file may depend on your operating system and the method you used to install Tesseract. Here are some common ways to find the path:

1. Windows: If you installed Tesseract using the installer, the default installation directory is C:\Program Files\Tesseract-OCR. If you installed Tesseract using Chocolatey, the installation directory is typically C:\ProgramData\chocolatey\bin.

2. MacOS: If you installed Tesseract using Homebrew, the executable file is typically located at /usr/local/Cellar/tesseract/{version}/bin/tesseract, where {version} is the version number of Tesseract. You can find the version number by running brew info tesseract in the terminal.

3. Linux: If you installed Tesseract using your package manager, the executable file is typically located at /usr/bin/tesseract or /usr/local/bin/tesseract.

Once you have found the path to the Tesseract OCR executable file, you can set the tesseract_cmd variable in your Python code to the path. For example, if the path is /usr/bin/tesseract, you can set the variable as follows:

import pytesseract

#Set the path to the Tesseract OCR executable file

pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'


eg. If the Tesseract OCR executable file is installed in the default directory of C:\Program Files\Tesseract-OCR on a Windows machine, you can set the tesseract_cmd variable in your Python code to the following path:

In [2]:
import pytesseract

# Set the path to the Tesseract OCR executable file
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'


Testing out to see the functionality of Tesseract

In [14]:
import pytesseract
from PIL import Image

# Open the image using PIL
image = Image.open('fy.png')

# Use pytesseract to extract the text from the image
text = pytesseract.image_to_string(image)

# Print the extracted text
print(text)


Importance of Quick Recovery codes in their various application areas:

1. Marketing: QR codes can be used as a tool for marketing by including them in print or digital advertising materials such as
posters, flyers, brochures, and websites. When scanned, the QR code can direct the user to a landing page with more information
about the product or service being advertised.

2. Retail: Retailers can use QR codes to provide additional information about products, such as pricing, product specifications,
and customer reviews. Customers can scan the QR code using their mobile device to access this information quickly and easily.

3. Education: QR codes can be used in educational materials, such as textbooks, to provide additional resources and multimedia
content. For example, a QR code in a history textbook could link to a video that provides additional context and background
information.

4. Events: QR codes can be used at events to provide additional information, such as schedules, maps, an

And good! There we have our text from image!!

To extract font information, you can use the image_to_osd method of pytesseract, which returns the orientation and script detection information for the image. The output of this method includes information on the font used in the image, including the font name and font size.

Here's how to extract font information from an image using pytesseract:

In [15]:
import pytesseract
from PIL import Image

# Set the path to the Tesseract OCR executable file
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Load the image and extract the font information
img = Image.open('fy.png')
osd = pytesseract.image_to_osd(img)

# Extract the font information from the OSD output
font_name = osd.split('\n')[2].split(':')[1].strip()
font_size = osd.split('\n')[3].split(':')[1].strip()

# Print the font information
print(f"Font name: {font_name}")
print(f"Font size: {font_size}")


Font name: 0
Font size: 11.87


In [16]:
import pytesseract
import io
import ipywidgets as widgets
from IPython.display import display
from PIL import Image

# Set the path to the Tesseract OCR executable file
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Define the GUI elements
uploader = widgets.FileUpload()
output = widgets.Output()
button = widgets.Button(description='Extract Text')
status = widgets.HTML(value='<i>Upload an image to begin</i>')

# Define the function to extract text from the image
def extract_text(image):
    with output:
        # Load the image and extract the text
        img = Image.open(io.BytesIO(image['content']))
        text = pytesseract.image_to_string(img)

        # Print the extracted text
        print(text)

# Define the function to handle button clicks
def on_button_click(button):
    if uploader.value:
        extract_text(uploader.value[list(uploader.value.keys())[0]])
        status.value = '<font color="green">Image successfully uploaded</font>'
    else:
        status.value = '<font color="red">No image uploaded</font>'

# Attach the event handler to the button
button.on_click(on_button_click)

# Display the GUI elements
display(widgets.VBox([uploader, button, status, output]))


VBox(children=(FileUpload(value={}, description='Upload'), Button(description='Extract Text', style=ButtonStyl…