## **Development of Document Processing System Using OCR, LLM, Text Summarization Techniques and Image Generation**

### **Objective:**

To design and implement a system that extracts text from image-based PDFs, generates a conversational chatbot based on the extracted text, summarizes the extracted text

### **User Instructions:**
1.	The python file is in Notebook format (.ipynb). The file can be run using `Google Colab`.
2.	Go to Colab, click `upload notebook` and upload the notebook file.
3.	Next step is to upload the PDF file that needed to generated from. Upload the document in /content/ folder. Change the PDF name while instantiating the `PyPDFLoader ` class.
4. Create `images` folder inside `\content\` folder, for the images to save.
4.	Change the `query` variable accordingly.
5.	Run the cells by clicking `runtime -> run all`. This will all the cells and gives the output of each cell.

### **Implementation Details**:

1.	**Import necessary libraries:** The first step of our implementation is to install and import all the necessary libraries that we need. The libraries that are used are: `opencv`, `easyocr`, `PyPDF2`, `Groq`.
2.	**Read and extract images from the PDF:** Read the given PDF file by using `PdfReader` class from `PyPDF2` library and save all the images from each page into seperate image file inside images folder.
3.	**Extract text from the images:** Iterate through all the image files and extract the text and save it as a variable.
5.	**Generating Answer:** The retrieved content along with the user query is used as input to generate an answer using the `llama2-70b-4096` model which we can use from `Groq` API.



### **Install and Import Modules**

In [1]:
# Installing the required modules
!pip install opencv-python
!pip install matplotlib
!pip install numpy



In [7]:
# Installing the CPU and CUDA
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install easyocr
!pip install PyPDF2

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


### **Importing all the neccessary libraries**

In [8]:
# Importing the different libraries
import cv2
import numpy as np
import easyocr
import matplotlib.pyplot as plt
%matplotlib inline
import os
import PyPDF2 # For processing PDF

### **Extracting all the images from the PDF and save it in seperate folder**

In [None]:
def extract_image_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        # Create a PDF reader object
        pdf_reader = PyPDF2.PdfReader(file)

        image_count = 1

        # Loop through each page in the PDF
        for page in pdf_reader.pages:
            # getting the image from the page. Since each page has only one image,
            # we are getting the 1st image
            image = page.images[0]

            # Creating image file for the extracted image
            with open(f'./images/image{image_count}.jpg', "wb") as fp:
              fp.write(image.data)
              image_count += 1

# calling the function, which will create image file for each images in the pdf.
extract_image_from_pdf('session5.pdf')

### **Defining the path of images**

In [45]:
# Extracting the path of images one by one in the form of list
img_path = "./images/"
create_path = lambda f : os.path.join(img_path, f)

# listing all the images in the directory as a list
test_image_files = [f for f in os.listdir(img_path) if '.jpg' in f.lower()]
test_image_files.sort()

for f in test_image_files:
    print(f)

image1.jpg
image2.jpg
image3.jpg
image4.jpg
image5.jpg


### **Defining a function that regocnize text from a single image file**

In [54]:
# Defining reader object for extract text in English ('en')
reader = easyocr.Reader(['en'])

# Function that retruns the recognized text as list with prob, text for each line
def recognize_text(img_path):
    ''' loads an image and recognizes text. '''
    return reader.readtext(img_path)



### **Defining function to extract text from image**

In [60]:
# Extracting text from the image
def ocr_text(img_path):
    # recognizing text from an image
    result = recognize_text(img_path)
    res = ""  # result variable to concate all the lines of an single image file

    # Iterating over all the lines along with probability of recognized text
    for(bbox, text, prob) in result:
        # If OCR prob is over 0.2, overlay text:
        if prob:
            res += text

    # returning extracted text of the image file
    return res

### **Extract text from all the images in the PDF**

In [None]:
# loading the path of image by passing the position of the image
extracted_text = ""

# Iterating over list of image files present in PDF:
for image_file in test_image_files:
  # creating complete path of the current image
  path = create_path(image_file)

  # Extracting the text from that image and concatinating it with the result
  extracted_text += ocr_text(path)

# Now the `extracted_text varibale` contains text from all the images in the PDF

### **Printing the extracted text from the image**

In [62]:
extracted_text

'ArtificialintelligencesystemLearns fronUses theRecognisesSolvesUnderstandsCreatesexperiencelearningmagescomplexlanguageperspectivest0 reasonproblemsits nuancesArtificial intelligence (AI), sometimes called machine intelligence,is intelligence demonstrated by machines_in contrast to the naturalintelligence displayed by humans and other animals, such as "learningand "problem solving_In computer science AI research is defined as the study of"intelligent agentsany device that perceives its environment andtakes actions that maximize its chance of successfully achieving itsgoals:auldWays that People Think and LearnAboutThingsIf you havea problem, think of a past situationwhere you solveda similar problem_If you take an action, anticipate what might happennext.Iffail at something, imagine how you mighthave done things differently:If you observe an event,to infer what prior eventmightcaused it_If you see an object, wonder if anyone owns it:If someone does something, ask yourself what theperso

## **Developing chatbot based on the interpreted text**

In [64]:
''' Installing groq for using llama model with API '''
!pip install groq



### **Summerizing the text in the images**

In [None]:
from groq import Groq

# setting environment variable for groq
os.environ['GROQ_API_KEY'] = 'Your Groq API Key Here'

# creating groq client to use the model `llama2-70b-4096`
client = Groq()


# giving prompt for summerizing the content
query = "Give me summery of the context"
prompt = f'''
    You are a chatbot you must generate a good summarised answer from the context.
    Use the following provided context to answer the query enclosed within triple backticks.
    Context: {extracted_text}
    User Query: ```{query}```
    Answer:
'''

# generating answer from the context and query using 'llama2-70b-4096' model:
completion = client.chat.completions.create(
    model="llama2-70b-4096",
    messages=[
        {
            "role": "user",
            "content": f""" {prompt}
            """
        }
    ],
    temperature=0.5,
    max_tokens=1024,
    top_p=1,
    stream=True,
    stop=None,
)

# printing the result
for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

Artificial intelligence (AI) is a field of study in computer science that focuses on creating machines that can perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making. AI systems can be used for a variety of tasks, including facial recognition, internet searches, and driving cars. The long-term goal of many researchers is to create general AI (AGI) that can outperform humans at nearly every thinking task. One recent development in AI is Google's Duplex system, which can conduct natural conversations over the phone to carry out real-world tasks, such as scheduling appointments.

In [66]:
# giving prompt for query
query = "What are the ways that people think and learn about things?"
prompt = f'''
    You are a chatbot you must generate a good summarised answer from the context.
    Use the following provided context to answer the query enclosed within triple backticks.
    Context: {extracted_text}
    User Query: ```{query}```
    Answer:
'''

# generating answer from the context and query using 'llama2-70b-4096' model:
completion1 = client.chat.completions.create(
    model="llama2-70b-4096",
    messages=[
        {
            "role": "user",
            "content": f""" {prompt}
            """
        }
    ],
    temperature=0.5,
    max_tokens=1024,
    top_p=1,
    stream=True,
    stop=None,
)

# printing the result
for chunk in completion1:
    print(chunk.choices[0].delta.content or "", end="")

There are several ways that people think and learn about things. One way is by using past experiences to solve similar problems. People can also use anticipation and imagination to think about what might happen next or how they could have done things differently. Observation is another way people learn, by inferring what might have caused an event or who might own an object. Additionally, people can use automated reasoning to answer questions and draw new conclusions, and machine learning to adapt to new circumstances and detect patterns. Finally, natural language processing and computer vision allow people to communicate with machines and process visual information. These ways of thinking and learning are being used to develop artificial intelligence (AI) systems that can perform tasks such as facial recognition, internet searches, and driving a car. The long-term goal of many researchers is to create general AI that can outperform humans at nearly every thinking task.