# Using Llama 2 as OCR Vision AI
---

    Author: Amit Shukla

[https://github.com/AmitXShukla](https://github.com/AmitXShukla)

[https://twitter.com/ashuklax](https://github.com/AShuklaX)

[https://youtube.com/@Amit.Shukla](https://youtube.com/@Amit.Shukla)


Meta has recently released Llama 2, a large language model trained with up to 70B parameters, positioning it as the fastest and most advanced solution available. This model is expected to outperform other tools in terms of both speed and accuracy.

In this blog post, I will demonstrate some automation use cases I have been working on. 

It's important to note that these use cases/models will work best when trained on "in-house" data. However, training such models is a rigorous task that requires significant computing hours and resources.

To make things more accessible and easier to utilize in production, using "off the shelf", language models like ChatGPT and Llama 2 is a viable solution.

Below, I present some examples of use cases I've been working on. 

*While these examples are not meant for production*, they still showcase the powerful capabilities of the language models.

`Upon completing this blog, you will acquire the skills to build Llama 2 and ChatGPT APIs and harness the capabilities of large language models for practical data analytics tasks.`

## Table of content
---

- Introduction
- Installation (Linux and Windows)
- Efficient Time and Expense Monitoring with Llama 2
- Using Llama 2 as OCR Vision AI
- Streamlining 3-Way Receipt Match and Duplicate Voucher Invoices with Llama 2
- Enhancing Fraud Detection: Utilizing Llama 2 as an Advanced Alert System for Monitoring Transactions
- Maximizing Tax Savings, Ensuring Compliance, and Streamlining Audits with Llama 2

## Introduction
---

#### About me
I'm Amit Shukla, and I specialize in training neural networks for Finance Supply Chain analysis, enabling them to identify data patterns and make accurate predictions.

During the challenges posed by the COVID-19 pandemic, I successfully trained GL and Supply Chain neural networks to anticipate supply chain shortages. The valuable insights gained from this effort have significantly influenced the content of this tutorial series.
	
#### Objective:
By delving into this powerful tool, we will master the fundamental techniques of utilizing large language models to predict hazards. 
This knowledge is crucial in preparing finance and supply chain data for advanced analytics, visualization, and predictive modeling using neural networks and machine learning.
	
#### Subject
It is crucial to emphasize that this specific series will focus exclusively on presenting `production-like examples that demonstrate certain use cases`. It is not intended for production applications. 

Nevertheless, these examples illustrate highly potent techniques that have practical applications in real-world Data Analytics.
	
#### Following
In future installments, we will explore Data Analytics and delve into the realm of Data Analytics and machine learning for predictive analytics.

Thank you for joining me, and I'm excited to embark on this educational journey together.
	
Let's get started.

---

## Installation
---

In a previous video, I demonstrated the process of activating the Open ChatGPT and Llama environments. 

In this section, I will guide you through the steps to install Llama 2 on a Windows operating system. 

While the installation process is quite similar to that on Linux, there are a few minor changes that need to be considered. 

Let's get started!

- Step 1: `download miniconda windows installer` [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html)
- Step 2: create a new conda environment (say llamaConda)

In [None]:
# before you setup your machine for llama 2, check if you have cuda on your machine

import torch
torch.cuda.current_device()
# if you don't have cuda and torch on your machine, please move to next step and download cuda

In [None]:
# download pytorch cuda
# https://pytorch.org/get-started/locally/
# uncomment and run this command in Terminal to monitor download progress and debug any error

# !conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

In [None]:
# run these command again and make sure you have cuda available on your machine
import torch
torch.cuda.set_device(0)
torch.cuda.is_available(),torch.cuda.get_device_name()

In [None]:
# use this case if you see CUDA out of memory error
# also, try to reduce your << --max_batch_size 1 >>, max_split_size_mb:512 and work with "lowest memory size" model
# !torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 1
# clear cache

import torch
torch.cuda.empty_cache()
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:24"

# import gc
# del variables
# gc.collect()

torch.cuda.memory_summary(device=None, abbreviated=False)

signup and receive model download link from meta.

[Mete AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
>>>
    do not use the 'Copy link address' option when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.

In [None]:
# Step 3: uncomment and run this command on your machine
# make sure, you have a Git installed on your machine if not
# for linux run 
# sudo apt install git-all

# for windows, download git from this link
# https://git-scm.com/download/win

####### clone Meta llama repo ##########
git clone https://github.com/facebookresearch/llama.git

In [None]:
# browse to root of your llama repo

## LINUX
# cd llama
# chmod +x # ./download.sh
# ./download.sh

## WINDOWS
# bash ./download.sh
# if this commands error out, download wget.exe from below link and copy wget.ex to C:\amit.la\llama
# https://eternallybored.org/misc/wget/
# make sure, you include << C:\amit.la\llama >> to windows environment path so that windows can find it

In [None]:
# make sure, latest conda env is selected as kernel
# before activating and installing dependencies

!pip install -e .
!python setup.py install

In [None]:
# only for windows
# change line #62 on llama/generate.py
# from 
# << torch.distributed.init_process_group("gloo|nccl") >>
# to
# << torch.distributed.init_process_group("gloo|nccl") >>

In [None]:
# make sure, latest conda env (llamaConda) is selected as kernel
# before activating and installing dependencies

# !pip install -e .
# !python setup.py install

In [None]:
# ./example_text_completion.py
# ./example_chat_completion.py

# change prompts | dialogue 
#   prompts = [
#         # For these prompts, the expected answer is the natural continuation of the prompt
#         "meaning of life is",
#     ]

In [None]:
# you are ready to use llama
# !torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 1

## Using Llama 2 as OCR Vision AI
---

`What incentive do I have to develop yet another OCR Vision AI tool when there are already thousands of options readily available in market?`

- The majority of the offerings in the market or available wrappers are built upon Open Source OCR packages.
- In contrast, Vision AI solutions by large companies, which are not reliant on Open Source technologies, often offer low upfront costs but impose significant charges later on, frequently requiring customers to pay per image used.

`Why pay for inferior results from a service not trained on your data, when building an in-house, cost-effective, on-premise solution with customized capabilities is feasible, without risking data exposure.`

Here are some use cases you can consider optimizing this OCR Vision API by training it on "in-house" data.

1. Document classifier
2. Digital Invoice scanner
3. Digital Private signature matching
4. Scanning for confidential information like PHI, Private health, or Personal data
5. Train on "in-house" data for classifying "Secured information" in contracts, spends, etc.
7. Label or Hash documents
8. Sorting Receipts, Vendor Invoices or Matching Employee Expenses
9. Duplicate Invoice
....


This OCR Vision AI code below, is just one of many ways while numerous smarter optimizations are possible.
and `Fine tuning these models with in-house knowledge base will results into more accurate results.`

## OCR, Vision AI automated build process
---

Step 1: A receipt or invoice document is uploaded / dropped to a folder
       or you system takes a screenshot of an image or document

Step 2: Automated Code to call script is executed as soon as file is dropped

Step 3: Scripts read text from images

Step 4: Scripts to build prompts

Step 5: Send prompts to Llama 2 | ChatGPT

Step 6: Store results back to Application

In [None]:
#########################################################################
# Step 1: System takes a screenshot of a webpage
#########################################################################
# pip install Pillow
# pip install selenium

from selenium import webdriver
from PIL import Image

# Define the URL of the web page we want to screenshot

url = 'https://finance.yahoo.com/quote/AAPL?p=AAPL&.tsrc=fin-srch'

# Define the path to the webdriver executable (e.g., chromedriver.exe)

# webdriver_path = '/path/to/webdriver/executable'
webdriver_path = 'C:\amit.la\WIP\RPA\downloads\chromedriver.exe'

# Set up the webdriver

options = webdriver.ChromeOptions()
options.headless = True # Run the browser in headless mode to prevent a window from popping up
driver = webdriver.Chrome(executable_path=webdriver_path, options=options)

# Load the web page

driver.get(url)

# Take a screenshot of the entire page

# screenshot = driver.find_element_by_tag_name('body').screenshot_as_png
screenshot = driver.save_screenshot('../downloads/screenshot.png')

# Close the webdriver

driver.quit()

# Save the screenshot to a file

# with open('../SampleData/screenshot.png', 'wb') as file:
#     file.write(screenshot)

# Open the screenshot with Pillow to display it (optional)

img = Image.open('../downloads/screenshot.png')
img.show()

In [None]:
#########################################################################
# Step 1: A receipt or invoice document is uploaded / dropped to a folder
#########################################################################

#   $ sftp remote_username@server_ip_or_hostname
# >> Connected to remote_username@server_ip_or_hostname.

# $ sftp> pwd
# >> Remote working directory: /home/remote_username

# $ sftp> lpwd
# >> Local working directory: /downloads

# $ sftp> put screenshot.png


## TAKING A SCREENSHOT ##

In [None]:
##############################################################################
# Step 2: Automated Code to call script is executed as soon as file is dropped
##############################################################################
import time
import os

# Define the directory to watch

watch_dir = '/downloads/AAPL.png'

# Define the command to run when a new file is added

command = 'python c:/amit.la/WIP/RPA/ocr.py'

# Define the set of already seen files

seen_files = set()

# Start watching the directory for new files

while True:
    # Get the list of files in the directory
    files = os.listdir(watch_dir)

    # Check each file in the directory
    for file in files:
        # If the file is new, trigger the command
        if file not in seen_files:
            seen_files.add(file)
            os.system(command) # add to your DB...

    # Wait for a little bit before checking again
    time.sleep(10)

In [None]:
#######################################
# Step 3: Scripts read text from images
#######################################
# py -m pip install pytesseract : open source
# py -m pip install PIL

import pytesseract
from PIL import Image

##############################################################################
# in case if tesseract is not included in PATH
pytesseract.pytesseract.tesseract_cmd = r'C:\amit.la\WIP\RPA\downloads\ts\tesseract.exe'
##############################################################################

def read_image_text(image_path):
    """
    Reads text from an image file using Tesseract OCR.

    Args:
        image_path (str): The file path to the input image.

    Returns:
        str: The extracted text from the image.
    """
    # Load the image file
    image = Image.open(image_path)

    # Use Tesseract OCR to extract the text from the image
    text = pytesseract.image_to_string(image)

    return text

# Example usage
image_path = "../downloads/AAPL.png"
# image_path = "../downloads/medical_form.png"
# image_path = "../downloads/email.png"
# image_path = "../downloads/vaccine.png"
# image_path = "../downloads/blurry_1.png"
# image_path = "../downloads/blurry_2.png"
text = read_image_text(image_path)
print(text)

In [None]:
#######################################
# Step 4: Scripts to build prompts
#######################################

promptText = read_image_text(image_path)
print(promptText)

In [None]:
###########################################
# Step 5: Send prompts to Llama 2 | ChatGPT
###########################################

def callChatGPT(prompt):
    ###########################################
    # uncomment this code to call ChatGPT API #
    ###########################################
    # completion = openai.ChatCompletion.create(
    # model=model_engine,
    # messages=[
    #     {"role": "user", "content": prompt}
    # ])
    # return completion.choices[0].message.content
    return "Yes"

def callLlama(prompt):
    ###########################################
    # uncomment this code to call Llama API #
    ###########################################
    # results = generator.chat_completion(
    #     dialogs,  # type: ignore
    #     max_gen_len=max_gen_len,
    #     temperature=temperature,
    #     top_p=top_p,
    # )
    # return result['generation']['content']
    return "Yes"

# build dynamic prompt per use case


prompt = "respond in one word: average volume of stock in this text."
prompt += promptText    


In [None]:
# print(promptText)
print(prompt)

In [23]:
###########################################
# Step 6: Store results back to Application
###########################################
# res2 = callLlama(promptText)
# res = callChatGPT(prompt)
# res2 = callLlama(promptText)
# res = 70,701,586
print(res2)

# respon

It seems like you've provided some financial data related to the stock of Apple Inc. (AAPL). Here's a breakdown of the information:

    Stock Symbol: AAPL (Apple Inc.)
    Current Stock Price: $159.74
        Change Today: +$0.46 (+0.29%)
    Market Cap: $252.71 billion
    Previous Close: $159.28
    Day's Range: $159.12 - $160.51
    52-Week Range: $124.27 - $179.61
    Volume: 23,782,692 shares
    Average Volume: 70,701,586 shares
    Beta: Not provided in the snippet
    PE Ratio (TTM - Trailing Twelve Months): 160.15x800
    EPS (TTM - Trailing Twelve Months): $5.89
    Earnings Date: April 26, 2023
    Ex-Dividend Date: February 10, 2023
    Dividend (Annual): Not provided in the snippet
    Forward P/E Ratio: Not provided in the snippet
    Target Estimate: $169.23

Please note that this data appears to be from April 26, 2023, and market conditions may have changed since then. It's always a good idea to verify the latest data through reliable financial sources before making an

#### Conclusion:

The use case discussed above exemplify sophisticated business processes and there is certainly lot more which is not covered. 

This use case merely scratch the surface of what can be achieved with these advanced tools. 

You may argue that the same results can be attained using simple algebraic mathematics with these datasets, and I fully support and agree with this observation.

In essence, the entire field, encompassing Data Science, Python, Llama, and ChatGPT, revolves around uncovering statistical associations within data.

However, it is crucial to recognize that the deployment of Llama or ChatGPT-like models does not surpass the importance of traditional statistics,
instead, they should be employed to streamline specific tasks.