#### Running Llama 3.1 8b Locally on Windows with LlamaIndex: A Practical Guide

In this blog post, we'll explore how to run LlamaIndex with the Llama 3.1 8B parameters locally on a Windows machine. LlamaIndex is a powerful tool for integrating large language models with external data sources, and we will demonstrate how to leverage a PDF document and a vectorstore for efficient information retrieval. Whether you're new to LlamaIndex or looking for a practical guide to running Llama locally, this post will walk you through the steps to set up and harness the capabilities of these advanced models for your own projects.


#### Setting up LLama 3.1 8b model locally on Windows

1. [Download](https://ollama.com/download/windows) installable for Windows
2. [Search](https://ollama.com/search) llama 3.1 and copy the command from model page
3. Run the command on Windows cli
    ollama run llama3.1

4. Once up and running 

![Llama 3.1](./llama3.1.PNG)

5. Or, hit http://localhost:11434/

    The browser shows "Ollama is running"


NOTE: I am running this on i9 Intel vPRO with 64.0 GB with NVIDIA RTX 3500 ADA (12 GB dedicated RAM). You can try tiny llama in case of resource constraints.

In [1]:
import fitz
import re
import copy
import warnings
warnings.filterwarnings("ignore")

def get_page_range(pdf_path, out_path, start_page, end_page):
    """
    Extracts text from a range of pages from a PDF file using PyMuPDF.
    """
    document = fitz.open(pdf_path)

    # Create a new PDF for the output
    output_document = fitz.open()
    text = ""
    # Extract pages from the specified range
    for page_num in range(start_page, end_page + 1):
        page = document.load_page(page_num)
        output_document.insert_pdf(document, from_page=page_num, to_page=page_num)

    # Save the extracted pages to the output file
    output_document.save(out_path)

    print(f"Pages {start_page + 1} to {end_page + 1} have been extracted and saved to '{out_path}'.")


def parse_pdf_sections(document):
    parsed_dict = {}
    main_section_key = ""
    section_key = ""
    for page in range(document.page_count):
        #print(f"Page {page + 1}")
        text_dict = document[page].get_text("dict")

        for block in text_dict.get("blocks", []):
            #print(f"Block {block['number']}")

            for line in block.get("lines", []):
                for span in line.get("spans", []):
                    #print(span)
                    text = span.get("text", "").strip()

                    if len(text) > 0:
                        if not text.isnumeric() and text.strip().lower() not in ["Chapter 1: Introduction to Apache Spark: A Unified Analytics Engine".lower(), "CHAPTER 1".lower()]:
                            #print(text)
                            font_size = span.get("size", 0)
                            if font_size > 17 and font_size < 19:
                                #print(text) 
                                main_section_key = text
                                parsed_dict[main_section_key] = {}
                                section_key = ""
                                sub_section_key = ""
                            
                            if font_size > 15 and font_size < 16:
                                #print("\t", text)
                                section_key = text
                                sub_section_key = ""
                                if main_section_key:
                                    parsed_dict[main_section_key][section_key] = {}

                            if font_size > 11 and font_size < 12:
                                #print("\t\t", text)
                                sub_section_key = text
                                if main_section_key and section_key:
                                    parsed_dict[main_section_key][section_key][sub_section_key] = []

                            if font_size > 10 and font_size < 11:
                                #print("\t\t\t", text)
                                if main_section_key and section_key and sub_section_key:
                                    parsed_dict[main_section_key][section_key][sub_section_key].append(text)
                                
                                if main_section_key and section_key and not sub_section_key:
                                    sub_section_key = "content"
                                    parsed_dict[main_section_key][section_key][sub_section_key] = [text]

    for main_section_key in parsed_dict:
        for section_key in parsed_dict[main_section_key]:
            for sub_section_key in parsed_dict[main_section_key][section_key]:
                content = parsed_dict[main_section_key][section_key][sub_section_key]
                parsed_dict[main_section_key][section_key][sub_section_key] = " ".join(content)
                
    return parsed_dict


def fixed_size_chunking(text, metadata, chunk_size, overlap, char=False):
    """
    Splits the input text into chunks of a fixed size with optional overlap.

    Parameters:
    text (str): The input text to be chunked.
    chunk_size (int): The size of each chunk.
    overlap (int): The number of overlapping elements between consecutive chunks.
    char (bool): If True, chunk by characters. If False, chunk by words. Default is False.

    Returns:
    list: A list of text chunks.
    """

    if char:
        return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size - overlap)]
    else:
        text = text.split()
        return [ metadata + text[i:i+chunk_size] for i in range(0, len(text) - len(metadata), chunk_size - overlap)]

### Pdf preprocessing

#### Get the sample pages from Pdf

I have taken Learning Spark 2.0 pdf and extracted only one chapter for this post. The function ``` get_page_range ``` extracts a page range and save it to a output pdf. In this case i have already created ``` sample.pdf ```. use the following code snippet to get the sample pdf.

```python
text = get_page_range("./LearningSpark2.0.pdf", "sample.pdf", 24, 41)
```

#### Extract text from pdf

I have written small parsing logic to keep the text for a section together to preserve the context.


In [2]:
document = fitz.open("./sample.pdf")
parsed_content = parse_pdf_sections(document)

In [3]:
#### Chunking the content

chunks = []

for main_section_key, main_section_value in parsed_content.items():
    for section_key, section_value in main_section_value.items():
        for sub_section_key, sub_section_value in section_value.items():
            metadata = (main_section_key + " " + section_key + " " + sub_section_key).strip().split()
            chnk = fixed_size_chunking(sub_section_value, metadata, 1000, 200, char=False)
            txt_chnk = [' '.join(c) for c in chnk]
            chunks.extend(txt_chnk)
            lowercased_list = [str(item).lower() for item in chunks]

In [4]:


import warnings
warnings.filterwarnings("ignore")

from llama_index.core.schema import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, Settings
embed_model = HuggingFaceEmbedding()

# Settings
Settings.embed_model = embed_model


index = VectorStoreIndex([])
for chunk in lowercased_list:
    index.insert(Document(text=chunk, extra_info={}))

In [7]:
# Use locally running Ollama Server for querying the index
from llama_index.llms.ollama import Ollama
llm = Ollama(model = "llama3.1", request_timeout=420.0)

query_engine = index.as_query_engine(llm=llm)

# Let's run one query
response = query_engine.query("Give me the names of Apache spark components in bullets")
print(response)

• Spark SQL
• Spark MLlib
• Spark Structured Streaming
• GraphX
