# Setup

In [None]:
%pip install --upgrade pip
%pip install --upgrade langchain langchain-community sentence-transformers transformers faiss-cpu tiktoken
%pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-25.1.1-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 1.8/1.8 MB 20.0 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.0.1
    Uninstalling pip-25.0.1:
      Successfully uninstalled pip-25.0.1
Successfully installed pip-25.1.1
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cpu/torchaudio-2.7.0%2Bcpu-cp310-cp310-win_amd64.whl.metadata (6.7 kB)
INFO: pip is looking at 

In [7]:
# import importlib
import pymupdf4llm
import pathlib
from tqdm import tqdm
from langchain.schema import Document
import chunking_utils
import langchain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

import importlib
importlib.reload(chunking_utils)

<module 'chunking_utils' from 'c:\\Users\\ankum\\OneDrive\\Desktop\\vivado_rag\\chunking_utils.py'>

# Exploration using General PyMuPDF4LLM Methods

In [6]:
vivado_documentation_md = pymupdf4llm.to_markdown("vivado_documentation.pdf")

In [9]:
pathlib.Path("markdown_files/vivado_documentation.md").write_bytes(vivado_documentation_md.encode())

869140

# Exploration using PyMuPDF4LLM + LlamaIndex

In [2]:
md_read = pymupdf4llm.LlamaMarkdownReader()
vivado_documentation_llama = md_read.load_data("vivado_documentation.pdf")

Successfully imported LlamaIndex


In [3]:
# page 9 (index 8) has text + images so is challenging
print(vivado_documentation_llama[8].to_dict()['text'])

*Chapter 1:* Introduction

     - Appendix D: JTAG Cables and Devices Supported by hw_server

    Appendix F: Configuration Memory Support
### **Getting Started**

After successfully implementing your design, the next step is to run it in hardware by
programming the FPGA or ACAP and debugging the design in-system. All of the necessary
commands to perform programming of FPGAs and in-system debugging of the design are in the
**Program and Debug** section of the **Flow Navigator** in the Vivado [®] Integrated Design
Environment (IDE) (see the following figure).

*Figure 1:* **Program and Debug Section of the Flow Navigator Panel**

UG908 (v2022.1) April 26, 2022 [www.xilinx.com](https://www.xilinx.com)
[Send Feedback](https://www.xilinx.com/about/feedback/document-feedback.html?docType=User_Guides&docId=UG908&Title=%20Vivado%20Design%20Suite%20User%20Guide&releaseVersion=2022.1&docPage=9)
Vivado Design Suite User Guide: Programming and Debugging 9


-----


In [4]:
# page 10 (index 9) for further exploration
print(vivado_documentation_llama[9].to_dict()['text'])

*Chapter 1:* Introduction
### **Debug Terminology**
##### **ILA**

The Integrated Logic Analyzer (ILA) feature allows you to perform in-system debugging of postimplemented designs on an FPGA, SoC, or Versal [®] device. This feature should be used when
there is a need to monitor signals in the design. You can also use this feature to trigger on
hardware events and capture data at system speeds.

The ILA core can be instantiated in your RTL code or inserted post synthesis in the Vivado design
flow. Detailed documentation on the ILA core IP can be found in the *Integrated Logic Analyzer*
*LogiCORE IP Product Guide* [(PG172).](https://www.xilinx.com/cgi-bin/docs/ipdoc?c=ila;v=latest;d=pg172-ila.pdf)

**Related Information**

In-System Logic Design Debugging Flows

Debugging Logic Designs in Hardware
##### **VIO**

The Virtual Input/Output (VIO) debug feature can both monitor and drive internal FPGA, SoC, or
Versal ACAP signals in real time. In the absence of physical access to the target h

# Creating Chunks

In [3]:
# starting from page 9 (index 8) till page 358 (index 357)
# ignoring tables for now
all_chunks = []
all_pages = []
start_index = 8
end_index = 357
for index in tqdm(range(start_index, end_index + 1)):
    curr_page_number = index + 1
    try:
        curr_text = vivado_documentation_llama[index].to_dict()['text']
        new_chunks = chunking_utils.chunk_markdown(curr_text)
    except Exception as e:
        print(f"Skipping page {curr_page_number}: {e}")
        continue
    
    all_chunks.extend(new_chunks)
    all_pages.extend([curr_page_number] * len(new_chunks))

print(f"{len(all_chunks)} chunks created")
print(f"There are {len(set(all_pages))} pages")

100%|██████████| 350/350 [00:00<00:00, 5361.99it/s]

1119 chunks created
There are 350 pages





In [4]:
# just to take a look at chunks
(all_chunks[:5], all_pages[:5])

(['- Appendix D: JTAG Cables and Devices Supported by hw_server\n\n    Appendix F: Configuration Memory Support',
  'Appendix F: Configuration Memory Support\n\nAfter successfully implementing your design, the next step is to run it in hardware by\nprogramming the FPGA or ACAP and debugging the design in-system. All of the necessary\ncommands to perform programming of FPGAs and in-system debugging of the design are in the\n**Program and Debug** section of the **Flow Navigator** in the Vivado [®] Integrated Design\nEnvironment (IDE) (see the following figure).',
  '*Figure 1:* **Program and Debug Section of the Flow Navigator Panel**',
  'The Integrated Logic Analyzer (ILA) feature allows you to perform in-system debugging of postimplemented designs on an FPGA, SoC, or Versal [®] device. This feature should be used when\nthere is a need to monitor signals in the design. You can also use this feature to trigger on\nhardware events and capture data at system speeds.',
  'The ILA core can 

In [None]:
# create embedding and store into FAISS
all_docs = []
for ind, chunk in enumerate(all_chunks):
    page_num = all_pages[ind]
    all_docs.append(Document(page_content = chunk, metadata = {"page_number" : page_num}))
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(all_docs, embeddings)
vectorstore.save_local("faiss_index")

RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
No module named 'torch.distributed.tensor'

In [6]:
import torch
print(torch.__version__)        # should print something like 2.x.x
print(torch.distributed.is_available())  # should be False (OK)

1.13.1+cpu
True
