PDF Reference Counter with DSPy

This project provides a Python script that counts the number of references to a specific author within a PDF document. It is a practical demonstration of Hybrid LLM Coding, where the script combines the declarative power of the DSPy framework to handle unstructured data with traditional Python code for deterministic tasks.

Core Concepts

Hybrid LLM Coding: This approach combines the strengths of large language models (LLMs) and traditional code. The LLM is used for "fuzzy" tasks that require natural language understanding (like parsing messy citation formats), while Python code is used for "rigid" tasks that require precise logic (like counting, file handling, and section identification).
DSPy: The DSPy framework is used to program the LLM calls. It allows us to define clear input and output "signatures" for our tasks, making the LLM a reliable, predictable component of the overall pipeline.

Getting StartedPrerequisites

You need to have Python installed, along with the following libraries:

pip install dspy-ai langchain-community pypdf

You also need to have a local LLM server running via Ollama.

Install Ollama: Follow the instructions on the Ollama website.
Pull the LLM Model: This script is configured to use Llama 3.1. Open a terminal and run:

ollama pull llama3.1:latest

Start Ollama: Ensure the Ollama server is running in the background.

Usage

Save the code provided below as a Python file (e.g., count_references.py)
Place the PDF file you want to analyze in the same directory and name it Paper.pdf.
Run the script from your terminal:

python count_references.py

Customization

You can easily modify the script for your specific needs by changing the variables at the bottom of the file:

pdf_file_path: Change this to the path of your PDF document.
author_to_find: Change this to the name of the author you want to count references for.

More Info

Some useful information about Prompt Engineering and DSPY, with a focus on mapping data.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
count_references.py		count_references.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Reference Counter with DSPy

Core Concepts

Getting StartedPrerequisites

Usage

Customization

More Info

About

Uh oh!

Releases

Packages

Languages

PeterLawrence/PDFReferenceCounter

Folders and files

Latest commit

History

Repository files navigation

PDF Reference Counter with DSPy

Core Concepts

Getting StartedPrerequisites

Usage

Customization

More Info

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages