In [2]:
from langchain_ollama import OllamaLLM

In [3]:
CONNECTION_URI = "http://localhost:19530"

In [4]:
ollama_embedding_func = OllamaLLM(model="moondream")

In [None]:
import os
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Ollama Setup
def setup_ollama_pdf_vectorstore(file_path, embedding_model="moondream"):
    try:
        # 1. Load PDF
        loader = PyMuPDFLoader(file_path)
        docs = loader.load()

        # 2. Text Splitting (Optional but recommended)
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        splits = text_splitter.split_documents(docs)

        # 3. Ollama Embeddings
        embeddings = OllamaEmbeddings(
            model=embedding_model,  # You can change this to other Ollama models
            num_gpu=16  # Optional: specify GPU usage
        )

        # 4. Create Chroma Vector Store
        vector_store = Chroma.from_documents(
            documents=splits,
            embedding=embeddings,
            collection_name="ollama_pdf_collection"
        )

        return vector_store

    except Exception as e:
        print(f"Error processing PDF: {e}")
        return None

files = os.listdir("papers")
print(files)

vector_store = setup_ollama_pdf_vectorstore(f"papers/{files[0]}")
print(f"Vector store for {files[0]} created successfully")
print("")

['2303.05510v1.pdf']
Vector store for 2303.05510v1.pdf created successfully



In [8]:
# Perform Similarity Search
contexts = ""
query = input("Enter your query: ")

if vector_store:
    results = vector_store.similarity_search(query, k=3)
    
    for doc in results:
        print("\n--- Relevant Document Excerpt ---")
        print(doc.page_content)
        contexts = contexts + doc.page_content


--- Relevant Document Excerpt ---
function. To evaluate the quality of the automatically-generated test cases, we compute the strict
accuracies of the sample solutions (correct solutions written by human programmers) on these test
cases. Clearly, the strict accuracies of the sample solutions on the ground-truth test cases should
be 100%. We evaluate the sample solutions on the automatically-generated test cases and ﬁnd the
corresponding strict accuracy is 72.56%, which conﬁrms that the automatically-generated test cases
are mostly correct.
Can PG-TD take advantage of the automatically-generated test cases? We report the average
strict accuracy of the generated programs on a subset of HumanEval problems in Table 11. We
conﬁrm that the strict accuracy of PG-TD is higher as it uses high-quality automatically-generated
test cases to verify the generated programs.
C
PLANNING FOR OTHER CODE GENERATION OBJECTIVES
Besides the default reward function that optimizes the pass rate, we show the v

In [9]:
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize Ollama Chat Model
chat_model = ChatOllama(
    model="llama3.2:1b",  # You can change to other models like mistral, phi3
    temperature=0.7,  # Creativity level
)

# Simple Chat Interaction
messages = [
    SystemMessage(content="You are a helpful assistant for my research."),
    HumanMessage(content=f"Query + {query} with the context + {contexts}")
]

# Generate Response
response = chat_model.invoke(messages)
print("--- Ollama Response ---")
print(response.content)

--- Ollama Response ---
Based on the provided text, here are three potential research ideas for this problem:

1. **Investigating the Effectiveness of PG-TD-generated Test Cases in Code Generation**: Building upon the success of using automatically-generated test cases to evaluate code generation algorithms like PG-TD, this research could explore whether these test cases can be effectively used as input to fine-tune transformer models for improved performance. This might involve analyzing the impact of different types of test cases (e.g., error-free vs. noisy) on the generated solutions and their ability to improve the model's performance.

2. **Analyzing the Impact of Automatically-Generated Test Cases on Code Generation Optimization Algorithms**: Given that PG-TD generates high-quality test cases, this research could investigate how these test cases affect optimization algorithms for code generation tasks. For example, it might examine whether using automatically-generated test cases