# Git LLM Training Prototype

A retrieval-based QA prototype for personalized Git training using LlamaIndex and Library Carpentry content.


## Introduction

This notebook demonstrates a prototype for a personalized, LLM-assisted Git training system. 
It uses the [LlamaIndex](https://www.llamaindex.ai/) framework and a small dataset of questions and answers based on the [Library Carpentry Git Lesson](https://librarycarpentry.github.io/lc-git/instructor/aio.html).


In [1]:
# Install if needed
# !pip install llama-index sentence-transformers

In [2]:
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import json

# Load training data
with open("git_llm_training_data.json", "r") as f:
    qa_pairs = json.load(f)

# Convert to documents
documents = [Document(text=f"Q: {item['prompt']}\nA: {item['response']}") for item in qa_pairs]

# Set up embedding model (globally via Settings)
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
Settings.embed_model = embed_model
Settings.llm = None  # Disable LLM (retrieval only)

# Create index and query engine
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Test query
response = query_engine.query("Why do I need to use git add before committing?")
print("Answer:", response.response.strip())

# Interactive query loop (stop with 'exit')
#while True:
#    user_input = input("Ask a Git question (or type 'exit'): ")
#    if user_input.lower().strip() == "exit":
#        break
#    response = query_engine.query(user_input)
#    print("Answer:", response.response.strip())

LLM is explicitly disabled. Using MockLLM.
Answer: Context information is below.
---------------------
Q: Why do I need to use 'git add' before 'git commit'?
A: 'git add' tells Git which changes you want to include in the next commit. This allows you to control what goes into the project history.

Q: What is Git and why is it useful in research?
A: Git is a version control system that helps track changes in files. In research, it ensures reproducibility, documents development history, and facilitates collaboration.
---------------------
Given the context information and not prior knowledge, answer the query.
Query: Why do I need to use git add before committing?
Answer:


## Part 2: LLM-Generated Answers using Phi via Ollama

This section demonstrates how to integrate a local open-source LLM (Phi-2) into the prototype using [Ollama](https://ollama.com/).
The LLM generates context-aware answers based on the same retrieval mechanism used before.

Unlike the retrieval-only prototype, the model now produces full responses rather than returning the most similar stored answer.



In [None]:
# Import LLM connector for Ollama
from llama_index.llms.ollama import Ollama
# Switch to LLM mode: use phi via Ollama
Settings.llm = Ollama(model="phi")
# Re-create query engine (now with LLM)
query_engine = index.as_query_engine()

# Test: LLM-generated response
response = query_engine.query("Why do I need to use git add before committing?")
print("LLM-generated answer:", response.response.strip())
# Interactive loop using phi
while True:
    user_input = input("Ask a Git question (or type 'exit'): ")
    if user_input.lower().strip() == "exit":
        break
    response = query_engine.query(user_input)
    print("LLM-generated answer:", response.response.strip())


LLM-generated answer: Git add tells Git which changes you want to include in the next commit. This allows you to control what goes into the project history. Without using 'git add', your commits may contain outdated or unwanted information, leading to confusion and errors when reviewing previous versions of your work. Additionally, using 'git add' helps ensure that all relevant files are included in your commits, making it easier for others to understand and maintain your codebase.


Ask a Git question (or type 'exit'):  What does git log do?


LLM-generated answer: Git log is a command that displays a history of all the changes made to the repository over time. It shows a chronological record of every commit, including the author's name, timestamp, message, and the files involved in the commit. This information helps researchers to understand how the code has evolved and to identify which parts may need further testing or refinement. Additionally, git log can be used to revert changes if necessary or roll back to previous versions if a problem arises.
