# Document Chat

In [1]:
from langchain.document_loaders import TextLoader, DirectoryLoader, UnstructuredMarkdownLoader
loader = TextLoader("./resources/commit.txt")
docs = loader.load()

In [2]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=100, chunk_overlap=0
)
texts = text_splitter.split_documents(docs)

Created a chunk of size 238, which is longer than the specified 100
Created a chunk of size 156, which is longer than the specified 100
Created a chunk of size 145, which is longer than the specified 100
Created a chunk of size 102, which is longer than the specified 100
Created a chunk of size 112, which is longer than the specified 100
Created a chunk of size 105, which is longer than the specified 100
Created a chunk of size 104, which is longer than the specified 100


In [3]:
texts

[Document(page_content='Introduction: Why good commit messages matter\nIf you browse the log of any random Git repository, you will probably find its commit messages are more or less a mess. For example, take a look at these gems from my early days committing to Spring:\n\n$ git log --oneline -5 --author cbeams --before "Fri Mar 26 2009"', metadata={'source': './resources/commit.txt'}),
 Document(page_content='e5f4b49 Re-adding ConfigurationPostProcessorTests after its brief removal in r814. @Ignore-ing the testCglibClassesAreLoadedJustInTimeForEnhancement() method as it turns out this was one of the culprits in the recent build breakage. The classloader hacking causes subtle downstream effects, breaking unrelated tests. The test method is still useful, but should only be run on a manual basis to ensure CGLIB is not prematurely classloaded, and should not be run as part of the automated build.\n2db0f12 fixed two build-breaking issues: + reverted ClassMetadataReadingVisitor to revision 

In [4]:
from langchain.embeddings import SentenceTransformerEmbeddings

embedding = SentenceTransformerEmbeddings(
    model_name="multi-qa-mpnet-base-dot-v1",
)

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=texts, embedding=embedding)

In [10]:
my_prompt = "Best practices?"

In [11]:
vectorstore.similarity_search(my_prompt, 5)

[Document(page_content='In this post, I am addressing just the most basic element of keeping a healthy commit history: how to write an individual commit message. There are other important practices like commit squashing that I am not addressing here. Perhaps I’ll do that in a subsequent post.', metadata={'source': './resources/commit.txt'}),
 Document(page_content='Tips\nLearn to love the command line. Leave the IDE behind.\nFor as many reasons as there are Git subcommands, it’s wise to embrace the command line. Git is insanely powerful; IDEs are too, but each in different ways. I use an IDE every day (IntelliJ IDEA) and have used others extensively (Eclipse), but I have never seen IDE integration for Git that could begin to match the ease and power of the command line (once you know it).', metadata={'source': './resources/commit.txt'}),
 Document(page_content='7. Use the body to explain what and why vs. how\nThis commit from Bitcoin Core is a great example of explaining what changed a

In [12]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key="key")
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())
qa_chain({"query": my_prompt})

{'query': 'Best practices?',
 'result': 'Here are some best practices for writing commit messages:\n\n1. Keep it concise: Try to keep your commit messages short and to the point. A good rule of thumb is to aim for around 50 characters or less for the subject line.\n\n2. Use the imperative mood: Start the subject line with a verb in the imperative mood, such as "Fix", "Add", "Update", etc. This helps to clearly communicate what the commit does.\n\n3. Separate subject and body: If your commit message requires more explanation, use the body section to provide additional details. This can include why the change was made, any relevant context, or any potential side effects.\n\n4. Use proper grammar and punctuation: While commit messages don\'t need to be perfect prose, it\'s still important to use proper grammar and punctuation. This helps to maintain readability and professionalism.\n\n5. Reference relevant issues or tickets: If your commit is related to a specific issue or ticket, include