# LangChain - QA with code

## Install dependencies

In [1]:
!pip install "langchain[llms]>=0.0.218"
!pip install chromadb
!pip install sentence_transformers
!pip install tiktoken
!pip install GitPython
!pip install python-dotenv

Collecting langchain[llms]>=0.0.218
  Downloading langchain-0.0.221-py3-none-any.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Collecting langchainplus-sdk>=0.0.17
  Downloading langchainplus_sdk-0.0.19-py3-none-any.whl (25 kB)
Collecting async-timeout<5.0.0,>=4.0.0
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7
  Downloading dataclasses_json-0.5.9-py3-none-any.whl (26 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m41.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tenacity<9.0.0,>=8.1.0
  Downloading tenacity-8.2.2-py3-none-any.whl (24 kB)
Collecting pydantic<2,>=1
  Downloading pydantic-1.10.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
[2K     [90m

## Load environment variables

Change the path if necessary (read `README.md` first and follow the instructions to setup the `.env` file):

In [2]:
import os
os.chdir("/home/jovyan/work/")

In [3]:
%load_ext dotenv
%dotenv

## Load a repository

In [4]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language
from git import Repo

In [5]:
repo = Repo.clone_from(
    "https://github.com/hwchase17/langchain", to_path="/tmp/test_repo"
)

In [6]:
loader = GenericLoader.from_filesystem(
    "/tmp/test_repo/langchain",
    glob="**/*",
    suffixes=[".py"],
    parser=LanguageParser(language=Language.PYTHON, parser_threshold=500)
)
documents = loader.load()
len(documents)

1071

## Index

In [7]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

In [8]:
python_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON, chunk_size=2000, chunk_overlap=200)
texts = python_splitter.split_documents(documents)
len(texts)

2934

In [9]:
embeddings = OpenAIEmbeddings(disallowed_special=())

db = Chroma.from_documents(texts, embeddings)

In [10]:
retriever = db.as_retriever(
    search_type="mmr",  # You can also experiment with "similarity"
    search_kwargs={"k": 8},
)

## Setup agent

In [11]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

In [12]:
llm = ChatOpenAI(temperature=0)

In [13]:
qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever)

## Tests

In [14]:
chat_history = []

In [15]:
question = "How can I load a source code as documents, for a QA over code, spliting the code in classes and functions?"
result = qa({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(result["answer"])

To load a source code as documents for a QA over code, splitting the code into classes and functions, you can use the `LanguageParser` class from the `langchain.text_splitter` module. Here's an example:

```python
from langchain.text_splitter import Language
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser

loader = GenericLoader.from_filesystem(
    "./code",
    glob="**/*",
    suffixes=[".py", ".js"],
    parser=LanguageParser()
)
docs = loader.load()
```

In this example, the `GenericLoader` is used to load the source code files from the specified directory (`"./code"`) with the specified file extensions (`.py` and `.js`). The `LanguageParser` is used as the parser to split the code into separate documents based on classes and functions. The resulting documents are stored in the `docs` variable.

You can also specify the language explicitly by passing the `language` parameter to the `LanguageParser` construc