# NextDoor Holdings Inc. 2023 10-K Q&A

## Steps to use
1. Hit the ▶️ icon at the top left of the code block below to setup the Q&A tool.
2. The next code block has a sample query that when you hit its ▶️ button will answer what the most recent net loss was.
3. To run your own query simply replace the text of the current query **OR** create a new code block below it, and use the same format as the previous query: `qa.run("YOUR QUERY HERE")` and hit the ▶️ button.

## Data provenance
This data came from the SEC site [here](https://www.sec.gov/ix?doc=/Archives/edgar/data/1846069/000184606923000010/kind-20221231.htm)

In [None]:
%pip install langchain
%pip install openai
%pip install chromadb
%pip install apify-client
%pip install tiktoken

from langchain.docstore.document import Document
from langchain.document_loaders import ApifyDatasetLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = ApifyDatasetLoader(
    dataset_id="69qelODwlYpWY8zgc",
    dataset_mapping_function=lambda item: Document(
        page_content=item["text"] or "", metadata={"source": item["url"]}
    ),
)

documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20
)

docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(openai_api_key="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")
db = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory="db")

db.persist()

from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name='gpt-4', temperature=0.8, openai_api_key="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")
embedding = OpenAIEmbeddings(openai_api_key="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")
vectordb = Chroma(persist_directory="db", embedding_function=embedding)

qa = RetrievalQA.from_chain_type(
    llm, 
    retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 10}),
    chain_type="map_reduce"
)

In [None]:
qa.run("what was the most recent net loss?")

'The most recent net loss was $(137.9) million in 2022.'