# Introduction
In today's tutorial, we will explore how large language models (LLMs), particularly the models developed by OpenAI, can be instrumental in automating accounting-related workflows. By leveraging these models, accountants can process and extract information from large and unstructured documents like SEC filings with unprecedented efficiency. Let's dive in to understand how

# Environment Setup
## Setting up the Conda Environment
This command activates a conda environment named llm. Conda environments allow users to manage multiple versions of software packages and their dependencies. It ensures that the required libraries and their specific versions are used, making the code reproducible.

In [None]:
%conda activate llm

## Installing Necessary Libraries

### Installation of Libraries

Here, we are installing several Python libraries:
- `pypdf`: Helps in working with PDF files.
- `chromadb`: For handling and searching large-scale document embeddings.
- `langchain`: A library for chaining different NLP tasks.
- `unstructured`: A library useful for processing unstructured data.
- `openai`: The official library to interface with OpenAI's models.
- `tiktoken`: Helps in counting tokens in a string without making an API call.


In [None]:
%pip install pypdf chromadb langchain unstructured openai tiktoken

## API Key Configuration

### Setting OpenAI API Key

In this block, we set up our OpenAI API key. This key is essential for authentication when making requests to the OpenAI service.


In [None]:
import os
os.environ["OPENAI_API_KEY"] = "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"

## Generating Custom Content

### Using the Model for Custom Content Generation

This block demonstrates how versatile LLMs can be. Not only can they process and analyze structured data, but they can also generate creative content. Here, we are using the model to write a rap about forensic accounting in the style of Eminem.


In [None]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import UnstructuredHTMLLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

In [None]:
llm = OpenAI(temperature = 0.9)

In [None]:
print(llm("Write a rap about forensic accounting in the style of Eminem."))

## Loading and Processing Documents

### Extracting Information from an HTML Document

In this block, we are loading an HTML document named `extract.html`. The `UnstructuredHTMLLoader` reads the document and extracts its content. The `CharacterTextSplitter` then breaks the content into smaller chunks, ensuring that each chunk does not exceed 1000 characters. The 500-character overlap between chunks ensures continuity of information.


In [None]:
loader = UnstructuredHTMLLoader("extract.html")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=500)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever())

## Querying the Document

### Using RetrievalQA to Extract Relevant Information

The `RetrievalQA` function facilitates question-answering from the loaded document. By using `RetrievalQA`, we can extract specific pieces of information, such as summaries, financial metrics, and other data points that are of interest to accountants. In the example provided, we ask the model to retrieve details like net revenue, net income, issuer purchases of equity securities, and revenue increase for a specific year.


In [None]:
query = """Summarize key information from this document that might be relevant for an accountant."""
print(qa.run(query))

In [None]:
query = """List the six months net revenue by year from the consolidated statement of operations table"""
print(qa.run(query))

In [None]:
query = """List the six months net income by year from the consolidated statement of operations table"""
print(qa.run(query))

In [None]:
query = """What was the issuer purchaes of equity securities? Make a bulleted list."""
print(qa.run(query))

In [None]:
query = """How much did revenue increase in 2023?"""
print(qa.run(query))

By the end of this tutorial, you should have a clearer understanding of how Large Language Models can streamline and improve accounting workflows by automating information retrieval from complex documents.