In [1]:
import os
from dotenv import load_dotenv
import openai

_ = load_dotenv(override=True)

In [2]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPEN_AI_KEY")

# A - Step By Step

1. first, we start by creating the document loader instance for CSV files

In [15]:
from langchain.document_loaders import CSVLoader
file_path = "../data/Marvel_Comics.csv"
loader = CSVLoader(
    file_path=file_path,
    encoding="utf8"
)

2. then we load the documents of the loader

In [16]:
docs = loader.load()

In [17]:
print(len(docs))
docs[0]

101


Document(page_content=": 0\ncomic_name: A Year of Marvels: April Infinite Comic (2016)\nissue_description: The Infinite Comic that will have everyone talking! Full of fun, heart, and pranks, this is one folks'll be talking about for years!", metadata={'source': '../data/Marvel_Comics.csv', 'row': 0})

3. The idea is to create an embedding for ech one of them, the embedding is a numeral representation in a vector that has the semantic value of the text

![Embeddings](../assets/embeddings.jpg "Embeddings")


In [18]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embed = embeddings.embed_query("hi, ich bin Cristhian")

print(len(embed))
embed[:5]

1536


[-0.000682526733726263,
 0.008164625614881516,
 -0.011010359972715378,
 -0.020723117515444756,
 -0.01268696691840887]

4. The embedding should be performed for every chunk of information in the whole text, In this case, as we have a csv file each row will be a chunk. These embedded chunks will be stored in a Vector Database. So, here we create that vector database, to place every embedding of the documents:

![verctorDatabase](../assets/verctorDatabase.jpg "verctorDatabase")




In [19]:
from langchain.vectorstores import DocArrayInMemorySearch
vectorDB = DocArrayInMemorySearch.from_documents(
    documents=docs,
    embedding=embeddings
)

5. Now we can query something regarding the new documents in the LLM, this is done by embedding the input query adn then comparing thar embed with the ones in the Vector Database

![index](../assets/index.jpg "index")

In [20]:
query = "Please suggest Spider-Man comics"
result_docs = vectorDB.similarity_search(query)

print(len(result_docs))
result_docs[0]

4


Document(page_content=": 85\ncomic_name: Actor Presents Spider-Man and the Incredible Hulk (2003)\nissue_description: Marvel Comics and A.C.T.O.R. team up to bring you two stories about Marvel's most popular characters in this special one-shot benefit book for the holiday season. Proceeds from this book benefit A.C.T.O.R., the charitable organization dedicated to providing financial assistance to disadvantaged comic book creators. 32 PGS./MARVEL PSR...$2.50", metadata={'source': '../data/Marvel_Comics.csv', 'row': 85})

6. The result vectors that are similar are retrieved for the LLM to take into account to process te Prompt for the result, so we first neet the LLM

![processWithLLMs](../assets/processWithLLMs.jpg "processWithLLMs")

In [21]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    api_key = os.getenv("OPEN_AI_KEY"),
    model = os.getenv("LLM_MODEL"),
    temperature=0.0
)

                    api_key was transferred to model_kwargs.
                    Please confirm that api_key is what you intended.


7. then, we join all documents result from the similarity search as texts

In [22]:
qdocs = "\n".join([doc.page_content for doc in result_docs])
print(qdocs)

: 85
comic_name: Actor Presents Spider-Man and the Incredible Hulk (2003)
issue_description: Marvel Comics and A.C.T.O.R. team up to bring you two stories about Marvel's most popular characters in this special one-shot benefit book for the holiday season. Proceeds from this book benefit A.C.T.O.R., the charitable organization dedicated to providing financial assistance to disadvantaged comic book creators. 32 PGS./MARVEL PSR...$2.50
: 29
comic_name: A+X (2012 - 2014)
issue_description: Spider-Man and Beast fight zombies, while Iron Man, Kitty Pryde, and Lockheed battle the Brood!
: 22
comic_name: A+X (2012 - 2014)
issue_description: SPIDER-WOMAN & KITTY PRYDE (with Lockheed in tow, of course) investigate some unfinished alien business! Adam Warren returns to Marvel with an amazing short story so chock full of awesome that we can't even attempt to sum it up in one sentence!
: 2
comic_name: A Year of Marvels: February Infinite Comic (2016)
issue_description: Join us in a brand new Marvel

8. create prompt and ask the LLM using only the result_docs

In [23]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
Spider-Man comics in a table in markdown and summarize each one.")

In [None]:
from IPython.display import display, Markdown
display(Markdown(response))

| Comic Name                                    | Issue Description                                                                                                      |
|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
| Actor Presents Spider-Man and the Incredible Hulk (2003) | Marvel Comics and A.C.T.O.R. team up to bring you two stories about Marvel's most popular characters in this special one-shot benefit book for the holiday season. Proceeds from this book benefit A.C.T.O.R., the charitable organization dedicated to providing financial assistance to disadvantaged comic book creators. 32 PGS./MARVEL PSR...$2.50 |
| A+X (2012 - 2014)                             | Spider-Man and Beast fight zombies, while Iron Man, Kitty Pryde, and Lockheed battle the Brood!                             |
| A+X (2012 - 2014)                             | SPIDER-WOMAN & KITTY PRYDE (with Lockheed in tow, of course) investigate some unfinished alien business! Adam Warren returns to Marvel with an amazing short story so chock full of awesome that we can't even attempt to sum it up in one sentence! |
| A+X (2012 - 2014)                             | CAPTAIN AMERICA + CYCLOPS continue their quest to root out Cadre K! Plus: Two former villains...SUPERIOR SPIDER-MAN + MAGNETO! Max Bemis of the band Say Anything makes his Marvel Comics writing debut!                                                                                                      |

1. **Actor Presents Spider-Man and the Incredible Hulk (2003):**
   - Description: Marvel Comics and A.C.T.O.R. team up to bring you two stories about Marvel's most popular characters in this special one-shot benefit book for the holiday season. Proceeds from this book benefit A.C.T.O.R., the charitable organization dedicated to providing financial assistance to disadvantaged comic book creators.
   - Price: $2.50

2. **A+X (2012 - 2014):**
   - Description: Spider-Man and Beast fight zombies, while Iron Man, Kitty Pryde, and Lockheed battle the Brood!

3. **A+X (2012 - 2014):**
   - Description: SPIDER-WOMAN & KITTY PRYDE (with Lockheed in tow, of course) investigate some unfinished alien business! Adam Warren returns to Marvel with an amazing short story so chock full of awesome that we can't even attempt to sum it up in one sentence!

4. **A+X (2012 - 2014):**
   - Description: CAPTAIN AMERICA + CYCLOPS continue their quest to root out Cadre K! Plus: Two former villains...SUPERIOR SPIDER-MAN + MAGNETO! Max Bemis of the band Say Anything makes his Marvel Comics writing debut!

# B - Encapsulated with a retriever:

1. first, we start by creating the document loader instance for CSV files and load them

In [None]:
from langchain.document_loaders import CSVLoader
file_path = "../data/Marvel_Comics.csv"
loader = CSVLoader(
    file_path=file_path,
    encoding="utf8"
)

docs = loader.load()

2. Create the embedding instance

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

3. Then, we create the Vector Database with the documents and the embedding instance

In [None]:
from langchain.vectorstores import DocArrayInMemorySearch
vectorDB = DocArrayInMemorySearch.from_documents(
    documents=docs,
    embedding=embeddings
)

4. Create a Retriever (generic interface, that takes query and returns documents), it is based on the original vector store with all documents as embeds

In [None]:
retriever = vectorDB.as_retriever()

5. Instatiate the LLM and instantiate the Chain for QnA using the retriever and the LLM

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(
    api_key = os.getenv("OPEN_AI_KEY"),
    model = os.getenv("LLM_MODEL"),
    temperature=0.0
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)

`chain_type` is used for telling the chain how to join the documents chunks:
- `stuff` tipe will join everything (concatenate) and pass it to the prompt
- `map_reduce`: sumarize each chunk and add them together
- `refine`: summarize based on previous chunk adding them together gradually (more time consumming)
- `map_rerank`: like `map_reduce` but add a ranking for the responses joining the highest scores for the prompt

![chain_type](../assets/chain_type.jpg "chain_type")

6. run the prompt using the chain for QnA

In [None]:
from IPython.display import display, Markdown

query =  "Please list all your Spider-Man comics in a table \
in markdown and summarize each one."

response = qa.run(query)
display(Markdown(response))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


| Issue | Comic Name                                        | Summary                                                                                                                         |
|-------|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|
| 85    | Actor Presents Spider-Man and the Incredible Hulk | A special one-shot benefit book for the holiday season. Proceeds from this book support A.C.T.O.R., a charitable organization.    |
| 29    | A+X                                              | Spider-Man and Beast fight zombies, while Iron Man, Kitty Pryde, and Lockheed battle the Brood!                                 |
| 2     | A Year of Marvels: February Infinite Comic        | Peter Parker's hot date gets interrupted by the bank-robbing villain The Vulture. It's a tale of romance, adventure, and punching! |
| 22    | A+X                                              | Spider-Woman and Kitty Pryde, along with Lockheed, investigate unfinished alien business.                                        |

Please note that the summaries provided are brief and may not capture all the details of each comic.

# C - The more encapsulated way with Index

1. create the Embedding

In [3]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
    openai_api_key = os.getenv("OPEN_AI_KEY")
)

2. Create the loader for the CSV file

In [4]:
from langchain.document_loaders import CSVLoader
file_path = "../data/Marvel_Comics.csv"
loader = CSVLoader(
    file_path=file_path,
    encoding="utf8"
)

3. create an index based on the vector Store and using the embeddings, this will use the VectorStore and the Embeddings, and we will pass the loaders right to it

In [5]:
from langchain.vectorstores import DocArrayInMemorySearch # helps to avoid using an external DB
from langchain.indexes import VectorstoreIndexCreator

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings
).from_loaders([loader]) # add chain_type="stuff" for langchain >= 0.3.0

4. Query using the LLM and the RetrievalQA Chain

In [6]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(
    api_key = os.getenv("OPEN_AI_KEY"),
    model = os.getenv("LLM_MODEL"),
    temperature=0.0
)

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
)

query =  "Please list all your Spider-Man comics in a table \
in markdown and summarize each one."
response = qa.run(query)

                    api_key was transferred to model_kwargs.
                    Please confirm that api_key is what you intended.




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [7]:
from IPython.display import display, Markdown
display(Markdown(response))

| Issue | Comic Name                                     | Summary                                                                                                 |
|-------|------------------------------------------------|---------------------------------------------------------------------------------------------------------|
| 85    | Actor Presents Spider-Man and the Incredible Hulk (2003) | A special one-shot benefit book for the holiday season featuring two stories about Spider-Man and the Incredible Hulk. Proceeds from this book benefit A.C.T.O.R., a charitable organization for disadvantaged comic book creators. |
| 29    | A+X (2012 - 2014)                             | Spider-Man and Beast fight zombies, while Iron Man, Kitty Pryde, and Lockheed battle the Brood!          |
| 2     | A Year of Marvels: February Infinite Comic (2016) | Peter Parker's hot date is interrupted by the bank-robbing villain The Vulture. A tale of romance, adventure, and punching! |
| 22    | A+X (2012 - 2014)                             | Spider-Woman and Kitty Pryde, with Lockheed, investigate unfinished alien business in an amazing short story by Adam Warren. |

Note: These are just a few Spider-Man comics mentioned in the given context. There may be more Spider-Man comics not mentioned here.