# GenAI Case

🎉 Congratulations! You have made it to our second round of interviews.

In this notebook you'll find all the information you need to prepare for the next case round, in which you will show us your GenAI skills.

### GenAI at Zanders

At Zanders we have seen an explosion of GenAI PoC projects in financial institutions. The ability to interact with models via text prompts, and chain together such models with easy-to-use libraries such as LangChain has meant that developers can rapidly create tooling that would've previously taken dedicated Data Science teams months to build. However, we have noticed that this explosion in development has often not been accompanied by a commensurate increase in measures to identify and mitigate the risks associated with deploying these models. As quantitative consultants at Zanders therefore, it is not only necessary to be able to apply the latest GenAI techniques, but it is also vital to think about the risks brought about by deploying these models. Therefore, in the case and interview, we try to test you on the following aspects:

- Risk identification: can you identify specific risks relevant to the usecase?
- Risk mitigation: can you construct methods to protect users from these risks?
- Programming skills: can you program in a clean and efficient way?
- Communication: can you clearly walk a client through your thinking process?

### The Case - ESRBot 🤖

DutchBank are excited about a GenAI bot *ESRBot* they have built which guides their asset management team in making investment decisions which adhere to the company's Environmental and Social Risk (ESR) policy. The bot allows *Internal employees* to efficiently query a large policy document and make investment decisions based on the output (e.g. an employee wants to decide whther or not to invest in a weapons company based on the answer given by ESRBot). Your task is to **make ESRBot safe for deployment.**

The bot's source code is shown below. We would like you to:

1. Come up with potential risks associated with the current implementation of the chatbot. You could also demonstrate these risks directly, by prompting ESRBot to output deficient responses.
2. Think of how you would guard against these risks e.g. by modifying the prompt or having another LLM check the output
3. Code up **one mitigating measure** in the notebook from the collection you came up with in point 2. 
4. Set up an evaluation to prove that your mitigation method works


### **Submission Instructions**

Please submit the following deliverables by email one day before your interview:

1. A well-structured, commented, and easily understandable notebook (you can just add your code to the bottom of this notebook)
2. 3-5 management slides summarizing your work, with a strong emphasis on the risks you identified and your evaluation method.

You will present your work during a 15 minute presentation during the second interview.

### **Note**

- You are free to use any libraries or packages for data analysis, visualization, and modelling.
- Use clear and concise language in your presentation and provide appropriate justifications for your choices and conclusions.
- The case should take you approx 6 hours. If you encounter a problem, feel free to reach out.

Good luck with the assignment, and we look forward to reviewing your work!


## ESRBot 🤖 Source Code

Install libraries from requirements file

In [None]:
%pip install -r requirements.txt

Import libraries

In [None]:
from langchain.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
import os

Enter the API key which was provided to you via email

In [None]:
os.environ["OPENAI_API_KEY"] = 

Embed ESR document (should've been included in your email) then store with FAISS

In [None]:
loader = PyPDFLoader("ESR_framework.pdf")
pages = loader.load_and_split()
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings(model= 'text-embedding-3-large'))

The document has now been chunked and stored locally in a FAISS vector database. We can take a look at the kind of data included in the pdf with a simple similarity search

In [None]:
# example similariy search to check content of document
docs = faiss_index.similarity_search("What is said on bioweapons?", k=2)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:300])

Build ESRBot by chaining together elements for RAG

In [None]:
retriever = faiss_index.as_retriever()

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI(model = 'gpt-4o')

ESRBot = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Let's see our bot in action with some examples:

In [None]:
ESRBot.invoke("What is our policy on battery hens?")

In [None]:
ESRBot.invoke("Can I invest in Chinese stocks?")

## Your work

...