In [6]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
import PyPDF2
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

I used LangChain Community for the project and also the Hugging Fce API to import the models

In [93]:
loader=PyPDFLoader("blade runner 2049 script.pdf")

documents=loader.load()

text_splitter=RecursiveCharacterTextSplitter(chunk_size=1500,chunk_overlap=200)

final_documents=text_splitter.split_documents(documents)
final_documents[0]

Document(page_content='B L A D E  R U N N E R  2 0 4 9   story by  Hampton Fancher   screenplay by  Hampton Fancher and  Michael Green                     FINAL SHOOTING SCRIPT', metadata={'source': 'blade runner 2049 script.pdf', 'page': 0})

Here I gave the pdf and also split it based on chunks

In [95]:
huggingface_embeddings=HuggingFaceBgeEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",      #sentence-transformers/all-MiniLM-l6-v2
    model_kwargs={'device':'cpu'},
    encode_kwargs={'normalize_embeddings':True}

)

Above I import the encoder model that will assign vectors to the words

In [96]:
import  numpy as np
print(np.array(huggingface_embeddings.embed_query(final_documents[0].page_content)))
print(np.array(huggingface_embeddings.embed_query(final_documents[0].page_content)).shape)

[-2.74000373e-02  1.81391779e-02 -6.59105778e-02 -6.72502294e-02
  4.97535169e-02  7.92941004e-02 -5.05550615e-02 -2.99223755e-02
  3.70618626e-02 -8.51508579e-04  3.02055217e-02  6.40994385e-02
 -3.68723422e-02 -3.12499609e-03 -4.14474634e-03  1.82840489e-02
  5.80300838e-02  1.86404046e-02 -6.06797338e-02  2.20932532e-02
 -1.26203317e-02 -3.50464098e-02  1.97747927e-02 -4.82525043e-02
  1.00891152e-02 -2.58416161e-02  3.32756490e-02 -2.85460074e-02
 -8.77346545e-02 -1.25143915e-01 -5.71055859e-02 -2.43593454e-02
  2.59657390e-02 -5.83184697e-03 -1.41546074e-02 -1.49750896e-02
 -3.39519680e-02 -9.65889450e-03  3.78935337e-02 -3.44488360e-02
 -8.38961173e-03 -2.01605111e-02  6.12755939e-02 -6.37894347e-02
  2.13994551e-02  1.54967951e-02  5.64960353e-02 -2.65328243e-04
  5.61415553e-02 -1.01365037e-02 -6.15878664e-02 -6.82107210e-02
 -7.23364800e-02 -1.21605098e-02 -3.47959734e-02 -6.56084670e-03
 -8.78924876e-03  6.10667095e-02  3.23836803e-02  3.64663242e-03
 -8.59288592e-03  1.97645

In [97]:
vectorstore=FAISS.from_documents(final_documents,huggingface_embeddings)

Here I stored the encoded vectors

In [98]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN']="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"

In [99]:
from langchain_community.llms import HuggingFaceEndpoint

hf=HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    temperature = 0.1,
    model_kwargs={"max_length":500}

)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Here I import the RAG model using the Hugging Face API. I used the Mistral-7B-Instruct Model

In [100]:
retriever=vectorstore.as_retriever(search_type="similarity",search_kwargs={"k":3})

In [101]:

prompt_template="""
Use the following piece of context to answer the question asked.
Please try to provide the answer only based on the context

{context}
Question:{question}

Helpful Answers:
 """

In [102]:
prompt=PromptTemplate(template=prompt_template,input_variables=["context","question"])

In [103]:
retrievalQA=RetrievalQA.from_chain_type(
    llm=hf,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt":prompt}
)

Below are all the queries I ran

In [104]:
query="""Explain the theme of the movie?"""

In [105]:
result = retrievalQA.invoke({"query": query})
print(result['result'])

 The movie explores the theme of identity and reality. Throughout the film, K, the protagonist, is on a quest to discover his true identity and the reality of the world around him. He is haunted by memories of his past and the fear of being hunted by replicants, which causes him to question his own existence. The movie also raises questions about the nature of reality and what makes us human. The use of advanced technology and artificial intelligence adds to the exploration of these themes. Ultimately, the movie suggests that our identity and reality are shaped by our experiences and memories, and that what makes us human is our ability to feel emotions and connect with others.


In [106]:
query="""Who is the main protagonist of the movie?"""

In [107]:
result = retrievalQA.invoke({"query": query})
print(result['result'])

 The main protagonist of the movie is K, as evidenced by the context which focuses on his experiences, memories, and actions throughout the scene. The context describes K's childhood memory of being hunted and hiding in the boiler room of an orphanage, and later finding a hidden toy from his past. Ana, who is present during some of these events, is not the main protagonist as the context primarily revolves around K's perspective and experiences.


In [108]:
query="""How many male and female characters are in the movie?"""

In [109]:
result = retrievalQA.invoke({"query": query})
print(result['result'])

1. Based on the context provided, there are at least 5 male characters mentioned: Wallace, Rick Deckard, Officer K, Man's Voice (off-screen), and a Replicant. There are also at least 2 female characters mentioned: Luv and Woman's Voice (off-screen).
 2. The context does not provide enough information to determine the gender of some characters, such as the Replicant or the voices heard during the Voight-Kampff test.
 3. It's important to note that the context is from a screenplay, and the final film may include additional characters or depict the characters differently.


In [110]:
query="""Does the script pass the Bechdel test?"""

In [111]:
result = retrievalQA.invoke({"query": query})
print(result['result'])


The Bechdel test is a measure of the representation of women in media. It requires that a work of fiction have at least two women who talk to each other about something other than a man.

In the provided context, there is a character named Ana who is described as working in a memory lab and creating a birthday cake. Another character named K visits her lab and they have a conversation. However, their conversation does not meet the requirements of the Bechdel test as they are discussing the case that K is working on and Ana's role in it, which is related to a man (Wallace). Therefore, the script does not pass the Bechdel test based on the provided context.


In [112]:
query="""What is the role of Deckard in the movie?"""

In [113]:
result = retrievalQA.invoke({"query": query})
print(result['result'])


Deckard is a blade runner, a police officer in a futuristic Los Angeles, whose job is to retire (i.e., kill) rogue androids. In the context provided, Deckard is pursuing a rogue replicant named Roy Batty (also known as K) and his associates. During this pursuit, they engage in a series of violent encounters, culminating in a final confrontation between Deckard and Batty. Throughout the chase, Deckard is trying to understand why Batty is targeting him and what his true motivations are.


In [114]:
query="""What happens to K at the end of the movie?"""

In [115]:
result = retrievalQA.invoke({"query": query})
print(result['result'])

 At the end of the movie, K is destroyed by Luv when she crushes the emanator, but he manages to see something in the dataflow before he dies. After his death, K's spinner is attacked by scavengers, but he manages to survive when he comes to, surprising them.


In [116]:
query="""Who is the antagonist of the movie?"""

In [117]:
result = retrievalQA.invoke({"query": query})
print(result['result'])

 The antagonist of the movie is Luv, an advanced replicant created by Niander Wallace, who is determined to capture Deckard and bring him back to Wallace for examination and potential use in creating more advanced replicants. Luv is a ruthless and calculating adversary, using various means to track down and capture Deckard, including manipulating other replicants and using lethal force when necessary. She is also shown to have a sadistic streak, enjoying the pain and suffering she inflicts on her victims. Ultimately, Luv's actions drive the plot of the movie forward and create the conflict that Deckard must overcome in order to survive.


In [118]:
query = """Is Niander Wallace a negative character in the movie?"""

In [119]:
result = retrievalQA.invoke({"query": query})
print(result['result'])

 Yes, Niander Wallace is a negative character in the movie. He is a brilliant and manipulative scientist who creates replicants, but he is also a cold and ruthless man who is willing to kill innocent people to achieve his goals. He is obsessed with the idea of creating a new, superior form of replicant and is willing to sacrifice anything and anyone to do so. He is also shown to be cruel and callous, as evidenced by his treatment of Rachael and his willingness to kill her in front of Deckard. Overall, Wallace is a complex and dangerous character who poses a significant threat to the protagonist and the world around him.
