# LlamaIndex Example

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('imdb_top_1000.csv')

In [3]:
df.sample(10)

Unnamed: 0,Poster_Link,Series_Title,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
765,https://m.media-amazon.com/images/M/MV5BMjIxOD...,Efter brylluppet,2006,R,120 min,Drama,7.7,A manager of an orphanage in India is sent to ...,78.0,Susanne Bier,Mads Mikkelsen,Sidse Babett Knudsen,Rolf Lassgård,Neeral Mulchandani,32001,412544.0
760,https://m.media-amazon.com/images/M/MV5BMTU2Nj...,Flipped,2010,PG,90 min,"Comedy, Drama, Romance",7.7,Two eighth-graders start to have feelings for ...,45.0,Rob Reiner,Madeline Carroll,Callan McAuliffe,Rebecca De Mornay,Anthony Edwards,81446,1752214.0
695,https://m.media-amazon.com/images/M/MV5BNjZmMW...,The Day of the Jackal,1973,A,143 min,"Crime, Drama, Thriller",7.8,"A professional assassin codenamed ""Jackal"" plo...",80.0,Fred Zinnemann,Edward Fox,Terence Alexander,Michel Auclair,Alan Badel,37445,16056255.0
474,https://m.media-amazon.com/images/M/MV5BN2U1Yz...,Nightcrawler,2014,A,117 min,"Crime, Drama, Thriller",7.9,"When Louis Bloom, a con man desperate for work...",76.0,Dan Gilroy,Jake Gyllenhaal,Rene Russo,Bill Paxton,Riz Ahmed,466134,32381218.0
962,https://m.media-amazon.com/images/M/MV5BNzk1Mj...,Sense and Sensibility,1995,U,136 min,"Drama, Romance",7.6,"Rich Mr. Dashwood dies, leaving his second wif...",84.0,Ang Lee,Emma Thompson,Kate Winslet,James Fleet,Tom Wilkinson,102598,43182776.0
274,https://m.media-amazon.com/images/M/MV5BZmQzMD...,Fanny och Alexander,1982,A,188 min,Drama,8.1,Two young Swedish children experience the many...,100.0,Ingmar Bergman,Bertil Guve,Pernilla Allwin,Kristina Adolphson,Börje Ahlstedt,57784,4971340.0
379,https://m.media-amazon.com/images/M/MV5BMjM2NT...,Yeopgijeogin geunyeo,2001,,137 min,"Comedy, Drama, Romance",8.0,"A young man sees a drunk, cute woman standing ...",,Jae-young Kwak,Tae-Hyun Cha,Jun Ji-Hyun,In-mun Kim,Song Wok-suk,45403,
442,https://m.media-amazon.com/images/M/MV5BYTNjN2...,The Night of the Hunter,1955,,92 min,"Crime, Drama, Film-Noir",8.0,A religious fanatic marries a gullible widow w...,99.0,Charles Laughton,Robert Mitchum,Shelley Winters,Lillian Gish,James Gleason,81980,654000.0
113,https://m.media-amazon.com/images/M/MV5BMTY3Mj...,A Clockwork Orange,1971,A,136 min,"Crime, Drama, Sci-Fi",8.3,"In the future, a sadistic gang leader is impri...",77.0,Stanley Kubrick,Malcolm McDowell,Patrick Magee,Michael Bates,Warren Clarke,757904,6207725.0
13,https://m.media-amazon.com/images/M/MV5BZGMxZT...,The Lord of the Rings: The Two Towers,2002,UA,179 min,"Action, Adventure, Drama",8.7,While Frodo and Sam edge closer to Mordor with...,87.0,Peter Jackson,Elijah Wood,Ian McKellen,Viggo Mortensen,Orlando Bloom,1485555,342551365.0


In [4]:
## getting rid of unwanted columns
df = df[['Series_Title', 'Released_Year', 'Runtime', 'Genre', 'Overview', 'IMDB_Rating', 'Director']]

In [5]:
def to_text(row):
    return """
    The Movie name is {0} it was released in {1} the runtime of the movie is {2}, the movie tells story about {4} the genre is {3}, the rating of the movie in IMDB is {5}, and the director named {6}
        """.format(row['Series_Title'],row['Released_Year'],row['Runtime'],row['Genre'],row['Overview'], row['IMDB_Rating'], row['Director'])

In [6]:
text_list = []

for i in df.iterrows():
    text_list.append(to_text(i[1]))

In [7]:
print(text_list[0])


    The Movie name is The Shawshank Redemption it was released in 1994 the runtime of the movie is 142 min, the movie tells story about Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency. the genre is Drama, the rating of the movie in IMDB is 9.3, and the director named Frank Darabont
        


In [8]:
from llama_index import Document

documents = [Document(t) for t in text_list]

In [9]:
from llama_index import GPTSimpleVectorIndex

index = GPTSimpleVectorIndex.from_documents(documents)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 99900 tokens


In [None]:
# saving for later 
index.save_to_disk('index2.json')

In [28]:
from llama_index import QuestionAnswerPrompt, GPTSimpleVectorIndex, SimpleDirectoryReader

# load documents
# define custom QuestionAnswerPrompt
QA_PROMPT_TMPL = (
    "Answer the Query based on movies"
    "This is more information that:\n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "\n\nQuery: {query_str}\n from the information answer the Query."
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)
# Build GPTSimpleVectorIndex


In [25]:
response = index.query("when The Shawshank Redemption is released?", text_qa_template=QA_PROMPT)
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 158 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens




The Shawshank Redemption was released in 1994.


In [29]:
response = index.query("give me all the Movies that have runtime less than 120min", text_qa_template=QA_PROMPT)
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 171 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 12 tokens




The movie Short Term 12 has a runtime of 96 min, so it would be included in the list of movies with a runtime less than 120 min.


In [30]:
response = index.query("give me the best Drama movie", text_qa_template=QA_PROMPT)
print(response)

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 166 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 6 tokens




The Best Years of Our Lives is one of the best Drama movies, with a rating of 8.0 on IMDB and directed by William Wyler.
