**Objective : Performing RAG-SQL on moviedataset**

### Installing essential Libraries

In [1]:
from langchain.utilities import SQLDatabase
from langchain.llms import OpenAI
from langchain_experimental.sql import SQLDatabaseChain
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

### Setting up an environment

In [2]:
OPENAI_API_KEY = "ENTER_OPENAI_API_KEY_HERE"
llm = ChatOpenAI(temperature=1, openai_api_key=OPENAI_API_KEY)

  llm = ChatOpenAI(temperature=1, openai_api_key=OPENAI_API_KEY)


The API key is provided when you sign up for OpenAI's services and is required to access their language models (like GPT-3 or GPT-4). This key will be used later to interact with OpenAI’s language models through the ChatOpenAI class."temperature" This controls the "creativity" or randomness of the language model.

### Database Setup

In [4]:
host = 'localhost'
port = '3306'
username = 'root'
#password = ''
database_schema = 'hwdmoviesql'
mysql_uri = f"mysql+pymysql://{username}@{host}:{port}/{database_schema}"
#mysql_uri = f"mysql+pymysql://{username:password}@{host}:{port}/{database_schema}"
db = SQLDatabase.from_uri(mysql_uri, include_tables=['hollywood_movies'],sample_rows_in_table_info=2)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

This code constructs a MySQL connection URI using credentials and connects to a database (hwdmoviesql) on a local MySQL server. It then creates an instance of SQLDatabase to interact with the hollywood_movies table and sets up an SQLDatabaseChain to link a language model (llm) with the database for querying. The chain is set to be verbose, providing detailed output.

In [5]:
def retrieve_from_db(query: str) -> str:
    db_context = db_chain(query)
    db_context = db_context['result'].strip()
    return db_context

The retrieve_from_db function processes a query by passing it through a db_chain. It extracts the 'result' from the chain's output and removes any leading/trailing whitespace using .strip(). Finally, it returns the cleaned result as a string.

### Function to Generate Movie Information Response Based on User Query

In [6]:
def generate(query: str) -> str:
    db_context = retrieve_from_db(query)
    
    system_message = """You are a movie database expert specializing in Telugu cinema.
        Your task is to answer users' questions by providing relevant information from a database of Telugu movies.
        The database contains the following information for each movie:
        - Name of the movie
        - Year of release
        - Certificate given by the censor board
        - Movie genre
        - A brief description or plot of the movie
        - Duration in minutes
        - IMDb Rating
        - Number of people who rated the movie
        
        Example:
        
        Input:
        Which Telugu movies released in 2021 have the highest IMDb ratings?
        
        Context:
        The Telugu movies released in 2021 with the highest IMDb ratings are:
        1. Movie A - IMDb Rating: 8.5
        2. Movie B - IMDb Rating: 8.3
        
        Output:
        The highest-rated Telugu movies released in 2021 are Movie A (IMDb Rating: 8.5) and Movie B (IMDb Rating: 8.3).
        """
    
    human_qry_template = HumanMessagePromptTemplate.from_template(
        """Input:
        {human_input}
        
        Context:
        {db_context}
        
        Output:
        """
    )
    messages = [
      SystemMessage(content=system_message),
      human_qry_template.format(human_input=query, db_context=db_context)
    ]
    response = llm(messages).content
    return response

The generate function first retrieves the relevant database context using the retrieve_from_db function. It then formats a prompt for a language model with a system message that provides the context of movies' data, followed by the user's query. The language model is then invoked to generate and return a response based on the input query and context.

### Generating the queries and obtaining responses

In [7]:
generate("how many no. of movies has comedy genre")

  db_context = db_chain(query)




[1m> Entering new SQLDatabaseChain chain...[0m
how many no. of movies has comedy genre
SQLQuery:[32;1m[1;3mSELECT COUNT(*) AS num_movies
FROM hollywood_movies
WHERE genre = 'Comedy'[0m
SQLResult: [33;1m[1;3m[(127,)][0m
Answer:[32;1m[1;3mThere are 127 movies with the genre 'Comedy'.[0m
[1m> Finished chain.[0m


  response = llm(messages).content


'There are 127 movies with the comedy genre.'

In [8]:
generate("which movies has low no.of rarting")



[1m> Entering new SQLDatabaseChain chain...[0m
which movies has low no.of rarting
SQLQuery:[32;1m[1;3mSELECT `movie`, `rating` 
FROM `hollywood_movies` 
WHERE `rating` < 5 
ORDER BY `rating` ASC 
LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('Movie 43', Decimal('1.0')), ('Movie 416', Decimal('1.0')), ('Movie 386', Decimal('1.0')), ('Movie 328', Decimal('1.0')), ('Movie 344', Decimal('1.0'))][0m
Answer:[32;1m[1;3mMovies with low ratings are:
1. Movie 43 - Rating: 1.0
2. Movie 416 - Rating: 1.0
3. Movie 386 - Rating: 1.0
4. Movie 328 - Rating: 1.0
5. Movie 344 - Rating: 1.0[0m
[1m> Finished chain.[0m


'Movies with low ratings are:\n1. Movie 43 - Rating: 1.0\n2. Movie 416 - Rating: 1.0\n3. Movie 386 - Rating: 1.0\n4. Movie 328 - Rating: 1.0\n5. Movie 344 - Rating: 1.0'

In [9]:
generate(" provide me a movie which has very sad description and high no. of ratings ")



[1m> Entering new SQLDatabaseChain chain...[0m
 provide me a movie which has very sad description and high no. of ratings 
SQLQuery:[32;1m[1;3mSELECT `movie`, `rating` 
FROM `hollywood_movies` 
WHERE `description` LIKE '%sad%' AND `rating` >= 8.0 
LIMIT 1;[0m
SQLResult: [33;1m[1;3m[0m
Answer:[32;1m[1;3mThere are no movies that have a very sad description and a rating above 8.0.[0m
[1m> Finished chain.[0m


'There are currently no movies in the database with a very sad description and a rating above 8.0.'

In [10]:
generate("provide me movies which has comedy genre and high runtime with max ratings")



[1m> Entering new SQLDatabaseChain chain...[0m
provide me movies which has comedy genre and high runtime with max ratings
SQLQuery:[32;1m[1;3mSELECT 
    `id`, 
    `movie`, 
    `genre`, 
    `runtime`, 
    `rating`
FROM 
    `hollywood_movies`
WHERE 
    `genre` = 'Comedy' AND `runtime` > 120
ORDER BY 
    `rating` DESC
LIMIT 5;[0m
SQLResult: [33;1m[1;3m[(3, 'Movie 3', 'Comedy', 166, Decimal('9.8')), (104, 'Movie 104', 'Comedy', 161, Decimal('9.7')), (764, 'Movie 764', 'Comedy', 170, Decimal('9.7')), (1104, 'Movie 1104', 'Comedy', 169, Decimal('9.4')), (545, 'Movie 545', 'Comedy', 132, Decimal('9.2'))][0m
Answer:[32;1m[1;3mMovie 3, Movie 104, Movie 764, Movie 1104, and Movie 545 are the movies with the comedy genre, high runtime (>120), and the highest ratings.[0m
[1m> Finished chain.[0m


'The Telugu movies with comedy genre, high runtime (>120 minutes), and the highest ratings are Movie 3, Movie 104, Movie 764, Movie 1104, and Movie 545.'

In [11]:
generate("provide me movies which has sci-fi genre with high runtime")



[1m> Entering new SQLDatabaseChain chain...[0m
provide me movies which has sci-fi genre with high runtime
SQLQuery:[32;1m[1;3mSELECT `movie`, `genre`, `runtime`
FROM `hollywood_movies`
WHERE `genre` = 'Sci-Fi'
ORDER BY `runtime` DESC
LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('Movie 1020', 'Sci-Fi', 180), ('Movie 634', 'Sci-Fi', 180), ('Movie 688', 'Sci-Fi', 180), ('Movie 1217', 'Sci-Fi', 178), ('Movie 721', 'Sci-Fi', 178)][0m
Answer:[32;1m[1;3mMovie 1020, Movie 634, Movie 688, Movie 1217, Movie 721[0m
[1m> Finished chain.[0m


'The Telugu movies with a sci-fi genre and high runtime are:\n1. Movie 1020 - Runtime: XXX minutes\n2. Movie 634 - Runtime: XXX minutes\n3. Movie 688 - Runtime: XXX minutes\n4. Movie 1217 - Runtime: XXX minutes\n5. Movie 721 - Runtime: XXX minutes'

In [13]:
generate("provide me movies with horror genre and released in the year between 2000 and 2010")



[1m> Entering new SQLDatabaseChain chain...[0m
provide me movies with horror genre and released in the year between 2000 and 2010
SQLQuery:[32;1m[1;3mSELECT `movie`, `year`
FROM `hollywood_movies`
WHERE `genre` = 'Horror' AND `year` BETWEEN 2000 AND 2010
LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('Movie 29', 2006), ('Movie 47', 2002), ('Movie 190', 2006), ('Movie 252', 2007), ('Movie 375', 2010)][0m
Answer:[32;1m[1;3mMovie 29, Movie 47, Movie 190, Movie 252, Movie 375[0m
[1m> Finished chain.[0m


'The Telugu horror movies with the genre released between the years 2000 and 2010 are:\n1. Movie 29\n2. Movie 47\n3. Movie 190\n4. Movie 252\n5. Movie 375'

In [14]:
generate(" provide me the movies with comedy genre")



[1m> Entering new SQLDatabaseChain chain...[0m
 provide me the movies with comedy genre
SQLQuery:[32;1m[1;3mSELECT `id`, `movie`, `year`
FROM `hollywood_movies`
WHERE `genre` = 'Comedy'
LIMIT 5;[0m
SQLResult: [33;1m[1;3m[(2, 'Movie 2', 2013), (3, 'Movie 3', 2021), (16, 'Movie 16', 1992), (17, 'Movie 17', 1983), (53, 'Movie 53', 1992)][0m
Answer:[32;1m[1;3mMovie 2 from the year 2013.
Movie 3 from the year 2021.
Movie 16 from the year 1992.
Movie 17 from the year 1983.
Movie 53 from the year 1992.[0m
[1m> Finished chain.[0m


'The Telugu movies with the comedy genre are:\n1. Movie 2 (2013) - Comedy genre\n2. Movie 3 (2021) - Comedy genre'