# **AI ASSIGNMENT SUBMISSION**
**Installing Dependancies and set up environment to host an LLM**

In [None]:
!pip install langchain

In [9]:
import os
from getpass import getpass
from langchain import HuggingFaceHub

In [10]:
HUGGINGFACE_API_KEY = getpass()

··········


In [11]:
os.environ['HUGGINGFACE_API_KEY'] = HUGGINGFACE_API_KEY

In [24]:
!export HUGGINGFACEHUB_API_TOKEN=HUGGINGFACE_API_KEY

Due to resource limitations, model is directly loaded from HuggingFaceHub. Therefore no finetuning of model is included for this submission.

In [359]:
llm = HuggingFaceHub(repo_id='mistralai/Mistral-7B-Instruct-v0.2', huggingfacehub_api_token=HUGGINGFACE_API_KEY)

In [233]:
llm

HuggingFaceHub(client=<InferenceClient(model='google/flan-t5-base', timeout=None)>, repo_id='google/flan-t5-base', task='text2text-generation', huggingfacehub_api_token='hf_IUtEbzVSnXyOURgXKLLtKSNeHUdsfUJiiN')

# **Step 1 - Data Extraction**

In [243]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pypdf
Successfully installed pypdf-4.2.0


In [244]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/Big Mac Index.pdf")

#Load the document by calling loader.load()
pages = loader.load()

In [None]:
print(pages[0].page_content)

In [None]:
### Creating a single list of all the data
data = []
for page in pages:
  data.append(page.page_content)

In [263]:
print(data)

['Big\nMac\nIndex\nThe\nBig\nMac\nIndex\nis\na\nprice\nindex\npublished\nsince\n1986\nby\nThe\nEconomist\nas\nan\ninformal\nway\nof\nmeasuring\nthe\npurchasing\npower\nparity\n(PPP)\nbetween\ntwo\ncurrencies\nand\nproviding\na\ntest\nof\nthe\nextent\nto\nwhich\nmarket\nexchange\nrates\nresult\nin\ngoods\ncosting\nthe\nsame\nin\ndifferent\ncountries.\nIt\n"seeks\nto\nmake\nexchange-rate\ntheory\na\nbit\nmore\ndigestible."\nThe\nindex\ncompares\nthe\nrelative\nprice\nworldwide\nto\npurchase\nthe\nBig\nMac,\na\nhamburger\nsold\nat\nMcDonald\'s\nrestaurants.\nOverview\nThe\nBig\nMac\nindex\nwas\nintroduced\nin\nThe\nEconomist\nin\nSeptember\n1986\nby\nPam\nWoodall\nas\na\nsemi-humorous\nillustration\nof\nPPP\nand\nhas\nbeen\npublished\nby\nthat\npaper\nannually\nsince\nthen.\nAlthough\nthe\nBig\nMac\nIndex\nwas\nnot\nintended\nto\nbe\na\nlegitimate\ntool\nfor\nexchange\nrate\nevaluation,\nit\nis\nnow\nglobally\nrecognised\nand\nfeatured\nin\nmany\nacademic\ntextbooks\nand\nreports.\nThe\ni

**Cleaning of data**

In [292]:
import re
cleaned_data = [re.sub(r'[\n|\|/]', ' ', item) for item in data]

In [293]:
cleaned_data

['Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries. It "seeks to make exchange-rate theory a bit more digestible." The index compares the relative price worldwide to purchase the Big Mac, a hamburger sold at McDonald\'s restaurants. Overview The Big Mac index was introduced in The Economist in September 1986 by Pam Woodall as a semi-humorous illustration of PPP and has been published by that paper annually since then. Although the Big Mac Index was not intended to be a legitimate tool for exchange rate evaluation, it is now globally recognised and featured in many academic textbooks and reports. The index also gave rise to the word burgernomics. The theory underpinning the Big Mac index stems from the concept of PPP, which states that the ex

**Data Visualization**

In [266]:
### Data type conversion for tokenization
str_form_data = " ".join([str(item) for item in cleaned_data])

In [267]:
str_form_data

'Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries. It "seeks to make exchange-rate theory a bit more digestible." The index compares the relative price worldwide to purchase the Big Mac, a hamburger sold at McDonald\'s restaurants. Overview The Big Mac index was introduced in The Economist in September 1986 by Pam Woodall as a semi-humorous illustration of PPP and has been published by that paper annually since then. Although the Big Mac Index was not intended to be a legitimate tool for exchange rate evaluation, it is now globally recognised and featured in many academic textbooks and reports. The index also gave rise to the word burgernomics. The theory underpinning the Big Mac index stems from the concept of PPP, which states that the exc

In [268]:
### # Tokenize the input text into sentences and again stroing it in a list
import nltk
nltk.download('punkt')
df_data = []
tokens = nltk.sent_tokenize(str_form_data)
for t in tokens:
  df_data.append(t)
  print(t, "\n")

Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries. 

It "seeks to make exchange-rate theory a bit more digestible." 

The index compares the relative price worldwide to purchase the Big Mac, a hamburger sold at McDonald's restaurants. 

Overview The Big Mac index was introduced in The Economist in September 1986 by Pam Woodall as a semi-humorous illustration of PPP and has been published by that paper annually since then. 

Although the Big Mac Index was not intended to be a legitimate tool for exchange rate evaluation, it is now globally recognised and featured in many academic textbooks and reports. 

The index also gave rise to the word burgernomics. 

The theory underpinning the Big Mac index stems from the concept of PPP, which states th

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [294]:
(df_data)

['Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries.',
 'It "seeks to make exchange-rate theory a bit more digestible."',
 "The index compares the relative price worldwide to purchase the Big Mac, a hamburger sold at McDonald's restaurants.",
 'Overview The Big Mac index was introduced in The Economist in September 1986 by Pam Woodall as a semi-humorous illustration of PPP and has been published by that paper annually since then.',
 'Although the Big Mac Index was not intended to be a legitimate tool for exchange rate evaluation, it is now globally recognised and featured in many academic textbooks and reports.',
 'The index also gave rise to the word burgernomics.',
 'The theory underpinning the Big Mac index stems from the concept of PPP, w

**Conversion of data from irrelevant text peices to meaningfull chunks**

In [295]:
tokens1 = df_data.copy()

### Checking : If -> the chunks created are too small(<250 charachters)
###            Then -> Append small sentences into larger chunks
def update_appended_list(tokens):
    appended_list = []
    i = 0
    while i < len(tokens1):
        appended_text = tokens1[i]
        while len(appended_text) <= 250 and i < len(tokens1)-1:
            appended_text += " " + tokens1[i+1]
            i += 1
        appended_list.append(appended_text)
        i += 1
    return appended_list

appended_list = update_appended_list(tokens)
print(appended_list)

['Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries.', 'It "seeks to make exchange-rate theory a bit more digestible." The index compares the relative price worldwide to purchase the Big Mac, a hamburger sold at McDonald\'s restaurants. Overview The Big Mac index was introduced in The Economist in September 1986 by Pam Woodall as a semi-humorous illustration of PPP and has been published by that paper annually since then.', 'Although the Big Mac Index was not intended to be a legitimate tool for exchange rate evaluation, it is now globally recognised and featured in many academic textbooks and reports. The index also gave rise to the word burgernomics. The theory underpinning the Big Mac index stems from the concept of PPP, which states that 

In [298]:
appended_list[3]

'However, in reality, sourcing an identical basket of goods in every country provides a complex challenge. According to the Organisation for Economic Co-operation and Development (OECD), over "3,000 consumer goods and services, 30 occupations in government, 200 types of equipment goods and about 15 construction projects" are included in the current PPP calculations.'

**Prompt Engineering**

In [312]:
template = """Generate question answer pairs from the context.
Context : {context}
Qusetion :
Answer :
"""

# **Step 2 - Creation of a Question Chain**

In [367]:
from langchain.prompts import PromptTemplate
from langchain import LLMChain

for lists in appended_list:
  context = f'''{lists}'''
  prompt = PromptTemplate(input_variables=['context'],
                        template=template)
  chain = LLMChain(llm=llm, prompt=prompt)
  result = chain.invoke(context)
  for key, value in result.items():
    print(f'{key} : {value}') # Generated QA pairs(Single context can contain multiple Questions)
    with open('output.txt', 'w') as f:
      f.write(f'{key} : {value}') # Saving output

context : Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries.
text : Generate question answer pairs from the context. 
Context : Big Mac Index The Big Mac Index is a price index published since 1986 by The Economist as an informal way of measuring the purchasing power parity (PPP) between two currencies and providing a test of the extent to which market exchange rates result in goods costing the same in different countries.
Qusetion :
Answer : 
What is the Big Mac Index and what is its purpose?
The Big Mac Index is a price index published by The Economist to measure the purchasing power parity (PPP) between two currencies and test the extent to which market exchange rates result in goods costing the same in different countries.
context : It "s