**Prompts**

• List major Food insecurity reason in 2023

• Explain malnutrition in war zones

• Explain increase prices impact on food security

**Install necessery packages**

In [None]:
!pip install pymupdf
!pip install pdfplumber transformers
!pip install sentence-transformers


**1. Import libraries**

In [None]:
import fitz  # PyMuPDF
import pdfplumber
from google.colab import drive
import re


**2. Extract text from PDF**

In [None]:
drive.mount('/content/drive')

def extract_text_from_pdf(pdf_path):
    text = ""
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            text += page.extract_text()
    return text

# Open the PDF file
pdf_path = '/content/drive/My Drive/Colab Notebooks/SOFI-2023.pdf'
pdf_text = extract_text_from_pdf(pdf_path)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**3. Text processing**

In [None]:
def clean_text(text):
    # Basic text cleaning
    text = re.sub(r'\s+', ' ', text)  # Remove extra spaces
    return text

cleaned_text = clean_text(pdf_text)

**4. Integrate a Language Model**

In [None]:
from transformers import GPTNeoForCausalLM, GPT2Tokenizer

# Initialize the model and tokenizer
model_name = "EleutherAI/gpt-neo-2.7B"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPTNeoForCausalLM.from_pretrained(model_name)

# Since GPT models do not have padding by default and we are not batching, we avoid setting a pad token
def generate_response(prompt, max_new_tokens=100):
    # Encode the prompt to tokens and check input length
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    max_length = tokenizer.model_max_length

    # Generate response
    if input_ids.shape[1] < max_length:
        outputs = model.generate(input_ids, max_length=max_length, max_new_tokens=max_new_tokens, num_return_sequences=1)
    else:
        outputs = model.generate(input_ids[:, :max_length], max_length=max_length, max_new_tokens=max_new_tokens, num_return_sequences=1)

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Prompts based on your specific questions
prompts = [
    "List major Food insecurity reasons in 2023",
    "Explain malnutrition in war zones",
    "Explain how increasing prices impact food security"
]

# Generating responses for each prompt
responses = {prompt: generate_response(prompt) for prompt in prompts}

# Printing the responses
for prompt, response in responses.items():
    print(f"Prompt: {prompt}\nResponse: {response}\n")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
The attention mask and the pad token id were not set. As a consequence, you may observ

Prompt: List major Food insecurity reasons in 2023
Response: List major Food insecurity reasons in 2023

The UK government has published its latest food security report, which shows that the number of people in the UK who are food insecure has risen to 1.6 million.

The report, which was published on Monday, shows that the number of people in the UK who are food insecure has risen to 1.6 million.

The report, which was published on Monday, shows that the number of people in the UK who are food insecure has risen to 1.6 million.


Prompt: Explain malnutrition in war zones
Response: Explain malnutrition in war zones

The United Nations has declared that the world is facing a “global emergency” over the lack of food and water in war zones.

The UN’s Food and Agriculture Organization (FAO) has warned that the world is facing a “global emergency” over the lack of food and water in war zones.

The UN’s Food and Agriculture Organization (FAO) has warned that the world is facing a “global emer

**5. Enhance Retrieval System**

In [None]:
from sentence_transformers import SentenceTransformer, util

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Define sections based on thematic analysis of the PDF
sections = [
    "Detailed analysis on food insecurity reasons in 2023, including climate change impacts, economic factors, and geopolitical tensions.",
    "Comprehensive review on malnutrition challenges in war zones, focusing on accessibility, supply chains disruptions, and humanitarian aid effectiveness.",
    "Discussion on how increasing prices are impacting food security globally, with a focus on price volatility, income disparities, and government policy responses."
]

# Encode the sections to create embeddings
embeddings = model.encode(sections)

def retrieve_relevant_section(query):
    # Encode the query to create its embedding
    query_embedding = model.encode(query)
    # Calculate cosine similarities between the query and all section embeddings
    scores = util.cos_sim(query_embedding, embeddings)
    # Find the index of the section with the highest similarity score
    top_idx = scores.argmax()
    # Return the most relevant section
    return sections[top_idx]

# Queries aligned with the specified prompts
queries = [
    "List major Food insecurity reasons in 2023",
    "Explain malnutrition in war zones",
    "Explain how increasing prices impact food security"
]

# Retrieve and print relevant sections for each query
for query in queries:
    relevant_section = retrieve_relevant_section(query)
    print(f"Query: {query}\nRelevant Section: {relevant_section}\n")

