## Library installation

In [1]:
!pip install transformers sentence-transformers langchain torch faiss-cpu numpy
!pip install langchain_community
!pip install pypdf

Collecting transformers
  Downloading transformers-4.41.0-py3-none-any.whl (9.1 MB)
[K     |████████████████████████████████| 9.1 MB 4.0 MB/s eta 0:00:01
[?25hCollecting sentence-transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)
[K     |████████████████████████████████| 171 kB 135.1 MB/s eta 0:00:01
[?25hCollecting langchain
  Downloading langchain-0.2.0-py3-none-any.whl (973 kB)
[K     |████████████████████████████████| 973 kB 119.1 MB/s eta 0:00:01
[?25hCollecting torch
  Downloading torch-2.3.0-cp39-cp39-manylinux1_x86_64.whl (779.1 MB)
[K     |████████████▊                   | 310.7 MB 172.8 MB/s eta 0:00:03

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



[K     |███████████████████████████████▌| 768.3 MB 174.2 MB/s eta 0:00:01

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[K     |████████████████████████████████| 27.0 MB 145.8 MB/s eta 0:00:01
[?25hCollecting numpy
  Downloading numpy-1.26.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
[K     |████████████████████████████████| 18.2 MB 133.4 MB/s eta 0:00:01
[?25hCollecting filelock
  Downloading filelock-3.14.0-py3-none-any.whl (12 kB)
Collecting safetensors>=0.4.1
  Downloading safetensors-0.4.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 133.7 MB/s eta 0:00:01
Collecting huggingface-hub<1.0,>=0.23.0
  Downloading huggingface_hub-0.23.0-py3-none-any.whl (401 kB)
[K     |████████████████████████████████| 401 kB 130.5 MB/s eta 0:00:01
[?25hCollecting regex!=2019.12.17
  Downloading regex-2024.5.15-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (774 kB)
[K     |███████████████████

## Library configuration

In [2]:
import os
from urllib.request import urlretrieve
import numpy as np
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.llms import HuggingFacePipeline
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

## Documents choice

## Water Footprint Network - Morocco and Netherlands
Authored by A.Y. Hoekstra and A.K. Chapagain, this report examines the water footprints of Morocco and the Netherlands. It highlights Morocco's significant external water footprint and dependence on foreign water resources, focusing on the virtual water trade in agricultural products like oil crops, fruits, cereals, and livestock. The report emphasizes the importance of considering water resources in international trade and water sustainability.

## Millennium Challenge Corporation - Irrigation Evaluation Brief
This brief from the Millennium Challenge Corporation (MCC) evaluates the effectiveness of irrigation projects in Morocco. It typically analyzes project outcomes, focusing on agricultural productivity, water use efficiency, and environmental sustainability, and includes recommendations for future irrigation investments.

## IQPC Sponsor Edition
Without access to the content, a detailed description is unavailable. IQPC (Intelligence Quotient Property Company) usually publishes conference proceedings and industry-specific reports, summarizing presentations, discussions, and case studies to share knowledge among professionals.

## HAL - Geosciences Article
This article from Geosciences discusses water management in Morocco's Souss-Massa basin. It covers rainfall, dam infrastructure, and water usage, highlighting the imbalance between water supply and demand. The article also touches on national programs aimed at improving water management and addressing water scarcity.

## World Bank - Water Scarcity in Morocco
This World Bank report analyzes Morocco's water challenges, including scarcity, inefficient use, and climate change impacts. It likely provides an overview of water management strategies, investment needs, and policy recommendations to address these issues.


## Split documents to smaller chunks

In [3]:
# Load pdf files in the local directory
loader = PyPDFDirectoryLoader("./docs")

docs_before_split = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 50,
)
docs_after_split = text_splitter.split_documents(docs_before_split)

docs_after_split[0]

Document(page_content='MOROCCO\nCLIMATE RISK COUNTRY PROFILE', metadata={'source': 'docs/doc.pdf', 'page': 0})

In [4]:
avg_doc_length = lambda docs: sum([len(doc.page_content) for doc in docs])//len(docs)
avg_char_before_split = avg_doc_length(docs_before_split)
avg_char_after_split = avg_doc_length(docs_after_split)

print(f'Before split, there were {len(docs_before_split)} documents loaded, with average characters equal to {avg_char_before_split}.')
print(f'After split, there were {len(docs_after_split)} documents (chunks), with average characters equal to {avg_char_after_split} (average chunk length).')

Before split, there were 205 documents loaded, with average characters equal to 2270.
After split, there were 586 documents (chunks), with average characters equal to 797 (average chunk length).


## Text Embeddings with Hugging Face models

In [5]:
import os
import torch
# Print available CUDA GPUs
if torch.cuda.is_available():
    print("CUDA is available. Number of GPUs:", torch.cuda.device_count())
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
else:
    print("CUDA is not available.")

CUDA is available. Number of GPUs: 1
GPU 0: NVIDIA A100-SXM4-40GB


In [6]:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
print("CUDA_VISIBLE_DEVICES set to:", os.environ["CUDA_VISIBLE_DEVICES"])

CUDA_VISIBLE_DEVICES set to: 0


In [7]:
huggingface_embeddings = HuggingFaceBgeEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",  # alternatively use "sentence-transformers/all-MiniLM-l6-v2" for a light and faster experience.
    encode_kwargs={'normalize_embeddings': True}
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
sample_embedding = np.array(huggingface_embeddings.embed_query(docs_after_split[0].page_content))
print("Sample embedding of a document chunk: ", sample_embedding)
print("Size of the embedding: ", sample_embedding.shape)

Sample embedding of a document chunk:  [ 2.13129763e-02 -6.72915876e-02 -3.18347290e-02  1.94846299e-02
 -2.18812209e-02  1.67022049e-02  3.33516598e-02 -4.52170670e-02
  2.75168978e-02 -2.33039055e-02  3.66176628e-02 -2.37919367e-03
  3.44308205e-02 -1.23412088e-02  4.55992967e-02 -5.49793020e-02
  6.02602772e-02 -3.68011668e-02 -1.96892265e-02  2.58359611e-02
  4.57998700e-02  1.75656341e-02  4.53522243e-02  4.84684529e-03
  2.04574335e-02  3.06395255e-02 -3.26061733e-02  1.32259727e-02
  4.76542376e-02 -1.02408491e-02  5.65457419e-02 -1.05057620e-02
  1.08465813e-02 -8.40618089e-02  1.55770294e-06 -1.99244972e-02
 -8.54423121e-02  6.21012365e-03  9.01253074e-02  1.34018445e-02
  3.49129923e-02  3.64515930e-02 -8.29524547e-03  9.05552972e-03
 -1.51977250e-02  3.38855125e-02 -3.40571329e-02 -2.83595268e-02
  3.46285962e-02  1.11265751e-02  3.01981829e-02 -1.61561358e-03
 -5.44068329e-02  3.80321704e-02  4.80201207e-02  1.23540880e-02
  1.47848506e-03  2.63242647e-02  1.96637288e-02  3

## Retrieval System for vector embeddings

FAISS (Facebook AI Similarity Search) is a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other. It solves limitations of traditional query search engines that are optimized for hash-based searches, and provides more scalable similarity search functions (nearest-neighbor search implementations).

In [9]:
vectorstore = FAISS.from_documents(docs_after_split, huggingface_embeddings)

In [10]:
query = """I'm a farmer from Morocco, can you give recommendations based on the weather on what crops I can grow?"""  

relevant_documents = vectorstore.similarity_search(query)
print(f'There are {len(relevant_documents)} documents retrieved which are relevant to the query. Display the first one:\n')
print(relevant_documents[0].page_content)

There are 4 documents retrieved which are relevant to the query. Display the first one:

15
CLIMATE RISK COUNTRY PROFILE: MOROCCOcountry’s economic growth has overall become more resilient, agriculture remains dependent on the climate and 
thus remains highly vulnerable. Cereals are a predominant crop in Morocco, planted on nearly 43% of all agricultural 
areas, however cereals are less dominant in terms of value as compared to Morocco’s other agriculture outputs. 
Key agricultural exports include citrus fruit (especially oranges), vegetables (e.g., pepper, tomato, green bean), 
almonds, table olives and olive oil, dairy products, and, more recently, blueberries, cherries and asparagus. Early 
season vegetables and specialty crops such as Argan, have the highest value for export.
Morocco has made progress in recent years to expand irrigation for commercial agriculture. Over the last 15 years, 
significant efforts have been made to increase water productivity in agriculture. This has le

## Create a retriever interface using vector store, we’ll use it later to construct Q & A chain using LangChain.

In [11]:
# Use similarity searching algorithm and return 3 most relevant documents.
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

Now we have our vector store and retrieval system ready. We then need a large language model (LLM) to process information and answer the question.

In [12]:
!pip install accelerate
!pip install --upgrade transformers torch langchain_community


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting accelerate
  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)
[K     |████████████████████████████████| 302 kB 4.2 MB/s eta 0:00:01
Installing collected packages: accelerate
Successfully installed accelerate-0.30.1


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Hugging Face models can be run locally through the HuggingFacePipeline class.

In [None]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="gradientai/Llama-3-8B-Instruct-Gradient-1048k",
    task="text-generation",
    pipeline_kwargs={"temperature": 0.2, "max_new_tokens": 1000
                    }
)

llm = hf 
llm.invoke(query)

At a glance, our LLM generates some output that might seem plausible but not accurate or factual. That is because it has not been trained on the forcasting and does not have relevent data to make plausable recommendations.

## Q & A chain

In [None]:
# Define the temperature, humidity, and precipitation values
temperature = "25°C"
humidity = "60%"
precipitation = "10mm"

# Define the prompt template with f-strings
prompt_template = f"""
    We have provided context information below :
    ---------------------
    {{context}}
    ---------------------
     We have provided context information about prediction values for the following metrics:
    Temperature = {temperature}
    Humidity = {humidity}
    Precipitation = {precipitation}
    Take these metrics into consideration while giving your recommendation and you must hightlight them in your answer
    Given this information, please answer the question: {{question}}
"""


# Define the input variables including context, question, temperature, humidity, and precipitation
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)



## Use RetrievalQA invoke method to execute the chain

In [15]:
# Create RetrievalQA instance
retrievalQA = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)


In [None]:
# Call the QA chain with our query.
import time
temperature = "25°C"
humidity = "60%"
precipitation = "10mm"
t0=time.time()
result = retrievalQA.invoke({"query": query})
print(f"Time: {time.time()-t0}")
print(result['result'])

In [20]:
# Define functions to pretty print and visualize retrieved nodes
def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n", "<br>")))

def visualize_retrieved_nodes(nodes) -> None:
    result_dicts = []
    for node in nodes:
        result_dict = {"Score": node.score, "Text": node.node.get_text()}
        result_dicts.append(result_dict)

    pretty_print(pd.DataFrame(result_dicts))

# Print the response
print(response.response)

# Visualize the source nodes
nodes = response.source_nodes
visualize_retrieved_nodes(nodes)



As a farmer in Morocco, the changing climate and increasing water scarcity pose significant challenges for your agricultural operations. According to the World Bank report, the country's water resources have been declining, with the available water resources decreasing from 3.1 billion m3/year (58% of satisfaction) in the past to an unknown level currently. This has led to water restrictions for collective irrigation schemes, with an average of 64% water restrictions over the last four irrigation seasons, resulting in only 36% of the theoretical needs being provided.

Based on the weather patterns and projections, it is recommended that you consider shifting some of your traditional rain-fed cereal production to more resilient crops such as olive trees or almonds, especially in fragile areas. This is because increased temperatures, prolonged dry periods, and droughts are likely to increase soil erosion and exacerbate land degradation, making it more difficult to grow cereals.

Additi

Unnamed: 0,Score,Text
0,0.041924,"15 CLIMATE RISK COUNTRY PROFILE: MOROCCOcountry’s economic growth has overall become more resilient, agriculture remains dependent on the climate and thus remains highly vulnerable. Cereals are a predominant crop in Morocco, planted on nearly 43% of all agricultural areas, however cereals are less dominant in terms of value as compared to Morocco’s other agriculture outputs. Key agricultural exports include citrus fruit (especially oranges), vegetables (e.g., pepper, tomato, green bean), almonds, table olives and olive oil, dairy products, and, more recently, blueberries, cherries and asparagus. Early season vegetables and specialty crops such as Argan, have the highest value for export. Morocco has made progress in recent years to expand irrigation for commercial agriculture. Over the last 15 years, significant efforts have been made to increase water productivity in agriculture. This has led to the integration of localized, on-farm irrigation (drip and sprinkler). At the end of 2018, the agricultural areas using modern on-farm irrigation techniques increased 3.5 times as compared to a baseline on 2008; increased water productivity no reaches 560,000 ha. The main irrigated areas are the Gharb and Loukkos in the northwest, the Tadla in the center- north of the Atlas Mountain region, Al Haouz in the Marrakech region, the Souss-Massa (SM) in the Agadir region, the Ouarzazate and Tafilalet south of the Atlas Mountains, and the Low Moulouya in the northeast. Moroccan agriculture and livestock also remain vulnerable to droughts. Additional challenges for the sector result from farmer’s inability to formally own land or are unable to provide notarized land titles, making it difficult to obtain credit or permits (e.g., for digging wells), and thereby limiting investment for irrigation and other needed inputs.47 Perhaps the most significant challenges for the sector include increasing water restrictions for collective irrigation schemes. For example, in Oum Er Bia and Tensift basins, over the last four irrigation seasons, water restrictions averaged 64% (resulting in only 36% of the theoretical needs were provided), with an extreme value of 72% in 2020–21."
1,0.163892,"Climate Change Impacts Faced with increasing climate variability, Moroccan agriculture has adapted through diversification and rising yields. Although cereal production remains dominant, there is an increasing trend towards horticulture and livestock production.48 Agriculture remains a key sector for Morocco’s economy, food security and rural livelihoods. However, the sector has suffered due to population pressures and increasingly erratic rainfall, which have pushed production to fragile and degraded land. 87% of the country’s crop total production remains primarily rainfed and thus highly vulnerable to increased rainfall variability (particularly barley and wheat). For example, the 2016 winter grain harvest saw harvested yields 70% lower than in 2015 due to widespread drought. Hotter, drier conditions are expected to increase crops’ water requirements by up to 12%, increasing demand for irrigation and further stressing limited water resources. Drought also promotes proliferation of the Hessian fly, increasing risk of damage to wheat yields. Rising temperatures are expected to reduced yields by 50%–75% of rainfed drops during dry years. Erratic precipitation and increased aridity and drought conditions will result in shortened growing seasons, reduced yields and lower productivity. Decreased water availability will continue to impact irrigation potential and in turn, reduce profitability of irrigated agriculture as alternatives require the pumping of groundwater.49 Morocco’s theoretical water allocation is 5.3 billion m3 per year, however, the average of water actually allocated over the last 11 years was 3.1 billion m3/year (58% of satisfaction). 47 World Bank (2018). Climate Variability, Drought, and Drought Management in Morocco’s Agricultural Sector. URL: http:/ / documents.worldbank.org/curated/en/353801538414553978/pdf/130404-WP-P159851-Morocco-WEB.pdf 48 World Bank (2018). Climate Variability, Drought, and Drought Management in Morocco’s Agricultural Sector. URL: http:/ / documents.worldbank.org/curated/en/353801538414553978/pdf/130404-WP-P159851-Morocco-WEB.pdf 49 USAID (2016). Climate Change Risk Profile – Morocco."


## UI

In [None]:
!pip install gradio

In [None]:
import gradio as gr

# Define the function to answer the question
def answer_question(prompt):
    # Call the QA chain with the provided prompt
    result = retrievalQA.invoke({"query": prompt})
    return result['result']

# Create Gradio Interface
gr.Interface(
    fn=answer_question,
    inputs=gr.inputs.Textbox(lines=5, label="Enter Prompt"),
    outputs="text",
    title="Question Answering System",
    description="Enter a prompt to get the answer from the model."
).launch()
