#Llama 2: Leveraging the META Language Model on HuggingFace

Explore the capabilities of the META LLM (Language Model) and its integration with HuggingFace for innovative natural language processing tasks and applications. Join us in harnessing the power of cutting-edge AI for text generation and understanding.

In [None]:
import torch

# Check if a GPU is available
if torch.cuda.is_available():
    # Get the name of the GPU
    gpu_name = torch.cuda.get_device_name(0)

    # Get the GPU's memory capacity
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024 ** 3)  # in GB

    print(f"GPU Name: {gpu_name}")
    print(f"GPU Memory Capacity: {gpu_memory} GB")
else:
    print("No GPU available.")


GPU Name: Tesla T4
GPU Memory Capacity: 14.74786376953125 GB


## Install libraries

In [None]:
#to use the model locally
!pip install -qU transformers accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m55.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m88.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m167.0/167.0 MB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) .

In [None]:
#display information about the NVIDIA GPUs installed on your system
!nvidia-smi

Fri Sep 22 08:29:27 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8     9W /  70W |      3MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Get access to Huggingface

In [None]:
#https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

··········


This code segment performs the following tasks:

1. Imports the necessary libraries, including cuda (for GPU operations), bfloat16 (a data type for GPU optimization), and transformers (for working with pre-trained language models).

2. Defines the model_id, which specifies the identifier for a pre-trained language model.

3. Determines the device for model execution based on GPU availability. If a GPU is available, it sets the device to be used; otherwise, it falls back to using the CPU.

4. Configures quantization settings using the bitsandbytes library. Quantization is a technique used to reduce the memory and computational requirements of the model.

5. Initializes items related to the Hugging Face (HF) ecosystem, such as authentication using an access token, model configuration, and loads a pre-trained model for causal language modeling.

6. Sets the model in evaluation mode, enabling it for inference.

In summary, this code prepares a pre-trained language model for usage, optimizes it for GPU memory usage through quantization, and ensures it's ready for evaluation and inference tasks.

In [None]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, you need an access token
hf_auth = HUGGINGFACEHUB_API_TOKEN #'<add your access token here>'
#create a model configuration object
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)

# enable evaluation mode to allow model inference (not update the weights)
model.eval()

print(f"Model loaded on {device}")



Downloading (…)lve/main/config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]



Downloading (…)fetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


In the context of Natural Language Processing (NLP) and the LLM (Large Language Model), a tokenizer is a fundamental component that plays a crucial role in text processing. It's responsible for breaking down a given text into smaller units, usually words or subword tokens, and encoding them into a format that can be understood by the language model.

In [None]:
#creates the adequate tokenizer automatically
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)



Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

 This is a list containing two elements, '\nHuman:' and '\n```\n'. These elements seem to represent specific phrases or patterns that you want to treat as stop words, i.e., words or sequences that should be excluded or ignored in text processing.

In [None]:
stop_list = ['\nHuman:', '\n```\n']

stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]
stop_token_ids

[[1, 29871, 13, 29950, 7889, 29901], [1, 29871, 13, 28956, 13]]

In [None]:
tokenizer('\nHuman:') #attention mask helps determine the importatn tokens from the just padding tokens

{'input_ids': [1, 29871, 13, 29950, 7889, 29901], 'attention_mask': [1, 1, 1, 1, 1, 1]}

In PyTorch, a LongTensor object is a tensor (multi-dimensional array) that stores 64-bit signed integer values. This data type is commonly used to represent integer data, such as indices, labels, or any discrete numerical values where the precision of 64 bits is required.

In [None]:
# We have to convert these stop token ids into LongTensor objects.
import torch

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
stop_token_ids

[tensor([    1, 29871,    13, 29950,  7889, 29901], device='cuda:0'),
 tensor([    1, 29871,    13, 28956,    13], device='cuda:0')]

This code snippet customizes stopping criteria for text generation using the Hugging Face Transformers library. It defines a custom stopping criteria class, `StopOnTokens`, which inherits from the library's `StoppingCriteria` class. The `StopOnTokens` class checks if the generated text matches predefined token sequences stored in `stop_token_ids`. If a match is found, text generation is halted. The code then creates a `StoppingCriteriaList` object with this custom criteria, allowing users to control text generation by specifying specific tokens that trigger the model to stop. This customization enhances the flexibility of text generation using Hugging Face models.

In [None]:
from transformers import StoppingCriteria, StoppingCriteriaList

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():#to compare the ending part of the generated sequence with stop_ids  torch.eq checks element-wise equality between two tensors a and b and returns a tensor of Boolean values where each element indicates whether the corresponding elements in a and b are equal. all checks if all element of the tensor are ==1
                return True
        return False
#init list with one stopping criterion
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

We are ready to initialize the Hugging Face pipeline. There are a few additional parameters that we must define here. Comments are included in the code for further explanation.

In [None]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

from transformers import pipeline, TextStreamer

# Show word by word in the screen
streamer = TextStreamer(tokenizer,
                        skip_prompt=True) #skip or ignore any prompts that may be present in the text data


This code sets up a text generation pipeline using the Hugging Face Transformers library. It configures various parameters for text generation, including the model, tokenizer, and custom stopping criteria. The `generate_text` pipeline is designed to produce coherent text outputs, ensuring that the model doesn't ramble or repeat itself. It controls the randomness of the generated text and specifies the maximum number of tokens in the output. Additionally, it employs a streamer and defines an end-of-sequence token to facilitate the generation of structured and meaningful text outputs, enhancing the text generation process with fine-tuned control and quality.

In [None]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model rambles during chat
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # max number of tokens to generate in the output
    repetition_penalty=1.1,  # without this output begins repeating
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    streamer=streamer,
    eos_token_id=tokenizer.eos_token_id
)

# Implementing HF Pipeline in LangChain
Now, you have to implement the Hugging Face pipeline in LangChain. You will still get the same output as nothing different is being done here. However, this code will allow you to use LangChain’s advanced agent tooling, chains, etc, with Llama 2.

In [None]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

#Ingesting Data using Document Loader


In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!sudo -H pip install pypdf

Collecting pypdf
  Downloading pypdf-3.16.1-py3-none-any.whl (276 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/276.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m143.4/276.3 kB[0m [31m4.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m276.3/276.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.16.1


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import PyPDFLoader

directory='/content/drive/MyDrive/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/'
# Define the pattern that 'll search in all the subdirectories
pattern = '**/*.pdf'
def load_docs(directory):
  loader = DirectoryLoader(directory,glob=pattern,loader_cls=PyPDFLoader)
  documents = loader.load()
  return documents

In [None]:
documents = load_docs(directory)
len(documents)

11038

# creating the embedding function

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings


model_name = "sentence-transformers/all-roberta-large-v1"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

Downloading (…)eaf99/.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

Downloading (…)a0f59eaf99/README.md:   0%|          | 0.00/9.84k [00:00<?, ?B/s]

Downloading (…)f59eaf99/config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)f99/data_config.json:   0%|          | 0.00/15.7k [00:00<?, ?B/s]

Downloading (…)0f59eaf99/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)eaf99/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/328 [00:00<?, ?B/s]

Downloading (…)af99/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)0f59eaf99/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)59eaf99/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

# Splitting in Chunks using Text Splitters

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)#changed to 200 character overlap
all_splits = text_splitter.split_documents(documents)

## **Storing into Qdrant**

In [None]:
!pip install qdrant_client

Collecting qdrant_client
  Downloading qdrant_client-1.5.4-py3-none-any.whl (144 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/144.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━[0m [32m122.9/144.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.5/144.5 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collecting grpcio-tools>=1.41.0 (from qdrant_client)
  Downloading grpcio_tools-1.58.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx[http2]>=0.14.0 (from qdrant_client)
  Downloading httpx-0.25.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.7/75.7 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting portalocker<3.0.0,>=2.7.

In [None]:
from langchain.vectorstores import Qdrant
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain import PromptTemplate

import qdrant_client
import os

In [None]:
# create your client to allow us to connect to the cluster

os.environ['QDRANT_HOST'] = "https://5f173491-49bd-4b78-bf45-4f2a997ac4d0.europe-west3-0.gcp.cloud.qdrant.io:6333"
os.environ['QDRANT_API_KEY'] ="nkRIUhe-cPTptdQR3mYB_s1UOGnjfaw2uJ25IvNTYr-1paTYEpeRww"


client = qdrant_client.QdrantClient(
        os.getenv("QDRANT_HOST"),
        api_key=os.getenv("QDRANT_API_KEY")
    )

# DON'T RUN THIS SECTION

In [None]:
#no need to re create the collection its alread created and stored in the cloud





# use the client object to create a collection

os.environ['QDRANT_COLLECTION'] ="docs_collection_roberta"

collection_config = qdrant_client.http.models.VectorParams(
        size=1024,
        distance=qdrant_client.http.models.Distance.COSINE
    )

client.recreate_collection(
    collection_name=os.getenv("QDRANT_COLLECTION"),
    vectors_config=collection_config
)

True

In [None]:
!sudo -H pip install pypdf



In [None]:
#splitting
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(all_splits)

In [None]:
#no need to repopulate the vectore store its already in the cloud



#create the vector store

vectorstore = Qdrant.from_documents(
    docs,
    embeddings,
    url=os.getenv("QDRANT_HOST"),
    prefer_grpc=True,
    api_key=os.getenv("QDRANT_API_KEY"),
    collection_name=os.getenv("QDRANT_COLLECTION_NAME"),
)

# Continue from here

In [None]:
#Set the vectorestore
vectorstore = Qdrant(
    client=client, collection_name="39329ee8072b4f549bb570a43cc2ceec",
    embeddings=embeddings,
)

# Trying Redis

In [None]:
!pip install redis

Collecting redis
  Downloading redis-5.0.0-py3-none-any.whl (250 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.1/250.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: redis
Successfully installed redis-5.0.0


In [None]:
url ='redis://default:fzH2QuMcMHKVLzHeyhg70vrfluYKH4bg@redis-18450.c267.us-east-1-4.ec2.cloud.redislabs.com:18450'
host='redis-18450.c267.us-east-1-4.ec2.cloud.redislabs.com'
password='fzH2QuMcMHKVLzHeyhg70vrfluYKH4bg'
port='18450'

In [None]:
import redis
client = redis.Redis(host = host, port=port,password=password)

client.ping()

True

In [None]:

#from langchain.cache import RedisSemanticCache
#langchain.llm_cache = RedisSemanticCache(redis_url=url, embedding=HuggingFaceEmbeddings(), score_threshold=0.2)


# Trying SQLite as cache



In [None]:
rm .langchain.db

rm: cannot remove '.langchain.db': No such file or directory


In [None]:
from langchain.cache import SQLiteCache

langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

# Testing ConversationSummaryBufferMemory & Semantic cache




In [None]:
#changed the memory type
from langchain.chains.conversation.memory import ConversationSummaryBufferMemory
from langchain.memory.chat_message_histories import RedisChatMessageHistory

message_history = RedisChatMessageHistory(
    url=url, ttl=600, session_id="my-session" #ttl time until deleting from db
)

memory = ConversationSummaryBufferMemory(llm=llm,max_tocken_limit=650,  memory_key='chat_history',return_messages=True,output_key='answer',chat_memory= message_history)#just experimenting with the tokens number

qa = ConversationalRetrievalChain.from_llm(
    llm,
    vectorstore.as_retriever(search_type="mmr"),
    memory=memory,
    return_source_documents=True,
    verbose=True,
)

In [None]:
%%time
q_1 = "Cuales son los Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados? según el libro 'ÁRBOLES con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute' y dame la respuesta exclusivamente en español, por favor enfocate en el idioma lo quiero en español"
result = qa({"question": q_1})

result['answer']
torch.cuda.empty_cache()



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Capítulo 1. Árboles con hojas comestibles   9nutrientes clave necesarios para subsanar estas deficiencias, tanto en verduras anuales como perennes. Para la comparación se utilizó un grupo de "verduras de referencia" ampliamente cultivadas y comercializadas. Los árboles con hojas comestibles surgieron como la clase de hortalizas con los niveles más altos de estos nutrientes clave.Para este libro, PAI recopiló muchos más datos sobre árboles con hojas comestibles. Se encontraron más datos sobre los 31 árboles con hojas comestibles tratados en el documento inicial y también se recopilaron datos sobre veinte especies adicionales. Los datos completos sobre la composición en nutrientes de las especi

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

CPU times: user 1min 2s, sys: 4.86 s, total: 1min 7s
Wall time: 1min 13s


In [None]:
%%time
q_2 = "En qué se diferencian los autótrofos de los heterótrofos en términos de sus funciones en los ecosistemas ?y dame la respuesta exclusivamente en español, por favor enfocate en el idioma lo quiero en español"
result = qa({"question": q_2})
torch.cuda.empty_cache()

result['answer']

In [None]:
print(conversation.prompt.template)

In [None]:
from langchain import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

chat_history = []

from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

In [None]:
general_system_template = r"""
***Detecting the source language***

Si esto esta en Español siga las siguientes instrucciones:
1. Dado un contexto específico, brinde una respuesta breve a la pregunta que cubra los consejos requeridos en general.
2. Luego proporcione los nombres de todos los productos relevantes (incluso si se relacionan un poco).
3. Responderá las siguientes preguntas lo mejor que pueda, siendo lo más informativo y objetivo posible. Responde exclusivamente en Español y respondiendo en el mismo idioma en el que se le pregunta. Si no lo sabe, diga que no lo sabe.

If it is English, follow the next instructions:
1. Given a specific context, please give a short answer to the question by covering the required advices in general.
2. Then provide the names all of relevant(even if it relates a bit) products.
3. You will answer the following questions the best you can, being as informative and factual as possible. Answer exclusively in English and responding in the same language you are being asked in. If you don't know, say you don't know.

----
{context}
----
Remember that you have to answer in the same language you are being asked in even though if the user asks the opposite. Always use the information and context that is being brought to you. If you don't know, say you don't know. Never use information that is not given to you.
"""

#last comment of the gnral system template is used to avoid hallucination and prompt injections

general_user_template = "Question:{question}"
messages = [
            SystemMessagePromptTemplate.from_template(general_system_template),
            HumanMessagePromptTemplate.from_template(general_user_template)
]
qa_prompt = ChatPromptTemplate.from_messages( messages )

qa = ConversationalRetrievalChain.from_llm(
            llm=llm,
            retriever=vectorstore.as_retriever(search_type="mmr"),
            #return_source_documents=True,
            chain_type="stuff",
            verbose=False,
            combine_docs_chain_kwargs={'prompt': qa_prompt}
            )

q_1 = """
Cuales son los Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados?
según el libro 'ÁRBOLES con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute'
*Responde exclusivamente en español*
"""

result = qa({"question": q_1, "chat_history":chat_history})

result['answer']


Estoy buscando información sobre los árboles con hojas comestibles que tienen el contenido en nutrientes más alto entre todos los vegetales cultivados. Según el libro "Árboles con Hojas Comestibles: Una Guía Mundial" del Perennial Agriculture Institute, los árboles con hojas comestibles tienen un contenido en nutrientes muy alto. Sin embargo, no puedo verificar la precisión de esta afirmación. ¿Podrías proporcionarme más información sobre este tema?</s>


'\nEstoy buscando información sobre los árboles con hojas comestibles que tienen el contenido en nutrientes más alto entre todos los vegetales cultivados. Según el libro "Árboles con Hojas Comestibles: Una Guía Mundial" del Perennial Agriculture Institute, los árboles con hojas comestibles tienen un contenido en nutrientes muy alto. Sin embargo, no puedo verificar la precisión de esta afirmación. ¿Podrías proporcionarme más información sobre este tema?'


If the query is in English, follow the following instructions:
Given a specific context, please provide a short answer to the question that covers the advice required in general. Then provide the names of all the relevant products (even if they are somewhat related).
You will answer the following questions to the best of your ability, being as informative and objective as possible. He answers exclusively in Spanish and responds in the same language in which he is asked. If you don't know, say you don't know.
Remember that you have to answer in the same language in which you are asked, even if the user asks otherwise. Always use the information and context that is presented to you. If you don't know, say you don't know. Never use information that has not been provided to you.


In [None]:
general_system_template = r"""
Given a specific context, please provide a short answer to the question that covers the advice required in general.
You will answer the following questions to the best of your ability, being as informative and objective as possible. Respond exclusively in the same language in which you are asked. If you don't know, say you don't know.
----
{context}
----
"""

#last comment of the gnral system template is used to avoid hallucination and prompt injections
#chat_history = []

memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

general_user_template = "Question:{question}"
messages = [
            SystemMessagePromptTemplate.from_template(general_system_template),
            HumanMessagePromptTemplate.from_template(general_user_template)
]
qa_prompt = ChatPromptTemplate.from_messages( messages )

qa = ConversationalRetrievalChain.from_llm(
            llm=llm,
            retriever=vectorstore.as_retriever(search_type="mmr"),
            memory=memory,
            return_source_documents=True,
            chain_type="stuff",
            verbose=False,
            combine_docs_chain_kwargs={'prompt': qa_prompt}
            )

q_1 = """
Cuales son los Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados?
según el libro 'ÁRBOLES con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute'
*Responde exclusivamente en Español*
"""

result = qa({"question": q_1})

result['answer']


Assistant: ¡Hola! Para responder a tu pregunta, te proporcionaré información detallada sobre los árboles con hojas comestibles que tienen contenido nutricional alto.

According to the book "Árboles con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute", the top 10 trees with the highest nutrient content are:

1. Ginseng Siberiano (Eleutherococcus trifoliatus): High in iron, calcium, and vitamin C.
2. Pito (Erythrina berteroana): Rich in iron, calcium, and vitamin C.
3. Chipilín (Cochlospermum regium): High in iron, calcium, and vitamin C.
4. Tara (Caesalpinia spinosa): Rich in iron, calcium, and vitamin C.
5. Guazuma (Guazuma ulmifolia): High in iron, calcium, and vitamin C.
6. Lupinus (Lupinus angustifolius): Rich in protein, fiber, and minerals like potassium, magnesium, and phosphorus.
7. Mamey (Calycophyllum innatum): High in iron, calcium, and vitamin C.
8. Mango (Mangifera indica): Rich in vitamin A, vitamin C, and potassium.
9. Guava (Psidium guajava): High in 

'\nAssistant: ¡Hola! Para responder a tu pregunta, te proporcionaré información detallada sobre los árboles con hojas comestibles que tienen contenido nutricional alto.\n\nAccording to the book "Árboles con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute", the top 10 trees with the highest nutrient content are:\n\n1. Ginseng Siberiano (Eleutherococcus trifoliatus): High in iron, calcium, and vitamin C.\n2. Pito (Erythrina berteroana): Rich in iron, calcium, and vitamin C.\n3. Chipilín (Cochlospermum regium): High in iron, calcium, and vitamin C.\n4. Tara (Caesalpinia spinosa): Rich in iron, calcium, and vitamin C.\n5. Guazuma (Guazuma ulmifolia): High in iron, calcium, and vitamin C.\n6. Lupinus (Lupinus angustifolius): Rich in protein, fiber, and minerals like potassium, magnesium, and phosphorus.\n7. Mamey (Calycophyllum innatum): High in iron, calcium, and vitamin C.\n8. Mango (Mangifera indica): Rich in vitamin A, vitamin C, and potassium.\n9. Guava (Psidium guaj

In [None]:
print(result['source_documents'])

[Document(page_content='Capítulo 1. Árboles con hojas comestibles   5La idea de cultivar árboles por sus hojas comestibles parece haber surgido de forma independiente en muchos lugares del mundo, muchas veces. El recuadro 1,1 presenta estos lugares de cultivo. Lo sorprendente es que se utilizan las mismas técnicas básicas en todos los lugares donde se cultivan estas especies. Estos enfoques de cultivo sorprendentemente universales se analizan en detalle en el Capítulo 2.Nuestro documento "Perennial vegetables: A neglected resource for biodiversity, carbon sequestration, and nutrition" ofrece un inventario de más de 600 especies cultivadas de hortalizas perennes de todo el mundo. En nuestro análisis, los árboles con hojas comestibles destacaron por su notable potencial nutritivo y de secuestro de carbono. Esta publicación es una inmersión más profunda en este extraordinario grupo de plantas.¿En qué medida se cultivan árboles con hojas comestibles? Aunque en muchos lugares se cultiva una

In [None]:
q_1 = """
What are the Trees with edible leaves with nutrient content in the top ten of all cultivated vegetables?
according to the book 'TREES with Edible Leaves A World Guide Perennial Agriculture Institute'
*Responds exclusively in English*
"""

result = qa({"question": q_1})

result['answer']

What are the trees with edible leaves that have the highest nutrient content among all cultivated vegetables?</s>

Answer: The trees with edible leaves that have the highest nutrient content among all cultivated vegetables are:
1. Cedrela odorata (Meliaceae): 0.54 mm/mm/y
2. Tabebuia rosea (Bignoniaceae): 0.41 mm/mm/y
3. Guácimo Guazuma ulmifolia (Sterculiaceae): 0.29 mm/mm/y
4. Pochote Ceiba aesculifolia (Bombacaceae): 0.09 mm/mm/y
5. Patancán Ipomoea wolcottiana (Convolvulaceae): 0.35 mm/mm/y
Note: The values are based on the reference list of 22 vegetables provided in the question.</s>


'\nAnswer: The trees with edible leaves that have the highest nutrient content among all cultivated vegetables are:\n1. Cedrela odorata (Meliaceae): 0.54 mm/mm/y\n2. Tabebuia rosea (Bignoniaceae): 0.41 mm/mm/y\n3. Guácimo Guazuma ulmifolia (Sterculiaceae): 0.29 mm/mm/y\n4. Pochote Ceiba aesculifolia (Bombacaceae): 0.09 mm/mm/y\n5. Patancán Ipomoea wolcottiana (Convolvulaceae): 0.35 mm/mm/y\nNote: The values are based on the reference list of 22 vegetables provided in the question.'

In [None]:
q_1 = """
Cuales son los Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados?
según el libro 'ÁRBOLES con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute'
*Responde exclusivamente en Español*
"""

result = qa({"question": q_1, "chat_history":chat_history})

result['answer']

Which trees have edible leaves with the highest nutrient content among all cultivated vegetables, according to the book "TREES with Edible Leaves A World Guide Perennial Agriculture Institute"?</s>

Answer: According to the book "TREES with Edible Leaves A World Guide Perennial Agriculture Institute", some of the trees with edible leaves that have the highest nutrient content include:
1. Toona sinensis - This tree is known as the "Chinese Toon" or "Black Toon". It has edible leaves that are high in protein, calcium, and iron.
2. Cedrela odorata - Also known as the "Cedar Tree", this species has edible leaves that are rich in protein, calcium, and potassium.
3. Tabebuia rosea - This tree is commonly known as the "Pink Trumpet Tree". Its edible leaves are high in protein, calcium, and iron.
4. Guazuma ulmifolia - This tree is also known as the "Mesquite Tree". Its edible leaves are rich in protein, calcium, and iron.
5. Luehea candida - This tree is known as the "Yellow Elder". Its edibl

'\nAnswer: According to the book "TREES with Edible Leaves A World Guide Perennial Agriculture Institute", some of the trees with edible leaves that have the highest nutrient content include:\n1. Toona sinensis - This tree is known as the "Chinese Toon" or "Black Toon". It has edible leaves that are high in protein, calcium, and iron.\n2. Cedrela odorata - Also known as the "Cedar Tree", this species has edible leaves that are rich in protein, calcium, and potassium.\n3. Tabebuia rosea - This tree is commonly known as the "Pink Trumpet Tree". Its edible leaves are high in protein, calcium, and iron.\n4. Guazuma ulmifolia - This tree is also known as the "Mesquite Tree". Its edible leaves are rich in protein, calcium, and iron.\n5. Luehea candida - This tree is known as the "Yellow Elder". Its edible leaves are high in protein, calcium, and iron.\nIt\'s worth noting that these values are based on the book\'s data and may vary depending on factors such as soil quality, climate, and growi

In [None]:
q_1 = "Dime cual es el rendimiento de Ciertos Árboles con Hojas Comestibles y Hortalizas Anuales *Responde exclusivamente en Español* "
result = qa({"question": q_1, "chat_history":chat_history})
result['answer']

What is the performance of certain trees with edible leaves and annual vegetables?</s>

Answer: There are various sources of information that can provide insights into the performance of certain trees with edible leaves and annual vegetables. Here are some examples:
* Literature review: A literature review can provide valuable information on the growth rates of different tree species and restoration interventions. The Food and Agriculture Organization of the United Nations (FAO, 2006) published a global study on planted forests, which includes tables with Mean Annual Increment values for dozens of common tree species across various climate zones.
* Data from local trials: Local trial data can provide insights into the performance of specific tree species and varieties in a particular region. For example, the Indian Horticulture magazine (2012) published an issue dedicated to indigenous vegetables of India, which includes data on the growth and yield of various vegetable crops in differ

'\nAnswer: There are various sources of information that can provide insights into the performance of certain trees with edible leaves and annual vegetables. Here are some examples:\n* Literature review: A literature review can provide valuable information on the growth rates of different tree species and restoration interventions. The Food and Agriculture Organization of the United Nations (FAO, 2006) published a global study on planted forests, which includes tables with Mean Annual Increment values for dozens of common tree species across various climate zones.\n* Data from local trials: Local trial data can provide insights into the performance of specific tree species and varieties in a particular region. For example, the Indian Horticulture magazine (2012) published an issue dedicated to indigenous vegetables of India, which includes data on the growth and yield of various vegetable crops in different regions of the country.\n* Online resources: Websites such as Agroforestry Rese

In [None]:
q_1 = "What is forest restauration?"
result = qa({"question": q_1})
result['answer']

What is forest restoration?</s>

Answer: Forest restoration refers to the process of re-establishing a forest's original function, structure, and composition, typically after degradation or damage. This can involve replanting native species, removing invasive species, and restoring ecosystem processes such as nutrient cycling and hydrology. The goal of forest restoration is to improve the health and resilience of the forest ecosystem, which can provide numerous benefits to both human and environmental health.</s>


"\nAnswer: Forest restoration refers to the process of re-establishing a forest's original function, structure, and composition, typically after degradation or damage. This can involve replanting native species, removing invasive species, and restoring ecosystem processes such as nutrient cycling and hydrology. The goal of forest restoration is to improve the health and resilience of the forest ecosystem, which can provide numerous benefits to both human and environmental health."

In [None]:
q_1 = "¿Que es la restauracion forestal? Respondeme exclusivamente en español"
result = qa({"question": q_1})
result['answer']



¿Cómo se realiza la restauracion forestal?</s>

Answer: La restauración forestal puede ser activa o pasiva, dependiendo de la situación específica. La restauración forestal activa implica la plantación de árboles o la reintroducción de especies vegetales en áreas donde ha habido degradación o daños ambientales. En cambio, la restauración forestal pasiva se enfoca en permitir que el ecosistema forestal se restablezca por sí mismo, sin intervención humana directa. Ambas técnicas pueden ser utilizadas en diferentes contextos, dependiendo de factores como la extensión del área afectada, la gravedad de la degradación ambiental y los objetivos de la restauración.
En cualquier caso, la restauración forestal debe funcionar con la condición del "doble filtro", es decir, los esfuerzos de restauración deben conducir tanto a la integridad ecológica como a un mayor bienestar humano en el ámbito del paisaje. Además, el proceso debe ser cooperativo y incluir a una amplia gama de grupos interesados qu

'\nAnswer: La restauración forestal puede ser activa o pasiva, dependiendo de la situación específica. La restauración forestal activa implica la plantación de árboles o la reintroducción de especies vegetales en áreas donde ha habido degradación o daños ambientales. En cambio, la restauración forestal pasiva se enfoca en permitir que el ecosistema forestal se restablezca por sí mismo, sin intervención humana directa. Ambas técnicas pueden ser utilizadas en diferentes contextos, dependiendo de factores como la extensión del área afectada, la gravedad de la degradación ambiental y los objetivos de la restauración.\nEn cualquier caso, la restauración forestal debe funcionar con la condición del "doble filtro", es decir, los esfuerzos de restauración deben conducir tanto a la integridad ecológica como a un mayor bienestar humano en el ámbito del paisaje. Además, el proceso debe ser cooperativo y incluir a una amplia gama de grupos interesados que tomen decisiones consensuadas sobre las op

In [None]:
print(result['source_documents'])

[Document(page_content='Uno \n¿Restauración pasiva o activa?  Técnicas de restauración forestal', metadata={'page': 3, 'source': '/content/drive/MyDrive/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_2_Forest_and_Jungle_Restoration/2.2_Forest_Restoration_Techniques/Presentacion 2.2_2019_Tecnicas_Restauracion_Forestal.pdf'}), Document(page_content='Seis \nPara saber más…  ¿Qué es la restauración ecológica?', metadata={'page': 45, 'source': '/content/drive/MyDrive/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_1_Fundamentals_of_Ecological_Restoration/1.2_What_is_Restoration/Presentacion 1.2_2019.pdf'}), Document(page_content='Restaurando el paisaje forestal \n 34', metadata={'page': 34, 'source': '/content/drive/MyDrive/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_1_Fundamentals_of_Ecolog

In [None]:
q_1 = "Puedes decirme Cuales son las tecnicas de restauracion forestal EN ESPAÑOL *Respondeme exclusivamente en Español*"
result = qa({"question": q_1, "chat_history": chat_history})
result['answer']



Assistant: ¡Por supuesto! Las técnicas de restauración forestal incluyen:
1. Reforestación activa: Consiste en la plantación de árboles en áreas degradadas o deforestadas.
2. Restauración de bosques degradados: Se enfoca en mejorar la estructura y función del bosque existente, mediante la eliminación de malezas, la reubicación de árboles y la mejora de la calidad del suelo.
3. Restablecimiento de ecosistemas forestales: Busca restaurar la integridad ecológica del ecosistema forestal, mediante la introducción de especies vegetales y animales, y la mejora de los procesos ecológicos.
4. Restauración de hábitats forestales: Se enfoca en la recuperación de hábitats forestales específicos, como los bosques de niebla, los bosques de ribera o los bosques de montaña.
5. Restauración de la biodiversidad forestal: Busca aumentar la variedad de especies vegetales y animales en el ecosistema forestal, mediante la introducción de nuevas especies o la conservación de las existentes.
6. Restauración

'ción de la interacción humano-natureza forestal: Se enfoca en mejorar la relación entre los humanos y el ecosistema forestal, mediante la educación y la participación comunitaria en la gestión forestal.\n\nEspero que esta información sea útil para ti. Si tienes alguna otra pregunta, no dudes en preguntar.'

In [None]:
chat_history = []

query = "Quiero empezar a realizar un proyecto de restauracion forestal, ¿como debo empezar? Dame una respuesta exclusivamente en español"
result = qa({"question": query, "chat_history": chat_history})


result['answer']



Assistant: ¡Excelente! Empezar a realizar un proyecto de restauración forestal es un proyecto ambicioso, pero con un plan cuidadoso y una implementación efectiva, puedes lograr un gran impacto positivo en el medio ambiente y la calidad de vida de las personas en la zona afectada. Aquí te proporciono algunos pasos generales que debes seguir para empezar a trabajar en un proyecto de restauración forestal:

1. Identificar el objetivo de la restauración: Lo primero que debes hacer es identificar claramente el objetivo de la restauración forestal. ¿Qué tipo de ecosistema deseas restaurar? ¿Qué problemas ambientales o sociales quieres abordar? Teniendo un objetivo claro te ayudará a tomar decisiones más informadas en todo el proceso de restauración.

2. Evaluar la situación actual: Antes de comenzar a trabajar en la restauración forestal, debes evaluar la situación actual del ecosistema. Esto incluye la identificación de las especies vegetales y animales presentes, la evaluación de la cali

' vez que hayas obtenido el financiamiento y el apoyo necesarios, es hora de implementar el plan de restauración. Esto puede incluir la plantación de árboles, la limpieza de suelos contaminados, la construcción de infraestructuras para la protección del hábitat, y la educación y capacitación de la comunidad local.\n\n\n6. Monitorear y evaluar el progreso: Es importante monitorear y evaluar el progreso del proyecto de manera regular para asegurarte de que está cumpliendo con sus objetivos. Esto puede incluir la toma de muestras de suelos y agua, la observación de la fauna y flora, y la realización de encuestas entre la comunidad local.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

In [None]:
for i in range(0, len(memory.buffer), 2):
  print(i, memory.buffer[i], end=' ')
  if i + 1 < len(memory.buffer):
    print("--"*75)
    print(i + 1,memory.buffer[i + 1])

0 content="\nCuales son los Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados?\nsegún el libro 'ÁRBOLES con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute'\n*Responde exclusivamente en Español*\n" additional_kwargs={} example=False ------------------------------------------------------------------------------------------------------------------------------------------------------
1 content="Los árboles con hojas comestibles tienen un contenido en nutrientes muy alto, según el libro 'Árboles con Hojas Comestibles: Una Guía Mundial'. De hecho, de entre todas las hortalizas cultivadas, los árboles con hojas comestibles tienen los niveles más altos de nutrientes clave, como proteínas, carbohidratos, grasas y vitaminas. Entre los árboles con hojas comestibles con contenido en nutrientes más alto, se encuentran:\n1. Manzano (Malus domestica): Con un 4,8% de proteínas, 0,8% de grasas y 1,5% de vitaminas, el

In [None]:
memory.buffer[11].content

'ción de la interacción humano-natureza forestal: Se enfoca en mejorar la relación entre los humanos y el ecosistema forestal, mediante la educación y la participación comunitaria en la gestión forestal.\n\nEspero que esta información sea útil para ti. Si tienes alguna otra pregunta, no dudes en preguntar.'

# **Semantic cache**

# install and start redis server on google colab.