#Llama 2: Leveraging the META Language Model on HuggingFace

Explore the capabilities of the META LLM (Language Model) and its integration with HuggingFace for innovative natural language processing tasks and applications. Join us in harnessing the power of cutting-edge AI for text generation and understanding.

## Install libraries

In [None]:
!pip install -qU transformers accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m22.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m35.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m167.0/167.0 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) .

In [None]:
!nvidia-smi

Sun Sep 10 12:18:33 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Get access to Huggingface

In [None]:
# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

··········


This code segment performs the following tasks:

1. Imports the necessary libraries, including cuda (for GPU operations), bfloat16 (a data type for GPU optimization), and transformers (for working with pre-trained language models).

2. Defines the model_id, which specifies the identifier for a pre-trained language model.

3. Determines the device for model execution based on GPU availability. If a GPU is available, it sets the device to be used; otherwise, it falls back to using the CPU.

4. Configures quantization settings using the bitsandbytes library. Quantization is a technique used to reduce the memory and computational requirements of the model.

5. Initializes items related to the Hugging Face (HF) ecosystem, such as authentication using an access token, model configuration, and loads a pre-trained model for causal language modeling.

6. Sets the model in evaluation mode, enabling it for inference.

In summary, this code prepares a pre-trained language model for usage, optimizes it for GPU memory usage through quantization, and ensures it's ready for evaluation and inference tasks.

In [None]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-7b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, you need an access token
hf_auth = HUGGINGFACEHUB_API_TOKEN #'<add your access token here>'
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)

# enable evaluation mode to allow model inference
model.eval()

print(f"Model loaded on {device}")



Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]



Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


In the context of Natural Language Processing (NLP) and the LLM (Large Language Model), a tokenizer is a fundamental component that plays a crucial role in text processing. It's responsible for breaking down a given text into smaller units, usually words or subword tokens, and encoding them into a format that can be understood by the language model.

In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)



Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

 This is a list containing two elements, '\nHuman:' and '\n```\n'. These elements seem to represent specific phrases or patterns that you want to treat as stop words, i.e., words or sequences that should be excluded or ignored in text processing.

In [None]:
stop_list = ['\nHuman:', '\n```\n']

stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]
stop_token_ids

[[1, 29871, 13, 29950, 7889, 29901], [1, 29871, 13, 28956, 13]]

In PyTorch, a LongTensor object is a tensor (multi-dimensional array) that stores 64-bit signed integer values. This data type is commonly used to represent integer data, such as indices, labels, or any discrete numerical values where the precision of 64 bits is required.

In [None]:
# We have to convert these stop token ids into LongTensor objects.
import torch

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
stop_token_ids

[tensor([    1, 29871,    13, 29950,  7889, 29901], device='cuda:0'),
 tensor([    1, 29871,    13, 28956,    13], device='cuda:0')]

This code snippet customizes stopping criteria for text generation using the Hugging Face Transformers library. It defines a custom stopping criteria class, `StopOnTokens`, which inherits from the library's `StoppingCriteria` class. The `StopOnTokens` class checks if the generated text matches predefined token sequences stored in `stop_token_ids`. If a match is found, text generation is halted. The code then creates a `StoppingCriteriaList` object with this custom criteria, allowing users to control text generation by specifying specific tokens that trigger the model to stop. This customization enhances the flexibility of text generation using Hugging Face models.

In [None]:
from transformers import StoppingCriteria, StoppingCriteriaList

# define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

We are ready to initialize the Hugging Face pipeline. There are a few additional parameters that we must define here. Comments are included in the code for further explanation.

In [None]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

from transformers import pipeline, TextStreamer

streamer = TextStreamer(tokenizer, skip_prompt=True) # Show word by word in the screen

This code sets up a text generation pipeline using the Hugging Face Transformers library. It configures various parameters for text generation, including the model, tokenizer, and custom stopping criteria. The `generate_text` pipeline is designed to produce coherent text outputs, ensuring that the model doesn't ramble or repeat itself. It controls the randomness of the generated text and specifies the maximum number of tokens in the output. Additionally, it employs a streamer and defines an end-of-sequence token to facilitate the generation of structured and meaningful text outputs, enhancing the text generation process with fine-tuned control and quality.

In [None]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    stopping_criteria=stopping_criteria,  # without this model rambles during chat
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=812,  # max number of tokens to generate in the output 512
    repetition_penalty=1.1,  # without this output begins repeating
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    streamer=streamer,
    eos_token_id=tokenizer.eos_token_id
)

## Test the model

In [None]:
# Run this code to confirm that everything is working fine.
res = generate_text("What is the restoration forest?")
res[0]["generated_text"]


everybody knows that the forest is a place of great beauty and diversity, but it is also a complex ecosystem that provides many important benefits to society. Restoration forests are areas where degraded or damaged forests are restored to their natural state through reforestation, habitat reconstruction, and other measures. These efforts can help to improve biodiversity, stabilize soil, and even mitigate climate change by sequestering carbon in trees and soil. In this article, we will explore the concept of restoration forests and how they can benefit both people and the environment.
What are restoration forests?
Restoration forests are areas of land that have been degraded or damaged through human activities such as deforestation, logging, mining, or urbanization. These areas are often characterized by reduced tree cover, altered hydrological cycles, and disrupted ecosystem processes. Restoration forests aim to restore these damaged ecosystems to their natural state by planting nativ

'What is the restoration forest?\n everybody knows that the forest is a place of great beauty and diversity, but it is also a complex ecosystem that provides many important benefits to society. Restoration forests are areas where degraded or damaged forests are restored to their natural state through reforestation, habitat reconstruction, and other measures. These efforts can help to improve biodiversity, stabilize soil, and even mitigate climate change by sequestering carbon in trees and soil. In this article, we will explore the concept of restoration forests and how they can benefit both people and the environment.\nWhat are restoration forests?\nRestoration forests are areas of land that have been degraded or damaged through human activities such as deforestation, logging, mining, or urbanization. These areas are often characterized by reduced tree cover, altered hydrological cycles, and disrupted ecosystem processes. Restoration forests aim to restore these damaged ecosystems to t

# Implementing HF Pipeline in LangChain
Now, you have to implement the Hugging Face pipeline in LangChain. You will still get the same output as nothing different is being done here. However, this code will allow you to use LangChain’s advanced agent tooling, chains, etc, with Llama 2.

In [None]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

# checking again that everything is working fine
llm(prompt="What is the restoration forest?")


nobody knows.

The restoration forest is a place where people can come to heal, grow and learn. It is a sanctuary for those who have been hurt or damaged by the world around them. The forest is a place of peace and tranquility, where one can find solace and comfort in the beauty of nature.
Inside the forest, there are many different areas that cater to different needs. There is a stream that runs through the center of the forest, providing fresh water for all who visit. There are also fields of wildflowers, where people can sit and watch the flowers bloom and change with the seasons.
There are also many different types of trees within the forest, each one offering its own unique benefits. Some trees provide shade and shelter from the sun, while others offer protection from the wind and rain. Some trees even have special properties that can help to heal the body and mind.
One of the most interesting aspects of the restoration forest is the way it changes over time. As people come and g

'\n nobody knows.\n\nThe restoration forest is a place where people can come to heal, grow and learn. It is a sanctuary for those who have been hurt or damaged by the world around them. The forest is a place of peace and tranquility, where one can find solace and comfort in the beauty of nature.\nInside the forest, there are many different areas that cater to different needs. There is a stream that runs through the center of the forest, providing fresh water for all who visit. There are also fields of wildflowers, where people can sit and watch the flowers bloom and change with the seasons.\nThere are also many different types of trees within the forest, each one offering its own unique benefits. Some trees provide shade and shelter from the sun, while others offer protection from the wind and rain. Some trees even have special properties that can help to heal the body and mind.\nOne of the most interesting aspects of the restoration forest is the way it changes over time. As people co

#Ingesting Data using Document Loader


In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!sudo -H pip install pypdf

Collecting pypdf
  Downloading pypdf-3.15.5-py3-none-any.whl (272 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/272.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━[0m [32m245.8/272.6 kB[0m [31m7.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m272.6/272.6 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.15.5


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from langchain.document_loaders import PyPDFLoader

PDF_paths = ["/content/drive/MyDrive/omdena_local_chapter_colombia/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_2_Forest_and_Jungle_Restoration/Tarea_Modulo_2/1. Migracion_asistida_como_herramienta_restauración_bosques.pdf",
             #"/content/drive/MyDrive/omdena_local_chapter_colombia/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_2_Forest_and_Jungle_Restoration/2.2_Forest_Restoration_Techniques/Identification_manuals/Manual para la identificación de especies forestales de la Amazonía peruana. volumen I.pdf",
             "/content/drive/MyDrive/omdena_local_chapter_colombia/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_2_Forest_and_Jungle_Restoration/2.2_Forest_Restoration_Techniques/Identification_manuals/arboles-con-hojas-comestibles.pdf"]

loaders = [PyPDFLoader(PDF_path) for PDF_path in PDF_paths]
documents = [loader.load() for loader in loaders]

# Splitting in Chunks using Text Splitters

This code performs several text processing tasks. First, it uses the "RecursiveCharacterTextSplitter" to split documents into smaller chunks, each containing 1000 characters with a 20-character overlap. Then, it employs the "HuggingFaceEmbeddings" to generate embeddings for the text using a specific model called "sentence-transformers/all-mpnet-base-v2," utilizing the GPU for processing. Finally, it stores these embeddings in a vector store called "FAISS," derived from the split documents. Essentially, this code prepares text data by splitting it, embedding it, and then storing those embeddings for further analysis or retrieval.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from functools import reduce

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = [text_splitter.split_documents(document) for document in documents]

# Use reduce for merge all the list in one
all_splits = reduce(lambda x, y: x + y, all_splits)

In [None]:
display(len(all_splits))
all_splits[-100]

397

Document(page_content='capeva. Forma: Arbusto semileñoso.Origen: Trópico americano.Clima y Suelos: Especie amante de la sombra de los trópicos húmedos, tanto de tierras bajas como altas.', metadata={'source': '/content/drive/MyDrive/omdena_local_chapter_colombia/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_2_Forest_and_Jungle_Restoration/2.2_Forest_Restoration_Techniques/Identification_manuals/arboles-con-hojas-comestibles.pdf', 'page': 75})

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

# storing embeddings in the vector store
vectorstore = FAISS.from_documents(all_splits, embeddings)

#Initializing Chain

In [None]:
from langchain.chains import ConversationalRetrievalChain

chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)

In [None]:
chat_history = []

query = "Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados? y dame la respuesta exclusivamente en español, por favor enfocate en el idioma lo quiero en español"
result = chain({"question": query, "chat_history": chat_history})

result['answer']


Los árboles con hojas comestibles tienen contenido en nutrientes entre los diez más altos de todos los vegetales cultivados. Según la publicación "Trees for a Future" (2020), las especies de árboles con hojas comestibles con mayor contenido en nutrientes son:
1. Morera (Morus alba): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.
2. Orgaza (Atriplex halimus): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.
3. Higuerón (Ficus carica): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.
4. Palma datilera (Phoenix dactylifera): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.
5. Tamarindo (Tamarindus indica): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.
6. Guayabo (Guaiacum officinale): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.
7. Guava (Prunus gua

'\nLos árboles con hojas comestibles tienen contenido en nutrientes entre los diez más altos de todos los vegetales cultivados. Según la publicación "Trees for a Future" (2020), las especies de árboles con hojas comestibles con mayor contenido en nutrientes son:\n1. Morera (Morus alba): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.\n2. Orgaza (Atriplex halimus): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.\n3. Higuerón (Ficus carica): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.\n4. Palma datilera (Phoenix dactylifera): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.\n5. Tamarindo (Tamarindus indica): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.\n6. Guayabo (Guaiacum officinale): Contiene entre 1,5 y 2,5% de proteínas, 0,5 a 1,5% de grasas y 6 a 8% de carbohidratos.\n7. Guava (P

In [None]:
query = "Te estoy preguntando por los datos en el libro 'ÁRBOLES con Hojas Comestibles Una Guía Mundial Perennial Agriculture Institute' "
result = chain({"question": query, "chat_history": chat_history})

result['answer']

The book "TREES with Edible Leaves A Global Guide" by the Perennial Agriculture Institute provides information on 102 species of trees, shrubs, and cacti that are grown for their edible leaves and shoots. The book covers topics such as the origins of these plants, their potential to address nutrient deficiencies, their benefits for climate change mitigation and adaptation, and other advantages. It also provides an overview of the cultivation of these species, including basic techniques and how they can be integrated into agroforestry systems. Additionally, it offers information on propagation and care of the trees with edible leaves.</s>


' The book "TREES with Edible Leaves A Global Guide" by the Perennial Agriculture Institute provides information on 102 species of trees, shrubs, and cacti that are grown for their edible leaves and shoots. The book covers topics such as the origins of these plants, their potential to address nutrient deficiencies, their benefits for climate change mitigation and adaptation, and other advantages. It also provides an overview of the cultivation of these species, including basic techniques and how they can be integrated into agroforestry systems. Additionally, it offers information on propagation and care of the trees with edible leaves.'

In [None]:
query = "¿Cuales son los Árboles con hojas comestibles con contenido en nutrientes entre los diez más altos de todos los vegetales cultivados, para las deficiencias de la dieta industrial enfocados en el calcio?"
result = chain({"question": query, "chat_history": chat_history})

result['answer']

According to the provided data from PAI, the following tree species have high levels of calcium among all vegetables cultivated:




















































































































































































































































































































































































































































































































































































































































































































































































































' According to the provided data from PAI, the following tree species have high levels of calcium among all vegetables cultivated:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

In [None]:
chat_history

[]

In [None]:
chat_history = []

query = "Que es la migración asistida? y dame la respuesta exclusivamente en español"
result = chain({"question": query, "chat_history": chat_history})

result['answer']

La migración asistida es una técnica de restaución de bosques que implica la translocación de especies a nuevas localidades fuera de sus rangos de distribución. Es decir, se trata de movilizar a las especies a áreas donde no han sido antes, con el objetivo de prevenir posibles extinciones debido al cambio climático.



</s>


' La migración asistida es una técnica de restaución de bosques que implica la translocación de especies a nuevas localidades fuera de sus rangos de distribución. Es decir, se trata de movilizar a las especies a áreas donde no han sido antes, con el objetivo de prevenir posibles extinciones debido al cambio climático.\n\n\n\n'

In [None]:
chat_history = []

query = "What is assisted migration? Provide the answer exclusively in English."
result = chain({"question": query, "chat_history": chat_history})

result['answer']

Assisted migration refers to the intentional movement of individuals or populations of a species to new locations outside their natural range, with the goal of preventing extinctions due to climate change. This strategy involves human intervention to facilitate the movement of plants or animals to areas where they can better adapt to changing environmental conditions. (Seddon 2010, Thomas 2011; Williams & Dumroese 2013)</s>


' Assisted migration refers to the intentional movement of individuals or populations of a species to new locations outside their natural range, with the goal of preventing extinctions due to climate change. This strategy involves human intervention to facilitate the movement of plants or animals to areas where they can better adapt to changing environmental conditions. (Seddon 2010, Thomas 2011; Williams & Dumroese 2013)'

In [None]:
print(result['source_documents'])

[Document(page_content='Actividades  Migración Asistida como herramienta de restauración de bosques', metadata={'source': '/content/drive/MyDrive/omdena_local_chapter_colombia/Development of a Forest Restoration Chatbot using NLP/Forest_and_Landscape_Restoration/Restoration_Modules/Module_2_Forest_and_Jungle_Restoration/Tarea_Modulo_2/1. Migracion_asistida_como_herramienta_restauración_bosques.pdf', 'page': 9}), Document(page_content='Migración Asistida como herramienta de restauración de bosques  \nIntroducción  \nLa migración  asistida  se refiere  a la translocación  de \nespecies  a nuevas  localidades , fuera  de sus rangos  de \ndistribución . \n \nSe considera  que la migración  asistida  es una \nestrategia  de mitigación  ante  el cambio  climático  \npara  prevenir  posibles  extinciones .  \n(Seddon  2010, Thomas 2011; Williams & Dumroese  2013)', metadata={'source': '/content/drive/MyDrive/omdena_local_chapter_colombia/Development of a Forest Restoration Chatbot using NLP/

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

In [None]:
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True
)

In [None]:
%%time
conversation.predict(input="Que es la restauracion forestal? Dame tu respuesta en español", stop=["Q:", "\n"])



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Que es la restauracion forestal?
AI:  Oh, that's a great question! *thinks for a moment* Restauracion forestal is the process of restoring or rehabilitating damaged or degraded forests. It involves various techniques such as planting native tree species, removing invasive species, and controlling pests and diseases. The goal is to improve the health and biodiversity of the forest ecosystem, which can help to mitigate climate change, protect wildlife habitats, and provide benefits for local communities. *smiling* Did you know that some of the most successful forest restoration projects have been in areas where the local community is actively 



Oh, ¡claro! *thinks for a moment* La restauracion forestal es el proceso de restaurar o rehabilitar bosques dañados o degradados. Involucra técnicas como el plantado de especies nativas, la eliminación de especies invasoras y el control de plagas y enfermedades. El objetivo es mejorar la salud y la diversidad del ecosistema forestal, lo que puede ayudar a frenar el cambio climático, proteger los hábitats de vida silvestre y proporcionar beneficios para las comunidades locales. *sonriendo* ¿Sabías que algunos de los proyectos de restauracion forestal más exitosos han sido en áreas donde la comunidad local está activamente involucrada en el proceso? Es importante involucrar a la comunidad y involucrarla en el decision-making para asegurarse de que el proyecto tenga éxito.</s>

[1m> Finished chain.[0m
CPU times: user 20.3 s, sys: 644 ms, total: 20.9 s
Wall time: 21.6 s


' Oh, ¡claro! *thinks for a moment* La restauracion forestal es el proceso de restaurar o rehabilitar bosques dañados o degradados. Involucra técnicas como el plantado de especies nativas, la eliminación de especies invasoras y el control de plagas y enfermedades. El objetivo es mejorar la salud y la diversidad del ecosistema forestal, lo que puede ayudar a frenar el cambio climático, proteger los hábitats de vida silvestre y proporcionar beneficios para las comunidades locales. *sonriendo* ¿Sabías que algunos de los proyectos de restauracion forestal más exitosos han sido en áreas donde la comunidad local está activamente involucrada en el proceso? Es importante involucrar a la comunidad y involucrarla en el decision-making para asegurarse de que el proyecto tenga éxito.'

In [None]:
print(memory.buffer)

Human: Que es la restauracion forestal?
AI:  Oh, that's a great question! *thinks for a moment* Restauracion forestal is the process of restoring or rehabilitating damaged or degraded forests. It involves various techniques such as planting native tree species, removing invasive species, and controlling pests and diseases. The goal is to improve the health and biodiversity of the forest ecosystem, which can help to mitigate climate change, protect wildlife habitats, and provide benefits for local communities. *smiling* Did you know that some of the most successful forest restoration projects have been in areas where the local community is actively involved in the process? It's important to engage with the community and involve them in decision-making to ensure the project's success.
Human: Que es la restauracion forestal? Dame tu respuesta en español
AI:  Oh, ¡claro! *thinks for a moment* La restauracion forestal es el proceso de restaurar o rehabilitar bosques dañados o degradados. In

In [None]:
from langchain import PromptTemplate
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain


from langchain.memory import ConversationBufferMemory

template = """You will answer the following questions the best you can, being as informative and factual as possible. Detecting the source language, whether it's Spanish or English,
and responding in the same language you are being asked in.
If you don't know, say you don't know.

Current conversation:
{history}
Human: {input}
AI Assistant:"""

#the_output_parser=MyOutputParser()
#print(type(the_output_parser))
memory = ConversationBufferWindowMemory(k=4)

PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
    prompt=PROMPT,
    llm=llm,
    memory=memory,
    return_final_only=True,
    verbose=False,
    #output_parser=the_output_parser,
)

In [None]:
%%time
conversation.predict(input="Que es la restauracion forestal? Dame tu respuesta en español") #, stop=["Q:", "\n"]

Entiendo que estás buscando información sobre la reforestación. La reforestación es el proceso de volver a plantar árboles en un área donde antes había bosques o hados. Es una práctica importante para proteger y mejorar el medio ambiente, ya que ayuda a reducir la cantidad de dióxido de carbono en la atmósfera, protege a las especies y ecosistemas, y proporciona recursos económicos y sociales para las comunidades locales. Además, la reforestación puede ayudar a prevenir desastres naturales como inundaciones, erosión del suelo y tormentas, y también puede mejorar la calidad del agua y la calidad del aire. ¿Quieres saber más sobre este tema?</s>
CPU times: user 17.7 s, sys: 454 ms, total: 18.1 s
Wall time: 18.4 s


' Entiendo que estás buscando información sobre la reforestación. La reforestación es el proceso de volver a plantar árboles en un área donde antes había bosques o hados. Es una práctica importante para proteger y mejorar el medio ambiente, ya que ayuda a reducir la cantidad de dióxido de carbono en la atmósfera, protege a las especies y ecosistemas, y proporciona recursos económicos y sociales para las comunidades locales. Además, la reforestación puede ayudar a prevenir desastres naturales como inundaciones, erosión del suelo y tormentas, y también puede mejorar la calidad del agua y la calidad del aire. ¿Quieres saber más sobre este tema?'

In [None]:
conversation.predict(input="Si, si quiero saber mas sobre este tema. Actualmente quiero hacer un proyecto de restauracion forestal. Por donde deberia empezar? que me recomiendas?")

¡Excelente! La reforestación es un proyecto muy interesante y valioso. Para empezar, te recomiendo que investigues sobre las diferentes técnicas de reforestación, como la plantación de semillas, la siembra de árboles jardín, la regeneración natural, entre otras. También es importante considerar factores como el clima, la topografía y la vegetación existente en la zona donde quieres realizar la reforestación. Además, es recomendable que consultes con expertos en el campo y que tengas en cuenta los costos y los recursos necesarios para llevar a cabo el proyecto. ¿Quieres saber más sobre cómo empezar?</s>


' ¡Excelente! La reforestación es un proyecto muy interesante y valioso. Para empezar, te recomiendo que investigues sobre las diferentes técnicas de reforestación, como la plantación de semillas, la siembra de árboles jardín, la regeneración natural, entre otras. También es importante considerar factores como el clima, la topografía y la vegetación existente en la zona donde quieres realizar la reforestación. Además, es recomendable que consultes con expertos en el campo y que tengas en cuenta los costos y los recursos necesarios para llevar a cabo el proyecto. ¿Quieres saber más sobre cómo empezar?'

In [None]:
print(memory.buffer)

Human: Que es la restauracion forestal? Dame tu respuesta en español
AI:  Entiendo que estás buscando información sobre la reforestación. La reforestación es el proceso de volver a plantar árboles en un área donde antes había bosques o hados. Es una práctica importante para proteger y mejorar el medio ambiente, ya que ayuda a reducir la cantidad de dióxido de carbono en la atmósfera, protege a las especies y ecosistemas, y proporciona recursos económicos y sociales para las comunidades locales. Además, la reforestación puede ayudar a prevenir desastres naturales como inundaciones, erosión del suelo y tormentas, y también puede mejorar la calidad del agua y la calidad del aire. ¿Quieres saber más sobre este tema?
Human: Si, si quiero saber mas sobre este tema. Actualmente quiero hacer un proyecto de restauracion forestal. Por donde deberia empezar? que me recomiendas?
AI:  ¡Excelente! La reforestación es un proyecto muy interesante y valioso. Para empezar, te recomiendo que investigues 

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


#Second method

In [None]:
from transformers import AutoTokenizer
import transformers
import torch

model = "daryl149/llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model,use_auth_token=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Downloading (…)okenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]



Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Result: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

I'm glad you liked "Breaking Bad" and "Band of Brothers". They are both highly acclaimed and widely popular shows. Here are some other shows that you might enjoy, based on your interest in drama and crime:

1. "The Sopranos" - This HBO series is a classic drama that follows the life of a New Jersey mob boss, Tony Soprano, as he navigates the criminal underworld and deals with personal and family issues.

2. "The Wire" - This HBO series is a gritty and realistic portrayal of the drug trade in Baltimore, from the perspective of both law enforcement and the criminals involved. It's known for its complex characters and nuanced storytelling.

3. "Mad Men"


In [None]:
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'What is the restoration forest?\n',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Result: What is the restoration forest?

The Restoration Forest is a new feature in the Forest setting that allows you to restore your forest to its former glory after it has been damaged by a natural disaster or other event.

When you first enter the Restoration Forest, you will see a map of your forest with a number of red dots indicating areas that need restoration. You can click on these dots to view the damage and see what needs to be done to restore the area.

Once you have identified the areas that need restoration, you can start working on them by clicking on the "Restore" button. This will take you to a new screen where you can choose the type of restoration you want to perform, such as planting trees or repairing damaged buildings.

As you restore the forest, you will see the damage gradually disappear and the forest begin to flourish once again. You will also earn rewards and


In [None]:
def remove_duplicates(text):
    lines = text.split('\n')
    unique_lines = set()
    cleaned_text = ""

    for line in lines:
        if line.strip() not in unique_lines:
            unique_lines.add(line.strip())
            cleaned_text += line + '\n'

    return cleaned_text

In [None]:
sequences = pipeline(
    'Que es la restauracion forestal?\n',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
  cleaned_text = remove_duplicates(seq['generated_text'])
  print(cleaned_text)
  #print(f"Result: {seq['generated_text']}")

Que es la restauracion forestal?

La restauracion forestal es el proceso de reconstrucción o recuperación de un ecosistema forestal que ha sido dañado o degradado. Esto puede ocurrir por diferentes razones, como la deforestación, la degradación del suelo, la enfermedad o los incendios forestales. La restauracion forestal tiene como objetivo mejorar la calidad del ecosistema forestal, proteger la biodiversidad, mejorar la calidad del agua y del aire, y reducir la vulnerabilidad de los comunidades forestales a los desastres naturales.
La restauracion forestal puede incluir diferentes técnicas y estrategias, como la plantación de árboles, la reintroducción de especies nativas, la mejora de la estructura del suelo, la gestión de inc

