In [1]:
# %pip install llama-index
# %pip install transformers accelerate bitsandbytes
# %pip install llama-index-readers-web
# %pip install llama-index-llms-huggingface
# %pip install llama-index-embeddings-huggingface
# %pip install llama-index-program-openai
# %pip install llama-index-agent-openai

Setup

Loading Stage

In [2]:
from llama_index.core.response.notebook_utils import display_response
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()

Indexing Stage

LLM

In [2]:
# %pip install transformers accelerate

In [4]:
# huggingface api token for downloading llama2
hf_token = "your_token"

In [5]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.core.prompts import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    query_wrapper_prompt=PromptTemplate(" [INST] {query_str} [/INST] "),
    context_window=3900,
    model_kwargs={"token": hf_token, "quantization_config": quantization_config},
    tokenizer_kwargs={"token": hf_token},
    device_map="auto",
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Embedding Model

In [6]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = "local:BAAI/bge-small-en-v1.5"

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Storing Stage

In [7]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents)

Query Stage

In [8]:
query_engine = vector_index.as_query_engine()

response = query_engine.query("What is data responsibility in humanitarian action")

display_response(response)

**`Final Response:`** Based on the provided context information, the answer to the query "What is data responsibility in humanitarian action?" is:

Data responsibility in humanitarian action refers to the safe, ethical, and effective management of personal and non-personal data for operational response. It involves ensuring that data is managed in a way that protects the right to privacy, avoids harm to already vulnerable people and communities, and leads to increased trust among humanitarian actors and the people they serve.

The Centre for Humanitarian Data at the Office for the Coordination of Humanitarian Affairs (OCHA) plays a key role in promoting data responsibility in the humanitarian sector. The Centre has developed guidelines for data responsibility, which apply to all operational data managed directly by OCHA, managed on its behalf, or managed by humanitarian actors within activities coordinated by OCHA. These guidelines address both data protection (safeguards that preserve the right to privacy) and data security (measures that aim to preserve the confidentiality, availability, and integrity of personal and non-personal data).

The Centre also works with partners to publish guidance

In [9]:
query_engine = vector_index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is data responsibility in humanitarian action")

display_response(response)

**`Final Response:`** Based on the provided context information, the answer to the query "What is data responsibility in humanitarian action?" is:

Data responsibility in humanitarian action refers to the safe, ethical, and effective management of personal and non-personal data for operational response. It is a critical issue for the sector to address, as the disclosure of sensitive data can lead to further harm to already vulnerable people and communities, while responsible data management can lead to increased trust among humanitarian actors and with the people they seek to serve.

Data responsibility encompasses both data protection (safeguards that preserve the right to privacy with respect to the processing of personal data) and data security (measures that aim to preserve the confidentiality, availability, and integrity of personal and non-personal data).

In recent years, the humanitarian sector has seen the development of principles, policies, and strategies for data responsibility, including system-wide guidance such as the Inter-Agency Standing Committee Operational Guidance on Data Responsibility in Humanitarian Action and organization-specific guidance such as the ICRC Handbook on Data Protection in Humanitarian Action or the UN Secret