In [1]:
# pip install torch transformers ctransformers sentencepiece
# Apple silicon:
# CT_METAL=1 pip install ctransformers --no-binary ctransformers

In [2]:
from ctransformers import AutoModelForCausalLM, AutoConfig
from transformers import AutoTokenizer
from typing import List, Dict


- Retrieval
    - RAGatouille / ColBERT
    - normal DPR (chromaDB)
    - manual/oldschool methods
- Generation
    - original paper is encoder/decoder (BART)
        - encode all documents + query
        - decode from encoder state 
    - all best models now are decoder-only (GPT/LLama/Mistral/...)
        - put everything in a prompt
- Evaluation
    - metrics
        - faithfulness
        - quality of answers (embedding-based)
        - BLUE/ROUGE, Code quality + reproducibility
        - originality
    - Synthetic dataset
        - https://nlp.stanford.edu/pubs/wang-berant-liang-acl2015.pdf
        - ChatGPT to generate queries/answers?
    - We should get queries/answers from hackathon
        - 10 beforehand 
- Fine-tuning
    - ColBERT/DPR is easy to finetune
    - Mistral is very hard to finetune (in a day)
        - Fine-tuning the prompt is easiest/fastest way to get improvements
     
- UX
 

# Load model

### Quantized Mistral 7B, finetuned on code instructions
- https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-16k-GGUF#provided-files

In [3]:
gpu_layers = 0

model_file = "openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf"

llm = AutoModelForCausalLM.from_pretrained(
    "TheBloke/OpenHermes-2.5-Mistral-7B-16k-GGUF",
    model_file="openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf",
    model_type="mistral", 
    gpu_layers=gpu_layers,
    max_new_tokens=4000,
    context_length=16_000,
)

tokenizer = AutoTokenizer.from_pretrained("NurtureAI/OpenHermes-2.5-Mistral-7B-16k")

def format_prompt(messages: List[Dict[str, str]]) -> str:
    return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# RAG Query

In [4]:
facebook_threat_exchange = """
Facebook API, Threat Exchange

ThreatExchange
==============
Most threat-intelligence solutions suffer because the data is too hard to standardize and verify. Meta created the ThreatExchange platform so that participating organizations can share threat data using a convenient, structured, and easy-to-use API that provides privacy controls to enable sharing with only desired groups. You can [apply for membership](https://developers.facebook.com/products/threat-exchange) today!

|     |     |
| --- | --- |
| #### [Get Started](https://developers.facebook.com/docs/threat-exchange/getting-started/)<br><br>Learn how ThreatExchange is structured and walk through steps to get started.<br><br>#### [Get Access](https://developers.facebook.com/docs/threat-exchange/getting-access/)<br><br>Learn the steps required to apply to ThreatExchange and how to add the product.<br><br>#### [API Structure](https://developers.facebook.com/docs/threat-exchange/api-structure/)<br><br>Learn how the ThreatExchange API makes use of the Graph API and try a few calls to get started. | #### [Best Practices](https://developers.facebook.com/docs/threat-exchange/best-practices)<br><br>Make the most effective use of the API by following these tips.<br><br>#### [Reference Code on GitHub](https://l.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Ffacebook%2FThreatExchange%2F&h=AT2kQVEStuZHZgpBRzkogs7FuoICL5QD5e_KeuLTptKI1bTc1AiEb94S6l-6DG3DCwYP7rkKJNf3ivZ5WY9Ne_awWf2tTRDF0F8lkuavwPV9bD7-O6h9jXEa4nGe7_scPicx8ViogqSpyAyz)<br><br>Use freely available libraries in JavaScript, PHP, Python, and Ruby to speed up your integration and development.<br><br>#### [Terms and Conditions](https://www.facebook.com/legal/threatexchange_terms/)<br><br>Learn about terms of use. |

ThreatExchange UI
-----------------

The UI contains point-and-click management for tags, privacy groups, small-scale descriptor download and upload, and more—while bulk data-processing is best deferred to the API.

|     |     |
| --- | --- |
| ### [UI Overview](https://developers.facebook.com/docs/threat-exchange/ui)<br><br>Essential information about the UI. | ### [UI Reference](https://developers.facebook.com/docs/threat-exchange/reference/ui)<br><br>Full documentation for the ThreatExchange UI. |

ThreatExchange API
------------------

Your one-stop shop for ThreatExchange resources. The API contains the complete feature-set of what is possible in ThreatExchange.

|     |     |
| --- | --- |
| ### [API Reference](https://developers.facebook.com/docs/threat-exchange/reference/apis)<br><br>Full documentation for all of the ThreatExchange APIs. | ### [Privacy Controls](https://developers.facebook.com/docs/threat-exchange/reference/privacy)<br><br>Understand the ways you control who sees the data you publish. |
| ### [Resharing Controls](https://developers.facebook.com/docs/threat-exchange/reference/resharing)<br><br>Understand how to permit re-sharing of your data. | ### [Submit Data](https://developers.facebook.com/docs/threat-exchange/reference/submitting)<br><br>Learn how to submit your data. |
| ### [Submit Connections](https://developers.facebook.com/docs/threat-exchange/reference/submitting-connections)<br><br>Learn how to create connections between pieces of data to express relationships. | ### [Edit Data](https://developers.facebook.com/docs/threat-exchange/reference/editing)<br><br>Learn how to edit your data. |
| ### [Delete Data](https://developers.facebook.com/docs/threat-exchange/reference/deleting)<br><br>Learn how to delete your data. |     |

Learn More
----------

Look here for anything else you may be missing.

|     |     |
| --- | --- |
| ### [Terms and Conditions](https://www.facebook.com/legal/threatexchange_terms)<br><br>The complete terms and conditions for ThreatExchange. | ### [Changelog](https://developers.facebook.com/docs/threat-exchange/reference/changelog)<br><br>See how the ThreatExchange API has changed. |
| ### [Open Source Terms of Use](https://opensource.facebook.com/legal/terms)<br><br>Terms of use for general Facebook Open Source projects. | ### [Open Source Privacy Policy](https://opensource.facebook.com/legal/privacy)<br><br>Privacy Policy for general Facebook Open Source projects. |
"""

In [5]:
system_message = """
You are a friendly QA bot that answers questions based only on the context you are given.
- Give an  answer only based on the context and with no prior knowledge.
- If you cannot come up with an answer to the user question, answer that you do not know the answer.
- You can be very strict with what you do and don't answer. if the query is completely unrelated to the context you are given,
reply that you do not know the answer.
- The user does not know what context you are given, it comes from a semantic search on a large body of API references.
- smileys and fun interactions are encouraged, but remember you are a professional chatbot that wants to help the user with their question.

------------------
CONTEXT
{context}
"""

In [6]:
def init_chat_log(context: str):
    return [
        {"role": "system", "content": system_message.format(context=context)},
    ]

def generate_answer(chat_log: List[Dict[str, str]], query: str):
    if query:
        chat_log.append({"role": "user", "content": query})
    if chat_log[-1]["role"] != "user":
        raise ValueError("query required")
            
    prompt = format_prompt(chat)
    output = llm(prompt, stop=["<|im_end|>"])
    chat_log.append({"role": "assistant", "content": output})
    return chat_log

from IPython.display import display, Markdown

def print_chat(chat_log):
    for entry in chat_log:
        if entry['role'] != 'system':
            display(Markdown(f"**{entry['role'].capitalize()}:** \n{entry['content']}\n"))

In [9]:
context = facebook_threat_exchange
query = "Can you tell me about the Facebook Threat Exchange API? What does it do?"

chat = init_chat_log(context)
chat = generate_answer(chat, query)
print_chat(chat)

**User:** 
Can you tell me about the Facebook Threat Exchange API? What does it do?


**Assistant:** 
The Facebook Threat Exchange API is a platform designed for participating organizations to share threat data using a structured and easy-to-use API that provides privacy controls, enabling them to share information with desired groups. It makes use of the Graph API and offers libraries in JavaScript, PHP, Python, and Ruby to speed up integration and development. The complete feature-set of what is possible in ThreatExchange can be found through the provided documentation, which includes features like submitting, editing, and deleting data, as well as privacy controls for managing who sees the published data.


In [10]:
chat = generate_answer(chat, query="Where can I find more information?")

In [11]:
print_chat(chat)

**User:** 
Can you tell me about the Facebook Threat Exchange API? What does it do?


**Assistant:** 
The Facebook Threat Exchange API is a platform designed for participating organizations to share threat data using a structured and easy-to-use API that provides privacy controls, enabling them to share information with desired groups. It makes use of the Graph API and offers libraries in JavaScript, PHP, Python, and Ruby to speed up integration and development. The complete feature-set of what is possible in ThreatExchange can be found through the provided documentation, which includes features like submitting, editing, and deleting data, as well as privacy controls for managing who sees the published data.


**User:** 
Where can I find more information?


**Assistant:** 
You can find more information about the Facebook Threat Exchange API by visiting the following resources:
- [Get Started](https://developers.facebook.com/docs/threat-exchange/getting-started/) to learn how it's structured and walk through steps to get started.
- [UI Overview](https://developers.facebook.com/docs/threat-exchange/ui) for point-and-click management options.
- [API Reference](https://developers.facebook.com/docs/threat-exchange/reference/apis) for full documentation of the ThreatExchange API.
- [Terms and Conditions](https://www.facebook.com/legal/threatexchange_terms/) for the complete terms and conditions related to Threat Exchange.
- [Changelog](https://developers.facebook.com/docs/threat-exchange/reference/changelog) for information on API changes.
- [Open Source Terms of Use](https://opensource.facebook.com/legal/terms) and [Open Source Privacy Policy](https://opensource.facebook.com/legal/privacy) for general Facebook Open Source projects.


In [13]:
print_chat(generate_answer(chat, query="where is the closest train station?"))

**User:** 
Can you tell me about the Facebook Threat Exchange API? What does it do?


**Assistant:** 
The Facebook Threat Exchange API is a platform designed for participating organizations to share threat data using a structured and easy-to-use API that provides privacy controls, enabling them to share information with desired groups. It makes use of the Graph API and offers libraries in JavaScript, PHP, Python, and Ruby to speed up integration and development. The complete feature-set of what is possible in ThreatExchange can be found through the provided documentation, which includes features like submitting, editing, and deleting data, as well as privacy controls for managing who sees the published data.


**User:** 
Where can I find more information?


**Assistant:** 
You can find more information about the Facebook Threat Exchange API by visiting the following resources:
- [Get Started](https://developers.facebook.com/docs/threat-exchange/getting-started/) to learn how it's structured and walk through steps to get started.
- [UI Overview](https://developers.facebook.com/docs/threat-exchange/ui) for point-and-click management options.
- [API Reference](https://developers.facebook.com/docs/threat-exchange/reference/apis) for full documentation of the ThreatExchange API.
- [Terms and Conditions](https://www.facebook.com/legal/threatexchange_terms/) for the complete terms and conditions related to Threat Exchange.
- [Changelog](https://developers.facebook.com/docs/threat-exchange/reference/changelog) for information on API changes.
- [Open Source Terms of Use](https://opensource.facebook.com/legal/terms) and [Open Source Privacy Policy](https://opensource.facebook.com/legal/privacy) for general Facebook Open Source projects.


**User:** 
where is the closest train station?


**Assistant:** 
I do not know the answer to this question, as the context provided is about the Facebook Threat Exchange API and not related to a specific location or transportation information. Please provide more context for me to assist you with finding the closest train station.


**User:** 
where is the closest train station?


**Assistant:** 
I do not know the answer to this question, as the context provided is about the Facebook Threat Exchange API and not related to a specific location or transportation information. Please provide more context for me to assist you with finding the closest train station.


In [14]:
print_chat(generate_answer(chat, query="I'm in amsterdam. I need to go to the train station!"))

**User:** 
Can you tell me about the Facebook Threat Exchange API? What does it do?


**Assistant:** 
The Facebook Threat Exchange API is a platform designed for participating organizations to share threat data using a structured and easy-to-use API that provides privacy controls, enabling them to share information with desired groups. It makes use of the Graph API and offers libraries in JavaScript, PHP, Python, and Ruby to speed up integration and development. The complete feature-set of what is possible in ThreatExchange can be found through the provided documentation, which includes features like submitting, editing, and deleting data, as well as privacy controls for managing who sees the published data.


**User:** 
Where can I find more information?


**Assistant:** 
You can find more information about the Facebook Threat Exchange API by visiting the following resources:
- [Get Started](https://developers.facebook.com/docs/threat-exchange/getting-started/) to learn how it's structured and walk through steps to get started.
- [UI Overview](https://developers.facebook.com/docs/threat-exchange/ui) for point-and-click management options.
- [API Reference](https://developers.facebook.com/docs/threat-exchange/reference/apis) for full documentation of the ThreatExchange API.
- [Terms and Conditions](https://www.facebook.com/legal/threatexchange_terms/) for the complete terms and conditions related to Threat Exchange.
- [Changelog](https://developers.facebook.com/docs/threat-exchange/reference/changelog) for information on API changes.
- [Open Source Terms of Use](https://opensource.facebook.com/legal/terms) and [Open Source Privacy Policy](https://opensource.facebook.com/legal/privacy) for general Facebook Open Source projects.


**User:** 
where is the closest train station?


**Assistant:** 
I do not know the answer to this question, as the context provided is about the Facebook Threat Exchange API and not related to a specific location or transportation information. Please provide more context for me to assist you with finding the closest train station.


**User:** 
where is the closest train station?


**Assistant:** 
I do not know the answer to this question, as the context provided is about the Facebook Threat Exchange API and not related to a specific location or transportation information. Please provide more context for me to assist you with finding the closest train station.


**User:** 
I'm in amsterdam. I need to go to the train station!


**Assistant:** 
In Amsterdam, there are multiple train stations. If you are looking for Amsterdam Centraal Station, it is one of the largest and most important stations in the city, located at Stationsplein 10, 1012 AB Amsterdam, Netherlands. It serves as a major transportation hub with connections to various trains, trams, buses, and metros. Please provide more details about your specific needs or location for further assistance.
