<div style="width: 30%; float: right; margin: 10px; margin-right: 5%;">
    <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/FHNW_Logo.svg/2560px-FHNW_Logo.svg.png" width="500" style="float: left; filter: invert(50%);"/>
</div>

# Phi-2 Few-Shot learning

In diesem Notebook werden wir einen Chatbot für Schweizer Immobilien Empfehlungen mittels Few-Shot learning erstellen. <br>
Dabei verwenden wir das LLM phi-2 von Microsoft.



---
Bearbeitet durch Si Ben Tran, Yannic Lais, Rami Tarabishi im HS 2023.<br>
Bachelor of Science FHNW in Data Science.

## Einleitung

### Allgemeines Vorgehen

- Name entity recognition auf den Prompt
- Entities werden für die Datenbankabfrage extrahiert
- Prompt wird mit den Trainingsexamples sowie der Datenbankabfrage an das Phi-2 Modell gesendet

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import spacy

torch.set_default_device("cuda")

: 

## Phi-2 Model

In [None]:
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", device_map="cuda", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

AssertionError: Torch not compiled with CUDA enabled

In [None]:
inputs = tokenizer('''Hallo, erkläre mir wie ich mit dir Chatten kann?''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

## Name entitiy recognition (NER)

In [None]:
nlp = spacy.load("en_core_web_sm")

In [None]:
prompt = "Hey, i'm looking for an appartement in Bern which costs less than 700'000CHF. Can you help me?"

In [None]:
doc = nlp(prompt)
entities = {ent.label_: ent.text for ent in doc.ents}

## Training Examples

In [None]:
few_shot_examples = [
    {
        "Question": "I am looking for an apartment in Zurich under 1'000'000 CHF.", 
        "Answer": "Here are some options for apartments in Zurich under 1'000'000 CHF: [Query]"
    },
    {
        "Query": "Are there terraced houses in Bern in the CHF 500,000 to 700,000 range?",
        "Answer": "Yes, there are terraced houses in Bern in the CHF 500,000 to 700,000 range: [Query]"
    },
    {
        "Question": "I need a detached house in Lucerne with a garden for around CHF 1,200,000.",
        "Answer": "In Lucerne you can find detached houses with a garden for around CHF 1,200,000: [Query]"
    },
    {
        "Question": "Are modern apartments available in Basel for under CHF 900,000?",
        "Answer": "Modern apartments in Basel under 900'000 CHF are available: [Query]"
    },
    {
        "Query": "I am looking for a large house in Lausanne, at least 5 rooms, up to 1'500'000 CHF.",
        "Answer": "Large houses in Lausanne with at least 5 rooms up to 1'500'000 CHF can be found here: [Query]"
    }

]


def search_link(df, stadt, min_preis, max_preis, typ):
    filtered_df = df[(df['Stadt'] == stadt) & 
                     (df['Preis'] >= min_preis) & 
                     (df['Preis'] <= max_preis) & 
                     (df['Typ'] == typ)]
    return filtered_df['Link'].values


## Process Prompt

In [None]:
def process_prompt(prompt):
    # Perform NER on the prompt
    doc = nlp(prompt)
    entities = {ent.label_: ent.text for ent in doc.ents}

    # Build a query based on recognized entities (customize this part based on your needs)
    query = ""
    if "GPE" in entities:  # GPE represents countries, cities, states
        query += f"Location: {entities['GPE']}; "
    if "MONEY" in entities:  # MONEY represents monetary values, including prices
        query += f"Price: {entities['MONEY']}; "

    # Combine few-shot examples with the current prompt and query
    combined_prompt = "\n".join([f"Prompt: {ex['Prompt']}\nResponse: {ex['Response']}" for ex in few_shot_examples])
    combined_prompt += f"\nPrompt: {prompt}\nResponse: {query}"

    # Here you would call your model to generate a response (this is a placeholder)
    model_response = "Generated response based on the query."

    return model_response

In [None]:
# Example usage
user_prompt = "I want to find a house in Geneva for around 800,000 CHF."
response = process_prompt(user_prompt)
print(response)