Search 이외 임베딩 활용 방법
- use case
    - 사용자 의도 파악
    - 자주 묻는 질문 set

=> 사용자의 인풋에 따라 다른 function이 실행될 수 있는 trigger<br>

---

In [9]:
import os
import openai
from openai import OpenAI
from sklearn.cluster import KMeans
from utils import cosine_similarity

# initialize openai
os.environ['OPENAI_API_KEY']= ""
openai.api_key = os.environ["OPENAI_API_KEY"]

## 1. 사용자 의도 파악

In [10]:
politics = ["What are the key policies of the main political parties in the upcoming election?",
            "Who do you vote for the next presedent?",
            "I love the current Democratic Party.",
            "What is your opinion on the president's current political move?",
            "I love politics. Don't you?"]

ml = ["How does supervised learning differ from unsupervised learning in machine learning models?",
      "What are the ethical considerations of using machine learning in predictive policing?",
    "How do neural networks mimic the human brain in processing data and recognizing patterns?",
    "What are some examples of natural language processing?",
    "Can you describe how machine learning is being utilized in personalized medicine and healthcare?"]

In [11]:
print('dfdf')

dfdf


In [12]:
def create_embeddings(txt_list):
    client = OpenAI()

    response = client.embeddings.create(
    input=txt_list,
    model="text-embedding-3-small")
    responses = [r.embedding for r in response.data]

    return responses

In [13]:
embeddings = politics+ml
emb = create_embeddings(embeddings)

In [14]:
embeddings

['What are the key policies of the main political parties in the upcoming election?',
 'Who do you vote for the next presedent?',
 'I love the current Democratic Party.',
 "What is your opinion on the president's current political move?",
 "I love politics. Don't you?",
 'How does supervised learning differ from unsupervised learning in machine learning models?',
 'What are the ethical considerations of using machine learning in predictive policing?',
 'How do neural networks mimic the human brain in processing data and recognizing patterns?',
 'What are some examples of natural language processing?',
 'Can you describe how machine learning is being utilized in personalized medicine and healthcare?']

#### Clustering 활용

In [15]:
n_clusters = 2
kmeans = KMeans(n_clusters=n_clusters)
clusters = kmeans.fit_predict(emb)

In [16]:
clusters

array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1], dtype=int32)

유저가 정치 관련 질문을 한 경우

In [17]:
input_sentence = "I would like to have a talk about politics."
sent_emb = create_embeddings([input_sentence])

In [18]:
kmeans.predict(sent_emb)

array([0], dtype=int32)

유저가 machine learning 관련 질문을 한 경우

In [19]:
input_sentence = "Tell me about machine learning."
sent_emb = create_embeddings([input_sentence])

In [20]:
kmeans.predict(sent_emb)

array([1], dtype=int32)

#### Similarity search를 활용

In [21]:
politics_emb = create_embeddings(politics)
ml_emb = create_embeddings(ml)

In [22]:
def route_selection(emb_list, query_emb, threshold=0.5):
    cos_sim = [cosine_similarity(i, query_emb) for i in emb_list]

    threshold_filtered = [i for i in cos_sim if i>threshold]

    if len(threshold_filtered)>0:
        return True
    else:
        return False

In [23]:
input_sentence = "I would like to have a talk about politics."
sent_emb = create_embeddings([input_sentence])

print("{} for politics, {} for machine learning".format(route_selection(politics_emb, sent_emb[0]), route_selection(ml_emb, sent_emb[0])))

True for politics, False for machine learning


In [24]:
input_sentence = "How is the weather today?"
sent_emb = create_embeddings([input_sentence])

print("{} for politics, {} for machine learning".format(route_selection(politics_emb, sent_emb[0]), route_selection(ml_emb, sent_emb[0])))

False for politics, False for machine learning


In [25]:
input_sentence = "What is the best way to learn machine learning?"
sent_emb = create_embeddings([input_sentence])

print("{} for politics, {} for machine learning".format(route_selection(politics_emb, sent_emb[0]), route_selection(ml_emb, sent_emb[0], threshold=0.4)))

False for politics, True for machine learning


Embedding을 활용하기 때문에 최소한의 input을 활용하여 clustering이 가능해짐 <br>
##### __=> 사용자의 목적을 파악하여, 각 목적에 맞는 function 실행 가능__ (guardrails 또는 semantic router)

## 2. 자주 묻는 질문 리스트

1. 동일한 방식으로 자주 묻는 질문을 카테고리 별로 저장
2. Threshold를 정해서 유사한 질문 search
3. 유사한 질문과 연결된 정보 제공

In [26]:
password_reset = ["What steps should I take to recover my account without access to my registered email?",
                  "Is there a way to authenticate my identity for password reset without security questions?",
                  "How can I reset my password?"]
service_request = ["Are there any special offers or discounts currently available?",
                   "How can I compare the different plans to find one that suits my needs?",
                   "Where can I see user reviews or testimonials about your services?"]

---

--END--