In [38]:
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer, DPRContextEncoder, DPRContextEncoderTokenizer
from sklearn.feature_extraction.text import TfidfVectorizer
import faiss
import openai

In [39]:
hotlines = [
    {"name": "COP2COP", "description": "Support for police officers dealing with trauma."},
    {"name": "Suicide Prevention Hotline", "description": "Support for individuals with suicidal thoughts."},
    {"name": "NJ Addiction Hotline", "description": "Support for individuals struggling with substance use."},
]

# Create documents to index
documents = [f"{entry['name']}: {entry['description']}" for entry in hotlines]


In [40]:
# Initialize FAISS and create an index
tokenizer = DPRContextEncoderTokenizer.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')
model = DPRContextEncoder.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')

# Tokenize and encode documents
inputs = tokenizer(documents, return_tensors='pt', padding=True, truncation=True)
embeddings = model(**inputs).pooler_output.detach().numpy()

# Use FAISS to index the embeddings
index = faiss.IndexFlatL2(embeddings.shape[1])  # Index for fast search
index.add(embeddings)

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizer'.
Some weights of the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base were not used when initializing DPRContextEncoder: ['ctx_encoder.bert_model.pooler.dense.bias', 'ctx_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRContextEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRContextEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification mod

In [41]:
query = "James is a police officer experiencing work-related trauma."

# Encode the query using DPRQuestionEncoder
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question_model = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')

# Tokenize and encode the query
query_inputs = question_tokenizer(query, return_tensors='pt', padding=True, truncation=True)
query_embedding = question_model(**query_inputs).pooler_output.detach().numpy()

# Search FAISS index to find the most relevant hotlines
D, I = index.search(query_embedding, k=3)  # Retrieve top 3 hotlines
retrieved_hotlines = [documents[i] for i in I[0]]

In [42]:
retrieved_text = "\n".join(retrieved_hotlines)
prompt = f"The user is experiencing: {query}\nHere are some suggested hotlines:\n{retrieved_text}\nPlease explain why these hotlines are appropriate for the user's situation."

openai.api_key = 'sk-vcluRbh8_Y_LTW4rYNJZ_OIe0AG5iZOKBaj99sEmn0T3BlbkFJmyNWqmOrh3BkZ-YrMHcyuzUgCU26oHqPNhGTPo3LAA'

response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=[
    {"role": "system", "content": "You are a helpful assistant recommending hotline services."},
    {"role": "user", "content": prompt}
  ],
  temperature=0.7
)

In [37]:
print("GPT-4's recommended hotlines and reasoning:")
print(response['choices'][0]['message']['content'])

print("\nRetrieved Hotlines:")
for hotline in retrieved_hotlines:
    print(hotline)


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/492 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizer'.


pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()
Some weights of the model checkpoint at facebook/dpr-ctx_encoder-single-nq-base were not used when initializing DPRContextEncoder: ['ctx_encoder.bert_model.pooler.dense.bias', 'ctx_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRContextEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRContextEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/493 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at facebook/dpr-question_encoder-single-nq-base were not used when initializing DPRQuestionEncoder: ['question_encoder.bert_model.pooler.dense.bias', 'question_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing DPRQuestionEncoder from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DPRQuestionEncoder from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


GPT-4's recommended hotlines and reasoning:
1. COP2COP: This hotline is specifically designed to provide support for police officers dealing with work-related trauma, stress, and other mental health issues. Given that James is a police officer who is experiencing work-related trauma, it's highly suitable. The operators are often former law enforcement officers themselves, and they are familiar with the unique pressures and struggles that come with the job.

2. Suicide Prevention Hotline: This hotline provides immediate assistance to individuals in suicidal crisis by connecting them to the nearest available crisis center. People experiencing trauma, like James, may have suicidal thoughts and this service could provide immediate help and support.

3. NJ Addiction Hotline: It is not uncommon for individuals undergoing trauma to turn to substance use as a coping mechanism. If James has developed or is at risk of developing a substance use disorder due to his work-related trauma, this hotli