In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
from langchain_community.document_loaders import PyPDFLoader

def load_pdf_data(pdf_path):
    """
    this function loads text data from pdf file
    """
    loader = PyPDFLoader(file_path=pdf_path)
    documents = loader.load()
    return documents

In [3]:
general = load_pdf_data(pdf_path = '/Users/sayo/personal_projects/Usafe_bot/data/general.pdf')

In [4]:

print(f"number of loaded pages: {len(general)}")

number of loaded pages: 9


In [5]:
print("________________")
print(general[0].page_content)

________________
Hate
Crime
Definition
●
A
hate
crime
(also
known
as
a
bias
crime)
is
a
crime
where
a
perpetrator
targets
a
victim
due
to
their
physical
appearance
or
perceived
membership
in
a
specific
social
group.
Such
groups
may
include
race,
ethnicity ,
disability ,
language,
nationality ,
political
views,
age,
religion,
sex,
gender
identity ,
or
sexual
orientation.
Non-criminal
actions
motivated
by
these
biases
are
often
termed
“bias
incidents.”
•
Examples
of
hate
crimes
include:
•
Physical
assault,
homicide,
damage
to
property
•
Bullying,
harassment,
verbal
abuse,
offensive
graffiti,
or
hate
mail
History
of
Hate
Crimes
•
Term
Origin:
The
term
“hate
crime”
gained
common
usage
in
the
U.S.
during
the
1980s,
although
similar
crimes
have
historical
roots.
•
Historical
Examples:
•
Roman
persecution
of
Christians,
Nazi
genocide
of
Jews,
European
colonial
violence
against
indigenous
peoples
•
In
the
U.S.,
lynching
of
African
Americans,
cross
burnings,
and
attacks
on
minority
ethnic
and
L

### Split Document into Chunks

>- not possible to feed the whole content into the LLM at once because of finite context window
>- even models with large window sizes may struggle to find information in very long inputs and perform very badly
>- chunk the document into pieces: helps retrieve only the relevant information from the corpus

In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

def split_documents(documents, chunk_size=800, chunk_overlap=80):
    """
    this function splits documents into chunks of given size and overlap
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    chunks = text_splitter.split_documents(documents=documents)
    return chunks

In [7]:
general_chunks = split_documents(general)
print(f"number of chunks: {len(general_chunks)}")

number of chunks: 21


In [8]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

def create_embedding_vector_db(chunks, db_name):
    """
    this function uses the open-source embedding model HuggingFaceEmbeddings 
    to create embeddings and store those in a vector database called FAISS, 
    which allows for efficient similarity search
    """
    # instantiate embedding model
    embedding = HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-mpnet-base-v2'
    )
    # create the vector store 
    vectorstore = FAISS.from_documents(
        documents=chunks,
        embedding=embedding
    )
    # save vector database locally
    vectorstore.save_local(f"./vector_databases/vector_db_{db_name}")

In [9]:
create_embedding_vector_db(chunks=general_chunks, db_name='general')

  from tqdm.autonotebook import tqdm, trange


## Retrieve from Vector Database

In [10]:
def retrieve_from_vector_db(vector_db_path):
    """
    this function spits out a retriever object from a local vector database
    """
    # instantiate embedding model
    embeddings = HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-mpnet-base-v2'
    )
    react_vectorstore = FAISS.load_local(
        folder_path=vector_db_path,
        embeddings=embeddings,
        allow_dangerous_deserialization=True
    )
    retriever = react_vectorstore.as_retriever()
    return retriever

In [11]:
general_retriever = retrieve_from_vector_db(vector_db_path='./vector_databases/vector_db_general')

In [12]:
type(general_retriever)

langchain_core.vectorstores.base.VectorStoreRetriever

#### Load the prompt

In [13]:
with open('/Users/sayo/personal_projects/Usafe_bot/data/usafe_prompt.txt', 'r') as file:
    user_prompt = file.read()

print(user_prompt)


   Usafe ChatBot Guide:

   Initial Introduction:
   Introduce Usafe to the user. Explain that their information is confidential and encourage them to share what hate crime happened to them. Ensure the message is supportive and non-judgmental because they must be traumatized. Don't be too long, be concise and to the point.

   Fallback Response:
   Provide a supportive fallback response if the user’s input is unclear or doesn’t clearly indicate a hate crime. Acknowledge their trust, ask for additional information, and offer general guidance if needed.
   Add that if they are unsure, they can ask for more information or help from a human. Explain you are a bot and sometimes you don't grasp the full context.

   Experience Acknowledgment:
   Acknowledge the user's experience empathetically. Based on the input crime type, describe the type of hate crime they may have experienced: Gender-based hate crime involves discrimination or violence directed at individuals based on gender identity 

## Generation

[`create_stuff_documents_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html#langchain.chains.combine_documents.stuff.create_stuff_documents_chain)

- takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM
- passes ALL documents, so you should make sure it fits within the context window of the LLM being used

In [14]:
from langchain import hub
from langchain.chains.combine_documents import create_stuff_documents_chain

**chain passing user inquiry to retriever object**

[`create_retrieval_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html#langchain.chains.retrieval.create_retrieval_chain)

- takes in a user inquiry, which is then passed to the retriever to fetch relevant documents
- those documents (and original inputs) are then passed to an LLM to generate a response

In [15]:
from langchain.chains.retrieval import create_retrieval_chain

connect chains

In [16]:
def connect_chains(retriever):
    """
    this function connects stuff_documents_chain with retrieval_chain
    """
    stuff_documents_chain = create_stuff_documents_chain(
        llm=llm,
        prompt=hub.pull("langchain-ai/retrieval-qa-chat")
    )
    retrieval_chain = create_retrieval_chain(
        retriever=retriever,
        combine_docs_chain=stuff_documents_chain
    )
    return retrieval_chain

In [17]:
import warnings
warnings.filterwarnings("ignore")
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0.02,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

output generation

In [18]:
react_retrieval_chain = connect_chains(general_retriever)

In [31]:
def print_output(
    inquiry,
    retrieval_chain=react_retrieval_chain
):
    result = retrieval_chain.invoke({"input": inquiry})
    print(result['answer'].strip("\n"))

In [32]:
type(output)

dict

In [33]:
output.keys()

dict_keys(['input', 'context', 'answer'])

In [36]:
print_output("what is hate crime")

According to the provided context, a hate crime (also known as a bias crime) is a crime where a perpetrator targets a victim due to their physical appearance or perceived membership in a specific social group. These groups may include:

* Race
* Ethnicity
* Disability
* Language
* Nationality
* Political views
* Age
* Religion
* Sex
* Gender identity
* Sexual orientation

Examples of hate crimes include:

* Physical assault
* Homicide
* Damage to property
* Bullying
* Harassment
* Verbal abuse
* Offensive graffiti
* Hate mail


In [37]:
print_output("quais sao os crimes de odio")

De acordo com o contexto fornecido, os crimes de ódio (ou crimes motivados por preconceito) são crimes que envolvem a perseguição ou agressão a uma pessoa ou grupo de pessoas com base em suas características físicas, sociais, políticas, religiosas, sexuais, de gênero, identidade de gênero, orientação sexual, deficiência, idade, nacionalidade, etnia, raça, língua, religião, crença ou opinião política.

Exemplos de crimes de ódio incluem:

* Ataques físicos, homicídios, danos a propriedade
* Bullying, assédio, abuso verbal, grafite ofensivo, envio de correspondência ofensiva
* Lynching, queimadas de cruz, ataques a comunidades minoritárias
* Perseguição religiosa, incluindo anti-semitismo e islamofobia
* Xenofobia, crimes contra pessoas percebidas como estrangeiras ou pertencentes a uma nacionalidade ou etnia diferente

Esses crimes podem ter efeitos psicológicos negativos significativos nas vítimas, incluindo trauma, depressão, baixa autoestima e medo.


In [38]:
print_output("what is the advice you can give to someone who is a victim of hate crime")

Based on the provided context, here are some pieces of advice that can be given to someone who is a victim of a hate crime:

1. Report the incident: It is essential to report the hate crime to the authorities as soon as possible. This can be done by contacting the police or other relevant authorities in your area.

2. Document everything: Keep a record of what happened, including the time, location, and any identifying features of the perpetrator. Also, save any online messages, emails, or social media posts related to the incident.

3. Preserve digital evidence: Take screenshots of any online messages, emails, or social media posts that contain hate speech or threats. These can be valuable evidence in the investigation.

4. Prepare for language barriers: If you don't speak the local language, consider bringing someone who speaks it to assist with translation at the police station. Alternatively, you can contact organizations that provide translation support for hate crime victims.

5.

In [39]:
print_output("i was victim of a hate crime, what should i do?")

I'm so sorry to hear that you've been a victim of a hate crime. It's important to take immediate action to report the incident and seek support. Here are the steps you can follow:

1. Document the incident:
	* Gather as much information as possible about the event, including:
		+ Photos or videos: Visual evidence of the incident or its aftermath.
		+ Witness statements: Contact information and statements from any witnesses.
		+ Descriptions: Detailed accounts of what happened, including time, location, and any identifying features of the perpetrator.
2. Visit Your Local Police Station:
	* Bring all collected documentation with you.
	* Explain the details of the incident, and let the officer know that you believe it to be a hate crime.
	* The police will create an official report based on your statement and evidence.
3. Report Online (Optional):
	* If you're unable to visit the police station, you may be able to file a report online through local authorities' websites or specific online

In [40]:
print_output("i neeed therapy")

I'm so sorry to hear that you're going through a tough time. It's completely understandable that you might need some support and guidance to cope with the aftermath of a hate crime.

According to the context, you can consider seeking mental health support from professionals or communities who can help you through this difficult time. You can find more information by clicking on the 🧩 Connecting with Local Support link, which provides assistance for individuals facing hate crime incidents.

Additionally, you can also reach out to ReachOut Berlin, which offers support for individuals facing hate crime incidents. You can contact them through their email (info@reachoutberlin.de) or use their contact form available on their website.

Remember, you are not alone, and there are people who care and want to help.


In [41]:
print_output("give me the list of all resources")

Here is the list of resources mentioned in the context:

1. Online Strafanzeige (Online Criminal Complaint)
2. Meldestelle Respect!
3. Antidiskriminierungsstelle des Bundes (Federal Anti-Discrimination Agency)
4. Verband der Beratungsstellen für Betroffene rechter, rassistischer und antisemitischer Gewalt (VBRG)
5. GLADT e.V.
6. Hydra e.V.
7. LesMigraS
8. Antidiskriminierungsverband Deutschland (advd)
9. Neuemedienmacher
10. Gesellschaft für Freiheitsrechte – Marie Munk Initiative

These resources include:

* Reporting mechanisms and platforms
* NGOs (Non-Governmental Organizations)
* Legal aid
* Counseling services
* Advocacy organizations
* Support groups for specific communities (e.g. LGBTQ+, sex workers, Black and People of Color)
