In [12]:
from dotenv import load_dotenv
load_dotenv()

True

In [13]:
from langchain_community.document_loaders import PyPDFLoader

def load_pdf_data(pdf_path):
    """
    this function loads text data from pdf file
    """
    loader = PyPDFLoader(file_path=pdf_path)
    documents = loader.load()
    return documents

In [14]:
general = load_pdf_data(pdf_path = '/Users/sayo/personal_projects/Usafe_bot/data/general.pdf')

In [15]:

print(f"number of loaded pages: {len(general)}")

number of loaded pages: 9


In [16]:
print("________________")
print(general[0].page_content)

________________
Hate
Crime
Definition
●
A
hate
crime
(also
known
as
a
bias
crime)
is
a
crime
where
a
perpetrator
targets
a
victim
due
to
their
physical
appearance
or
perceived
membership
in
a
specific
social
group.
Such
groups
may
include
race,
ethnicity ,
disability ,
language,
nationality ,
political
views,
age,
religion,
sex,
gender
identity ,
or
sexual
orientation.
Non-criminal
actions
motivated
by
these
biases
are
often
termed
“bias
incidents.”
•
Examples
of
hate
crimes
include:
•
Physical
assault,
homicide,
damage
to
property
•
Bullying,
harassment,
verbal
abuse,
offensive
graffiti,
or
hate
mail
History
of
Hate
Crimes
•
Term
Origin:
The
term
“hate
crime”
gained
common
usage
in
the
U.S.
during
the
1980s,
although
similar
crimes
have
historical
roots.
•
Historical
Examples:
•
Roman
persecution
of
Christians,
Nazi
genocide
of
Jews,
European
colonial
violence
against
indigenous
peoples
•
In
the
U.S.,
lynching
of
African
Americans,
cross
burnings,
and
attacks
on
minority
ethnic
and
L

### Split Document into Chunks

>- not possible to feed the whole content into the LLM at once because of finite context window
>- even models with large window sizes may struggle to find information in very long inputs and perform very badly
>- chunk the document into pieces: helps retrieve only the relevant information from the corpus

In [17]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

def split_documents(documents, chunk_size=800, chunk_overlap=80):
    """
    this function splits documents into chunks of given size and overlap
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    chunks = text_splitter.split_documents(documents=documents)
    return chunks

In [18]:
general_chunks = split_documents(general)
print(f"number of chunks: {len(general_chunks)}")

number of chunks: 21


In [19]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

def create_embedding_vector_db(chunks, db_name):
    """
    this function uses the open-source embedding model HuggingFaceEmbeddings 
    to create embeddings and store those in a vector database called FAISS, 
    which allows for efficient similarity search
    """
    # instantiate embedding model
    embedding = HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-mpnet-base-v2'
    )
    # create the vector store 
    vectorstore = FAISS.from_documents(
        documents=chunks,
        embedding=embedding
    )
    # save vector database locally
    vectorstore.save_local(f"./vector_databases/vector_db_{db_name}")

i can combine all of them above. on top of description add the definitions (one pdf each)

In [20]:
create_embedding_vector_db(chunks=general_chunks, db_name='general')

## Retrieve from Vector Database

In [21]:
def retrieve_from_vector_db(vector_db_path):
    """
    this function spits out a retriever object from a local vector database
    """
    # instantiate embedding model
    embeddings = HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-mpnet-base-v2'
    )
    react_vectorstore = FAISS.load_local(
        folder_path=vector_db_path,
        embeddings=embeddings,
        allow_dangerous_deserialization=True
    )
    retriever = react_vectorstore.as_retriever()
    return retriever

In [22]:
general_retriever = retrieve_from_vector_db(vector_db_path='./vector_databases/vector_db_general')

In [23]:
type(general_retriever)

langchain_core.vectorstores.base.VectorStoreRetriever

#### Load the prompt

In [24]:
with open('/Users/sayo/personal_projects/Usafe_bot/data/usafe_prompt.txt', 'r') as file:
    user_prompt = file.read()

print(user_prompt)


   Usafe ChatBot Guide:

   Initial Introduction:
   Introduce Usafe to the user. Explain that their information is confidential and encourage them to share what hate crime happened to them. Ensure the message is supportive and non-judgmental because they must be traumatized. Don't be too long, be concise and to the point.

   Fallback Response:
   Provide a supportive fallback response if the user’s input is unclear or doesn’t clearly indicate a hate crime. Acknowledge their trust, ask for additional information, and offer general guidance if needed.
   Add that if they are unsure, they can ask for more information or help from a human. Explain you are a bot and sometimes you don't grasp the full context.

   Experience Acknowledgment:
   Acknowledge the user's experience empathetically. Based on the input crime type, describe the type of hate crime they may have experienced: Gender-based hate crime involves discrimination or violence directed at individuals based on gender identity 

## Generation

[`create_stuff_documents_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html#langchain.chains.combine_documents.stuff.create_stuff_documents_chain)

- takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM
- passes ALL documents, so you should make sure it fits within the context window of the LLM being used

In [25]:
from langchain import hub
from langchain.chains.combine_documents import create_stuff_documents_chain

**chain passing user inquiry to retriever object**

[`create_retrieval_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html#langchain.chains.retrieval.create_retrieval_chain)

- takes in a user inquiry, which is then passed to the retriever to fetch relevant documents
- those documents (and original inputs) are then passed to an LLM to generate a response

In [26]:
from langchain.chains.retrieval import create_retrieval_chain

connect chains

In [27]:
def connect_chains(retriever):
    """
    this function connects stuff_documents_chain with retrieval_chain
    """
    stuff_documents_chain = create_stuff_documents_chain(
        llm=llm,
        prompt=hub.pull("langchain-ai/retrieval-qa-chat")
    )
    retrieval_chain = create_retrieval_chain(
        retriever=retriever,
        combine_docs_chain=stuff_documents_chain
    )
    return retrieval_chain

In [28]:
import warnings
warnings.filterwarnings("ignore")
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0.02,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

output generation

In [29]:
react_retrieval_chain = connect_chains(general_retriever)

In [30]:
def print_output(
    inquiry,
    retrieval_chain=react_retrieval_chain
):
    result = retrieval_chain.invoke({"input": inquiry})
    print(result['answer'].strip("\n"))

Rakib: replace answer. Make print(result). It will give all the info, it will tell which chunk. It will give the info of which pdf is coming from. 

3 pdfs for each hate crime. Be able to find the correct hate type. first embedding then rag. 

In [31]:
print_output("what is hate crime")

According to the provided context, a hate crime (also known as a bias crime) is a crime where a perpetrator targets a victim due to their physical appearance or perceived membership in a specific social group. These groups may include:

* Race
* Ethnicity
* Disability
* Language
* Nationality
* Political views
* Age
* Religion
* Sex
* Gender identity
* Sexual orientation

Examples of hate crimes include:

* Physical assault
* Homicide
* Damage to property
* Bullying
* Harassment
* Verbal abuse
* Offensive graffiti
* Hate mail


In [32]:
print_output("quais sao os crimes de odio")

De acordo com o contexto fornecido, os crimes de ódio (ou crimes motivados por preconceito) são crimes que envolvem a perseguição ou agressão a uma pessoa ou grupo de pessoas com base em suas características físicas, sociais, políticas, religiosas, sexuais, de gênero, identidade de gênero, orientação sexual, deficiência, idade, raça, etnia, nacionalidade, língua, religião, crença ou opinião política.

Exemplos de crimes de ódio incluem:

* Ataques físicos, homicídios, danos a propriedade
* Bullying, assédio, abuso verbal, grafite ofensivo, envio de correspondência ofensiva
* Lynching, queimadas de cruz, ataques a comunidades minoritárias
* Perseguição religiosa, incluindo anti-semitismo e islamofobia
* Xenofobia, crimes contra pessoas percebidas como estrangeiras ou pertencentes a uma nacionalidade ou etnia diferente

Esses crimes podem ter efeitos psicológicos negativos significativos nas vítimas, incluindo trauma, depressão, baixa autoestima e medo.


In [33]:
print_output("What are the laws in germany to regards hate crime agaisnt trans people")

Based on the provided context, there is no specific mention of laws in Germany regarding hate crimes against trans people. However, Germany's criminal code (StGB) does consider bias motivation during sentencing (Section 46 StGB), and there are laws that address bias motivations in sentencing, such as:

* Section 46 StGB: Enhanced sentencing for discriminatory motivations.
* Equal Treatment Act (AGG): Prohibits discrimination in public and private sectors.
* Section 130 StGB: Addresses hate speech targeting gender and sexual minorities.

It is possible that these laws may be applicable to hate crimes against trans people, but a more specific analysis would be required to determine the exact legal framework.


In [34]:
print_output("what is the advice you can give to someone who is a victim of hate crime")

Based on the provided context, here are some advice I can give to someone who is a victim of a hate crime:

1. Report the incident: Report the incident to the police as soon as possible. Make sure to provide as much detail as possible, including the time, location, and any identifying features of the perpetrator.
2. Preserve digital evidence: Save any online messages, emails, or social media posts related to the incident. Screenshots can be valuable evidence if hate speech or threats were made online.
3. Prepare for language barriers: If you don't speak the local language, consider bringing someone who speaks it to assist with translation at the police station. Alternatively, you can contact organizations that provide translation support for hate crime victims.
4. Seek support: Reach out to organizations that provide support for hate crime victims, such as ReachOut Berlin. They can offer assistance and provide a safe and confidential space to talk about your experience.
5. Document eve

In [35]:
print_output("i was victim of a hate crime, what should i do?")

I'm so sorry to hear that you've been a victim of a hate crime. It's important to take immediate action to report the incident and seek support. Here are the steps you can follow:

1. Document the incident:
	* Gather as much information as possible about the event, including:
		+ Photos or videos: Visual evidence of the incident or its aftermath.
		+ Witness statements: Contact information and statements from any witnesses.
		+ Descriptions: Detailed accounts of what happened, including time, location, and any identifying features of the perpetrator.
2. Visit Your Local Police Station:
	* Bring all collected documentation with you.
	* Explain the details of the incident, and let the officer know that you believe it to be a hate crime.
	* The police will create an official report based on your statement and evidence.
3. Report Online (Optional):
	* If you're unable to visit the police station, you may be able to file a report online through local authorities' websites or specific online

In [36]:
print_output("How do I report a hate crime in Germany?")

To report a hate crime in Germany, you can follow these steps:

1. Document the incident:
	* Gather as much information as possible about the event, including:
		+ Photos or videos: Visual evidence of the incident or its aftermath.
		+ Witness statements: Contact information and statements from any witnesses.
		+ Descriptions: Detailed accounts of what happened, including time, location, and any identifying features of the perpetrator.
2. Visit your local police station:
	* Bring all collected documentation with you.
	* Explain the details of the incident to the officer and let them know you believe it to be a hate crime.
	* The police will create an official report based on your statement and evidence.
3. Report online (optional):
	* If you're unable to visit the police station, you may be able to file a report online through local authorities' websites or specific online reporting platforms.
	* For Berlin residents, you can use the Berlin Police Online Reporting Portal to report inci

In [37]:
print_output("give me the list of all resources")

Here is the list of resources mentioned in the context:

1. Online Strafanzeige (Online Criminal Complaint)
2. Meldestelle Respect!
3. Antidiskriminierungsstelle des Bundes (Federal Anti-Discrimination Agency)
4. Verband der Beratungsstellen für Betroffene rechter, rassistischer und antisemitischer Gewalt (VBRG)
5. GLADT e.V.
6. Hydra e.V.
7. LesMigraS
8. Antidiskriminierungsverband Deutschland (advd)
9. Neuemedienmacher
10. Gesellschaft für Freiheitsrechte – Marie Munk Initiative

These resources include:

* Reporting mechanisms and platforms
* NGOs (Non-Governmental Organizations)
* Legal aid
* Counseling services
* Advocacy organizations
* Support groups for specific communities (e.g. LGBTQ+, sex workers, Black and People of Color)


In [42]:
print_output("what is the history of hate crime")

According to the provided context, the term "hate crime" gained common usage in the U.S. during the 1980s, although similar crimes have historical roots. Some historical examples of hate crimes include:

* Roman persecution of Christians
* Nazi genocide of Jews
* European colonial violence against indigenous peoples
* Lynching of African Americans in the U.S.
* Cross burnings and attacks on minority ethnic and LGBTQ+ communities in the U.S.

In recent times, hate crimes have been documented in various forms, such as:

* Anti-Chinese violence surging during the COVID-19 pandemic, as documented by organizations like the "NEVER AGAIN" Association in Poland.

It's worth noting that the concept of hate crimes has evolved over time, and the term "hate crime" has become more widely used in recent decades to describe crimes motivated by bias or prejudice against specific groups.


In [44]:
print_output('what are the consequences of hate crime')

According to the provided context, the consequences of hate crimes include:

1. Impacts on Individuals:
	* Trauma
	* Depression
	* Low self-esteem
2. Targeted Group:
	* Increased fear and vulnerability
3. Broader Community:
	* Division
	* Weakened multicultural society
4. Victim Reactions:
	* Symptoms may include PTSD, depression, and avoidance behaviors
5. Educational and Socioeconomic Outcomes:
	* Studies suggest that hate crimes also negatively impact educational and socioeconomic outcomes for affected groups


In [46]:
print_output("i was assaulted because i'm black")

I'm so sorry to hear that you were assaulted. It's unacceptable that you were targeted because of your race.
