# Metaphor Identification via RAG with Local Model via Ollama

This notebook details the process of metaphor identification via RAG using the local model at Ollama. 

The following packages are needed to run this notebook:

In [None]:
!pip install pandas langchain ollama

Import packages.

In [None]:
import pandas as pd
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import SKLearnVectorStore
from langchain_ollama import OllamaEmbeddings
from ollama import chat

Please input a test text here.

In [None]:
key_text=""

Or, alternatively, you may choose a test text from our metaphor corpus.

(Here, we choose the first text as an example.)

In [None]:
ds_fp="data/metaphor_dataset.csv"
ds_df=pd.read_csv(ds_fp)
ds_df=ds_df[["textid","plain"]]
ds_df.rename(columns={"plain":"context"},inplace=True)

key_text=ds_df.loc[0,"context"]

Load the metaphor protocol in plain text. This is used as the knowledge base for context retrival in the following process.

In [None]:
total_context_fp="data/rag_context.txt"
with open(total_context_fp,"r",-1) as f:
    total_context=f.read()

Transform the text into Documents.

In [None]:
docs=[[Document(page_content=text)] for text in [total_context]]
docs_list=[item for sublist in docs for item in sublist]

Split documents into chunks, on which the retrival is based upon. 

In [None]:
text_splitter=RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=0)
doc_splits=text_splitter.split_documents(docs_list)

Vectorizing the split documents.

In [None]:
vectorstore = SKLearnVectorStore.from_documents(
    documents=doc_splits,
    embedding=OllamaEmbeddings(model="nomic-embed-text"),
)

Construct a retriever, and use it to retrieve the context based on the inputted test text.

In [None]:
retriever=vectorstore.as_retriever(k=1)
retrieved_documents=retriever.invoke(key_text)
context=retrieved_documents[0].page_content

Construct the chat based on the retrieved context and the test text.

In [None]:
p_strat=[
    {"role":"system",
     "content":"You are a helpful AI assistant. Use the following pieces of context to answer the question at the end. If you don't know the answer, just say you don't know. DO NOT try to make up an answer. If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\n"},
    {'role': 'system',
    'content': 'You are a linguistic expert trained in metaphor identification. When the user provides a text, follow this protocol:\n• Identify all metaphorical expressions.\n• Wrap each one in <Metaphor> and </Metaphor> tags.\n• Reproduce the rest of the text exactly as written.\n• Do not include any explanation, commentary, or extra content in this message.'},
    {'role': 'user',
    'content': 'Can you please identify and tag the metaphors in the following text?\n'},
]
ct=p_strat.copy()
p_strat[0]["content"]=p_strat[0]["content"]+context
p_strat[-1]["content"]=p_strat[-1]["content"]+key_text

Next, as the last step required before run, you need to specify a model.

The models we used in our paper are:

llama3.2:1b

llama3.2:3b

llama3.1:8b

deepseek-r1:8b


In [None]:
modelid="llama3.2:1b"

Note: to use the model you specify, you'll need ollama installed and started. You may download Ollama here:

https://ollama.com/

And if ollama is not started, simply run:

In [None]:
!ollama start

Also, if you haven't download the model you specified, you may use the following script to download the model.

(Here I use llama3.2:1b as an example.)

In [None]:
!ollama pull llama3.2:1b

Send chat to model for inferring, and retrieve result.

In [None]:
cr=chat(model=modelid, messages=ct)
rs=cr.message.content

View the result.

In [None]:
print(rs)