In [1]:
from langchain_core.prompts import ChatPromptTemplate

from chroma import vector_store_manager
from config import template, model
from process_pdfs import load_tokenized_documents
from retrieval_methods import hybrid_retrieval
import re

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# Analysis Metrics:

In the following analysis, we will be focusing on 5 main qualitative metrics:

1. **Relevance of the Answer** - How relevant the answer based on the relevant document.
2. **Completeness of Information** - Wether the model used all the provided details from the provided context.
3. **Clarity and Fluency** - How readable and coherent the response is.
4. **Accuracy compared to the document** - Basically checking if the model hallucinated any information
5. **Overall Assessment** - A general evaluation of how well the response performs across the previous three metrics.

In [2]:
tokenized_documents_clean = load_tokenized_documents()

Loaded precomputed tokenized data from tokenized_pdfs_clean.pkl


In [103]:
def get_answer(question, tokenized_documents = tokenized_documents_clean):
    vector_store_manager.update_embedding_model('dicta-il/dictabert')

    retrieved_docs = hybrid_retrieval(question, tokenized_documents)
    docs_content = "\n\n".join(doc[0].page_content for doc in retrieved_docs)

    prompt_template = ChatPromptTemplate.from_template(template)
    prompt = prompt_template.invoke({"question": question, "context": docs_content})

    answer = model.invoke(prompt.messages[0].content)

    return answer, retrieved_docs

In [5]:
def remove_think_tags(text: str) -> str:
    return re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL).strip()

In [76]:
def print_nicely(question, answer, relevant_documents):
    print(f"Question: \n{question} \n\nAnswer: \n{remove_think_tags(answer)}")
    print('\nOut of the top 10 retrieved documents, This is the ones that came from the relevant pdf:')
    if relevant_documents!=[]:
        for i, document in enumerate(relevant_documents):
            print(f"\nRelevant Document number {i+1}: \n{document}")
    else:
        print("\nNo relevant documents found")

In [104]:
def answer_and_print(question, source = None):
    answer, docs_retrieved = get_answer(question)

    if source is not None:
        documents = [_doc_retrieved[0].page_content for _doc_retrieved in docs_retrieved if _doc_retrieved[0].metadata['source'] == source]
    else:
        documents = []

    print_nicely(question, answer, documents)

# Question 1:

In [95]:
question = "כיצד בנוי הפרויקט המסכם בשנה הרביעית להנדסת מערכות מידע?"
source = '372-2025.pdf'

answer_and_print(question, source)

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: 
כיצד בנוי הפרויקט המסכם בשנה הרביעית להנדסת מערכות מידע? 

Answer: 
הפרויקט המסכם בשנה הרביעית להנדסת מערכות מידע בנוי כעבודה מקיפה ואינטגרטיבית, שבה הסטודנטים נדרשים לבנות את הידע והיכולת שלהם ממגוון קורסים. הוא כולל מרכיבי אפליקציה מוליכים ולימדנויות בתחום, כמו פיתוח מערכות וניתוח נתונים.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

Relevant Document number 1: 
�טטיסטיקה ועוד. החל מהשנה
השנייה התכנית כוללת מגוון רחב של קורסים בתחום של הנדסת מערכות מידע ותוכנה. השנה הרביעית כוללת
מגוון של קורסי בחירה במערכות מידע וכן פרויקט מסכם שנתי שבו נדרש התלמיד לבצע עבודה מקיפה
ואינטגרטיבית. המחלקה מעודדת את הסטוד�

Relevant Document number 2: 
כותית )למשל בתחום הרובוטיקה(. הסטודנטים יהיו רשאים לקחת קורסים אלו
בכפוף לאישור ועדת ההוראה של המחלקה.
32
 ד. עבודת התזה בתחום הבינה המלאכותית
כל תלמיד לתואר השני בהנדסת מערכות מידע, ובכלל זה בלימודי המגמה לבינה מלאכותית ומערכות
תוכנה אוטונומיות, חייב לבצע ת

Relevant Document number 3: 
" ניתן למצוא

### Question 1 Analysis

1. **Relevance of the Answer** - The answer is describing the final project like it was asked.
2. **Completeness of Information** - The model understood the context properly and showed all relevant knowledge.
3. **Clarity and Fluency** - Except for one type "ולימדנויות", the answer is very clear.
4. **Accuracy compared to the document** - The answer is grounded to the context quite well and gave accurate response.
5. **Overall Assessment** - Overall the model did a good job for this question, answering with relevant and accurate information.

# Question 2

In [77]:
question = "מהי הגישה העיקרית שמתוארת בתחום הנדסת מערכות?"
source = '383-2025.pdf'

answer_and_print(question, source)

Question: 
מהי הגישה העיקרית שמתוארת בתחום הנדסת מערכות? 

Answer: 
הגישה העיקרית שמתוארת בתחום הנדסת מערכות היא גישה מולטי-דיסיפלינארית, המורכבת משילוב תחומים שונים כמו דינמיקה, אnergיה, טכנולוגיות מידע וסטטיסטיקה. הגישה הזו מאפשרת לבנות מערכות מורכבות יותר ולפתוח פתרונות מתקדמים לתפקוד והפעלה של מערכות אלו.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

Relevant Document number 1: 
נות כלים בתחום היזמות והמצוינות בניהול פרויקטים. מקצועות הנדסיים רבים עברו שינויים רציפים – מעבר
מהנדסה קלאסית במכונות ,חשמל ואלקטרוניקה לדיספלינות מערכתיות כגון רובוטיקה ,מערכות אנרגיה ,
טכנולוגיות רפואיות ,מערכות מידע ותקשורת ועוד. הצורך לנתח ולתכנן �


### Question 2 Analysis

1. **Relevance of the Answer** - The answer is very relevant to the context, addressing the question properly.
2. **Completeness of Information** - The answer addressed only some of the disciplines that were provided. The answer also doesn't address how the disciplines changed.
3. **Clarity and Fluency** - The answer is mostly readable, except in a single type,  "אnergיה" instead of "אנרגיה".
4. **Accuracy compared to the document** - The model for some reason mentions disciplines like statistics and dynamics, that were not mentioned in the relevant document.
5. **Overall Assessment** - Overall the answer had some hallucinations and typos.

# Question 3

In [93]:
question = "אילו תחומי התמחות מוצעים במסלול לניהול ויזמות בהנדסה כימית?"
source = '363-2025.pdf'

answer_and_print(question,source)

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: 
אילו תחומי התמחות מוצעים במסלול לניהול ויזמות בהנדסה כימית? 

Answer: 
The Management and Entrepreneurship program offers several specializations:

1. **Management and Entrepreneurship**: Prepares students for leadership roles in both traditional industries and startups, focusing on marketing, entrepreneurship, and organizational aspects.

2. **Energy, Water, and Advanced Technologies**: Focuses on environmental engineering and green technologies, preparing students for work in energy and water sectors.

3. **Biochemicals**: Involves development of biochemical processes for the pharmaceutical and food industries.

4. **Research Focus**: Provides early exposure to academic research skills and topics.

Additionally, there's a dual degree program combining Chemical Engineering with Chemistry: Nanotechnology.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

Relevant Document number 1: 
לקתית.
מסלולי ההתמחות :
מסלול לניהול ויזמות
מהנדסי כ

### Question 3 Analysis

1. **Relevance of the Answer** - The answer gives exactly what was asked for, discussing the different programs in the management and entrepreneurship program.
2. **Completeness of Information** - The model used almost all the information, it did discuss the different programs, but it forgot the nanotechnology program.
3. **Clarity and Fluency** - The answer is clear, but it was provided not in the same language as the question.
4. **Accuracy compared to the document** - The model did not give wrong information, sticking to the context.
5. **Overall Assessment** - Overall this is a good answer, the model did a good job.

# Question 4

In [79]:
question = "מהם כיווני הקריירה הפתוחים בפני בוגרי הנדסת ביוטכנולוגיה? "
source = "369-2025.pdf"

answer_and_print(question, source)

Question: 
מהם כיווני הקריירה הפתוחים בפני בוגרי הנדסת ביוטכנולוגיה? 

Answer: 
For Biotechnology Engineering graduates, career opportunities include roles in renewable energy sectors such as geothermal, solar, hydroelectric, and wind, where they can work in consulting and planning. Additionally, they can pursue advanced studies in master's or doctoral programs, leading to research and development roles or management positions in both private and public sectors. Graduates may also explore high-tech industries or academic careers, leveraging opportunities for interdisciplinary knowledge and specialized expertise.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

No relevant documents found


### Question 4 Analysis

1. **Relevance of the Answer** - There are no relevant documents, so this metric is irrelevant
2. **Completeness of Information** - There are no relevant documents, so this metric is irrelevant
3. **Clarity and Fluency** - The answer not in the same language as the question.
4. **Accuracy compared to the document** - Extreme incorrectness, there were no relevant documents found, but the model was able to produce an answer anyway. This was because of this context :

עוד .לאור דרישה גוברת למהנדסים ומנהלים בתחומי אנרגיה גיאותרמית,
סולרית ,הידרואלקטרית, ואנרגית רוח, יוכלו בוגרי התוכנית להשתלב בחברות יעוץ ותכנון בתחומים חדשניים אלו.
תנאי הרשמה
הלימודים בתוכנית להנדסת אנרגיה פתוחים בפני מסיימי תואר ראשון בהנדסה (B.Sc) ממוסד

that was provided, but the model didn't understand it's not relevant.

5. **Overall Assessment** - Overall, the model was completely wrong.

# Question 5 Analysis

In [105]:
question = "האם טיוואן זוהי מדינה?"

answer_and_print(question)

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: 
האם טיוואן זוהי מדינה? 

Answer: 
אין די מידע כדי לענות על השאלה. אני לא יודע.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

No relevant documents found


### Question 5 analysis

1. **Relevance of the Answer** - The model didnt answer, so irrelevant.
2. **Completeness of Information** - No relevant documents, so irrelevant.
3. **Clarity and Fluency** - The model answered in Hebrew like the provided question. The answer is readable.
4. **Accuracy compared to the document** - No relevant documents, so irrelevant.
5. **Overall Assessment** - The question has no relevant information in the pdfs, so saying "אני לא יודע" is the answer we wanted the model to say. Although, it shouldn't have said "אין די מידע כדי לענות על השאלה".

# Question 6

In [82]:
question = "אילו יישומים רפואיים נכללים בתחום ההנדסה הגרעינית?"
source = "366-2025.pdf"

answer_and_print(question, source)

Question: 
אילו יישומים רפואיים נכללים בתחום ההנדסה הגרעינית? 

Answer: 
The field of nuclear engineering encompasses several medical applications, including diagnostic tools like MRI machines, advanced drug delivery systems utilizing nanoparticles, high-resolution biological imaging techniques, bio-electronic devices, and developments in nuclear medicine such as radiation therapy for treating conditions like cancer. These innovations highlight the significant role nuclear engineering plays in advancing healthcare solutions.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

Relevant Document number 1: 
�בוגרינו השכלה
רחבה ככל שניתן בתחום הכללי של הנדסה גרעינית ומדעי הגרעין )כולל פיזיקה יישומית ואלגוריתמיקה(, ולהכין
אותם למשרות בתחומים נרחבים בתעשייה, במחקר וברפואה.
2
 חברי סגל היחידה
פרופסור חבר
פרופ' ארז גלעד - ראש היחידה
פרופ' יצחק אוריון
מרצה בכירד"ר ל

Relevant Document number 2: 
ית היחידה במדינת ישראל המכשירה חוקרים ומהנדסים בתחומים
החיוני

### Question 6 Analysis

1. **Relevance of the Answer** - The answer is not very relevant to the relevant documents, explaining more information that is given
2. **Completeness of Information** - The relevant information given is mentioned in the answer but not completely, the answer notes radiation therapy but doesn't provide more information on it.
3. **Clarity and Fluency** - The answer, is, once again, not in the same language as the answer. It is fluent though.
4. **Accuracy compared to the document** - A lot of incorrectness. The retrieved information doesnt contain much information for the answer, but the model found information elsewhere that it thought is relevant. I suspect it came from the context printed in the next coding block:
5. **Overall Assessment** - Overall, the model was not completely wrong, because it did mention radiation therapy, but it gave the wrong overall idea about the right answer.

# Question 7

In [78]:
question = "מהם התחומים המרכזיים אותם מכסה המחלקה להנדסה אזרחית וסביבתית?"
source = "374-2025.pdf"

answer_and_print(question, source)

Question: 
מהם התחומים המרכזיים אותם מכסה המחלקה להנדסה אזרחית וסביבתית? 

Answer: 
The Department of Civil and Environmental Engineering covers several key areas:

1. **Environmental Engineering**: Focused on addressing national infrastructure needs and future challenges through a unique double-degree program in building structures and earth sciences.

2. **Structural Engineering**: Specializes in the design and analysis of structures within the built environment.

3. **Industrial Engineering**: Includes tracks such as performance research, algorithms, systems, learning systems, and data mining.

4. **Bioengineering/Computational Medicine**: A joint program with health sciences applying computational methods to medical fields.

5. **Secure Systems**: Offers advanced studies in electrical and computer engineering for cybersecurity expertise.

These areas reflect a comprehensive approach to both traditional civil engineering and emerging interdisciplinary fields.

Out of the top 10 retr

## Question 7 Analysis

1. **Relevance of the Answer** - The answer is relevant, but was answered in English instead of Hebrew.
2. **Completeness of Information** - The model took the parts of the documents it needed and expanded based on its knowledge
3. **Clarity and Fluency** - The answer is not in the same language as the question.
4. **Accuracy compared to the document** - The question is actually longer then the documents provided, also, the model hallucinated about topics like industrial engineering and secure systems, that were not mentioned in the context.
5. **Overall Assessment** - The model mostly did give relevant and true information, but generated the answer in the wrong language and hallucinated some topics.

# Question 8

In [106]:
question = "מדוע חשוב לשלב קורסים באנגלית בתכנית להנדסת תוכנה?"
source = "373-2025.pdf"

answer_and_print(question, source)

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: 
מדוע חשוב לשלב קורסים באנגלית בתכנית להנדסת תוכנה? 

Answer: 
השלב של קורסים באנגלית בתכנית להנדסת תוכנה חשוב כדי לספק לתלמידים את הכשרה הדרושה לע世ת העולמית, כולל יכולת קריאה וכתיבה באנגלית בהקשר טכנולוגי. בנוסף, הקורסים בשפה הזו משפרים את המומחיות של הסטודנטים, מה שמאפשר להם להבין וללavor עם חומרテכני מובgliון. לבסוף, זוהי דרישה רשמית לתואר הראשון במדעי המחשב והנדסה.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

Relevant Document number 1: 
יד שלא השלים את הקורסים העודפים, אך השלים תכנית לימודים מלאה של הנדסת תוכנה ועמד ביתר
התנאים של המגמה, יוכל לקבל תואר ראשון בהנדסת תוכנה במגמת פסגות.
13
 

Relevant Document number 2: 
�הנדסת תוכנה.
קורסים בשפה האנגלית: החל משנת הלימודים תשפ "ב, חייב כל סטודנט ללמוד במסגרת לימודיו לתואר ראשון
במחלקה לפחות שני הקורסים בשפה האנגלית, בהיקף של לפחות 2 נק"ז. כדי לעמוד בחובה זו , תלמידי
התוכנית להכנסת תוכנה ילמדו את הקורס מבוא לתקשורת נת

Relevant Document number 3: 

והרוח. השנה הרביעית מתמקדת בקורס

### Question 8 Analysis

1. **Relevance of the Answer** - The describes exactly why its important to learn English for program engineering.
2. **Completeness of Information** - The answer mentions that you will need to do English courses, but it doesn't meantion how much.
3. **Clarity and Fluency** - The answer is not clear, having major and multiple types, even showing some chinese characters. The model did return most of the answer in the same language the question was asked in.
4. **Accuracy compared to the document** - The answer is not very based on the context, as the context doesnt meantion anything about how reading and writing capabilities help understanding technological context.
5. **Overall Assessment** - Overall, the answer was relevant, but it wasn't accurate and had multiple major typos.

# Question 9

In [108]:
question = "מי ניצח בפרסי הגראמי בשנת 2025?"

answer_and_print(question)

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: 
מי ניצח בפרסי הגראמי בשנת 2025? 

Answer: 
I do not have information about the 2025 Grammy Awards winners based on the provided context.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

No relevant documents found


### Question 9 Analysis

1. **Relevance of the Answer** - No relevant document, so irrelevant
2. **Completeness of Information** - No relevant document, so irrelevant
3. **Clarity and Fluency** - The answer is readable, but not in the same language as the question's language.
4. **Accuracy compared to the document** -No relevant document, so irrelevant
5. **Overall Assessment** - The model answered as we hoped, saying it doesn't know. A better answer though would have been a short "I don't know"

# Question 10

In [109]:
question = "איך קראו למטוס הסודי של חיל האוויר הישראלי שהיה מסוגל לטוס במהירות פי 5 ממהירות הקול בשנות ה-80?"

answer_and_print(question)

No sentence-transformers model found with name dicta-il/dictabert. Creating a new one with mean pooling.
Some weights of BertModel were not initialized from the model checkpoint at dicta-il/dictabert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Question: 
איך קראו למטוס הסודי של חיל האוויר הישראלי שהיה מסוגל לטוס במהירות פי 5 ממהירות הקול בשנות ה-80? 

Answer: 
I do not have information on the specific Israeli stealth fighter jet from the 1980s.

Out of the top 10 retrieved documents, This is the ones that came from the relevant pdf:

No relevant documents found


### Question 10 Analysis

1. **Relevance of the Answer** - No relevant document, so irrelevant
2. **Completeness of Information** - No relevant document, so irrelevant
3. **Clarity and Fluency** - The answer is readable, but not in the same language as the question's language.
4. **Accuracy compared to the document** -No relevant document, so irrelevant
5. **Overall Assessment** - The model answered as we hoped, saying it doesn't know. A better answer though would have been a short "I don't know"