# Search Engine
## **Mana Saleh Al Aqeel**

## Important Necessary Libraries

In [None]:
import pandas as pd
import torch
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import pytorch_cos_sim as cos_sim
import faiss
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pyarabic.araby as ar
import re
from tashaphyne.stemming import ArabicLightStemmer

import model embedding arabic

In [None]:
model = SentenceTransformer("acayir64/arabic-embedding-model-only-pairs")

## Preprocessing Data

In [3]:
df = pd.read_csv('CompanyReviews.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,review_description,rating,company
0,0,رائع,1,talbat
1,1,برنامج رائع جدا يساعد على تلبيه الاحتياجات بشك...,1,talbat
2,2,التطبيق لا يغتح دائما بيعطيني لا يوجد اتصال با...,-1,talbat
3,3,لماذا لا يمكننا طلب من ماكدونالدز؟,-1,talbat
4,4,البرنامج بيظهر كل المطاعم و مغلقه مع انها بتكو...,-1,talbat


In [4]:
def preprocess_arabic_text(text):
    if not isinstance(text, str):
        raise ValueError("Input must be a string.")
    
    if not text.strip():
        return ""  

    stemmer = ArabicLightStemmer()
    
    text = ar.normalize_hamza(text)
    text = ar.normalize_ligature(text)
    text = ar.strip_tashkeel(text)
    text = ar.strip_tatweel(text)
    text = ar.strip_harakat(text)

    text = re.sub(r'[^أ-يa-zA-Z0-9\s]', '', text)
    
    tokens = ar.tokenize(text)
    
    processed_tokens = [stemmer.light_stem(token) for token in tokens]
    
    processed_text = ' '.join(processed_tokens)
    
    return processed_text

### `preprocess_arabic_text` Function

This function prepares Arabic text for analysis by performing the following steps:

1. **Input Validation**
   - Ensures the input is a string. If it isn't, an error is raised.

2. **Empty String Check**
   - Returns an empty string if the input consists only of whitespace.

3. **Initialization**
   - Sets up a stemmer to reduce words to their root forms.

4. **Normalization and Cleaning**
   - Standardizes the text by normalizing variations of specific characters and removing diacritics and elongation marks.

5. **Remove Punctuation**
   - Strips out all non-essential characters, retaining only Arabic letters, English letters, digits, and spaces.

6. **Tokenization**
   - Splits the text into individual words or tokens.

7. **Stemming**
   - Applies stemming to each token to reduce it to its base or root form.

8. **Reconstruct Processed Text**
   - Joins the processed tokens back into a single string.

9. **Return Processed Text**
   - Outputs the cleaned and processed text.


In [5]:
text = df['review_description'].fillna('')

In [6]:
df['preprocessed_text'] = text.apply(preprocess_arabic_text)

In [7]:
df[['review_description', 'preprocessed_text']].head()

Unnamed: 0,review_description,preprocessed_text
0,رائع,راع
1,برنامج رائع جدا يساعد على تلبيه الاحتياجات بشك...,رنامج راع جد ساعد على لب احتياج شكل سرع
2,التطبيق لا يغتح دائما بيعطيني لا يوجد اتصال با...,تطبيق لا يغتح دام يعطين لا وجد تصال شبكةمع نه ...
3,لماذا لا يمكننا طلب من ماكدونالدز؟,ماذ لا مك طلب من ماكدونالدز
4,البرنامج بيظهر كل المطاعم و مغلقه مع انها بتكو...,برنامج يظهر كل مطاعم و مغلق مع نه تكون تح قال ...


In [8]:
doc = df['preprocessed_text'].fillna('').tolist()

In [9]:
len(doc)

40046

### Encoding Document into Embeddings

In [None]:
doc_embedding = model.encode(doc , convert_to_tensor= True)

In [None]:
doc_embedding.shape

### Saving Tensor to File

- **Purpose**: To persist the tensor for future use, avoiding the need to recompute it.

In [None]:
torch.save(doc_embedding, 'doc_embedding.pt')

In [None]:
embedding = torch.load('doc_embedding.pt' , map_location=torch.device('cpu'))

---

## Semantic search and TFIDF

In [12]:
def semantic_search(query, top_k=3):
    query_embedding = model.encode(preprocess_arabic_text(query) , convert_to_tensor=True)
    
    cosine_scores = cos_sim(query_embedding, embedding)[0]
    
    top_indices = np.argpartition(-cosine_scores.cpu().numpy(), range(top_k))[:top_k]
    
    print(f"Query: {query}\nTop {top_k} most similar documents:")
    for idx in top_indices:
        print(f"Document: {text[idx]}, Score: {cosine_scores[idx].item():.4f}")

### `semantic_search` Function

The `semantic_search` function performs a search to find the most similar documents to a given query based on semantic embeddings.

1. **Encode Query**: The function starts by converting the preprocessed query into an embedding vector using a model. This embedding represents the semantic content of the query.

2. **Compute Similarities**: It then calculates the cosine similarity between the query embedding and the embeddings of all documents. Cosine similarity measures how similar the query is to each document.

3. **Identify Top-k Results**: The function identifies the top-k most similar documents by sorting the cosine similarity scores and selecting the top indices.

4. **Display Results**: Finally, it prints the query and the top-k most similar documents along with their similarity scores. This helps in understanding which documents are most relevant to the query based on their semantic content.

**Purpose**: This function is used to retrieve and rank documents that are semantically closest to a given query, facilitating effective information retrieval.


In [13]:
while True:
    user_query = input("Enter your query (or type 'exit' to quit): ")
    if user_query == 'q':
        break
    semantic_search(user_query)

Enter your query (or type 'exit' to quit):  يساعد على تلبيه الاحتياجات


Query: يساعد على تلبيه الاحتياجات
Top 3 most similar documents:
Document: يساعدني علي سداد فواتيري, Score: 0.6929
Document: التطبيق إلى قبل افضل, Score: 0.6752
Document: ساعد أكثر, Score: 0.6697


Enter your query (or type 'exit' to quit):  يساعد على تلبيه 


Query: يساعد على تلبيه 
Top 3 most similar documents:
Document: كدبين ولاد كلب, Score: 0.7150
Document: يساعد, Score: 0.7140
Document: الحمد الله بدون دف, Score: 0.7004


Enter your query (or type 'exit' to quit):  q


---

Converts text into numerical features that reflect term importance within the documents.

In [14]:
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(doc)

In [15]:
def hybrid_search(query, top_k=3, weight_st=.5, weight_tfidf=.5):
    preprocessed_query = preprocess_arabic_text(query)
    query_embedding = model.encode(preprocessed_query , convert_to_tensor=True)
    
    cosine_scores = cos_sim(query_embedding, embedding)[0]
    
    query_tfidf = tfidf_vectorizer.transform([preprocessed_query])
    tfidf_similarities = cosine_similarity(query_tfidf, tfidf_matrix)[0]

    combined_scores = (weight_st * cosine_scores.cpu().numpy() + weight_tfidf * tfidf_similarities) / (weight_st + weight_tfidf)
    #normalize_scores = (combined_scores - np.min(combined_scores)) / (np.max(combined_scores) - np.min(combined_scores))

    top_results = np.argpartition(-combined_scores, range(top_k))[:top_k]

    print(f"Query: {query}\nTop {top_k} most similar documents:")
    for idx in top_results:
        print(f"Document: {text[idx]}, Combined Score: {combined_scores[idx]:.4f}")


### `hybrid_search` Function

The `hybrid_search` function performs a semantic and TF-IDF based search to find the most relevant documents to a query.

1. **Preprocess Query**: The query is preprocessed using `preprocess_arabic_text`.

2. **Compute Semantic Similarity**:
   - The preprocessed query is encoded into an embedding vector.
   - Cosine similarity is computed between the query embedding and document embeddings.

3. **Compute TF-IDF Similarity**:
   - The query is transformed into a TF-IDF vector.
   - Cosine similarity is calculated between the query's TF-IDF vector and the TF-IDF matrix of the documents.

4. **Combine Scores**:
   - The semantic and TF-IDF similarities are combined using weighted averages.
   - **`weight_st`** and **`weight_tfidf`** determine the contribution of semantic and TF-IDF scores, respectively.

5. **Identify Top Results**:
   - The top-k documents are selected based on the combined scores.

6. **Display Results**:
   - The function prints the query and the top-k most similar documents along with their combined scores.

**Purpose**: To retrieve documents that are most similar to the query by combining semantic and TF-IDF based similarities for improved relevance.


In [16]:
while True:
    user_query = input("Enter your query (or type 'exit' to quit): ")
    if user_query == 'q':
        break
    hybrid_search(user_query)

Enter your query (or type 'exit' to quit):  يساعد على تلبيه الاحتياجات


Query: يساعد على تلبيه الاحتياجات
Top 3 most similar documents:
Document: برنامج رائع جدا يساعد على تلبيه الاحتياجات بشكل اسرع, Combined Score: 0.6480
Document: يساعد, Combined Score: 0.5833
Document: ساعد أكثر, Combined Score: 0.5286


Enter your query (or type 'exit' to quit):  يساعد على تلبيه 


Query: يساعد على تلبيه 
Top 3 most similar documents:
Document: يساعد, Combined Score: 0.6789
Document: الب, Combined Score: 0.6560
Document: كلب, Combined Score: 0.6560


Enter your query (or type 'exit' to quit):  q


---

## Faiss

In [17]:
faiss.Index

faiss.swigfaiss_avx2.Index

In [18]:
print(text[84])
print(doc[84])
print(embedding[84])

كثير العلانات على فاضي طورو حالكم
ثير علان على ضي طور حال
tensor([-3.1479e-02, -5.7117e-01,  3.1469e-01, -1.1622e-01,  8.7758e-02,
        -1.6345e-01, -3.2968e-01,  5.8213e-01,  1.1912e-01,  3.6421e-01,
        -7.7369e-01, -1.6374e-02,  3.5413e-01,  3.7312e-01, -6.4841e-01,
         4.4944e-01, -1.8668e-01, -2.9280e-01, -2.0952e-01,  3.6817e-01,
        -2.4352e-01, -4.6627e-01, -1.2433e+00, -4.9719e-01, -1.0315e+00,
         3.0023e-01,  2.0854e-01,  3.5614e-01,  1.3048e+00,  7.0860e-02,
         3.4594e-01, -2.4703e-03,  3.0673e-01, -9.1111e-01, -5.8031e-01,
         1.2260e-01,  8.2773e-01,  2.4389e-01,  3.9348e-01, -2.7214e-01,
         2.5922e-01,  6.0404e-01, -8.6121e-02, -3.7856e-01, -1.2655e-01,
         5.8635e-01, -2.7992e-02,  5.2387e-01, -2.2740e-01, -4.9450e-01,
         3.6209e-01,  9.4444e-02,  2.7760e-02, -1.3656e+00, -3.1027e-01,
        -9.1427e-01, -2.8379e-01,  9.4310e-01,  3.2024e-01, -1.5282e-01,
         1.6006e-01,  7.0945e-01,  7.0040e-01,  5.6697e-02, -1.654

- **Dimension**:
  - `dimension = embedding.shape[1]` extracts the dimensionality of the embeddings. This is the number of features each embedding has.

- **Initialize FAISS Index**:
  - `index = faiss.IndexFlatL2(dimension)` creates a FAISS index for L2 (Euclidean) distance search.
    - **`faiss.IndexFlatL2`**: An index type in FAISS that performs exact nearest neighbor search using L2 distance.
    - **`dimension`**: Specifies the number of dimensions in the embeddings.

**Purpose**: To create an index structure that enables efficient nearest neighbor searches based on L2 distance, suitable for similarity searches and clustering.

In [19]:
dimension = embedding.shape[1]
index = faiss.IndexFlatL2(dimension)

### L2 Distance (Euclidean Distance)

- **Definition**: L2 distance, also known as Euclidean distance, measures the straight-line distance between two points in a multi-dimensional space.

- **Formula**: For two points $$( \mathbf{x} ) and ( \mathbf{y} )$$ in an n-dimensional space, the L2 distance is calculated as:
  $$
  \text{L2 Distance} = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}
  $$
  where $$ ( x_i ) and ( y_i )$$ are the coordinates of the points in the i-th dimension.

- **Characteristics**:
  - **Euclidean Geometry**: It is the most familiar distance metric, akin to measuring distance with a ruler.
  - **Sensitivity**: It is sensitive to the magnitude of differences, meaning it considers the exact distance between points.

- **Use Cases**:
  - Commonly used in clustering algorithms, nearest neighbor searches, and many machine learning algorithms where spatial relationships are important.

**Purpose**: To quantify the similarity or dissimilarity between vectors based on their spatial distance in a multi-dimensional feature space.


The `is_trained` attribute of a FAISS index indicates whether the index has been trained or not.

In [20]:
index.is_trained

True

The `index.add(embedding)` method adds vectors (embeddings) to a FAISS index.

- **Purpose**:
  - **Index Population**: This method is used to populate the index with the vectors you want to search. It stores these vectors so that you can later query the index for nearest neighbors.


- **Note**:
  - Ensure that the dimensions of `embedding` match the dimensionality of the FAISS index.m

In [21]:
index.add(embedding)

To check if vectors have been successfully added to the index, you can compare the value of `index.ntotal` before and after adding vectors.

In [22]:
index.ntotal

40046

In [23]:
def first_faiss(query, k=10):
        preprocessed_query = preprocess_arabic_text(query)        
        query_embedding = model.encode(preprocessed_query, convert_to_tensor=True).unsqueeze(0).numpy()
        
        D, I = index.search(query_embedding, k)
        
        results = [f'{i}: {text[idx]}' for idx, i in enumerate(I[0]) if idx < len(text)]
        
        for result in results:
            print(result)
        
        return results

### `first_faiss` Function

The `first_faiss` function performs a search using a FAISS index to find the most similar documents to a given query.

1. **Preprocess Query**:
   - The query is preprocessed using `preprocess_arabic_text` to clean and prepare it for embedding.

2. **Generate Query Embedding**:
   - The preprocessed query is converted into an embedding vector using the model. The embedding is then reshaped and converted to a NumPy array.

3. **Search the FAISS Index**:
   - `index.search(query_embedding, k)` performs a search in the FAISS index to find the top `k` most similar documents.
     - **`D`**: Distances of the nearest neighbors.
     - **`I`**: Indices of the nearest neighbors.

**Purpose**: To find and display the top `k` most similar documents to the query using FAISS for efficient nearest neighbor search.


In [24]:
while True:
    user_query = input("Enter your query (or type 'q' to quit): ")
    if user_query.lower() == 'q':
        break
    first_faiss(user_query)

Enter your query (or type 'q' to quit):  يساعد على تلبيه الاحتياجات


38305: رائع
19573: برنامج رائع جدا يساعد على تلبيه الاحتياجات بشكل اسرع
28545: التطبيق لا يغتح دائما بيعطيني لا يوجد اتصال بالشبكة..مع انه النت عندي تمام شو الحل??
12515: لماذا لا يمكننا طلب من ماكدونالدز؟
11068: البرنامج بيظهر كل المطاعم و مغلقه مع انها بتكون فاتحه بقاله كده اكتر من شهر
38347: أصبح غالي جداً
15357: جميل جدا رائع. . .
31381: للأسف الواحد ينصدم بعد زيادة الاسعار و للاسف بعض المطاعم اصبحت معاملتهم رديئة
18475: برنامج توترز توصيل احلى من برنامجكم فاشل
25478: كتير في تحسن خدمة العملاء لطفين في بعض الاخطاء التقنية لكن بتصلحوها بسرعة وبعض التاخير سياسيتكم وتسويقكم صار افضل من كريم بالنسبة الي متفقين


Enter your query (or type 'q' to quit):  q


In [25]:
nlist = 100
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)

### Creating an IVF FAISS Index

- **Number of Lists**:
  - `nlist = 100` sets the number of clusters or lists in the IVF index. This value determines how many partitions the data is divided into for searching.

- **Quantizer**:
  - `quantizer = faiss.IndexFlatL2(dimension)` creates a quantizer using L2 distance. This index type is used to partition the space and perform efficient search within each partition.

- **Index Initialization**:
  - `index = faiss.IndexIVFFlat(quantizer, dimension, nlist)` creates an IVF index with a flat quantizer.
    - **`quantizer`**: The indexing structure used to partition the space.
    - **`dimension`**: The dimensionality of the vectors.
    - **`nlist`**: The number of partitions or clusters in the index.

**Purpose**: To create an IVF index that partitions data into multiple lists for efficient nearest neighbor searches, improving search speed and scalability compared to a flat index.


In [26]:
index.is_trained

False

In [27]:
index.train(embedding)
index.is_trained

True

In [28]:
index.add(embedding)
index.ntotal

40046

In [29]:
query = preprocess_arabic_text(str(["يساعد على تلبيه الاحتياجات"]))        

In [30]:
query

'ساعد على لب احتياج'

In [31]:
k = 10
xq = model.encode(query , convert_to_tensor=True).unsqueeze(0).numpy()

In [32]:
%%time
D, I = index.search(xq, k) 
print(I)

[[28545 18094 22516 15620 22798 10595 23623 27043 13297   807]]
CPU times: user 745 µs, sys: 13 µs, total: 758 µs
Wall time: 627 µs


In [33]:
[f'{i}: {text[i]}' for i in I[0]]

['28545: التطبيق إلى قبل افضل',
 '18094: احب اقولكم تطبيقم مابي خير',
 '22516: معرفت اطلب ساعدوني',
 '15620: لانة يساعدني لاكل',
 '22798: حلووو انصحكم فيه 😍😍',
 '10595: جميل فعال بنصح بتستعمالو بينجز',
 '23623: انصحكم تحملو😍😍😍',
 '27043: انصحكم تحملو',
 '13297: انصحكم بهذا البرناج',
 '807: افضل تجربه على الإطلاق']

In [34]:
index.nprobe =10000

### Setting `nprobe` in FAISS

- **Definition**:
  - `index.nprobe = 10000` sets the number of clusters (or partitions) to be searched during a query.

- **Purpose**:
  - **Control Search Scope**: `nprobe` determines how many of the `nlist` clusters are examined during the search. A higher `nprobe` value means more clusters are searched, potentially improving recall but increasing search time.
  - **Balance**: Adjusting `nprobe` allows you to balance between search accuracy and performance.


**Purpose**: To configure the FAISS index to search a specified number of clusters, optimizing the trade-off between search quality and speed.


In [35]:
%%time
D, I = index.search(xq, k)  
print(I)

[[38305 19573 28545 12515 11068 38347 15357 31381 18475 25478]]
CPU times: user 6.27 ms, sys: 2.97 ms, total: 9.24 ms
Wall time: 6.9 ms


In [36]:
[f'{i}: {text[i]}' for i in I[0]]

['38305: يساعدني علي سداد فواتيري',
 '19573: ساعد أكثر',
 '28545: التطبيق إلى قبل افضل',
 '12515: طلبات فقط للامارات',
 '11068: احب طلبات وايد ويسعدني اذا يوعانه',
 '38347: Yyy help with the remedy for abroken',
 '15357: يساعد',
 '31381: رووووووووووعه الله يعطيكم العافي',
 '18475: طلبات ساعدني كثيرا احببته',
 '25478: يعطيكم العافيه عل  مجهودكم الأكثر من رائع']

---

## Faiss and TFIDF

In [37]:
def faiss_tfidf(query, k=10):
    preprocessed_query = preprocess_arabic_text(query)    
    query_embedding = model.encode(preprocessed_query, convert_to_tensor=True).unsqueeze(0).numpy()
    
    query_tfidf = tfidf_vectorizer.transform([preprocessed_query])
    tfidf_similarities = cosine_similarity(query_tfidf, tfidf_matrix)[0]
    
    index.nprobe = 10000
    D, I = index.search(query_embedding, k)
    
    results = []
    for idx in I[0]:
        if idx < len(text):
            results.append(f'{idx}: {text[idx]}')
    
    for result in results:
        print(result)
    
    return results

### `faiss_tfidf` Function

The `faiss_tfidf` function performs a combined semantic and TF-IDF based search to find the most relevant documents for a given query.

1. **Preprocess Query**:
   - The query is preprocessed using `preprocess_arabic_text` to ensure it is clean and suitable for embedding.

2. **Generate Query Embedding**:
   - The preprocessed query is converted into an embedding vector using the model, reshaped, and converted to a NumPy array.

3. **Compute TF-IDF Similarity**:
   - The query is transformed into a TF-IDF vector.
   - Cosine similarity is calculated between the query’s TF-IDF vector and the TF-IDF matrix of the documents.

4. **Search FAISS Index**:
   - `index.nprobe = 10000` sets the number of clusters to be searched in the FAISS index.
   - `index.search(query_embedding, k)` performs a search to find the top `k` most similar documents based on the semantic embedding.

5. **Format and Display Results**:
   - The function collects the results by retrieving the text of the documents corresponding to the indices found.
   - Results are printed and returned as a list of formatted strings showing the document index and content.

**Purpose**: To retrieve and display the top `k` most relevant documents to the query using both semantic embeddings and TF-IDF similarity, optimizing for accuracy and relevance in the search results.


In [38]:
query = 'يساعد على تلبيه الاحتياجات'
results = faiss_tfidf(query)

38305: يساعدني علي سداد فواتيري
19573: ساعد أكثر
28545: التطبيق إلى قبل افضل
12515: طلبات فقط للامارات
11068: احب طلبات وايد ويسعدني اذا يوعانه
38347: Yyy help with the remedy for abroken
15357: يساعد
31381: رووووووووووعه الله يعطيكم العافي
18475: طلبات ساعدني كثيرا احببته
25478: يعطيكم العافيه عل  مجهودكم الأكثر من رائع


In [39]:
def faiss_tfidf(query, k=10, weight_st=.5, weight_tfidf=.8):
    preprocessed_query = preprocess_arabic_text(query)    
    query_embedding = model.encode(preprocessed_query, convert_to_tensor=True).unsqueeze(0).numpy()
    
    query_tfidf = tfidf_vectorizer.transform([preprocessed_query])
    tfidf_similarities = cosine_similarity(query_tfidf, tfidf_matrix)[0]
    
    index.nprobe = 10000
    D, I = index.search(query_embedding, k)
    
    results = []
    for idx in I[0]:
        if idx < len(text):
            # FAISS similarity: 1 / (distance + epsilon) to avoid division by zero
            faiss_similarity = 1 / (D[0][list(I[0]).index(idx)] + 1e-2)
            # TF-IDF similarity
            tfidf_similarity = tfidf_similarities[idx]
            # Compute weighted score
            weighted_score = weight_st * faiss_similarity + weight_tfidf * tfidf_similarity
            results.append((weighted_score, idx, text[idx]))
    
    results.sort(reverse=True, key=lambda x: x[0])
    
    for score, idx, doc in results:
        print(f'Score: {score:.4f} - Index: {idx} - Document: {doc}')
    
    return results

### The second `faiss_tfidf` Function

The `faiss_tfidf` function performs a hybrid search using both FAISS and TF-IDF methods to retrieve the most relevant documents for a given query.

1. **Preprocess Query**:
   - The query is preprocessed with `preprocess_arabic_text` to ensure it is clean and ready for embedding.

2. **Generate Query Embedding**:
   - The preprocessed query is converted into an embedding vector using the model, reshaped, and converted to a NumPy array.

3. **Compute TF-IDF Similarity**:
   - The query is transformed into a TF-IDF vector.
   - Cosine similarity is computed between the query’s TF-IDF vector and the TF-IDF matrix of the documents.

4. **Search FAISS Index**:
   - `index.nprobe = 10000` specifies the number of clusters to be searched in the FAISS index.
   - `index.search(query_embedding, k)` performs the search to find the top `k` most similar documents based on semantic embeddings.

5. **Compute Weighted Scores**:
   - For each retrieved document:
     - **FAISS Similarity**: Calculated as `1 / (distance + epsilon)` to avoid division by zero, where `epsilon` is a small constant.
     - **TF-IDF Similarity**: Directly from the computed TF-IDF similarities.
     - **Weighted Score**: Combined using specified weights (`weight_st` for FAISS and `weight_tfidf` for TF-IDF).

6. **Sort and Display Results**:
   - Results are sorted by weighted score in descending order.
   - The function prints and returns the results, showing the score, index, and document content.

**Purpose**: To find and display the top `k` documents that are most relevant to the query by combining FAISS-based semantic similarity with TF-IDF similarity, using weighted scoring to balance the two methods.


In [40]:
query = 'يساعد على تلبيه الاحتياجات'
results = faiss_tfidf(query)

Score: 0.4029 - Index: 15357 - Document: يساعد
Score: 0.3128 - Index: 19573 - Document: ساعد أكثر
Score: 0.2412 - Index: 38305 - Document: يساعدني علي سداد فواتيري
Score: 0.2091 - Index: 18475 - Document: طلبات ساعدني كثيرا احببته
Score: 0.0029 - Index: 28545 - Document: التطبيق إلى قبل افضل
Score: 0.0028 - Index: 12515 - Document: طلبات فقط للامارات
Score: 0.0027 - Index: 11068 - Document: احب طلبات وايد ويسعدني اذا يوعانه
Score: 0.0027 - Index: 38347 - Document: Yyy help with the remedy for abroken
Score: 0.0026 - Index: 31381 - Document: رووووووووووعه الله يعطيكم العافي
Score: 0.0026 - Index: 25478 - Document: يعطيكم العافيه عل  مجهودكم الأكثر من رائع


---

## Results Overview

### **Query**: يساعد على تلبيه الاحتياجات

#### **Semantic Search**
  - يساعدني علي سداد فواتيري
  - التطبيق إلى قبل افضل
  - ساعد أكثر

#### **Hybrid Search**
  - برنامج رائع جدا يساعد على تلبيه الاحتياجات بشكل اسرع
  - يساعد
  - ساعد أكثر

#### **First FAISS Search**
  - رائع
  - برنامج رائع جدا يساعد على تلبيه الاحتياجات بشكل اسرع
  - التطبيق لا يغتح دائما بيعطيني لا يوجد اتصال بالشبكة..مع انه النت عندي تمام شو الحل??
  - لماذا لا يمكننا طلب من ماكدونالدز؟
  - البرنامج بيظهر كل المطاعم و مغلقه مع انها بتكون فاتحه بقاله كده اكتر من شهر
  - أصبح غالي جداً
  - جميل جدا رائع. . .
  - للأسف الواحد ينصدم بعد زيادة الاسعار و للاسف بعض المطاعم اصبحت معاملتهم رديئة
  - برنامج توترز توصيل احلى من برنامجكم فاشل
  - كتير في تحسن خدمة العملاء لطفين في بعض الاخطاء التقنية لكن بتصلحوها بسرعة وبعض التاخير سياسيتكم وتسويقكم صار افضل من كريم بالنسبة الي متفقين

#### **FAISS with nlist**
- يساعدني علي سداد فواتيري
- ساعد أكثر
- التطبيق إلى قبل افضل
- طلبات فقط للامارات
- احب طلبات وايد ويسعدني اذا يوعانه
- Yyy help with the remedy for abroken
- يساعد
- رووووووووووعه الله يعطيكم العافي
- طلبات ساعدني كثيرا احببته
- يعطيكم العافيه عل  مجهودكم الأكثر من رائع

#### **FAISS with nprobe**
- يساعدني علي سداد فواتيري
- ساعد أكثر
- التطبيق إلى قبل افضل
- طلبات فقط للامارات
- احب طلبات وايد ويسعدني اذا يوعانه
- Yyy help with the remedy for abroken
- يساعد
- رووووووووووعه الله يعطيكم العافي
- طلبات ساعدني كثيرا احببته
- يعطيكم العافيه عل  مجهودكم الأكثر من رائع

#### **FAISS with TF-IDF**
- يساعدني علي سداد فواتيري
- ساعد أكثر
- التطبيق إلى قبل افضل
- طلبات فقط للامارات
- احب طلبات وايد ويسعدني اذا يوعانه
- Yyy help with the remedy for abroken
- يساعد
- رووووووووووعه الله يعطيكم العافي
- طلبات ساعدني كثيرا احببته
- يعطيكم العافيه عل  مجهودكم الأكثر من رائع

#### **FAISS with TF-IDF and Weights**
- يساعد
- ساعد أكثر
- يساعدني علي سداد فواتيري
- طلبات ساعدني كثيرا احببته
- التطبيق إلى قبل افضل
- طلبات فقط للامارات
- احب طلبات وايد ويسعدني اذا يوعانه
- Yyy help with the remedy for abroken
- رووووووووووعه الله يعطيكم العافي
- يعطيكم العافيه عل  مجهودكم الأكثر من رائع

- **Best Method**: **FAISS with TF-IDF and Weights** provides the most accurate and relevant results by combining FAISS with TF-IDF and applying weights for better precision.

- **Comparison**:
  - **FAISS with nlist** and **FAISS with nprobe** offer good relevance but are less fine-tuned.
  - **Hybrid Search** balances multiple techniques but may not always be as precise.
  - **Semantic Search** focuses on meaning but may miss specific relevance.

**FAISS with TF-IDF and Weights** is the most effective method for accurate search results.