# Keyword Search

The keyword search works in a way like BM25 or Tf-idf etc,, similar to elastic search or solr search engines. 
weaviate database supports both keyword search and vector search.

## Setup

In [1]:
!pip install cohere > /dev/null
!pip install weaviate-client > /dev/null
!pip install python-dotenv > /dev/null

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

Let's start by imporing Weaviate to access the Wikipedia database.

the dataset contains 10 million articles in multiple languages.

In [2]:
import weaviate
auth_config = weaviate.auth.AuthApiKey(
    api_key=os.environ['WEAVIATE_API_KEY'])

In [4]:
client = weaviate.Client(
    url="https://cohere-demo.weaviate.network/",
    auth_client_secret=auth_config,
    additional_headers = {
        'X-Choere-Api-Key': os.environ['COHERE_API_KEY']
    }
)

In [5]:
client.is_ready()

True

In [7]:
def keyword_search(query, 
                   results_lang="en",
                   properites = ['title', 'url', 'text'],
                   num_results=3):
    
    where_filter = {
        "path": ["lang"],
        "operator": "Equal",
        "valueString": results_lang
    }
    
    response = (
        client.query.get("Articles", properites)
        .with_bm25(
            query=query
        )
        .with_where(where_filter)
        .with_limit(num_results)
        .do()
        
    )
    
    result = response['data']['Get']['Articles']
    return result

In [8]:
query = "What is the most viewed televised event?"
keyword_search_results = keyword_search(query)
keyword_search_results

[{'text': 'The most active Gamergate supporters or "Gamergaters" said that Gamergate was a movement for ethics in games journalism, for protecting the "gamer" identity, and for opposing "political correctness" in video games and that any harassment of women was done by others not affiliated with Gamergate. They argued that the close relationships between journalists and developers demonstrated a conspiracy among reviewers to focus on progressive social issues. Some supporters pointed to what they considered disproportionate praise for games such as "Depression Quest" and "Gone Home", which feature unconventional gameplay and stories with social implications, while they viewed traditional AAA games as downplayed. False claims of the "ethics in game journalism" had started as early as 2012, when Geoff Keighley was accused of such unethical behavior when he was presenting information about "Halo 4" among advertisements for Mountain Dew and Doritos, an event called "Doritosgate" in the gam

In [9]:
def print_result(result):
    """print results in pretty print format"""
    for i, item in enumerate(result):
        print(f'item {i}')
        for key in item.keys():
            print(f"{key}:{item.get(key)}")
            print()
        print()

In [10]:
print_result(keyword_search_results)

item 0
text:The most active Gamergate supporters or "Gamergaters" said that Gamergate was a movement for ethics in games journalism, for protecting the "gamer" identity, and for opposing "political correctness" in video games and that any harassment of women was done by others not affiliated with Gamergate. They argued that the close relationships between journalists and developers demonstrated a conspiracy among reviewers to focus on progressive social issues. Some supporters pointed to what they considered disproportionate praise for games such as "Depression Quest" and "Gone Home", which feature unconventional gameplay and stories with social implications, while they viewed traditional AAA games as downplayed. False claims of the "ethics in game journalism" had started as early as 2012, when Geoff Keighley was accused of such unethical behavior when he was presenting information about "Halo 4" among advertisements for Mountain Dew and Doritos, an event called "Doritosgate" in the ga

In [13]:
query = "اشهر مسلسلات عربي?"
keyword_search_results = keyword_search(query, results_lang="ar")
print_result(keyword_search_results)

item 0
text:فواز بركات الزعبي الجيلاني (1868 - 1931)، أحد اشهر مشايخ منطقة الرمثا وحوران في اواخر الحقبة العثمانية وفترة الانتداب، ولد في مدينة الرمثا عام 1868 و نشأ في كنف والديه، تعلم القراءة والكتابة وقراءة القرآن الكريم ومبادئ الحساب على يد شيوخ المساجد وما كان يسمى بالكتّاب أو الكتاتيب التي كانت سائدة في ذلك الوقت. و قد رعاه والده الشيخ بركات رعاية أبناء الشيوخ.

title:فواز بركات الزعبي

url:https://ar.wikipedia.org/wiki?curid=1222161


item 1
text:إياد نحاس، مخرج سوري. بدأ عمله مساعد مخرج في مسلسلات تلفزيونية منها كسر الخواطر، سقف العالم 2007م، كما عمل مخرجاً منفذاً في مسلسلات بيت جدي، الدوامة)، كما ساهم أيضاً كتعاون فني في المسلسل التاريخي القعقاع بن عمرو التميمي، أخرج للتلفزيون عدة مسلسلات منها الشام العدية، أيام الدراسة، ما بتخلص حكاياتنا.

title:إياد نحاس

url:https://ar.wikipedia.org/wiki?curid=3752496


item 2
text:يُعتبر الإقبال على مسلسلات الأطفال المسيحية مثل حكايات الخضروات والمنزل الطائر وموكب العصور ويحكى أن كأداة تستخدمها الشبكات المسيحية كأداة للقوة المسيحية الناعمة