## Challenge 4

1. Use the URL and API key to connect to the Weaviate instance
2. How many vectors are stored in this database
3. Perform search over them to find concepts you are interested in!
4. See if you can filter out for a particular language that you understand and then perform vector search to see if you get back relevant results.


### 1. Use the URL and API key to connect to the Weaviate instance
```python
url="https://cohere-demo.weaviate.network/"
api_key="76320a90-53d8-42bc-b41d-678647c6672e"
```

In [19]:
import weaviate, os, json

cohere_api_key = os.getenv("COHERE_API_KEY")

auth_config = weaviate.auth.AuthApiKey(api_key="76320a90-53d8-42bc-b41d-678647c6672e") 

client = weaviate.Client(
    url="https://cohere-demo.weaviate.network/",
    auth_client_secret=auth_config,
    additional_headers={
        "X-Cohere-Api-Key": cohere_api_key,
    }
)

client.is_ready() #check if True

True

### 2. How many vectors are stored in this database

In [18]:
print(json.dumps(client.query.aggregate("Articles").with_meta_count().do(),indent=2))

{
  "data": {
    "Aggregate": {
      "Articles": [
        {
          "meta": {
            "count": 9436199
          }
        }
      ]
    }
  }
}


### 3. Perform search over them to find concepts you are interested in!

In [21]:
response = (client.query
            .get("Articles", ["text", "title", "url", "views", "lang"])
            .with_near_text({'concepts':"vacation spots in california"})
            .with_limit(5)
            .do())

print(json.dumps(response, indent=2))

{
  "data": {
    "Get": {
      "Articles": [
        {
          "lang": "en",
          "text": "Many locals and tourists frequent the Southern California coast for its beaches. Some of southern California's most popular beaches are Malibu, Laguna Beach, La Jolla, Manhattan Beach, and Hermosa Beach. Southern California is also known for its mountain resort communities, such as Big Bear Lake, Lake Arrowhead, and Wrightwood, and their ski resorts, like Bear Mountain, Snow Summit, Snow Valley Mountain Resort, and Mountain High. The inland desert city of Palm Springs is also popular.",
          "title": "Southern California",
          "url": "https://en.wikipedia.org/wiki?curid=62520",
          "views": 2000
        },
        {
          "lang": "fr",
          "text": "Les plages et parcs c\u00f4tiers principaux sont \"Trinidad State Beach\", \"Torrey Pines State Reserve\", le \"Cabrillo National Monument\". Les touristes se dirigent aussi vers les missions espagnoles, le \"Donner 

### 4. See if you can filter out for a particular language that you understand and then perform vector search to see if you get back relevant results.

In [30]:
nearText = {
        "concepts": ['easy to cook tasty meals'],
    }

properties = [
        "text", "title", "url", "views", "lang",
        "_additional {distance}"
    ]

where_filter = {
        "path": ["lang"],
        "operator": "Equal",
        "valueString": 'en' #ja
        }

In [31]:
response = (
            client.query
            .get("Articles", properties)
            .with_where(where_filter)
            .with_near_text(nearText)
            .with_limit(5)
            .do()
        )

result = response['data']['Get']['Articles']

print(json.dumps(response,indent=2))

{
  "data": {
    "Get": {
      "Articles": [
        {
          "_additional": {
            "distance": -144.8925
          },
          "lang": "en",
          "text": "Lawson has adopted a casual approach to cooking, stating, \"I think cooking should be about fun and family. ... I think part of my appeal is that my approach to cooking is really relaxed and not rigid. There are no rules in my kitchen.\" One editor, highlighting the technical simplicity of Lawson's recipes, noted that \"her dishes require none of the elaborate preparation called for by most TV chefs\".",
          "title": "Nigella Lawson",
          "url": "https://en.wikipedia.org/wiki?curid=153232",
          "views": 2000
        },
        {
          "_additional": {
            "distance": -144.7637
          },
          "lang": "en",
          "text": "In October 2010, Seinfeld launched a website for beginner cooks called \"Do it Delicious.\" The website teaches at-home viewers how to prepare particular di

### Below I've provided you a function to put all of this together and explore interesting multilingual searches

In [32]:
def semantic_serch(query, results_lang=''):
    """ 
    Query the vectors database and return the top results. 


    Parameters
    ----------
        query: str
            The search query
            
        results_lang: str (optional)
            Retrieve results only in the specified language.
            The demo dataset has those languages:
            en, de, fr, es, it, ja, ar, zh, ko, hi

    """
    
    nearText = {"concepts": [query]}
    properties = ["text", "title", "url", "views", "lang", "_additional {distance}"]

    # To filter by language
    if results_lang != '':
        where_filter = {
        "path": ["lang"],
        "operator": "Equal",
        "valueString": results_lang
        }
        response = (
            client.query
            .get("Articles", properties)
            .with_where(where_filter)
            .with_near_text(nearText)
            .with_limit(5)
            .do()
        )
        
    # Search all languages
    else:
        response = (
            client.query
            .get("Articles", properties)
            .with_near_text(nearText)
            .with_limit(5)
            .do()
        )


    result = response['data']['Get']['Articles']

    return result


def print_result(result):
    """ Print results with colorful formatting """
    for item in result:
        print(f"\033[95m{item['title']} ({item['views']}) {item['_additional']['distance']}\033[0m")
        print(f"\033[4m{item['url']}\033[0m")
        print(item['text'])
        print()

In [34]:
query_result = semantic_serch("easy to cook tasty meals", results_lang='ja')

# Print out the result
print_result(query_result)

[95mハンバーグ (1000) -144.0292[0m
[4mhttps://ja.wikipedia.org/wiki?curid=25707[0m
また、レトルト食品のハンバーグは調理が簡単である。一度焼いたハンバーグをそのまま、またはソースとともに封入することで、パックごと湯煎するだけで食卓に出すことができ、少々の材料面における味の不備も漬け込むソースでフォローできること、衛生的な生産工場（セントラルキッチン）による大量生産によって非常に安価に製造できるメリットが大きいため、家庭用・業務用ともに広く普及している。

[95m電子レンジ (900) -142.92491[0m
[4mhttps://ja.wikipedia.org/wiki?curid=17051[0m
野菜、とくに火が通りづらい根菜類でも、温野菜を作ることができる。これは食材の下拵えとしても行われる。レンジパックなどの、より簡単に温野菜をつくれる調理グッズも出てきている。ケーキのようなものも、電子レンジを用いて作ることができる。食感は蒸しケーキに似る。

[95mキャセロール (500) -142.91748[0m
[4mhttps://ja.wikipedia.org/wiki?curid=170018[0m
キャセロールは1950年代に多くの理由により非常に普及した家庭料理になった。材料すなわちツナ缶、缶詰の野菜、缶詰のスープ、およびエッグヌードルが安価で手に入りやすいこと、そして35分あれば作ることができるのが主な理由である。また、残りを冷凍または冷蔵し、翌日に温めなおして食べることもできる。ポットラック（持ち寄り食事会）や病人のお見舞い品としてもたいへん普及している。ツナキャセロールは一皿ごとに異なるが、歴史的には、エッグヌードル、刻みタマネギ、卸したチェダーチーズ、冷凍グリーンピース、漬け油を切ったツナ缶、缶詰の濃縮マッシュルームスープ、缶詰の薄切りマッシュルーム、砕いたポテトチップスを材料とする。ゆでた麺、タマネギ、チーズ、グリーンピース、ツナ、マッシュルームを耐熱皿に入れて混ぜ合わせ、ポテトチップとチーズを天面に振りかけてオーブンで焼く。

[95mチキンライス (500) -142.36076[0m
[4mhttps://ja.wikipedi