# Query MongoDB Atlas Using Custom Embeddings

In this Python notebook, we'll be making use of the search indexes we've just created to query our custom embeddings!

## Load Settings
As usual, we'll start be loading up our system variables.

In [1]:
# Load settings from .env file
import os,sys

# Change system path to root direcotry
sys.path.insert(0, '../')

from dotenv import find_dotenv, dotenv_values

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# For debugging purposes
# print (config)

ATLAS_URI = config.get('ATLAS_URI')

if not ATLAS_URI:
    raise Exception ("'ATLAS_URI' is not set.  Please set it above to continue...")
else:
    print("ATLAS_URI Connection string found:", ATLAS_URI)

ATLAS_URI Connection string found: mongodb+srv://yongtaufoo:mucjOuDXLysFfEGA@cluster0.ds8hjdi.mongodb.net/?retryWrites=true&w=majority


In [2]:
# Our variables
DB_NAME = 'sample_mflix'
COLLECTION_NAME = 'embedded_movies'

## Initialize Mongo Atlas Client
Then, we intialize a connection to Mongo Atlas Client by using our unique ATLAS_URI value.

In [3]:
from AtlasClient import AtlasClient

atlas_client = AtlasClient (ATLAS_URI, DB_NAME)
print("Connected to the Mongo Atlas database!")

Connected to the Mongo Atlas database!


Once that's done, we'll create a mapping `model_mappings` that defines the names of the embedding attribute and search index for each model. Since we're using 3 models, we'll have 3 sets of keys and values in our mapping.

In [4]:
model_mappings = {
    'BAAI/bge-small-en-v1.5' : {'embedding_attr' : 'plot_embedding_bge_small', 'index_name' : 'idx_plot_embedding_bge_small'},
    'sentence-transformers/all-mpnet-base-v2' : {'embedding_attr' : 'plot_embedding_mpnet_base_v2', 'index_name' : 'idx_plot_embedding_mpnet_base_v2'},
    'sentence-transformers/all-MiniLM-L6-v2' : {'embedding_attr' : 'plot_embedding_minilm_l6_v2', 'index_name' : 'idx_plot_embedding_minilm_l6_v2'},
}

## Time to Query!
Now that we've updated the collection, let's try making some queries!

In [5]:
# LlamaIndex will download embeddings models as needed.
# Set llamaindex cache dir to ../cache dir here (Default is system tmp)
# This way, we can easily see downloaded artifacts
os.environ['LLAMA_INDEX_CACHE_DIR'] = os.path.join(os.path.abspath('../'), 'cache')

# from llama_index.embeddings import HuggingFaceEmbedding
# Uncomment the line above and comment the line below if you face an import error
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

import time

# This is a handy function to run a query given a model
def run_vector_query (query : str, model_name : str):
    model_mapping = model_mappings.get(model_name)
    if model_mapping is None:
        raise Exception ("Unknown model : " + model_name)
    embedding_attr = model_mapping['embedding_attr']
    index_name = model_mapping ['index_name']

    # Generate embeddings
    embed_model = HuggingFaceEmbedding(model_name=model_name)
    query_embeddings = embed_model.get_text_embedding(query)

    # Now let's query Atlas
    t1a = time.perf_counter()
    movies = atlas_client.vector_search (collection_name=COLLECTION_NAME, index_name=index_name, attr_name=embedding_attr, embedding_vector=query_embeddings, limit=5)
    t1b = time.perf_counter()
    print (f'Atlas query returned in {(t1b-t1a)*1000} ms')
    return movies

# This function helps to print the queried movies
def print_movies(movies):
    print (f"Found {len (movies)} movies")
    for idx, movie in enumerate (movies):
        print(f'{idx+1}\nid: {movie["_id"]}\ntitle: {movie["title"]},\nyear: {movie["year"]}' +
            f'\nsearch_score(meta):{movie["search_score"]}\nplot: {movie["plot"]}\n')

Let's start with the `BAAI/bge-small-en-v1.5` model first! We'll be using the query **fatalistic sci-fi movies**. What this means behind the scenes is that MongoDB is conducting an Atlas Vector Search to find movies that have plots that most closely matches the semantic meaning of our query.

In [6]:
query = 'fatalistic sci-fi movies'
model_name = 'BAAI/bge-small-en-v1.5'

movies = run_vector_query(query=query, model_name=model_name)

print (f'========== model = {model_name} ======')
print_movies(movies)

  from .autonotebook import tqdm as notebook_tqdm


Atlas query returned in 116.21091599226929 ms
Found 5 movies
1
id: 573a1397f29313caabce61a5
title: Logan's Run,
year: 1976
search_score(meta):0.5782828330993652
plot: An idyllic sci-fi future has one major drawback: life must end at 30.

2
id: 573a13bff29313caabd5de30
title: Journey to Saturn,
year: 2008
search_score(meta):0.5679124593734741
plot: A danish crew of misfits travel to Saturn in search for natural resources. However, the planet is colonized by a ruthless army of Aliens that turn their eye on Earth and invade Denmark. ...

3
id: 573a13a6f29313caabd1898d
title: Forklift Driver Klaus: The First Day on the Job,
year: 2000
search_score(meta):0.5652276277542114
plot: Short film depicting a fictional educational film about fork lift truck operational safety. The dangers of unsafe operation are presented in gory details.

4
id: 573a13a8f29313caabd1ccea
title: Forklift Driver Klaus: The First Day on the Job,
year: 2000
search_score(meta):0.5652276277542114
plot: Short film depictin

As you can see from the plots of the movies returned, we were indeed able to search for movies that had plots that were related to sci-fi and/or fatalistic. Cool stuff! Now let's try using the `sentence-transformers/all-mpnet-base-v2` model instead to see what results we get.

In [7]:
query = 'fatalistic sci-fi movies'
model_name = 'sentence-transformers/all-mpnet-base-v2'

movies = run_vector_query (query=query, model_name=model_name)

print (f'========== model = {model_name} ======')
print_movies (movies)

Atlas query returned in 17.516541003715247 ms
Found 5 movies
1
id: 573a13b5f29313caabd4473e
title: Wristcutters: A Love Story,
year: 2006
search_score(meta):0.4835144877433777
plot: A film set in a strange afterlife way station that has been reserved for people who have committed suicide.

2
id: 573a139af29313caabcf0aff
title: Meet Joe Black,
year: 1998
search_score(meta):0.46951407194137573
plot: Death, who takes the form of a young man, asks a media mogul to act as a guide to teach him about life on Earth and in the process he falls in love with his guide's daughter.

3
id: 573a13aff29313caabd321a1
title: DOA: Dead or Alive,
year: 2006
search_score(meta):0.4645453691482544
plot: The movie adaptation of the best selling video game series Dead or Alive.

4
id: 573a13bff29313caabd5fdf0
title: Blood River,
year: 2009
search_score(meta):0.45820653438568115
plot: A psychological thriller, which explores the destruction of a young couple's seemingly perfect marriage.

5
id: 573a1397f29313ca

As you can see from the results returned above, using different embedding models can indeed yield pretty different results! For easier comparison between the search results of the different models, let's use the function below that'll only print out the titles of movies.

In [9]:
# This function helps to print the titles of queried movies
def print_movie_title(movies):
    for idx, movie in enumerate (movies):
        print(f'{idx+1} - {movie["title"]}')

Now it's time for you to play around with inputing your own search query! **Replace the `query` value below with your own search query** and see what results you get from our 3 models!

In [11]:
# TODO: enter your query here
query = 'superheroes fighting aliens and protecting earth'

for k, v in model_mappings.items():
    movies = run_vector_query (query=query, model_name=k)

    print (f'========== model = {k} ======')
    print_movie_title(movies)

Atlas query returned in 21.524082985706627 ms
1 - Independence Day
2 - Enemy Mine
3 - Starship Troopers
4 - V: The Final Battle
5 - Falling Skies
Atlas query returned in 17.406083992682397 ms
1 - Falling Skies
2 - V: The Final Battle
3 - The Watch
4 - Justice League: The New Frontier
5 - Ultimate Avengers
Atlas query returned in 14.993084012530744 ms
1 - Justice League: The New Frontier
2 - Superman
3 - Mystery Men
4 - Super Capers: The Origins of Ed and the Missing Bullion
5 - The Avengers


And we're done with this notebook! Please **head back to the Quest page on StackUp now** and refer to the instructions for how you can prepare your deliverable for this Quest.