<a href="https://colab.research.google.com/github/datastax/genai-cookbook/blob/main/raffle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Raffle ticket writer/picker with DataStax Astra DB vector database and RAGStack



This notebook takes you through the steps to build a simple application GenAI application to store (and draw) raffle winners. The idea is to be able to enter the names of conference attendees one-at-a-time. When the raffle is over, a random phrase must be entered to generate the winning embedding. The results are returned in the order of their similarity to the random phrase.

Requirements:
 - Access to the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) 384 dimensional sentence transformer on [Hugging Face](https://huggingface.co/).
 - A free account and vector database with [Astra DB](https://astra.datastax.com/).
     - Sufficient resources for this notebook to build a collection named `raffle_data` with a:
         - 384 dimensional vector structure.
         - cosine metric.
 - An API endpoint for Astra DB.
 - An access token for Astra DB.

*Note: This notebook will create the `raffle_data` collection if it does not exist.*

## Install the DataStax RAGStack and SentenceTransformers libraries.

In [None]:
!pip install ragstack-ai sentence-transformers

Collecting ragstack-ai
  Downloading ragstack_ai-0.10.0-py3-none-any.whl (4.3 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting astrapy<0.8.0,>=0.7.0 (from ragstack-ai)
  Downloading astrapy-0.7.7-py3-none-any.whl (32 kB)
Collecting cassio<0.2.0,>=0.1.3 (from ragstack-ai)
  Downloading cassio-0.1.7-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.9/44.9 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain==0.1.12 (from ragstack-ai)
  Downloading langchain-0.1.12-py3-none-any.whl (809 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m809.1/809.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-astradb==0.1.0 (from ragstack-ai)
  Downloading langchain_astradb-0.1.0-py3-none-any.whl (25 kB)
Coll

## Library Imports

In [None]:
import json
import os

from getpass import getpass
from astrapy.db import AstraDB
from sentence_transformers import SentenceTransformer

## Environment Variables

In [None]:
ASTRA_DB_TOKEN = getpass('Your Astra DB Token ("AstraCS:..."): ')

Your Astra DB Token ("AstraCS:..."): ··········


In [None]:
ASTRA_DB_ENDPOINT = input('Your Astra DB API endpoint: ')
ASTRA_DB_NAMESPACE='default_keyspace'
ASTRA_DB_COLLECTION_NAME='raffle_data'

Your Astra DB API endpoint: https://cab00884-ea42-4e4e-a426-e4199fb25536-us-east1.apps.astra.datastax.com


## Connect to Astra DB

In [None]:
db = AstraDB(
    token=ASTRA_DB_TOKEN,
    api_endpoint=ASTRA_DB_ENDPOINT,
    namespace=ASTRA_DB_NAMESPACE,
)
collection = db.create_collection(ASTRA_DB_COLLECTION_NAME, dimension=384, metric="cosine")

## Initialize Sentence Transformer "all-MiniLM-L6-v2" model locally

In [None]:
# initialize the all-MiniLM-L6-v2 model locally
model = SentenceTransformer('all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Enter attendee's name
A vector embedding will be generated for the attendee's name, and it will be stored in Astra DB.

In [None]:
name = input("Enter attendee's name: ")

vector_embedding = model.encode(name)
strJson = (f'{{"_id":"{name.replace(" ","")}","name":"{name}","$vector":{str(vector_embedding.tolist())}}}')
doc = json.loads(strJson)

collection.insert_one(doc)

# show vector embedding
print(vector_embedding)

Enter attendee's name: Aaron Ploetz
[-5.80854202e-03  1.98981334e-02 -4.53410000e-02 -8.91272426e-02
  5.88703668e-04  5.81747815e-02  6.70010820e-02  5.52987717e-02
  2.60656681e-02  2.17344537e-02 -4.62731868e-02  7.08673596e-02
 -2.05244925e-02  7.43010826e-03 -1.52908191e-02  7.16852695e-02
 -1.01234943e-01  5.56432195e-02  3.35503295e-02 -7.93277938e-03
  3.27195935e-02  2.78403126e-02  2.96607874e-02 -3.68330516e-02
  2.03201734e-02  1.02989733e-01 -1.05247619e-02  9.28064585e-02
  3.41139138e-02 -6.93955123e-02  5.00926152e-02 -4.34350818e-02
  3.20865400e-02 -8.87438189e-03 -8.32176507e-02  2.29182728e-02
  3.88654098e-02 -3.54470424e-02 -1.81065425e-02  5.02906069e-02
  2.10909564e-02 -4.39909752e-03 -1.00354053e-01  2.03691851e-02
 -8.05872604e-02 -7.93375596e-02  3.22596468e-02 -7.25677563e-03
 -2.55408306e-02  3.16945463e-02 -4.47137430e-02 -5.31715900e-02
  7.98002537e-03 -1.60526242e-02  9.95994136e-02  2.20959298e-02
  1.45271504e-02  1.94442160e-02 -1.71078015e-02 -2.19

## Draw winning names

Generate a vector embedding

In [None]:
search_string = input("Enter a phrase to generate an embedding: ")

Enter a phrase to generate an embedding: no one ever gets what they want


In [None]:
winner_embedding = model.encode(search_string)
results = collection.vector_find(winner_embedding.tolist(), limit=9, include_similarity=1, fields={"name"})
print(str(results).replace("}, {","},\n{"))

[{'_id': 'AaronPloetz', 'name': 'Aaron Ploetz', '$similarity': 0.5509486},
{'_id': 'EmilyPloetz', 'name': 'Emily Ploetz', '$similarity': 0.5207323}]
