<a href="https://colab.research.google.com/github/datastax/genai-cookbook/blob/main/Airbyte_xkcd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Querying an Astra DB collection with an Airbyte Cloud data pipeline.

This notebook will walk through how to build a simple application using a data pipeline from Airbyte Cloud. Our Airbyte Cloud pipeline will pull data from the XKCD API, and store it in Astra DB. Our application will then be able to run similarity searches on the vector embeddings to bring back data on the most-similar XKCD comic.

This notebook is a companion to this blog post: [Airbyte and DataStax simplify GenAI and RAG app development](https://www.datastax.com/blog/airbyte-and-datastax-simplify-genai-and-rag-app-development)
Requirements:
*   A free account and vector database with [Astra DB](https://astra.datastax.com).
*   A free account with [Airbyte](https://www.airbyte.com), and an Airbyte Cloud pipeline that has synchronized data into Astra DB.
*   An [OpenAI](https://openai.com/) API key.

## Setup

Install the DataStax RAGStack and Matplotlib libraries.

In [None]:
!pip install ragstack-ai matplotlib

## Library Imports

In [25]:
from astrapy.db import AstraDB
from langchain_openai import OpenAIEmbeddings
from getpass import getpass
from PIL import Image
from matplotlib import image as mpimg
from matplotlib import pyplot as plt
from urllib.request import urlopen

## Environment Variables

In [None]:
ASTRA_DB_APPLICATION_TOKEN = getpass('Your Astra DB Token ("AstraCS:..."): ')

In [None]:
ASTRA_DB_API_ENDPOINT = input('Your Astra DB API endpoint: ')
ASTRA_DB_KEYSPACE_NAME='default_keyspace'
ASTRA_DB_COLLECTION_NAME='airbyte'

In [None]:
OPENAI_API_KEY = getpass('Your OpenAI API Key ("sk-..."): ')

## Define the OpenAI "text-embedding-ada-002" embedding model.

In [8]:
model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

## Connect to Astra DB

In [10]:
db = AstraDB(
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
    namespace=ASTRA_DB_KEYSPACE_NAME,
)
collection = db.collection(ASTRA_DB_COLLECTION_NAME)

## Define your query

In [None]:
query = input('Enter your query ("Kepler" is the default): ')

if query == "":
  query = "Kepler"

## Generate a vector embedding of your query's text

In [None]:
print(f'query="{query}"')
vector = model.embed_query(query)
print(f'vector="{vector}"')

## Store and print the result
Be sure that you have run the Airbyte Cloud data pipeline to load data from the XKCD API into Astra DB, first!

In [None]:
result = collection.vector_find_one(vector,fields=['title','img','alt'])

print(result)

## Display the xkcd webcomic

In [None]:
plt.title(result['title'])
pil_image = Image.open(urlopen(result['img']))
plt.imshow(pil_image)
plt.show()
print(result['alt'])