### Embedding-Based Retrieval with Deep Lake and OpenAI

Augmented generation is the third pipeline component. We will use the data
we retrieved to augment the user input. This component processes the user
input, queries the vector store, augments the input, and calls gpt-4-turbo.


In [12]:
from deeplake.core.vectorstore.deeplake_vectorstore import VectorStore
import deeplake.util
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
# Initialize OpenAI client
client = OpenAI()  # Assumes API key is set in env variables or ~/.openai_key


In [19]:
vector_store_path = "./space_exploration_v1"
vector_store = VectorStore(path=vector_store_path)

Deep Lake Dataset in ./space_exploration_v1 already exists, loading from the storage


### Input and Query Retrieval

In [20]:
def embedding_function(texts, model="text-embedding-3-small"):
    """Generates embeddings for input texts using OpenAI API."""

    if isinstance(texts, str):
        texts = [texts]  # Convert single string to list

    texts = [t.replace("\n", " ") for t in texts]  # Remove newlines

    try:
        response = client.embeddings.create(
            model=model,
            input=texts  
        )

        embeddings = [data.embedding for data in response.data]  # Extract embeddings
        return embeddings

    except Exception as e:
        print(f"Error generating embeddings: {e}")
        return None


In [None]:
def get_user_prompt():
    # Request user input for the search prompt
    return input("Enter your search query: ")

def search_query(prompt):
    # Assuming `vector_store` as ds and `embedding_function` are already defined
    search_results = vector_store.search(embedding_data=prompt, embedding_function=embedding_function)
    return search_results

# Get the user's search query
#user_prompt = get_user_prompt()

user_prompt="Tell me about space exploration on the Moon and Mars."

# Perform the search
search_results = search_query(user_prompt)

# Print the search results
print(search_results)

{'score': [0.5991874933242798, 0.5692877173423767, 0.566982626914978, 0.5609710216522217], 'id': ['11fc102b-ebe6-11ef-ba44-34cff639d662', '11fc8f80-ebe6-11ef-a4bc-34cff639d662', '11fc103d-ebe6-11ef-b621-34cff639d662', '11fdd598-ebe6-11ef-9519-34cff639d662'], 'metadata': [{'source': 'llm.txt'}, {'source': 'llm.txt'}, {'source': 'llm.txt'}, {'source': 'llm.txt'}], 'text': ['Exploration of space, planets, and moons "Space Exploration" redirects here. For the company, see SpaceX . For broader coverage of this topic, see Exploration . Buzz Aldrin taking a core sample of the Moon during the Apollo 11 mission Self-portrait of Curiosity rover on Mars \'s surface Part of a series on Spaceflight History History of spaceflight Space Race Timeline of spaceflight Space probes Lunar missions Mars missions Applications Communications Earth observation Exploration Espionage Military Navigation Colonization Habitation Exploration Telescopes Tourism Spacecraft Robotic spacecraft Satellite Space probe Ca

In [22]:
# Function to wrap text to a specified width
def wrap_text(text, width=80):
    lines = []
    while len(text) > width:
        split_index = text.rfind(' ', 0, width)
        if split_index == -1:
            split_index = width
        lines.append(text[:split_index])
        text = text[split_index:].strip()
    lines.append(text)
    return '\n'.join(lines)

In [23]:

# Assuming the search results are ordered with the top result first
top_score = search_results['score'][0]
top_text = search_results['text'][0].strip()
top_metadata = search_results['metadata'][0]['source']

# Print the top search result
print("Top Search Result:")
print(f"Score: {top_score}")
print(f"Source: {top_metadata}")
print("Text:")
print(wrap_text(top_text))

Top Search Result:
Score: 0.5991874933242798
Source: llm.txt
Text:
Exploration of space, planets, and moons "Space Exploration" redirects here.
For the company, see SpaceX . For broader coverage of this topic, see
Exploration . Buzz Aldrin taking a core sample of the Moon during the Apollo 11
mission Self-portrait of Curiosity rover on Mars 's surface Part of a series on
Spaceflight History History of spaceflight Space Race Timeline of spaceflight
Space probes Lunar missions Mars missions Applications Communications Earth
observation Exploration Espionage Military Navigation Colonization Habitation
Exploration Telescopes Tourism Spacecraft Robotic spacecraft Satellite Space
probe Cargo spacecraft Crewed spacecraft Apollo Lunar Module Space capsules
Space Shuttle Space stations Spaceplanes Vostok Space launch Spaceport Launch
pad Expendable and reusable launch vehicles Escape velocity Non-rocket
spacelaunch Spaceflight types Sub-orbital Orbital Interplanetary Interstellar
Intergalactic 

### Augmented Input

In [24]:
augmented_input=user_prompt+" "+top_text

In [25]:
print(augmented_input)

Tell me about space exploration on the Moon and Mars. Exploration of space, planets, and moons "Space Exploration" redirects here. For the company, see SpaceX . For broader coverage of this topic, see Exploration . Buzz Aldrin taking a core sample of the Moon during the Apollo 11 mission Self-portrait of Curiosity rover on Mars 's surface Part of a series on Spaceflight History History of spaceflight Space Race Timeline of spaceflight Space probes Lunar missions Mars missions Applications Communications Earth observation Exploration Espionage Military Navigation Colonization Habitation Exploration Telescopes Tourism Spacecraft Robotic spacecraft Satellite Space probe Cargo spacecraft Crewed spacecraft Apollo Lunar Module Space capsules Space Shuttle Space stations Spaceplanes Vostok Space launch Spaceport Launch pad Expendable and reusable launch vehicles Escape velocity Non-rocket spacelaunch Spaceflight types Sub-orbital Orbital Interplanetary Interstellar Intergalactic List of space

### Generation and output

In [26]:
import openai
from openai import OpenAI
import time

client = OpenAI()
gpt_model="gpt-4o"
start_time = time.time()  # Start timing before the request

def call_gpt4_with_full_text(itext):
    # Join all lines to form a single string
    text_input = '\n'.join(itext)
    prompt = f"Please summarize or elaborate on the following content:\n{text_input}"

    try:
        response = client.chat.completions.create(
            model=gpt_model,
            messages=[
                {"role": "system", "content": "You are a space exploration expert."},
                {"role": "assistant", "content": "You can read the input and answer in detail."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.1  # Fine-tune parameters as needed
        )
        return response.choices[0].message.content
    except Exception as e:
        return str(e)

gpt4_response = call_gpt4_with_full_text(augmented_input)

response_time = time.time() - start_time  # Measure response time
print(f"Response Time: {response_time:.2f} seconds")  # Print response time

print(gpt_model, "Response:", gpt4_response)

Response Time: 7.73 seconds
gpt-4o Response: Space exploration on the Moon and Mars has been a significant focus of scientific and technological efforts, driven by the desire to understand more about our solar system and the potential for human habitation beyond Earth.

### Moon Exploration:
1. **Historical Context**: The Moon was the first celestial body beyond Earth to be visited by humans. The Apollo missions, particularly Apollo 11 in 1969, marked a monumental achievement with astronauts like Buzz Aldrin collecting lunar samples.
2. **Scientific Goals**: Lunar exploration aims to study the Moon's geology, understand its formation, and use it as a platform for further space exploration. The Moon's surface provides insights into the early solar system.
3. **Current and Future Missions**: Recent missions have focused on mapping the Moon's surface, searching for water ice, and preparing for potential human return. NASA's Artemis program aims to land "the first woman and the next man" o