Create the content definition to load into the index.
There are two options for loading web content
1. List the URLs you want to load in, like this example
2. Provide a sitemap url in the "location" field along with optional "whitelist" and/or "blacklist" arrays to filter the site

The content definition is a list, allowing you to load multiples sources and data types into the OpenSearch index in one pass 

In [None]:
artemis_url_list = [
    "https://en.wikipedia.org/wiki/Artemis_program",
    "https://www.nasa.gov/humans-in-space/artemis/",
    "https://www.nasa.gov/mission/artemis-i/",
    "https://www.nasa.gov/mission/artemis-ii/",
    "https://www.nasa.gov/directorates/esdmd/common-exploration-systems-development-division/space-launch-system/rocket-propellant-tanks-for-nasas-artemis-iii-mission-take-shape/",
    "https://www.nasa.gov/centers-and-facilities/hq/splashdown-nasas-orion-returns-to-earth-after-historic-moon-mission/",
    "https://www.nasa.gov/missions/artemis/nasas-first-flight-with-crew-important-step-on-long-term-return-to-the-moon-missions-to-mars/",
    "https://www.nasa.gov/missions/artemis/artemis-iii/",
    "https://www.rmg.co.uk/stories/topics/nasa-moon-mission-artemis-program-launch-date",
    "https://www.asc-csa.gc.ca/eng/astronomy/moon-exploration/artemis-missions.asp",
    "https://www.space.com/nasa-artemis-2-moon-mission-delay-september-2025"
]

content_source = {"name": "Reports", "type": "Website", "items": artemis_url_list}

content_sources = [content_source]

my_open_search_domain_name = 'my-opensearch-domain'
my_index_name = 'index-artemis-mission'


Now Load this content into the Open Search Vector DB (domain name and index)
- You can defer providing the content source in the constructor and pass it into the load command.  This allows you to load multiple content sources into the same index.

In [None]:
from aws_opensearch_vector_database import OpenSearchVectorDBLoader
vectordb_loader = OpenSearchVectorDBLoader(domain_name=my_open_search_domain_name,  
                                     index_name=my_index_name,
                                     data_sources=content_sources)

vectordb_loader.load()

Query the results

In [None]:
from aws_opensearch_vector_database import OpenSearchVectorDBQuery
vectordb = OpenSearchVectorDBQuery(domain_name=my_open_search_domain_name, index_name='index-artemis-mission')
results = vectordb.query("Who are the astronauts that are going to land on the moon")
print(f"Returned {len(results)} documents.  Summary below")
for doc in results:
    print(f"   Dock length: {len(doc.page_content)} for URL: {doc.metadata['source']}")
