In [3]:
!pip install -r requirements.txt

In [4]:
from langchain_community.document_loaders import UnstructuredURLLoader

In [5]:
urls = ['https://www.asyncapi.com/en', 'https://www.asyncapi.com/about', 'https://www.asyncapi.com/about#faqs']
loader = UnstructuredURLLoader(urls=urls)
data = loader.load()

In [6]:
data

[Document(metadata={'source': 'https://www.asyncapi.com/en'}, page_content='AsyncAPI Conference\n\nParis Edition\n\n9th - 11th of December, 2025 | Paris, France\n\n31 days until the end of Call for Speakers\n\nApply To Speak\n\nBuilding the future of Event-Driven Architectures (EDA)\n\nOpen-Source tools to easily build and maintain your event-driven architecture. All powered by the AsyncAPI specification, the industry standard for defining asynchronous APIs.\n\nRead the docs\n\nProud to be part of the Linux Foundation\n\nasyncapi.yaml\n\nPlay with it!\n\nOpen this example on AsyncAPI Studio to get a better taste of the specification. No signup is required!\n\nOpen in Studio\n\nAccount Service Documentation\n\nAccount Service 1.0.0\n\nThis service is in charge of processing user signups 🚀\n\nRECEIVES user/signedup\n\nAccepts the following message:\n\nPayload Object\n\ndisplayName\n\nString\n\nName of the user\n\nemail\n\nStringemail\n\nEmail of the user\n\nAdditional properties are allo

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(data)

print(f"Total number of docs: {len(docs)}")

Total number of docs: 29


In [8]:
docs[0]

Document(metadata={'source': 'https://www.asyncapi.com/en'}, page_content='AsyncAPI Conference\n\nParis Edition\n\n9th - 11th of December, 2025 | Paris, France\n\n31 days until the end of Call for Speakers\n\nApply To Speak\n\nBuilding the future of Event-Driven Architectures (EDA)\n\nOpen-Source tools to easily build and maintain your event-driven architecture. All powered by the AsyncAPI specification, the industry standard for defining asynchronous APIs.\n\nRead the docs\n\nProud to be part of the Linux Foundation\n\nasyncapi.yaml\n\nPlay with it!\n\nOpen this example on AsyncAPI Studio to get a better taste of the specification. No signup is required!\n\nOpen in Studio\n\nAccount Service Documentation\n\nAccount Service 1.0.0\n\nThis service is in charge of processing user signups 🚀\n\nRECEIVES user/signedup\n\nAccepts the following message:\n\nPayload Object\n\ndisplayName\n\nString\n\nName of the user\n\nemail\n\nStringemail\n\nEmail of the user\n\nAdditional properties are allow

In [9]:
from langchain_chroma import Chroma
# from langchain_openai import OpenAIEmbeddings
#from langchain_openai import OpenAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI

In [21]:
import os
import getpass
os.environ["GOOGLE_API_KEY"]=getpass.getpass("Enter API key for Google Gemini: ")

Enter API key for Google Gemini: ··········


In [12]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings)

In [13]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [14]:
retrieved_docs = retriever.invoke("What is their product?")
len(retrieved_docs)

3

In [15]:
print(retrieved_docs[0].page_content)

IBM: Company that manufactures and markets hardware and software. It has operations in over 170 countries and provides hosting and consulting services in many areas.

SAP: Company dedicated to the design of computer products for business management. Develops business software to manage operations and business-to-customer relationships. It’s a large company with 100,330 employees.

IQVIA: Company providing services for the combined health information technology and clinical research industries. It employs more than 58.000 people in over 100 countries.

Values of AsyncAPI

Innovative. There is no other specification that covers the messaging needs in event-driven architecture that AsyncAPI is covering. What it tries to do is to integrate with the existing tools and remove walls for communication.

Free. It’s a free software project: it seeks the user's liberty by offering a tool that can be used and enhanced without restrictions.


In [16]:
#llm = OpenAI(temperature=0.4, max_tokens=500)
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None)

In [17]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks."
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say 'fuck off "
    ", ask relavent question'. Answer the question in breief, "
    "max 4 to 5 sentences"
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),

    ]
)

In [18]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [19]:
response = rag_chain.invoke({"input": "What kind of services they provide?"})
print(response["answer"])

The context describes services provided by several companies associated with AsyncAPI. IBM manufactures and markets hardware and software, offering hosting and consulting services. SAP designs computer products for business management and develops business software. IQVIA provides services for the combined health information technology and clinical research industries. Solace powers event-driven architectures, integrations, and AI.


In [20]:
response = rag_chain.invoke({"input": "What is this about?"})
print(response["answer"])

This content is about AsyncAPI, an Apache License 2.0 library under the Linux Foundation. It's an initiative providing a specification and open-source tools to help developers define, build, and maintain asynchronous APIs and Event-Driven Architectures (EDAs). AsyncAPI aims to make working with EDAs as easy as working with REST APIs, similar to how OpenAPI (Swagger) works for RESTful APIs. It supports various message brokers like Apache Kafka and RabbitMQ, and languages including Python, Java, and Nodejs.
