### Generating Guides from OpenAPI Specification

*iteratively create more complicated examples to prototype generation of high quality guides*

### (1) Rudimentary Example

```mermaid
graph LR
    allow.com.yaml --> DocumentLoader
    DocumentLoader --> Chat
    Query[How do I create a journey in Python?] --> Chat
    Chat --> Guide
```

In [2]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0.2)

In [6]:
from langchain.document_loaders.text import TextLoader
def init_docs():
    file_path = 'alloy.com.yaml'
    loader = TextLoader(file_path)
    docs = loader.load()

In [7]:
init_docs()

### gpt-4-turbo Pricing

- Input: $10.00 / 1M tokens

- Output: $30.00 / 1M tokens

### gpt-3.5-turbo Pricing

- Input: $0.50 / 1M tokens

- Output: $1.50 / 1M tokens

In [4]:
import tiktoken

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-4")

with open("alloy.com.yaml") as f:
    doc = f.read()

tokens = enc.encode(doc)
len(tokens)
gpt4_price = len(tokens) * 10 / 1_000_000
gpt35_price = len(tokens) * 0.5 / 1_000_000
print("gpt-4-turbo: Price of passing in alloy.com.yaml 1 time: ${}".format(gpt4_price))
print("gpt-4-turbo: Price of passing in alloy.com.yaml 20 times: ${}".format(gpt4_price * 20))
print("gpt-3.5-turbo: Price of passing in alloy.com.yaml 1 time: ${}".format(gpt35_price))
print("gpt-3.5-turbo: Price of passing in alloy.com.yaml 20 times: ${}".format(gpt35_price * 20))

gpt-4-turbo: Price of passing in alloy.com.yaml 1 time: $0.50122
gpt-4-turbo: Price of passing in alloy.com.yaml 20 times: $10.0244
gpt-3.5-turbo: Price of passing in alloy.com.yaml 1 time: $0.025061
gpt-3.5-turbo: Price of passing in alloy.com.yaml 20 times: $0.50122


### ⛔️ Problem #1: Passing entire OpenAPI Specification is too expensive
### ⛔️ Problem #2: Entire OpenAPI Specification does not fit in gpt-3.5-turbo
### 🤔 Solution: Chunk the OpenAPI Specification, retrieve from vector store, and pass in as context instead

#### Data Pipeline

```mermaid
graph LR
    alloy.com.yaml --> Chunk1
    alloy.com.yaml --> Chunk2
    alloy.com.yaml --> Chunk3
    Chunk1 -->|embed| VectorStore
    Chunk2 -->|embed| VectorStore
    Chunk3 -->|embed| VectorStore
    VectorStore -->|How do I create a journey in Python?| Chunk1Out[Chunk1]
    VectorStore -->|How do I create a journey in Python?| Chunk3Out[Chunk3]
    Chunk1Out --> Context
    Chunk3Out --> Context
    Context --> Chat
    Query[How do I create a journey in Python?] --> Chat
    Chat --> Guide
```

#### RAG Chain

```mermaid
graph LR
    alloy.com.yaml --> TextSplitter
    TextSplitter --> Chroma[Chroma 'Vector Store']
    Chroma --> Retriever
    Retriever --> Context
    Context --> Chat
    Query[How do I create a journey in Python?] --> Chat
    Chat --> Guide
```

In [5]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# store OAS in vector DB
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()

In [6]:
query = "How do I create a journey in Python?"

In [7]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | chat
    | StrOutputParser()
)

output = rag_chain.invoke(query)

In [9]:
from IPython.display import display, Markdown

display(Markdown(output))

To create a journey in Python, you can use the properties provided in the object, such as journey_name, journey_type, journey_token, and journey_version_num. You can also access additional information like journey_batch applications by following the specified links. Make sure to include the necessary parameters like journey_token and journey_application_token when making post requests for notes or batches related to the journey.

### 👎 Results 

Output is nearly useless. Does not provide environment setup instructions or example code.

### 🤔 Maybe provide system instructions that are more explicit?

In [10]:
from langchain_core.prompts import ChatPromptTemplate

better_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an assistant for question-answering tasks. f you don't know the answer, just say that you don't know. Use markdown to ensure code blocks and commands are propertly formatted. Make sure to always include environment setup instructions and code blocks that are helpful for a developer to copy-paste. Always explain inputs and outputs of API requests. Be as detailed as possible. Answer the user's questions based on the below context:\n\n{context}",
        ),
        (
            "user",
            "{question}"
        )
    ]
)

In [11]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | better_prompt
    | chat
    | StrOutputParser()
)

In [12]:
output = rag_chain.invoke(query)

In [13]:
display(Markdown(output))

To create a journey in Python, you typically need to make an API request to the endpoint responsible for creating journeys. Below are the general steps to create a journey in Python using an API:

1. **Install Requests Library**: You can use the `requests` library in Python to make HTTP requests. If you don't have it installed, you can install it using pip:

```bash
pip install requests
```

2. **Make an API Request**: You need to make a POST request to the endpoint that creates journeys. Ensure you have the necessary authentication and permissions to create journeys.

3. **Provide Required Data**: You may need to provide data such as the journey name, type, version number, etc., depending on the API requirements.

4. **Handle the Response**: After making the request, you should handle the response to check if the journey was created successfully.

Here is a basic example of how you can create a journey in Python using the `requests` library:

```python
import requests

url = "https://api.example.com/journeys"  # Replace this with the actual API endpoint for creating journeys
headers = {
    "Authorization": "Bearer YOUR_ACCESS_TOKEN",
    "Content-Type": "application/json"
}

data = {
    "journey_name": "My New Journey",
    "journey_type": "application",
    "journey_version_num": "1.0"
    # Add any other required data here
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 201:
    print("Journey created successfully")
    print(response.json())
else:
    print("Failed to create journey. Status code:", response.status_code)
    print(response.text)
```

In this example:
- Replace `"https://api.example.com/journeys"` with the actual API endpoint for creating journeys.
- Replace `"YOUR_ACCESS_TOKEN"` with your actual access token for authentication.
- Customize the `data` dictionary with the required information for creating a journey.

Remember to refer to the API documentation of the service you are using to create journeys for specific details on the request format and required parameters.

### 👎 Results

Looks like the result is lacking the correct base URL, proper parameters, and explanation of inputs and outputs. This makes the answer still nearly useless. Likely because the model was not given enough context.

### 🤔 Lets investigate what our vector store is returning

In [14]:
results = retriever.invoke(query)

for doc in results:
    print(doc.page_content)

type: object
                                  properties:
                                    href:
                                      type: string
                      journey:
                        type: object
                        properties:
                          journey_name:
                            type: string
                          journey_type:
                            type: string
                            enum:
                              - application
                              - alert
                          journey_token:
                            type: string
                          journey_version_num:
                            type: string
                          _links:
                            type: object
                            properties:
                              self:
                                type: object
                                properties:
properties:
                          self:
            

### 👎 Useless chunk results

The results don't contain anything about the relevant API endpoint

### 🤔 Can we improve the chunk results by doing smarter chunking?

Maybe if we chunk the OpenAPI spec based on operations, we can give the LLM better context. For smarter chunking, we can try to chunk the OpenAPI into documents that only contain relevant information for a specific operation but keep all the contextual information (everything besides `paths`)

In [1]:
import yaml
import jsonref
from jsonref import replace_refs
from langchain_core.documents.base import Document
from copy import deepcopy
from pprint import pprint

with open("alloy.com.yaml") as f:
    spec = f.read()

def chunk_openapi_by_operation(openapi: str):
    parsed = yaml.safe_load(openapi)

    operations: (str, str) = []
    # 1) list all operations by (path, HTTP method)
    for path, methods in parsed['paths'].items():
        for method in methods.keys():
            operations.append((path, method))

    # 2) create a chunk for every operation

    # 2.a) Dereference entire OpenAPI Spec
    dereferenced = replace_refs(parsed, lazy_load=False)

    chunks = []
    for operation in operations:
        path = operation[0]
        method = operation[1]
        chunk = deepcopy(dereferenced)
        if 'tags' in chunk['paths'][operation[0]][operation[1]]:
            tags = chunk['paths'][operation[0]][operation[1]]['tags']

        # first tag if possible
        if tags:
            tag_name = tags[0]

        # delete all tags on OAS except tag for this operation
        while len(chunk['tags']) > 1:
            for i in range(len(chunk['tags']) - 1, -1, -1):
                if chunk['tags'][i]['name'] != tag_name:
                    chunk['tags'].pop(i)

        if "summary" in chunk['paths'][path][method]:
            summary = chunk['paths'][path][method]['summary']
        else:
            summary = ""

        if "description" in chunk['paths'][path][method]:
            description = chunk['paths'][path][method]['description']
        else:
            description = ""

        # delete other operations
        for other_operation in operations:
            if other_operation[0] == operation[0]:
                continue
            if other_operation[0] in chunk['paths']:
                del chunk['paths'][other_operation[0]]

        # delete empty paths
        for path in chunk['paths'].keys():
            if not chunk['paths'][path]:
                del chunk['paths'][path]

        # delete other operations under same path
        keys = list(chunk['paths'][operation[0]].keys())
        for method in keys:
            if operation[1] == method:
                continue
            del chunk['paths'][operation[0]][method]

        # delete all components (should be inlined from 2.a)
        del chunk['components']
        chunks.append(({
            "path": operation[0],
            "method": operation[1],
            "openapi": yaml.dump(chunk),
            "tag": tag_name,
            "summary": summary,
            "description": description
        }))
    return list(map(lambda chunk: Document(page_content=chunk["openapi"], metadata={
        "path": chunk["path"],
        "method": chunk["method"],
        "tag": chunk["tag"],
        "summary": chunk["summary"],
        "description": chunk["description"]
    }), chunks))
chunks = chunk_openapi_by_operation(spec)
# for chunk in chunks:
#     print(len(yaml.safe_load(chunk.page_content)['paths']))
print(len(chunks))

96


In [2]:
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_community.vectorstores.chroma import Chroma
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import pinecone
import os

# for chunk in chunks:
#     print(len(yaml.safe_load(chunk.page_content)['paths']))

query = "How do I create a journey application in Python?"

chat = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0.2)

# Reset the collection to remove embeddings from previous runs
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# splits = text_splitter.split_documents(chunks)
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())

In [3]:
metadata_field_info = [
    AttributeInfo(
        name="path",
        description="The subpath for this operation",
        type="string",
    ),
    AttributeInfo(
        name="method",
        description="The HTTP Method for this operation",
        type="string",
    ),
    AttributeInfo(
        name="tag",
        description="The logical grouping for this API operation",
        type="string",
    ),
    AttributeInfo(
        name="summary",
        description="A short description of this operation's functionality",
        type="string",
    ),
    AttributeInfo(
        name="description",
        description="A more detailed description of this operation's functionality",
        type="string",
    ),
]
document_content_description = "The pruned OpenAPI specification which includes only the relevant information for a particular operation in the OpenAPI specification."

# Dylan: really stupid, but I had to upgrade langchain to get this to work for some reason
retriever = SelfQueryRetriever.from_llm(
    chat, vectorstore, document_content_description, metadata_field_info, verbose=True, search_kwargs={"k": 4}
)

In [4]:
relevant_docs = retriever.invoke(query)
print(list(map(lambda doc: doc.metadata, relevant_docs)))

[{'description': '', 'method': 'parameters', 'path': '/journeys/{journey_token}/applications/{journey_application_token}', 'summary': '', 'tag': 'Journeys'}, {'description': 'If a journey application has the status `pending_journey_application_review`, this endpoint can be used to inform the system of the outcome of the manual review and submit review notes. The outcome submitted here will be the final outcome of the journey application.', 'method': 'post', 'path': '/journeys/{journey_token}/applications/{journey_application_token}/review', 'summary': 'Manual Review Journey Application', 'tag': 'Journeys'}, {'description': 'Create a note associated with the specified Journey Application', 'method': 'post', 'path': '/journeys/{journey_token}/applications/{journey_application_token}/notes', 'summary': 'Create Journey Application Note', 'tag': 'Journeys'}, {'description': 'Create a journey application for one or more entities.\n', 'method': 'post', 'path': '/journeys/{journey_token}/appli

In [13]:
import tiktoken

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-4")


tokens = []
for doc in relevant_docs:
    tokens += enc.encode(doc.page_content)
len(tokens)
gpt4_price = len(tokens) * 10 / 1_000_000
gpt35_price = len(tokens) * 0.5 / 1_000_000
print("gpt-4-turbo: Price of passing in alloy.com.yaml 1 time: ${}".format(gpt4_price))
print("gpt-4-turbo: Price of passing in alloy.com.yaml 20 times: ${}".format(gpt4_price * 20))
print("gpt-3.5-turbo: Price of passing in alloy.com.yaml 1 time: ${}".format(gpt35_price))
print("gpt-3.5-turbo: Price of passing in alloy.com.yaml 20 times: ${}".format(gpt35_price * 20))

gpt-4-turbo: Price of passing in alloy.com.yaml 1 time: $0.08301
gpt-4-turbo: Price of passing in alloy.com.yaml 20 times: $1.6602000000000001
gpt-3.5-turbo: Price of passing in alloy.com.yaml 1 time: $0.0041505
gpt-3.5-turbo: Price of passing in alloy.com.yaml 20 times: $0.08301


In [15]:
print("Using RAG is {}x less expensive".format(0.50122 / 0.0041505))

Using RAG is 120.76135405372848x less expensive


In [5]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

better_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an assistant for question-answering tasks. If you don't know the answer, just say that you don't know. Use markdown to ensure code blocks and commands are propertly formatted. Make sure to always include environment setup instructions and code blocks that are helpful for a developer to copy-paste. Always explain inputs and outputs of API requests. Be as detailed as possible. Answer the user's questions based on the below context:\n\n{context}",
        ),
        (
            "user",
            "{question}"
        )
    ]
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | better_prompt
    | chat
    | StrOutputParser()
)

In [6]:
output = rag_chain.invoke(query)

In [7]:
from IPython.display import display, Markdown

display(Markdown(output))

To create a journey application using the Alloy API in Python, you need to send a POST request to the `/journeys/{journey_token}/applications` endpoint with the required payload containing the entities to be processed in the application.

Here is a step-by-step guide on how to create a journey application in Python:

### Step 1: Install the `requests` library
Make sure you have the `requests` library installed. You can install it using pip:

```bash
pip install requests
```

### Step 2: Make a POST request to create a journey application
Use the following Python script to create a journey application:

```python
import requests
import json

# Define the endpoint URL
url = "https://demo-qasandbox.alloy.co/v1/journeys/{journey_token}/applications"

# Define the journey token
journey_token = "J-VCQoADBJxeHtmdAvFqoS"

# Define the payload with the entities to be processed
payload = {
    "do_await_additional_entities": False,
    "entities": [
        {
            "branch_name": "persons",
            "data": {
                "addresses": [
                    {
                        "city": "New York",
                        "country_code": "US",
                        "line_1": "41 E. 11th",
                        "line_2": "2nd floor",
                        "postal_code": "10003",
                        "state": "NY",
                        "type": "primary"
                    }
                ],
                "birth_date": "1990-01-25",
                "document_ssn": "111223333",
                "email_address": "john@alloy.com",
                "ip_address_v4": "42.206.213.70",
                "meta": {
                    "user_type": "vip"
                },
                "name_first": "John",
                "name_last": "Doe",
                "name_middle": "Franklin",
                "phone_number": "8443825569"
            },
            "entity_type": "person",
            "external_entity_id": "my_system_entity_id_123"
        }
    ]
}

# Convert the payload to JSON
payload_json = json.dumps(payload)

# Make the POST request
response = requests.post(url.format(journey_token=journey_token), json=payload_json)

# Print the response
print(response.json())
```

In this script:
- Replace `{journey_token}` with your actual journey token.
- Update the `payload` dictionary with the entities you want to process in the journey application.

### Step 3: Handle the response
The response will contain information about the created journey application, including the `journey_application_token` that can be used for further interactions with the application.

That's it! By following these steps, you can create a journey application using the Alloy API in Python.

### (2) Add documentation and custom configuration to input

```mermaid
graph LR
    allow.com.yaml --> DocumentLoader
    Documentation --> DocumentLoader
    JSON[Custom JSON Configuration] --> DocumentLoader
    DocumentLoader --> Chat
    Query[How do I create a journey in Python?] --> Chat
    Chat --> Guide
```