# Solution: Deploying a simple RAG Application using an API

[![open in colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LinkedInLearning/generative-ai-and-llmops-deploying-and-managing-llms-in-production-4465782/blob/solution/ch-03/challenge_deploy_RAG_using_API.ipynb)

In [None]:
!pip install graphlit-client

Collecting graphlit-client
  Downloading graphlit_client-1.0.20240515002-py3-none-any.whl (138 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.5/138.5 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx (from graphlit-client)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting websockets (from graphlit-client)
  Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.2/130.2 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx->graphlit-client)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
Collecting h11<0.15,>=0.13 (

In [None]:
from typing import Optional
from graphlit import Graphlit
from graphlit_api import *

In [None]:
graphlit = Graphlit(organization_id="", environment_id="", jwt_secret="")

In [None]:
async def create_feed(graphlit, uri):
    input = FeedInput(
        name=uri,
        type=FeedTypes.WEB,
        web=WebFeedPropertiesInput(
            uri=uri,
            readLimit=10
        )
    )

    try:
        response = await graphlit.client.create_feed(input)

        feed_id = response.create_feed.id
    except GraphQLClientError as e:
        return str(e)

    return feed_id

async def create_specification(graphlit):
    input = SpecificationInput(
        name="Summarization",
        type=SpecificationTypes.COMPLETION,
        serviceType=ModelServiceTypes.ANTHROPIC,
        searchType=SearchTypes.VECTOR,
        anthropic=AnthropicModelPropertiesInput(
            model=AnthropicModels.CLAUDE_3_HAIKU,
            temperature=0.1,
            probability=0.2,
            completionTokenLimit=2048,
        )
    )

    try:
        response = await graphlit.client.create_specification(input)

        spec_id = response.create_specification.id
    except GraphQLClientError as e:
        return str(e)

    return spec_id

async def create_conversation(graphlit, spec_id):
    input = ConversationInput(
        name="Conversation",
        specification=EntityReferenceInput(
            id=spec_id
        )
    )

    try:
        response = await graphlit.client.create_conversation(input)

        conv_id = response.create_conversation.id
    except GraphQLClientError as e:
        return str(e)

    return conv_id

async def prompt_conversation(graphlit, conv_id, prompt):
    try:
        response = await graphlit.client.prompt_conversation(prompt, conv_id)

        message = response.prompt_conversation.message.message
        citations = response.prompt_conversation.message.citations

        return message, citations
    except GraphQLClientError as e:
        return None, str(e)

In [None]:
uri='https://lilianweng.github.io/posts/2023-06-23-agent/'
feed_id=await create_feed(graphlit, uri)
feed_id

'80700023-4aa2-4e44-9ab2-e5fc008c4958'

In [None]:
spec_id= await create_specification(graphlit)
spec_id

'dabb0ba5-2063-46a2-b205-406e8666ab90'

In [None]:
conv_id = await create_conversation(graphlit, spec_id)
conv_id

'9ae4fe1a-3ac0-40d2-a06b-b3b5086412cd'

In [None]:
response = await prompt_conversation(graphlit, conv_id, "What is the difference between chain of thought and tree of thought and who created them?")
response

("Chain of Thought (CoT) and Tree of Thoughts (ToT) are two different prompting techniques for enhancing the performance of large language models on complex tasks.\n\nChain of Thought, proposed by Wei et al. in 2022, instructs the model to 'think step-by-step' to decompose a hard task into smaller, more manageable steps. This allows the model to utilize more test-time computation and provide an interpretation of its thinking process.\n\nTree of Thoughts, proposed by Yao et al. in 2023, extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be breadth-first or depth-first, with each state evaluated by a classifier or majority vote.\n\nIn summary, CoT focuses on sequential step-by-step reasoning, while ToT explores a tree of possible reasoning paths, providing more flexibility and exploration of alternative solutions to c