# Creating a Social Security Question answering Chatbot with Beautiful Soup, Pinecone, ChatGPT, Langchain


While reading a book about homeless healthcare I was surprised to hear of the difficulty that people have with determining whether they qualify for social security in the US. The "Blue Book" of social security, which the US Social Security Agency publishes for the intention of doctors and nurses to use to determine whether their patients qualify, uses a lot of medical jargon that may be difficult for the normal person to understand. After doing some digging, I realized that the blue book is published online. I wanted to explore the idea of scraping all blue book content using Beautiful Soup, then using that content to create a vector database and store it in Pinecone with semantic embeddings. Once I have my database that I can use to ground the language model, I want to use Langchain and ChatGPT to make a chatbot which is able to look to the Blue Book as a primary source(aka [retrieval augmented generation](https://arxiv.org/abs/2005.11401)). 

## 1 Scraping the Social Security Agency Website

#### 1.1 First, explore the Social Security Agency Blue Book website's sitemap and get a list of urls to crawl

#

In [2]:

sitemap_url = "https://www.ssa.gov/sitemap.xml"

import urllib.request
urllib.request.urlretrieve(sitemap_url, "bluebook_sitemap.xml")
#save sitemap as an xml file


('bluebook_sitemap.xml', <http.client.HTTPMessage at 0x1037596f0>)

In [40]:

#iterate through sitemap url, find urls matching pattern "https://www.ssa.gov/disability/professionals/bluebook/*"

bluebook_prefix = "https://www.ssa.gov/disability/professionals/bluebook/"

import inspect
import xml.etree.ElementTree as ET
tree = ET.parse('bluebook_sitemap.xml')
i=0
site_urls_to_crawl = []

for elem in tree.iter():
    #print(inspect.getmembers(elem))
    if(elem.text.startswith(bluebook_prefix) and elem.text.endswith("htm")):
        #print(elem.text)
        if i%5==0:
            print("on iteration:"+ str(i) +", current url is:"+elem.text)
        site_urls_to_crawl.append(elem.text)
        i+=1

on iteration:0, current url is:https://www.ssa.gov/disability/professionals/bluebook/general-info.htm
on iteration:5, current url is:https://www.ssa.gov/disability/professionals/bluebook/index.htm
on iteration:10, current url is:https://www.ssa.gov/disability/professionals/bluebook/110.00-MultipleBody-Childhood.htm
on iteration:15, current url is:https://www.ssa.gov/disability/professionals/bluebook/104.00-Cardiovascular-Childhood.htm
on iteration:20, current url is:https://www.ssa.gov/disability/professionals/bluebook/106.00-Genitourinary-Childhood.htm
on iteration:25, current url is:https://www.ssa.gov/disability/professionals/bluebook/12.00-MentalDisorders-Adult.htm
on iteration:30, current url is:https://www.ssa.gov/disability/professionals/bluebook/8.00-Skin-Adult.htm


In [143]:
#print list of site urls for sanity check
site_urls_to_crawl

['https://www.ssa.gov/disability/professionals/bluebook/general-info.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/listing-impairments.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/AdultListings.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/evidentiary.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/ChildhoodListings.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/index.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/7.00-HematologicalDisorders-Adult.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/9.00-Endocrine-Adult.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/11.00-Neurological-Adult.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/105.00-Digestive-Childhood.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/110.00-MultipleBody-Childhood.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/1.00-Musculoskeletal-Adult.htm',
 'https:/

In [7]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import tiktoken
encoding = tiktoken.encoding_for_model("text-embedding-ada-002")
import pinecone

  from tqdm.autonotebook import tqdm


#### 1.2: Scrape each page and break it up into text chunks


I have a few ideas on how to improve on the chunking method, for now I'm just doing v1: naive chunking.

v1: naive chunking by page
- Iterate through the pages
- convert all content into text
- chunk into n-gram chunks

v2: chunk by topic
(find the corrrect tag, and insert a labeling tag right before it indicating that it should be a chunk)

v3: enable recursive lookup of other references

In [149]:
def process_url(url_to_process, dataframe_to_add_to):
    # URL = "https://realpython.github.io/fake-jobs/"
    page = requests.get(url_to_process)

    soup = BeautifulSoup(page.content, "html.parser")

    results = soup.find_all("article", {"class": "cell w-100 m-w-80"})
    print("length of results object: " + str(len(results)))
    print("I think we want results[1]")
    document_content = results[1].text

    # print(results[1].text)

    document_words = document_content.split(' ')
    print(len(document_words))

    document_title = document_content.split('\n')[1]
    print("document title: " + document_title)

    #Now combine every 200 entries in this list, and treat each as a chunk.
    chunk_size = 200
    chunked_word_lists = [document_words[i:i+chunk_size] for i in range(0,len(document_words),chunk_size)]
    text_chunks = []
    text_content_ids = []
    for sublist in chunked_word_lists:
        text_content = " ".join(sublist)
        title_prepended = "title: " + document_title + text_content
        text_chunks.append(title_prepended)
        text_content_ids.append(ascii(text_content[0:30]))


    current_dataframe = pd.DataFrame({'text_chunk': text_chunks, 'id':text_content_ids}).head()

    # current_dataframe = pd.DataFrame(c,columns=['text_chunk'])
    title_dataframe = pd.DataFrame({'title': [document_title] * len(current_dataframe), 'url': [url_to_process] * len(current_dataframe)})
    combined_dataframe = pd.concat([current_dataframe, title_dataframe], axis=1)
    print("combined dataframe:")
    print(combined_dataframe)
    # len(encoding.encode("tiktoken is great!"))
    dataframe_to_add_to = pd.concat([dataframe_to_add_to, combined_dataframe], ignore_index=True)
    return dataframe_to_add_to





first_url = "https://www.ssa.gov/disability/professionals/bluebook/2.00-SpecialSensesandSpeech-Adult.htm"
second_url = "https://www.ssa.gov/disability/professionals/bluebook/110.00-MultipleBody-Childhood.htm"
df_result = pd.DataFrame(columns=['text_chunk'])


# single_page_df = process_url(first_url,df_result)
page_errors = []

for i in range(len(site_urls_to_crawl)):
    try:
        df_result = process_url(site_urls_to_crawl[i],df_result)
    except IndexError:
        print("error on page:" + site_urls_to_crawl[i])
        page_errors.append(site_urls_to_crawl[i])

df_result['tokenized'] = df_result.apply(lambda row: encoding.encode(row.text_chunk), axis = 1)
df_result['num_tokens'] = df_result.apply(lambda row: len(row.tokenized), axis = 1)

print("df_result:")
display(df_result)
# for i in empty_df:
#     print("encoding of this string:")
#     print(encoding.encode(i))


print("max embedding length: " + str(df_result["num_tokens"].max()))
print("number of chunks in the document:" + str(len(df_result)))

length of results object: 1
I think we want results[1]
error on page:https://www.ssa.gov/disability/professionals/bluebook/general-info.htm
length of results object: 1
I think we want results[1]
error on page:https://www.ssa.gov/disability/professionals/bluebook/listing-impairments.htm
length of results object: 1
I think we want results[1]
error on page:https://www.ssa.gov/disability/professionals/bluebook/AdultListings.htm
length of results object: 1
I think we want results[1]
error on page:https://www.ssa.gov/disability/professionals/bluebook/evidentiary.htm
length of results object: 1
I think we want results[1]
error on page:https://www.ssa.gov/disability/professionals/bluebook/ChildhoodListings.htm
length of results object: 1
I think we want results[1]
error on page:https://www.ssa.gov/disability/professionals/bluebook/index.htm
length of results object: 2
I think we want results[1]
5906
document title: 7.00 Hematological Disorders
combined dataframe:
                              

Unnamed: 0,text_chunk,id,title,url,tokenized,num_tokens
0,title: 7.00 Hematological Disorders\n7.00 Hema...,'\n7.00 Hematological Disorders\n',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",276
1,title: 7.00 Hematological Disorderstest that e...,'test that establishes a hemato',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",121
2,title: 7.00 Hematological Disorders explain h...,' explain how your diagnosis w',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",144
3,title: 7.00 Hematological DisordersHemolytic d...,'Hemolytic disorders include\r\n ',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",164
4,title: 7.00 Hematological Disordersintravascul...,'intravascular patches).\r\n ',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",179
...,...,...,...,...,...,...
135,title: \n \n107.00 Hematological Disorders\n\...,'\n \n107.00 Hematological Disord',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 2355, 720, 7699, 13, 410, 33924, 26...",292
136,title: that states you have the disorder; or\...,'that states you have the disor',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 430, 5415, 499, 617, 279, 1982...",279
137,title: of congenital hemolytic anemias includ...,'of congenital hemolytic anemia',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 315, 83066, 2223, 17728, 5849,...",294
138,title: complications of your hemolytic anemia...,'complications of your hemolyti',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 36505, 315, 701, 17728, 5849, ...",287


max embedding length: 319
number of chunks in the document:140


In [150]:
#print the pages that we ran into errors(probably because the html tags were not formatted as expected)

page_errors

['https://www.ssa.gov/disability/professionals/bluebook/general-info.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/listing-impairments.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/AdultListings.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/evidentiary.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/ChildhoodListings.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/index.htm',
 'https://www.ssa.gov/disability/professionals/bluebook/9.00-Endocrine-Adult.htm']

## 2.0 Convert text into embeddings; create the embedding using OpenAI's [embedding api](https://openai.com/blog/new-and-improved-embedding-model), upload to pinecone database

#### 2.1: Convert the text into embeddings

1. Reference documentation: https://platform.openai.com/docs/api-reference/embeddings/create, https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

In [14]:
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")


def get_openai_embedding_for_text(text_input):
  response = openai.Embedding.create(
    model="text-embedding-ada-002",
    input=text_input
  )
  embeddings = response['data'][0]['embedding']
  return embeddings

In [152]:
df_result['embedding'] = df_result.apply(lambda row: get_openai_embedding_for_text(row.text_chunk), axis = 1)

df_result.to_csv('second_test_df.csv')

In [153]:
display(df_result)

Unnamed: 0,text_chunk,id,title,url,tokenized,num_tokens,embedding
0,title: 7.00 Hematological Disorders\n7.00 Hema...,'\n7.00 Hematological Disorders\n',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",276,"[-0.006755029316991568, 0.0069112153723835945,..."
1,title: 7.00 Hematological Disorderstest that e...,'test that establishes a hemato',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",121,"[-0.016914399340748787, 0.021009182557463646, ..."
2,title: 7.00 Hematological Disorders explain h...,' explain how your diagnosis w',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",144,"[-4.470459316507913e-05, 0.027205660939216614,..."
3,title: 7.00 Hematological DisordersHemolytic d...,'Hemolytic disorders include\r\n ',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",164,"[-0.013731321319937706, 0.016009090468287468, ..."
4,title: 7.00 Hematological Disordersintravascul...,'intravascular patches).\r\n ',7.00 Hematological Disorders,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 22, 13, 410, 33924, 266, 5848,...",179,"[-0.014100857079029083, 0.00984108354896307, 0..."
...,...,...,...,...,...,...,...
135,title: \n \n107.00 Hematological Disorders\n\...,'\n \n107.00 Hematological Disord',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 2355, 720, 7699, 13, 410, 33924, 26...",292,"[-0.0060523455031216145, 0.010665039531886578,..."
136,title: that states you have the disorder; or\...,'that states you have the disor',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 430, 5415, 499, 617, 279, 1982...",279,"[-0.016128357499837875, 0.036826860159635544, ..."
137,title: of congenital hemolytic anemias includ...,'of congenital hemolytic anemia',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 315, 83066, 2223, 17728, 5849,...",294,"[-0.011032802984118462, 0.014102972112596035, ..."
138,title: complications of your hemolytic anemia...,'complications of your hemolyti',,https://www.ssa.gov/disability/professionals/b...,"[2150, 25, 220, 36505, 315, 701, 17728, 5849, ...",287,"[-0.013251912780106068, -0.012627680785953999,..."


#### 2.2: Create Pinecone database and upload embedded chunks

In [8]:
api_key = os.getenv("PINECONE_API_KEY") or "your-api-key"

# find environment next to your API key in the Pinecone console
env = os.getenv("PINECONE_ENVIRONMENT") or "us-central1-gcp"

pinecone.init(api_key=api_key, environment=env)
pinecone.whoami()

WhoAmIResponse(username='d79c7b6', user_label='default', projectname='cb3649f')

In [9]:
# # Check whether the index with the same name already exists - if so, delete it
# # if index_name in pinecone.list_indexes():


index_name = "social-security"

# pinecone.delete_index(index_name)
# # Creates new index
# pinecone.create_index(name=index_name, dimension=1536)
index = pinecone.Index(index_name=index_name)

# Confirm our index was created
pinecone.list_indexes()

['social-security']

In [156]:

from tqdm.auto import tqdm

# Add the text embeddings to Pinecone

batch_size = 1  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(df_result), batch_size)):
    # find end of batch
    i_end = min(len(df_result), i+batch_size)
    meta_batch = df_result[i:i_end]
    # get ids
    ids_batch = [x['id'] for _, x in meta_batch.iterrows()]
    # get texts to encode
    texts = [x['text_chunk'] for _, x in meta_batch.iterrows()]
    # add embeddings
    embeds = [x['embedding'] for _, x in meta_batch.iterrows()]
    # cleanup metadata
    meta_batch = [{
        'title': x['title'],
        'text_chunk': x['text_chunk'],
        'url': x['url']
    } for _, x in meta_batch.iterrows()]
    to_upsert = list(zip(ids_batch, embeds, meta_batch))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)



100%|██████████| 140/140 [00:12<00:00, 10.92it/s]


#### 2.3: Test the created embeddings

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone

In [10]:


# Configuring the embeddings to be used by our retriever to be OpenAI Embeddings, matching our embedded corpus
embeddings = OpenAIEmbeddings(openai_api_key=os.getenv("OPENAI_API_KEY"))


# Loads a docsearch object from an existing Pinecone index so we can retrieve from it
docsearch = Pinecone.from_existing_index(index_name,embeddings,text_key='text_chunk')

retriever = docsearch.as_retriever()



In [114]:
#we can try retrieving relevant documents from the vector database. This works by vectorizing the query, then running K-nearest neighbor search on the vector database's vector space to find the entries in the database that are semantically similar to the query.

query_docs = retriever.get_relevant_documents("I have a child with down syndrome. What do I need to know to determine if he qualifies for social security?")

In [12]:
query_docs

[Document(page_content='title: 10.00 Congenital Disorders that Affect Multiple Body Systems             We evaluate non-mosaic Down syndrome under 10.06.\r\n    If you have non-mosaic Down syndrome documented as described in 10.00C, we consider you disabled from birth.\r\n\nC. What evidence do we need to document non-mosaic Down syndrome under 10.06?\n1. Under 10.06A, we will find you disabled based on laboratory findings.\na. To find that your disorder meets 10.06A,\r\n    we need a copy of the laboratory report of karyotype analysis, which is the definitive test to establish non-mosaic Down syndrome. We will not purchase karyotype analysis.\r\n    We will not accept a fluorescence in situ hybridization (FISH) test because it does not distinguish between the mosaic and non-mosaic forms of Down syndrome.\r\n\nb. If a physician (see §§404.1513(a)(1) \r\n                and 416.913(a)(1) of this chapter) \r\n                has not signed the laboratory report of karyotype analysis, the 

## 3: Create the Langchain agent

#### 3.1: Create agent, its prompt and its set of tools

In [25]:


from langchain.chains import RetrievalQA

retrieval_llm = OpenAI(temperature=0,openai_api_key=os.getenv("OPENAI_API_KEY"))

social_security_retriever = RetrievalQA.from_chain_type(llm=retrieval_llm, chain_type="stuff", retriever=docsearch.as_retriever())



In [106]:
# Set up the prompt with input variables for tools, user input and a scratchpad for the model to record its workings
template = """Answer the following questions as best you can, speaking as if you are a nurse with extensive medical knowledge giving advice to a patient. You have access to the following tools:

{tools}

Use the following format and never deviate from this structure. The structure before each colon is also imperative to include:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question. This answer should consolidate the chain of thought from earlier and explain the answer in a clear way that could be understood by someone at the reading comprehension level of 7th grade. The answer should be phrased as if speaking directly to the patient. If you don't know the answer to the question, return the final answer as "Final Answer: I don't know the answer to this question"

Begin!

Question: {input}
{agent_scratchpad}"""


In [107]:
#right now we only have a single tool which is the vector database

tools = [
    Tool(
        name = 'Knowledge Base',
        func=social_security_retriever.run,
        description="Useful for getting answers for social security-related questions, as well as information on diseases that qualify for social security. Input should be a fully formed question."
    )
]

In [108]:
import datetime
import json
import openai
import os
import pandas as pd
import pinecone
import re
from tqdm.auto import tqdm
from typing import List, Union
import zipfile

# Langchain imports
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import BaseChatPromptTemplate, ChatPromptTemplate
from langchain import SerpAPIWrapper, LLMChain
from langchain.schema import AgentAction, AgentFinish, HumanMessage, SystemMessage
# LLM wrapper
from langchain.chat_models import ChatOpenAI
from langchain import OpenAI
# Conversational memory
from langchain.memory import ConversationBufferWindowMemory
# Embeddings and vectorstore
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone

# Vectorstore Index
index_name = 'podcasts'

In [109]:
class CustomPromptTemplate(BaseChatPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]

    def format_messages(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)

        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "

        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts

        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])

        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        formatted = self.template.format(**kwargs)
        return [HumanMessage(content=formatted)]

prompt = CustomPromptTemplate(
    template=template,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps"]
)

In [110]:


class CustomOutputParser(AgentOutputParser):

    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:

        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )

        # Parse out the action and action input
        regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)

        # If it can't parse the output it raises an error
        # You can add your own logic here to handle errors in a different way i.e. pass to a human, give a canned response
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)

        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

output_parser = CustomOutputParser()



In [111]:
# Initiate our LLM - default is 'gpt-3.5-turbo'
llm = ChatOpenAI(temperature=0,openai_api_key=os.getenv("OPENAI_API_KEY"))

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)

# Using tools, the LLM chain and output_parser to make an agent
tool_names = [tool.name for tool in tools]

agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=output_parser,
    # We use "Observation" as our stop sequence so it will stop when it receives Tool output
    # If you change your prompt template you'll need to adjust this as well
    stop=["\nObservation:"],
    allowed_tools=tool_names
)

In [112]:
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

#### 4: Now we have our agent and we can test it!

I found some questions on the [social security subreddit](https://www.reddit.com/r/SocialSecurity/) which I thought would be a good source for testing since it has real questions from people along with their answers. In the future I could scrape this as well and use it to ground the model.

In [115]:
agent_executor.run("I have a child with down syndrome. What do I need to know to determine if he qualifies for social security?")



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: To determine if a child with Down syndrome qualifies for social security, I need to gather information about the eligibility criteria and requirements.
Action: Knowledge Base
Action Input: "Eligibility criteria for social security benefits for children with Down syndrome"[0m

Observation:[36;1m[1;3m To be eligible for social security benefits for children with Down syndrome, we need a copy of the laboratory report of karyotype analysis, which is the definitive test to establish non-mosaic Down syndrome. We may also need a physician’s report stating that the child has the distinctive facial or other physical features of Down syndrome, and evidence that their functioning is consistent with a diagnosis of non-mosaic Down syndrome.[0m
[32;1m[1;3mI now know the eligibility criteria for social security benefits for children with Down syndrome.
Final Answer: To determine if your child qualifies for social security benefits, we wil

"To determine if your child qualifies for social security benefits, we will need a copy of the laboratory report of karyotype analysis, a physician's report confirming the physical features of Down syndrome, and evidence of their functioning consistent with a diagnosis of non-mosaic Down syndrome."

In [93]:
agent_executor.run("What is SSDI vs SSI?")



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: I need to explain the difference between SSDI and SSI to the patient.
Action: Knowledge Base
Action Input: "What is the difference between SSDI and SSI?"[0m

Observation:[36;1m[1;3m SSDI stands for Social Security Disability Insurance and SSI stands for Supplemental Security Income. SSDI is a program that provides benefits to people who have worked and paid into Social Security, while SSI is a program that provides benefits to people who have limited income and resources.[0m
[32;1m[1;3mI now know the final answer.
Final Answer: SSDI stands for Social Security Disability Insurance and SSI stands for Supplemental Security Income. SSDI is a program that provides benefits to people who have worked and paid into Social Security, while SSI is a program that provides benefits to people who have limited income and resources.[0m

[1m> Finished chain.[0m


'SSDI stands for Social Security Disability Insurance and SSI stands for Supplemental Security Income. SSDI is a program that provides benefits to people who have worked and paid into Social Security, while SSI is a program that provides benefits to people who have limited income and resources.'

In [113]:
agent_executor.run("My mom came to the US in 2016. In 2017 she&#039;s got a good job elderly care services. All of a sudden she couldn&#039;t do her job because of her physical condition. We found out that back in her country she was on disability program already. She had two strokes in 1995 and in 2007, three breast surgeries (cancer), osteoporosis etc.She is a green card holder, have an alien number, and SSN. Here is my question: Does my mom qualify for disability?")



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: To determine if the patient's mom qualifies for disability, I need to gather more information about her medical conditions and the requirements for disability eligibility.
Action: Knowledge Base
Action Input: "What medical conditions qualify for social security disability?"[0m

Observation:[36;1m[1;3m Social Security disability is based on the severity of an individual's impairment(s). To qualify for disability benefits, an individual must have a medically determinable physical or mental impairment that results in the inability to do any substantial gainful activity and is expected to last for at least 12 months or result in death.[0m
[32;1m[1;3mBased on the information provided, it seems that your mom may meet the criteria for disability. However, I would need more specific information about her medical conditions and how they affect her ability to work in order to give a definitive answer.
Final Answer: Based on the infor

'Based on the information provided, it is possible that your mom may qualify for disability benefits. However, a thorough evaluation of her medical conditions and their impact on her ability to work would be necessary to determine her eligibility. I recommend contacting the Social Security Administration or a disability attorney for further assistance.'

## 5. In progress: add an online endpoint using streamlit and make it publicly available