# Custom Chatbot Project

I have chosen to use web scraping to scrape text from the official cdc website regarding the COVID 19 pandemic. I chose this to get more up to date information regarding the COVID-19 pandemic. While I think there may be sources with more recent dates available, this one is definetly the most reputable from my research. Here is the link: [COVID-19 Timeline](https://www.cdc.gov/museum/timeline/covid19.html)

In [17]:
import openai
openai.api_key = ""
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
BATCH_SIZE = 100
COMPLETION_MODEL_NAME="gpt-3.5-turbo-instruct"

## Data Wrangling


In [2]:
import requests
from bs4 import BeautifulSoup

In [3]:
def fetch_page(url: str):
    headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'
    }
    r = requests.get(url, headers=headers)
    if r.status_code == 200:
        return r.text
    else:
        print(r.status_code)
        return r.text

In [4]:
test_url = "https://www.cdc.gov/museum/timeline/covid19.html"
test_result = fetch_page(test_url)
print(test_result)


<!DOCTYPE html>
<html lang="en-us" class="cdc-2022 theme-cyan cdc-page-type-content" >
<head>
	
<!-- Global / universal meta tags -->
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="theme-color" content="#075290">

<link rel="shortcut icon" href="/TemplatePackage/4.0/assets/imgs/favicon.ico?_=202112">
<link rel="icon" type="image/png" sizes="32x32" href="/TemplatePackage/4.0/assets/imgs/favicon-32x32.png">
<link rel="apple-touch-icon" sizes="60x60" href="/TemplatePackage/4.0/assets/imgs/apple-touch-icon-60x60.png">
<link rel="apple-touch-icon" sizes="180x180" href="/TemplatePackage/4.0/assets/imgs/apple-touch-icon-180x180.png">

<link rel="stylesheet prefetch" href="/TemplatePackage/contrib/libs/bootstrap/latest/css/bootstrap

In [5]:
home = "https://www.cdc.gov/museum/timeline/covid19.html"
home_page = fetch_page(home)
soup = BeautifulSoup(home_page, 'html.parser')
divs = soup.find_all('div', class_='card-body')
data=[]
for div in divs:
    date_element = div.find('h5', {'class': 'card-title'})
    text_element = div.find('p')
    if date_element and text_element:
        current_date=date_element.get_text()
        current_text=text_element.get_text()
        data.append(f"{current_date} -- {current_text}")
data

['December 12, 2019 -- A cluster of patients in China’s Hubei Province, in the city of Wuhan, begin to experience the symptoms of an atypical pneumonia-like illness that does not respond well to standard treatments.',
 'December 31, 2019 -- The World Health Organization (WHO) Country Office in China is informed of several cases of a pneumonia of unknown etiology (cause) with symptoms including shortness of breath and fever occurring in Wuhan, China. All initial cases seem connected to the Huanan Seafood Wholesale Market.',
 'January1, 2020 -- The Huanan Seafood Wholesale Market in Wuhan is closed amid worries in China of a reprise of the 2002–2004 SARS (Severe Acute Respiratory Syndrome Coronavirus or SARS-CoV-1) outbreak.',
 'January 2, 2020 -- WHO activates its Incident Management Support Team (IMST) across all three organizational levels: Country Office, Regional Office, and Headquarters.',
 'January 3, 2020 -- China informs WHO that they have identified over 40 cases of pneumonia o

In [6]:
import pandas as pd

pd.set_option('display.max_colwidth', None)  
pd.set_option('display.max_rows', None)  

df = pd.DataFrame(data, columns=['text'])
df

Unnamed: 0,text
0,"December 12, 2019 -- A cluster of patients in China’s Hubei Province, in the city of Wuhan, begin to experience the symptoms of an atypical pneumonia-like illness that does not respond well to standard treatments."
1,"December 31, 2019 -- The World Health Organization (WHO) Country Office in China is informed of several cases of a pneumonia of unknown etiology (cause) with symptoms including shortness of breath and fever occurring in Wuhan, China. All initial cases seem connected to the Huanan Seafood Wholesale Market."
2,"January1, 2020 -- The Huanan Seafood Wholesale Market in Wuhan is closed amid worries in China of a reprise of the 2002–2004 SARS (Severe Acute Respiratory Syndrome Coronavirus or SARS-CoV-1) outbreak."
3,"January 2, 2020 -- WHO activates its Incident Management Support Team (IMST) across all three organizational levels: Country Office, Regional Office, and Headquarters."
4,"January 3, 2020 -- China informs WHO that they have identified over 40 cases of pneumonia of unknown etiology."
5,"January 5, 2020 -- CDC’s National Center for Immunization and Respiratory Diseases (NCIRD) activates a center-level response to investigate this novel pneumonia of unknown etiology."
6,"January 7, 2020 -- Public health officials in China identify a novel coronavirus as the causative agent of the outbreak."
7,"January 10, 2020 -- WHO begins using the phrase “2019 Novel Coronavirus” or “2019-nCoV” to refer to disease causing the outbreak in Wuhan, China."
8,"January 11, 2020 -- WHO tweets that it has received the genetic sequences of the novel coronavirus from China and expects that the information will shortly become publicly available."
9,"January 13, 2020 -- The Thailand Ministry of Public Health confirms the first laboratory-confirmed case of the SARS-CoV-2 virus outside of China."


## Generating Embeddings


In [7]:


def create_embeddings(input_data, model_name):
    return openai.Embedding.create(
        input=input_data,
        engine=model_name
    )

def extract_embeddings(response):
    return [data["embedding"] for data in response["data"]]

def generate_embeddings(df, batch_size):
    embeddings = []
    for i in range(0, len(df), batch_size):
        response = create_embeddings(df.iloc[i:i+batch_size]["text"].tolist(), EMBEDDING_MODEL_NAME)
        embeddings.extend(extract_embeddings(response))
    return embeddings
# Add embeddings list to dataframe
df["embeddings"] = generate_embeddings(df, BATCH_SIZE)
df

Unnamed: 0,text,embeddings
0,"December 12, 2019 -- A cluster of patients in China’s Hubei Province, in the city of Wuhan, begin to experience the symptoms of an atypical pneumonia-like illness that does not respond well to standard treatments.","[0.0012409056071192026, -0.006781655363738537, 0.017395567148923874, -0.02466769888997078, -0.011836833320558071, 0.020940078422427177, -0.01773563213646412, -0.005042098928242922, -0.01155562698841095, -0.011509849689900875, -0.005784353241324425, 0.02611950971186161, -0.0008289053221233189, 0.024824652820825577, 0.021934110671281815, -0.0038387964013963938, 0.0381917729973793, -0.01700318604707718, 0.01079048402607441, -0.012909342534840107, -0.012268452905118465, 0.007873782888054848, -0.016009153798222542, -0.011706040240824223, -0.019605981186032295, -0.0008015204221010208, 0.025060079991817474, -0.0154336616396904, 0.0017804298549890518, 0.006977845914661884, -0.01108476985245943, -0.0062682898715138435, -0.008370799012482166, 0.00045655190479010344, -0.007245973218232393, -0.011065150611102581, 0.007154417689889669, -0.027937542647123337, 0.014348073862493038, -0.013419438153505325, 0.03688383474946022, 0.014753534458577633, 0.015080518089234829, 0.006958227138966322, -0.027126621454954147, -0.006866671610623598, 0.023320524021983147, -0.005751654971390963, -0.0007520639919675887, 0.0026338589377701283, 0.005166352726519108, -0.003992478828877211, -0.023634428158402443, -0.0073767672292888165, -0.002857843181118369, -0.016506170853972435, 0.0022888905368745327, -0.0016161202220246196, -0.015551376156508923, -0.011647182516753674, 0.002620779676362872, -0.017487123608589172, -0.026394177228212357, 0.00040035147685557604, -0.00994032435119152, 0.012072262354195118, -0.01753944158554077, 0.028225289657711983, 0.017277853563427925, 0.013282104395329952, 0.03241068869829178, 0.009711435064673424, -0.01155562698841095, 0.002579906489700079, 0.03662224858999252, -0.0007667783065699041, -0.004911304917186499, -0.004368511028587818, -0.008704323321580887, -0.006137496326118708, 0.015172073617577553, -0.022522682324051857, -0.007638354320079088, 0.018128013238310814, 0.024039888754487038, -0.00280389073304832, -0.01232730969786644, 0.02137169800698757, -0.006088448688387871, -0.011182865127921104, 0.018180329352617264, 0.011280960403382778, 0.003181557869538665, 0.014112644828855991, 0.006964766886085272, 0.014988962560892105, 0.009037847630679607, 0.012889723293483257, -0.008442736230790615, -0.05514264106750488, ...]"
1,"December 31, 2019 -- The World Health Organization (WHO) Country Office in China is informed of several cases of a pneumonia of unknown etiology (cause) with symptoms including shortness of breath and fever occurring in Wuhan, China. All initial cases seem connected to the Huanan Seafood Wholesale Market.","[0.019597971811890602, -0.013627123087644577, 0.006134165450930595, -0.01618792489171028, -0.020995961502194405, 0.0003403106238692999, -0.038020066916942596, -0.006963812746107578, -0.015521594323217869, -0.006663310341536999, -0.0077085355296730995, 0.017076365649700165, -0.00690501881763339, 0.017167823389172554, 0.012836671434342861, 0.007630143780261278, 0.02520299144089222, -0.02019897662103176, 0.005056276917457581, -0.01571757346391678, -0.013849233277142048, 0.013757776468992233, -0.016801994293928146, 0.004732910078018904, -0.007969842292368412, 0.007212053518742323, 0.028142688795924187, -0.01852661557495594, -0.006859290413558483, -0.01819998398423195, 0.016945714130997658, -0.0017883148975670338, -0.02313867211341858, 0.0034231124445796013, 0.0024726109113544226, 0.011961295269429684, 0.009694463573396206, -0.01381003763526678, 0.003357785986736417, -0.023543696850538254, 0.021858271211385727, 0.039195943623781204, 0.008551248349249363, 0.021884402260184288, -0.017612043768167496, -0.0058107986114919186, 0.022407013922929764, 0.0029560273978859186, -0.007506023161113262, 0.014437172561883926, 0.009910041466355324, 0.00020945332653354853, -0.022576862946152687, -0.0033332884777337313, 0.0026849221903830767, -0.014228127896785736, -0.01441104244440794, 0.007937178947031498, 0.005026879720389843, -0.01927133835852146, -0.0007912681321613491, -0.015965813770890236, -0.032219067215919495, 0.0007859603501856327, -0.01625325158238411, -0.0030785147100687027, -0.015351744368672371, 0.03352559730410576, -0.0014118705876171589, 0.021766813471913338, 0.034048210829496384, 0.02253766730427742, 0.00419723242521286, 0.019023098051548004, 0.040032126009464264, -0.007767329458147287, 0.0003027478524018079, 0.017585912719368935, -0.013117576017975807, -0.0036909515038132668, 0.010164815001189709, -0.0333949439227581, -0.005369844380766153, 0.022106513381004333, 0.01463315263390541, -0.008224615827202797, -0.011249235831201077, 0.015417071059346199, -0.010243207216262817, -0.01595274917781353, 0.02103515714406967, 0.010478381998836994, -0.0036582881584763527, 0.0044324081391096115, -0.0012950992677360773, 0.0014306519879028201, -0.004523865412920713, 0.035903483629226685, 0.008322605863213539, -0.05827130377292633, ...]"
2,"January1, 2020 -- The Huanan Seafood Wholesale Market in Wuhan is closed amid worries in China of a reprise of the 2002–2004 SARS (Severe Acute Respiratory Syndrome Coronavirus or SARS-CoV-1) outbreak.","[0.006242129951715469, -0.03615302965044975, -0.0016255199443548918, 0.01914297603070736, -0.03212713822722435, -0.0132041210308671, -0.026328256353735924, 0.013244112953543663, -0.0127842016518116, -0.02762134000658989, 0.0089982645586133, 0.025075165554881096, -0.007785165682435036, 0.0008856625645421445, 0.00045449569006450474, -0.006815352477133274, 0.023942049592733383, -0.02432864159345627, 0.0033793484326452017, -0.02415534295141697, -0.022315697744488716, 0.0035693116951733828, -0.01746330037713051, -0.008498361334204674, -0.011877709999680519, 0.014730493538081646, 0.031433939933776855, -0.014063955284655094, -0.017010053619742393, -0.01517040841281414, 0.00034535006852820516, -0.0003145226801279932, -0.025688380002975464, 0.01944958232343197, -0.0009056587005034089, 0.002037940314039588, 0.010231360793113708, -0.00955149158835411, 0.0176632609218359, -0.029381001368165016, 0.0138773238286376, 0.026314925402402878, 0.007825157605111599, 0.002232902916148305, -0.00989142619073391, -0.019462913274765015, 0.02815457060933113, 0.0035993061028420925, -0.02812791056931019, 0.019796183332800865, 0.03234042972326279, -0.00186297413893044, -0.009984741918742657, -0.0055822571739554405, 0.0005657242727465928, -0.002356212353333831, 0.008051780983805656, 0.0028494505677372217, 0.007885145954787731, -0.022249042987823486, 0.010091387666761875, -0.0033226925879716873, -0.040792133659124374, 0.011044536717236042, 0.010084722191095352, -0.012297628447413445, -0.014370562508702278, -0.0033210262190550566, -0.008085107430815697, 0.026328256353735924, 0.02486187219619751, 0.011797725223004818, 0.0060988240875303745, -0.010471314191818237, 0.027154764160513878, -0.0017213347600772977, 0.0006023838650435209, -0.007951799780130386, -0.00028827774804085493, -0.03268703073263168, 0.017543284222483635, -0.0380193367600441, -0.006592062301933765, 0.013957308605313301, 0.0304474625736475, -0.005255653522908688, -0.013943977653980255, 0.014410554431378841, -0.0018279808573424816, -0.0046824305318295956, 0.03135395422577858, 0.02718142606317997, -0.0025545074604451656, 0.0010131379822269082, -0.018223153427243233, 0.01651681587100029, -0.015197069384157658, 0.029887570068240166, -0.0021179248578846455, -0.028714463114738464, ...]"
3,"January 2, 2020 -- WHO activates its Incident Management Support Team (IMST) across all three organizational levels: Country Office, Regional Office, and Headquarters.","[-0.01466497965157032, -0.02162177674472332, -0.024225717410445213, -0.019328754395246506, -0.019950591027736664, 0.036895640194416046, -0.02002832107245922, -0.006040235981345177, 0.017204146832227707, 0.010992257855832577, 0.00569692999124527, -0.01798144169151783, -0.011724211275577545, -0.01140033733099699, 0.006328483112156391, -0.0068725901655852795, 0.018914196640253067, -0.00725476024672389, -0.01290958747267723, 0.006036996841430664, -0.022308388724923134, -0.012514461763203144, 0.017838938161730766, 0.01625843718647957, 0.0053568631410598755, 0.0031431897077709436, 0.0005242697661742568, -0.02611713670194149, 0.025870993733406067, -0.000650175497867167, 0.004100235179066658, -0.02054651826620102, -0.001566736726090312, -0.006283140741288662, -0.010616564191877842, -0.004984409082680941, 0.027594000101089478, -0.03285370022058487, -0.012669920921325684, -0.0273089911788702, 0.027801278978586197, 0.03039226494729519, 0.0136544955894351, -0.00524350767955184, -0.02495119348168373, 0.022334298118948936, 0.012682875618338585, -0.0067495182156562805, -0.023707520216703415, 0.028397204354405403, 0.0034168625716120005, 0.029407689347863197, -0.013576765544712543, -0.0015343494014814496, -0.02173837088048458, -0.0009821455460041761, -0.022671125829219818, -0.002037162659689784, 0.002202338073402643, -0.01970444805920124, -0.026039408519864082, -0.03124728985130787, -0.02751627005636692, 0.03298325091600418, 0.011678868904709816, -0.00730657996609807, -0.018590323626995087, 0.016232525929808617, -0.003669483819976449, 0.02663533389568329, 0.031169559806585312, 0.012391390278935432, -0.004589283838868141, 0.002156995702534914, 0.03298325091600418, -0.0099040437489748, -0.0008704093052074313, 0.0017246250063180923, 0.02793082781136036, -0.013155730441212654, 0.005742272362112999, -0.02536575123667717, -0.02568962424993515, 0.021660642698407173, 0.01449656579643488, 0.02171246148645878, 0.027049891650676727, -0.0019545750692486763, 0.0029682982712984085, 0.006286379415541887, 0.022554531693458557, 0.03161002695560455, 0.01450952049344778, 0.003533456940203905, -0.021557003259658813, 0.010623042471706867, -0.015468185767531395, 0.033760543912649155, 0.01647867076098919, -0.0395643524825573, ...]"
4,"January 3, 2020 -- China informs WHO that they have identified over 40 cases of pneumonia of unknown etiology.","[0.026982763782143593, -0.012238883413374424, 0.003897368907928467, -0.0233206357806921, 0.002497049979865551, 0.010077210143208504, -0.03868122771382332, -0.020396018400788307, -0.016085390001535416, 0.017471402883529663, 0.0052484143525362015, 0.01866668090224266, -0.002611491596326232, 0.017255235463380814, 0.010121715255081654, -0.00228247232735157, 0.01927703619003296, -0.017026353627443314, 0.005458224099129438, 0.0003210720024071634, -0.013402371667325497, -0.002272935351356864, -0.03156042471528053, -0.007247962057590485, -0.020294293761253357, 0.00924433022737503, 0.022354241460561752, -0.01837421953678131, 0.010337882675230503, -0.012575849890708923, -0.004380566533654928, -0.010019989684224129, -0.016428714618086815, 0.006516807712614536, -0.006510450039058924, -0.006341966800391674, 0.013593107461929321, -0.01887013204395771, 0.0061671254225075245, -0.012353324331343174, 0.022290661931037903, 0.03323889896273613, 0.012029074132442474, 0.014839248731732368, -0.020891932770609856, -0.004399640019983053, 0.012722080573439598, -0.012823806144297123, -0.021120814606547356, 0.0228883009403944, 0.011355140246450901, -0.004336061421781778, -0.03306087851524353, -0.004653954412788153, -0.0035540445242077112, -0.0232443418353796, -0.017204372212290764, 0.01907358318567276, -0.003369666635990143, -0.008430523797869682, -0.012327893637120724, -0.035883769392967224, -0.029424183070659637, 0.001560854958370328, -0.020357871428132057, 0.007419624365866184, -0.014877395704388618, 0.025406014174222946, 0.025164416059851646, 0.01953135058283806, 0.026804743334650993, 0.0008845374686643481, 0.016835616901516914, -0.0036494124215096235, 0.04168213903903961, -0.012022715993225574, -0.002393734874203801, -0.0040277051739394665, -0.011075394228100777, 0.013478666543960571, 0.00972752831876278, -0.024757511913776398, -0.0121307997033, 0.01852680742740631, 0.01200364250689745, -0.0018390114419162273, -0.0019041794585064054, 0.035629455000162125, 0.001440850319340825, -0.005302456207573414, 0.022227084264159203, 0.006399187259376049, 0.0051880148239433765, -0.003280656412243843, 0.0015266814734786749, 0.011088110506534576, 0.00012050133955199271, 0.03662128001451492, 0.006151231005787849, -0.044479597359895706, ...]"
5,"January 5, 2020 -- CDC’s National Center for Immunization and Respiratory Diseases (NCIRD) activates a center-level response to investigate this novel pneumonia of unknown etiology.","[0.0037111290730535984, -0.011207289062440395, -0.013520719483494759, -0.028275270015001297, 0.030922863632440567, 0.02404683083295822, -0.02422676607966423, -0.019175773486495018, -0.010275489650666714, 0.024907942861318588, 0.006548295263200998, 0.015602807514369488, 0.0024740861263126135, 0.01289095263928175, 0.008906709961593151, -0.010281916707754135, 0.02114219032227993, -0.013919143937528133, 0.021373532712459564, -0.007004555314779282, -0.019342854619026184, -0.005802856292575598, -0.01998547464609146, -0.021707694977521896, -0.006326591596007347, 0.004610796924680471, 0.013302229344844818, -0.029611919075250626, 0.009221593849360943, -0.008758907206356525, -6.722404941683635e-05, -0.014446092769503593, -0.004366601351648569, 0.00413525803014636, -0.017530666664242744, -0.008906709961593151, 0.0015342546394094825, -0.02052527479827404, 0.02110363356769085, -0.010468276217579842, 0.03089715912938118, 0.004234864376485348, -0.01301947608590126, 0.006888884119689465, 0.00015543366316705942, -0.00940795335918665, 0.02168199047446251, -0.014009110629558563, -0.015962675213813782, 0.01835322007536888, 0.018546005710959435, -0.0057096765376627445, -0.017414996400475502, -0.010031295008957386, -0.011740663088858128, -0.014420387335121632, -0.017993353307247162, 0.02396971732378006, -0.01107876468449831, -0.037657517939805984, -0.013597833923995495, -0.04292700067162514, -0.02647593431174755, 0.01715794764459133, 0.017890533432364464, 0.017067981883883476, -0.01273029763251543, 0.024650894105434418, -0.001802548416890204, 0.016553884372115135, 0.023802636191248894, 0.02657875418663025, 0.014446092769503593, 0.009665001183748245, 0.035883888602256775, 0.0011020929086953402, -0.00595708517357707, 0.002387332497164607, -0.0006008495111018419, 0.004491912201046944, 0.0074865203350782394, -0.0022716608364135027, -0.014433239586651325, 0.030948568135499954, 0.023455621674656868, -0.013469310477375984, 0.009440083988010883, 0.0328507237136364, -0.010089130140841007, -0.007190915290266275, 0.027427012100815773, 0.034393008798360825, 0.008244811557233334, 0.01316085271537304, -0.01089883130043745, 0.018661677837371826, 0.01602693647146225, 0.027401307597756386, 0.0013944848906248808, -0.03380180150270462, ...]"
6,"January 7, 2020 -- Public health officials in China identify a novel coronavirus as the causative agent of the outbreak.","[0.007948236539959908, -0.016862206161022186, -0.0001463291991967708, -0.017713574692606926, -0.010991565883159637, -0.0017726282821968198, -0.026862625032663345, -0.0068109589628875256, -0.018488703295588493, 0.005330591928213835, 0.015248415060341358, 0.025375904515385628, -0.005584732163697481, 0.009250705130398273, -0.0005916702793911099, -0.005095512140542269, 0.01911134645342827, -0.037866897881031036, 0.027116764336824417, 0.0022809088695794344, -0.021881476044654846, -0.011366423219442368, -0.009981358423829079, 0.007459016516804695, -0.014981567859649658, 0.008774192072451115, 0.012097075581550598, -0.03822269290685654, 0.019467143341898918, -0.015388192608952522, -0.01911134645342827, -0.008710657246410847, -0.010908970609307289, -0.0011483962880447507, 0.0028146032709628344, -0.002164957346394658, 0.0008331034914590418, -0.017472142353653908, 0.0018218678887933493, -0.025045521557331085, 0.028209568932652473, 0.02709135040640831, -0.001899698399938643, -0.007033331319689751, -0.025693578645586967, -0.004939851351082325, 0.02256765402853489, -0.011334654875099659, -0.020877622067928314, 0.005346475634723902, 0.03954422473907471, 0.005063744727522135, -0.012039894238114357, -0.018399754539132118, -0.014409752562642097, -0.015972714871168137, 0.006715656258165836, -0.005314708221703768, -0.006931675598025322, -0.02859077788889408, -0.002744714729487896, -0.026252688840031624, -0.02620185911655426, 0.008748778142035007, -0.012255913577973843, 0.006785544566810131, -0.0192257110029459, 0.035757534205913544, 0.003083038842305541, 0.03479180112481117, 0.02211020141839981, 0.020318513736128807, 0.018755551427602768, 0.009841580875217915, 0.03250453993678093, 0.0042314352467656136, -0.019022397696971893, -0.018844500184059143, -0.0037866898346692324, 0.008691596798598766, 0.020483704283833504, -0.006064421962946653, -0.001277848961763084, 0.01847599633038044, 0.02061077393591404, -0.015400899574160576, 0.004491928964853287, 0.024054374545812607, -0.000568638788536191, -0.016989275813102722, 0.013367777690291405, 0.025426732376217842, 0.005622853059321642, 0.009352360852062702, -0.01715446636080742, 0.027930013835430145, 0.00991782359778881, 0.024397464469075203, -0.0008402512175962329, -0.0348934568464756, ...]"
7,"January 10, 2020 -- WHO begins using the phrase “2019 Novel Coronavirus” or “2019-nCoV” to refer to disease causing the outbreak in Wuhan, China.","[0.004341480787843466, -0.024509891867637634, 0.00988628901541233, -0.003929815720766783, -0.003013069974258542, 0.0031033195555210114, -0.019455913454294205, 0.013755938969552517, -0.015541930682957172, -0.0018303252290934324, 0.0197345782071352, 0.01191294752061367, -0.005367476027458906, 0.01766992174088955, 0.008378962986171246, 0.004863978363573551, 0.01977257803082466, -0.01285660918802023, -0.00032339440076611936, -0.012083946727216244, -0.0302731990814209, 0.008967960253357887, -0.010253621265292168, -0.00531997624784708, -0.019392579793930054, 0.009664623998105526, 0.028955871239304543, -0.029158536344766617, -0.0038474828470498323, -0.002492155646905303, -0.025510553270578384, -0.0005379351205192506, -0.005114144179970026, -0.0036036507226526737, -0.006998302415013313, 6.486690108431503e-05, 0.0006891427910886705, -0.017340589314699173, 0.009005960077047348, -0.028069209307432175, 0.01663125865161419, 0.022989897057414055, 0.00025155095499940217, -0.0055416421964764595, -0.03642917051911354, -0.0066309706307947636, 0.010373953729867935, -0.015073266811668873, -0.0171379242092371, 0.026523882523179054, 0.020899906754493713, 0.019975244998931885, -0.012178946286439896, -0.022647898644208908, -0.03541584312915802, 0.0018699084175750613, -0.019037915393710136, 0.009246625937521458, 0.00682730320841074, -0.01394593808799982, -0.00834729615598917, -0.02027924358844757, -0.015934595838189125, 0.008942627348005772, -0.014528602361679077, -0.012102946639060974, -0.025662552565336227, 0.02738521248102188, -0.004116648342460394, 0.016783257946372032, 0.019987910985946655, 0.01977257803082466, 0.015060599893331528, 0.005231310147792101, 0.014617268927395344, 0.014845267869532108, -0.014262603595852852, -0.0010758702410385013, 0.013223941437900066, 0.0024858221877366304, 0.004775312263518572, -0.025067221373319626, -0.02282523177564144, 0.015845930203795433, 0.030653197318315506, -0.00041958148358389735, -0.0015880762366577983, 0.019861245527863503, -0.017061924561858177, -0.02277456596493721, 0.02022857777774334, 0.030729196965694427, 0.0016529926797375083, 0.00830929633229971, -0.002506405580788851, 0.007523966487497091, -0.014249936677515507, 0.021545903757214546, -0.008119297213852406, -0.04055848717689514, ...]"
8,"January 11, 2020 -- WHO tweets that it has received the genetic sequences of the novel coronavirus from China and expects that the information will shortly become publicly available.","[-0.021071095019578934, -0.0220038965344429, -0.015742624178528786, -0.016164302825927734, 0.005759730469435453, 0.006721283309161663, -0.023831164464354515, 0.0036737050395458937, -0.02972187101840973, 0.011985862627625465, 0.020048845559358597, 0.014899269677698612, -0.013084779493510723, -0.0027472926303744316, 0.016023743897676468, -0.005849177483469248, 0.014132583513855934, -0.034679777920246124, 0.00943663064390421, -0.0009831154020503163, -0.006855453364551067, -0.007622139528393745, -0.017953237518668175, -0.012688658200204372, -0.020444966852664948, 0.009845529682934284, 0.011883636936545372, -0.035497575998306274, -0.0034373102243989706, -0.021071095019578934, -0.014375368133187294, -0.0009887058986350894, -0.02521120011806488, -0.00831854622811079, 0.003574674716219306, -0.0013249297626316547, 0.01461815182119608, -0.02660401351749897, 0.009449408389627934, -0.02481507882475853, 0.016470976173877716, 0.024866191670298576, -0.0021195681765675545, 0.011097784154117107, -0.013710906729102135, -0.0006313182529993355, 0.026552900671958923, -0.02054719254374504, -0.012790882959961891, 0.0009272112511098385, 0.007398522458970547, 0.024316733703017235, -0.027984049171209335, -0.02421450801193714, -0.011021114885807037, -0.006881009321659803, -0.004207191057503223, -0.004609701223671436, -0.010279985144734383, -0.0010070743737742305, 0.01461815182119608, -0.018285468220710754, -0.035012006759643555, 0.01869436725974083, -0.01942271925508976, -0.0013201379915699363, -0.019652724266052246, 0.02348615601658821, 0.013966468162834644, 0.037925414741039276, 0.02923630364239216, -0.0046544247306883335, 0.011736689135432243, 0.016036521643400192, 0.03276306018233299, 0.01696932315826416, -0.002378324745222926, 0.002528467448428273, -0.0005977757391519845, -0.004782205913215876, 0.013915356248617172, -0.013966468162834644, -0.0004679980920627713, 0.015269835479557514, 0.025441206991672516, 0.01557650975883007, 0.016202636063098907, 0.02503230795264244, -0.014004802331328392, -0.020189406350255013, 0.027370700612664223, 0.02181222476065159, 1.0357252904213965e-05, -0.0058843172155320644, -0.03639204055070877, 0.0076476954855024815, -0.020112736150622368, 0.03128080070018768, 0.0008513412321917713, -0.05448583886027336, ...]"
9,"January 13, 2020 -- The Thailand Ministry of Public Health confirms the first laboratory-confirmed case of the SARS-CoV-2 virus outside of China.","[0.00293144048191607, -0.02092132717370987, -0.005906939040869474, -0.006841601338237524, 0.007017833646386862, 0.01479095034301281, -0.018781360238790512, 0.007288476452231407, -0.022620713338255882, 0.002498726360499859, 0.0009621984791010618, 0.009780908934772015, -0.011864230036735535, 0.0016631950857117772, 0.007571707479655743, -0.02302352897822857, 0.02751746028661728, -0.028625208884477615, 0.01375873014330864, -0.019599582999944687, -0.019838755950331688, -0.00721924239769578, -0.016628803685307503, 0.008276637643575668, 0.007382886949926615, 0.021273791790008545, 0.010548779740929604, -0.03763824701309204, 0.004792897030711174, -0.009453619830310345, -0.003870823187753558, -0.018000900745391846, -0.008836805820465088, -0.006234228145331144, 0.005598532035946846, -0.0063821375370025635, 0.006394725758582354, -0.007880114950239658, 0.004078525584191084, -0.025679606944322586, 0.008421400561928749, 0.011322944425046444, 0.0013941257493570447, 0.0012768995948135853, -0.012688746675848961, -0.00546321040019393, 0.02719017118215561, -0.011593586765229702, -0.03376112878322601, 0.031293872743844986, 0.0262083038687706, 0.008037465624511242, -0.03240162134170532, -0.009705380536615849, -0.02041780576109886, -0.005381388124078512, -0.011083771474659443, 8.683782652951777e-05, 0.013305560685694218, -0.03373595327138901, 0.001590027124620974, -0.021777313202619553, -0.035221341997385025, 0.003398771397769451, 0.019033120945096016, -0.005173685494810343, -0.003244567895308137, 0.019272293895483017, 0.016301514580845833, 0.03398771584033966, 0.011322944425046444, 0.011367002502083778, 0.0032791851554065943, 0.0029282933101058006, 0.016855388879776, 0.0057527353055775166, -0.010278136469423771, 0.008112993091344833, 0.00723183061927557, -0.003748089773580432, 0.011637645773589611, -0.00928997527807951, -0.014639893546700478, 0.011763526126742363, 0.013355913572013378, 0.00010935861791949719, 0.021336732432246208, 0.019687699154019356, -0.00840251799672842, -0.03622838482260704, 0.01764843612909317, 0.035649336874485016, -0.009648734703660011, -0.0014759480254724622, -0.01500494685024023, -0.005529297515749931, -0.012254459783434868, 0.03874599561095238, -0.005343623925000429, -0.054229285567998886, ...]"


In [8]:
# save df to csv
df.to_csv("embeddings.csv")

## Custom Query Completion


In [9]:
from openai.embeddings_utils import get_embedding, distances_from_embeddings

def calculate_distances(question_embeddings, df):
    return distances_from_embeddings(
        question_embeddings,
        df["embeddings"].values,
        distance_metric="cosine"
    )

def get_rows_sorted_by_relevance(question, df):
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    df_copy = df.copy()
    df_copy["distances"] = calculate_distances(question_embeddings, df_copy)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

In [10]:
# Testing related text
get_rows_sorted_by_relevance("When did the New York Department of Health say that everyone should wear a mask?", df)

Unnamed: 0,text,embeddings,distances
304,"July 8, 2022 -- New York Department of Health recommends that all people should wear N95, KN95, or KF94 masks in all public indoor settings and when in crowded outdoor areas due to high community transmission of COVID-19.","[0.010357880964875221, -0.002179683418944478, -0.013266251422464848, -0.013967831619083881, 0.02208065427839756, -0.001712494413368404, -0.020486153662204742, 0.009241729974746704, -0.01662108302116394, 0.007213524077087641, 0.0043115317821502686, 0.0006605222006328404, 0.011435763910412788, 0.005899655167013407, -0.009936932474374771, -0.016939982771873474, 0.018700312823057175, -0.01417192816734314, 0.024172641336917877, -0.027910152450203896, -0.022093411535024643, 0.029058191925287247, -0.02464461326599121, -0.027527472004294395, -0.008648575283586979, 0.022986331954598427, 0.010976547375321388, -0.01738644205033779, 0.012545536272227764, -0.03094608150422573, 0.024517053738236427, -0.0023885630071163177, -0.0050386241637170315, -0.01600879430770874, -0.011238045990467072, 0.002594253746792674, -0.006387572269886732, -0.007934238761663437, 0.018343143165111542, 0.012079942040145397, 0.018241096287965775, -0.01696549542248249, 0.0011887007858604193, -0.004722913261502981, 0.00047874904703348875, 0.011301825754344463, 0.0008096080855466425, -0.013189715333282948, -0.03632912039756775, 0.0016112435841932893, 0.02162143774330616, -0.005877331830561161, -0.00898023135960102, -0.013087667524814606, -0.007691874634474516, -5.770101779489778e-05, -0.022029630839824677, -0.00337077584117651, -0.027935663238167763, -0.037298575043678284, -0.013049399480223656, -0.02464461326599121, -0.043651070445775986, 0.018725823611021042, 0.016557302325963974, -0.007685496471822262, -0.030665449798107147, 0.029134728014469147, -0.016365962103009224, 0.025091072544455528, 0.03217066079378128, 0.01765431836247444, 0.03426264598965645, 0.010842609219253063, 0.01440153643488884, -0.004927009344100952, 0.013929563574492931, 0.004742046818137169, -0.012213880196213722, -0.022323019802570343, 0.004199916496872902, -0.02437673695385456, -0.02571611851453781, 0.014146416448056698, 0.02288428321480751, 0.012558292597532272, 0.0021127143409103155, 0.018292119726538658, 0.01115513127297163, -0.0051151602528989315, 0.01585572212934494, 0.009962444193661213, 0.011824822053313255, -0.011665372177958488, -0.03599746152758598, 0.03888032212853432, 0.012092698365449905, 0.013470347970724106, 0.017794635146856308, -0.013266251422464848, ...]",0.113393
51,"April 3, 2020 -- At a White House press briefing, CDC announces new mask wearing guidelines and recommends that all people wear a mask when outside of the home.","[-0.009821628220379353, -0.0032139751128852367, -0.02507568523287773, -0.02882876805961132, 0.03971907123923302, 0.005292483605444431, -0.006882773246616125, 0.0007351113599725068, 0.00615442031994462, -0.0078115020878612995, 0.010610411874949932, 0.002549234079197049, -0.0007685074233449996, 0.007989614270627499, -0.008593924343585968, -0.002911820076406002, 0.0203557051718235, -0.010584967210888863, 0.025304686278104782, -0.01441438402980566, -0.0010965046240016818, -0.007754251826554537, -0.02323094941675663, -0.008682981133460999, -0.027047645300626755, -0.008015058934688568, 0.007315332069993019, -0.02100454457104206, 0.030253667384386063, -0.03943917900323868, 0.009446320123970509, -0.01819291152060032, 0.007875113748013973, -0.00027015042724087834, -0.011144748888909817, -0.007518888916820288, -0.006020836066454649, -0.018103856593370438, 0.01768402010202408, -0.000795144762378186, 0.008962871506810188, 0.0014503441052511334, 0.008460340090095997, -0.022251330316066742, -0.02461768127977848, 0.004242892377078533, -0.006408866960555315, -0.01572478376328945, -0.03765805438160896, -0.003045404562726617, 0.032467350363731384, 0.01483422052115202, -0.009910684078931808, -0.018943529576063156, -0.0036194990389049053, -0.004099766258150339, -0.002064195927232504, 0.02286200225353241, -0.01772218570113182, -0.027912762016057968, -0.011749058961868286, -0.021258991211652756, -0.033154357224702835, -0.004287420772016048, 0.024032454937696457, -0.00507938489317894, -0.03086433932185173, 0.0273020900785923, 0.001156140468083322, 0.041449304670095444, 0.02596624754369259, 0.00863209180533886, 0.03432480990886688, 0.004157016985118389, 0.01790029928088188, 0.008104115724563599, -0.015368558466434479, 0.006717382930219173, -0.016551733016967773, -0.0058999741449952126, -0.004382838029414415, -0.0016030118567869067, -0.02623341605067253, 0.022989224642515182, 0.02124626748263836, 0.014592496678233147, 0.00198468123562634, 0.026640530675649643, -0.0070990524254739285, -0.002981792902573943, 0.028955992311239243, 0.0024824419524520636, -0.0014988478505983949, -0.004885369446128607, -0.008116837590932846, 0.0333070233464241, 0.0018447358161211014, 0.04809035733342171, 0.011628197506070137, -0.013829157687723637, ...]",0.129629
286,"May 3, 2022 -- CDC recommends that everyone continue to wear a mask while in indoor transportation hubs to prevent the spread of COVID-19 – but this is no longer legally enforceable.","[-0.009401852265000343, -0.0117507204413414, -0.003465855959802866, -0.016761211678385735, 0.015982510522007942, 0.0008425285341218114, -0.024880122393369675, -0.018586689606308937, -0.004372212570160627, -0.0020887686405330896, 0.028007691726088524, 0.0009829499758780003, -0.006944477558135986, -0.007608287967741489, -0.003216927172616124, 0.0022036589216440916, 0.024586515501141548, -0.015382528305053711, 0.026986444368958473, -0.010231615975499153, 0.00020315230358392, -0.011150737293064594, -0.04891771823167801, -0.024037593975663185, -0.02431843802332878, 0.01372300274670124, 0.020565355196595192, -0.020067498087882996, 0.0239354707300663, -0.029590623453259468, 0.029079999774694443, -0.010518841445446014, -0.003218522761017084, -0.006701931823045015, -0.022186584770679474, -0.04166686534881592, -0.011929438449442387, -0.013991080224514008, 0.016582492738962173, 0.007365741766989231, 0.022148288786411285, 0.003711593570187688, 0.0013866615481674671, -0.016314417123794556, -0.01765480265021324, 0.022186584770679474, 9.848448826232925e-05, -0.004426466301083565, -0.03724997490644455, 0.017399491742253304, 0.026475820690393448, 0.0013595346827059984, -0.0033509659115225077, -0.024624811485409737, 0.006271093152463436, -0.003816909622400999, -0.03543725982308388, 0.010640114545822144, -0.016761211678385735, -0.03308839350938797, -0.00686150137335062, -0.013761299662292004, -0.02765025570988655, 0.002060046186670661, 0.029641686007380486, -0.0031163981184363365, -0.023399315774440765, 0.03344583138823509, -0.0005904082790948451, 0.018624987453222275, 0.01932709477841854, 0.027956629171967506, 0.031071431934833527, -0.011961352080106735, 0.019301563501358032, -0.026169447228312492, -0.005297717172652483, 0.004994534887373447, -0.02790556661784649, -0.017795223742723465, 0.000590807176195085, -0.0218291487544775, -0.027343880385160446, 0.033420298248529434, 0.030841650441288948, 0.01406767312437296, 0.01281664613634348, 0.011859227903187275, -0.005182826891541481, 0.0001794162963051349, 0.0023839727509766817, 0.005067936610430479, -0.004710500594228506, -0.003679679473862052, 0.007199789397418499, 0.03209267929196358, 0.001649951678700745, 0.02479076385498047, 0.01383789349347353, 0.014131501317024231, ...]",0.144178
201,"July 27, 2021 -- Amid a Delta variant surge, CDC releases updated masking guidance recommending that everyone in areas with substantial or high transmission wear a mask indoors.","[-0.004507143050432205, 0.0024336660280823708, -0.0009450975921936333, -0.013833805918693542, 0.034195639193058014, 0.014394808560609818, -0.03095712512731552, -0.00745878042653203, -0.013668055646121502, 0.007981532253324986, 0.019571328535676003, 0.004733456764370203, -0.01056341826915741, 0.0012072705430909991, -0.0027444486040621996, -0.013011427596211433, 0.019890081137418747, -0.015287311747670174, 0.027642112225294113, -0.03144162893295288, -0.015134311281144619, 0.0008606284973211586, -0.025882605463266373, -0.021904589608311653, -0.015070561319589615, 0.0032831383869051933, 0.0143438084051013, -0.03011562116444111, 0.020463833585381508, -0.016779068857431412, 0.025028351694345474, -0.010149041190743446, -0.006505713798105717, -0.003777202917262912, -0.012877551838755608, -0.017034068703651428, -0.01101604476571083, -0.012526925653219223, 0.022937342524528503, 0.004022641107439995, 0.016766317188739777, 0.0020830396097153425, -0.008478784002363682, -0.022057589143514633, 0.0020671021193265915, 0.016536816954612732, -0.006674651987850666, 0.006081774830818176, -0.03360913693904877, 0.01379555556923151, 0.028177613392472267, -0.0009251756127923727, -0.009696414694190025, -0.024288848042488098, 0.006416463293135166, -0.007069903425872326, -0.012297424487769604, 0.008128157816827297, -0.007930532097816467, -0.029580119997262955, -0.011347546242177486, -0.016893818974494934, -0.03289513289928436, -0.00022312589862849563, 0.018933827057480812, 0.004876894876360893, -0.021738838404417038, 0.030064621940255165, -0.003193887881934643, 0.03289513289928436, 0.031059125438332558, 0.027081109583377838, 0.03281863406300545, 0.0025276977103203535, -0.0034839515574276447, -0.003193887881934643, 0.008325783535838127, 0.00022073526633903384, -0.0027141673490405083, -0.013668055646121502, 0.0016830068780109286, -0.013999557122588158, -0.01648581773042679, 0.01283930242061615, 0.02805011346936226, 0.01155154686421156, 0.005316771566867828, 0.01690656878054142, -0.012934927828609943, 0.008791160769760609, 0.011985048651695251, 0.00896966177970171, 0.0008893160847947001, -0.002451197477057576, -0.01076104398816824, 0.03156912699341774, 0.019749829545617104, 0.030651124194264412, 0.01303692813962698, 0.0053358967415988445, ...]",0.147304
93,"July 14, 2020 -- CDC again calls on all people to wear cloth face masks when leaving their homes to prevent the spread of COVID-19, calling masks “a critical tool in the fight against COVID-19.”","[-0.014334137551486492, 0.00615291902795434, -0.014346504583954811, -0.027357300743460655, 0.035742584615945816, -0.0041648149490356445, -0.014878314919769764, 0.0010079656494781375, 0.01188533753156662, -0.0024627135135233402, -0.0014377423794940114, 0.009986898861825466, 0.006081805098801851, 0.005955036263912916, -0.0013952285517007113, 0.007272194139659405, 0.019132796674966812, -0.0010442957282066345, -0.005166596733033657, -0.021730007603764534, -0.02992977760732174, 0.0007992610917426646, -0.01714159920811653, -0.019689342007040977, -0.02898983471095562, 0.010537260212004185, 0.018699927255511284, -0.025329001247882843, 0.02454983815550804, -0.024821927770972252, -0.009683891199529171, -0.013715753331780434, -0.013270516879856586, -0.00993124395608902, -0.022645216435194016, 0.006517765577882528, -0.005280998069792986, -0.01155140995979309, 0.00666617788374424, -0.005936484783887863, 0.036212556064128876, 0.0069444505497813225, -0.0012228540144860744, -0.008589351549744606, -0.0261947400867939, 0.010679489001631737, -0.02349858544766903, -0.01397547498345375, -0.0408628024160862, 0.005667488090693951, 0.028940362855792046, 0.008898543193936348, -0.004396709147840738, -0.009473640471696854, 0.00011430438462411985, -0.0027533541433513165, -0.01351787056773901, 0.01872466318309307, -0.01121129933744669, -0.02918771654367447, -0.008774866349995136, -0.015274080447852612, -0.026417357847094536, -0.0014624777249991894, 0.018229955807328224, -0.00545414537191391, -0.032205428928136826, 0.013926004059612751, -0.0006384812877513468, 0.031834401190280914, 0.03195807710289955, 0.028841422870755196, 0.04454836994409561, -0.0026080338284373283, 0.008255423977971077, 0.0048233941197395325, -0.00568294757977128, 0.007711246609687805, -0.0095292953774333, -0.009652971290051937, 0.00019140912627335638, -0.0019138979259878397, -0.018205219879746437, 0.017215805128216743, 0.027184152975678444, 0.007655591703951359, -0.007717430125921965, 0.023275967687368393, -0.010648570023477077, -0.0023977833334356546, 0.01794549822807312, 0.0012584110954776406, -0.009337595663964748, 0.0032867100089788437, -0.022880202159285545, 0.03178492933511734, 0.007167068775743246, 0.03282381221652031, 0.0045605809427797794, -0.014569123275578022, ...]",0.148118
61,"April 20, 2020 -- As the COVID-19 pandemic grows, shortages of personal protective equipment (PPE) like gowns, eye shields, masks, and even body bags, become dire– particularly in New York","[-0.005619813688099384, -0.023654036223888397, 0.010355240665376186, -0.03516425937414169, 0.014612174592912197, 0.012256011366844177, -0.004524230491369963, 0.004613329190760851, -0.011992014944553375, -0.015839755535125732, 0.007233488839119673, 0.023667234927415848, 0.014823371544480324, 0.017278533428907394, 0.0003415447426959872, -0.008533668704330921, 0.01669774204492569, -0.010929431766271591, 0.005732011515647173, -0.012328609824180603, -0.0070684910751879215, 0.004243734758347273, -0.008091475814580917, 0.022004060447216034, -0.0013282295549288392, 0.004263534210622311, 0.00180507218465209, -0.024855216965079308, 0.006837494671344757, -0.00958965253084898, -0.016460146754980087, -0.015773756429553032, -0.012104213237762451, -0.01490257028490305, 0.01323279645293951, -0.006194004788994789, -0.007121290545910597, -0.011206627823412418, 0.016090551391243935, -0.003943439107388258, 0.022254858165979385, 0.012150413356721401, 0.01071163546293974, 0.014255780726671219, -0.028168367221951485, 0.006230304017663002, -0.019456500187516212, -0.02750837616622448, -0.0478624626994133, 0.01684294082224369, 0.01684294082224369, -0.01580015756189823, -0.020750081166625023, -0.01139802485704422, 0.004903724417090416, 0.004355933051556349, -0.006811094935983419, -0.013542991131544113, -0.016011353582143784, -0.01814972050487995, -0.018664512783288956, -0.04643688350915909, -0.04031217843294144, 0.00719388946890831, -0.0029732543043792248, -0.008447869680821896, -0.025686804205179214, 0.02523801103234291, 0.008012277074158192, 0.01388618629425764, 0.0360354445874691, 0.009754649363458157, 0.011846818029880524, -0.0029270548839122057, 0.01657894439995289, 0.00017015362391248345, -0.006065306719392538, 0.023706834763288498, 0.002496411558240652, 0.006662597414106131, 0.0017423732206225395, -0.036774635314941406, -0.018704112619161606, -0.002116917399689555, 0.0119590163230896, -0.0035870447754859924, -0.014572575688362122, 0.029197949916124344, -0.004316333681344986, 0.0061643049120903015, 0.02771957404911518, -0.011794018559157848, 0.009952646680176258, 0.009114460088312626, -0.03331628814339638, 0.038965798914432526, -0.0037025429774075747, 0.03363308310508728, 0.010124243795871735, -0.03249790146946907, ...]",0.152255
161,"January 30, 2021 -- As part of the Biden Administration’s Executive Order on Promoting COVID-19 Safety in Domestic and International Travel, CDC requires face masks to be worn by all travelers while on public transportation and inside transportation hubs to prevent the spread of COVID-19 effective February 2, 2021.","[-0.015050387009978294, -0.013441064395010471, -0.007383572869002819, -0.01060221903026104, 0.01618335023522377, -0.015217756852507591, -0.02757735550403595, -0.011670809239149094, 0.009462818503379822, 0.003836625488474965, 0.02331586927175522, 0.007351386360824108, -0.01570698991417885, -0.0013220587279647589, -0.006694782990962267, -0.011065703816711903, 0.015964481979608536, -0.012353162281215191, 0.015050387009978294, -0.003226692322641611, -0.0005564233870245516, -0.020161595195531845, -0.0342978872358799, -0.01932474784553051, -0.018281906843185425, 0.008407101966440678, 0.009366258978843689, -0.03138823062181473, 0.02610965259373188, -0.021217312663793564, 0.016376469284296036, -0.0076475017704069614, -0.006318201310932636, -0.013099887408316135, -0.011188012547791004, -0.02535005286335945, -0.009005770087242126, -0.02384372614324093, 0.020843949168920517, 0.003701442386955023, 0.027294114232063293, 0.009623750112950802, -0.001823362777940929, 0.01045416109263897, -0.017084570601582527, 0.013035514391958714, -0.0013140121009200811, -0.017792673781514168, -0.021899664774537086, 0.014458156190812588, 0.031001994386315346, 0.0012327412841841578, -0.01902863197028637, -0.0277833491563797, 0.004869810771197081, -0.0057903435081243515, -0.02404971979558468, 0.010801774449646473, -0.005297890864312649, -0.031001994386315346, -0.015192007645964622, -0.018925637006759644, -0.034993115812540054, 0.009353384375572205, 0.016981573775410652, -0.006804216653108597, -0.020316090434789658, 0.030847499147057533, -0.008716092444956303, 0.03846925124526024, 0.01897713541984558, 0.010917645879089832, 0.028118088841438293, -0.017895668745040894, 0.030590007081627846, -0.029405545443296432, -0.017367811873555183, -0.0007676469977013767, -0.01199911069124937, -0.02567191794514656, 0.007724749390035868, -0.02196403779089451, -0.03424638882279396, 0.0284013282507658, 0.02141042985022068, 0.0165695883333683, 0.0008159266435541213, 0.01474139653146267, -0.01765105314552784, -0.00715826777741313, 0.010917645879089832, 0.02420421503484249, -0.006669033784419298, 0.021770918741822243, -0.005207768641412258, 0.030590007081627846, -0.002066370565444231, 0.037902772426605225, 0.021358931437134743, -0.001085488242097199, ...]",0.153563
247,"January 14, 2022 -- CDC updates guidelines on masks to emphasize fit, comfort, and consistent wear.","[-0.017453836277127266, 0.00806998461484909, -0.011373976245522499, -0.02336624450981617, 0.028209522366523743, 0.007715755142271519, -0.006736794486641884, -0.004096177406609058, -0.0029336614534258842, -0.014452549628913403, 0.025040781125426292, 0.012758690863847733, 0.0005514703807421029, -0.0009829859482124448, -0.001698689884506166, -0.015122365206480026, 0.03217688947916031, 0.005529195070266724, 0.016036920249462128, -0.020017167553305626, -0.0014837371418252587, -0.004527692683041096, -0.040369242429733276, -0.013950188644230366, -0.02171746827661991, 0.002260626060888171, 0.014220690354704857, -0.021305274218320847, 0.031687408685684204, -0.029136959463357925, 0.019257185980677605, 0.006607984192669392, -0.0016906391829252243, -0.017312144860625267, -0.009937738068401814, -0.020996129140257835, -0.017428074032068253, -0.014091880060732365, 0.029033910483121872, -0.00019693933427333832, 0.011554311029613018, -0.0033812783658504486, 0.003970586694777012, -0.0038031330332159996, -0.01606268249452114, 0.012005148455500603, -0.00025097941397689283, -0.002663159277290106, -0.03972519189119339, -0.0030238288454711437, 0.01945040188729763, -0.005145983770489693, -0.013834258541464806, -0.018729062750935555, 0.009016741998493671, -0.01687418855726719, -0.012713606469333172, 0.013409184291958809, -0.012835976667702198, -0.030167443677783012, -0.007065261714160442, -0.029033910483121872, -0.024847565218806267, 0.001006332808174193, 0.02109917625784874, -0.015856586396694183, -0.03655644878745079, 0.05404892936348915, 0.0010184088023379445, 0.031069118529558182, 0.025040781125426292, 0.017685696482658386, 0.024821802973747253, 0.008797764778137207, 0.023404886946082115, -0.007921852171421051, 0.0013500961940735579, 0.0005623387987725437, -0.005377842579036951, 0.005384283140301704, 0.008198794908821583, 0.006936450954526663, -0.014336620457470417, 0.015096602961421013, 0.00923572015017271, 0.006247314158827066, 0.00014028280565980822, 0.0252468790858984, 0.004038212355226278, 0.008050662465393543, 0.022361520677804947, 0.006459851749241352, 0.0033200932666659355, 0.0019273286452516913, -0.029523391276597977, 0.03271789476275444, 0.024100463837385178, 0.029961347579956055, 0.022812359035015106, -0.007632028311491013, ...]",0.154947
296,"May 31, 2022 -- The U.S. Department of Justice (DOJ) asks a federal appeals court to overturn the order that declared CDC’s mandate requiring individuals to wear masks on public transportation unlawful.","[-0.003928762394934893, -0.004284143913537264, -0.023018307983875275, -0.0027876279782503843, 0.02188369445502758, 0.020097004249691963, 0.00014641160669270903, -0.02337042987346649, -0.007811879273504019, -0.0027941488660871983, 0.02710030786693096, 0.0011867795838043094, 0.00521987397223711, -0.019953547045588493, -0.00521987397223711, -0.008431351743638515, 0.030386775732040405, -0.01541509386152029, 0.02039695903658867, -0.00957900658249855, -0.00035151009797118604, -0.015323802828788757, -0.04003750905394554, -0.019144972786307335, -0.025548366829752922, 0.016249751672148705, 0.01297632697969675, -0.024909330531954765, 0.012695933692157269, -0.016967035830020905, 0.028117548674345016, -0.008248770609498024, -0.009089949540793896, -0.021870654076337814, -0.0221966914832592, -0.018075566738843918, -0.01332192774862051, -0.01781473681330681, 0.03427315130829811, -0.009859399870038033, 0.038420360535383224, 0.0029457565397024155, 0.0027240505442023277, -0.02193586155772209, -0.02676122821867466, 0.021088160574436188, 0.01525859534740448, -0.0035440369974821806, -0.025691822171211243, 0.013889234513044357, 0.031508348882198334, 0.00676855631172657, -0.026995975524187088, -0.048514507710933685, 0.02047520875930786, -0.006644662003964186, -0.01930147036910057, -0.005523089785128832, -0.0299172792583704, -0.020436083897948265, 0.004036354832351208, 0.0005734199658036232, -0.01585850678384304, 0.00763581832870841, 0.022900935262441635, -0.02487020567059517, -0.019027598202228546, 0.02068387344479561, 0.008679141290485859, 0.00972246378660202, 0.021727196872234344, 0.02989119663834572, 0.022209733724594116, -0.011906920932233334, 0.028873957693576813, -0.020370876416563988, -0.023305222392082214, 0.011952565982937813, 0.0010840775212273002, -0.014006608165800571, 0.00330602889880538, -0.010804911144077778, -0.023683426901698112, 0.015362927690148354, 0.02661777101457119, 0.013354531489312649, 0.0006308842566795647, 0.02884787507355213, 0.0021437022369354963, 0.0015323803527280688, -0.005748056340962648, 0.005503527354449034, -0.0020132868085056543, 0.010028939694166183, 0.0029799905605614185, 0.030647605657577515, -0.019627509638667107, 0.03390799090266228, 0.015936754643917084, -0.009807233698666096, ...]",0.162032
263,"March 10, 2022 -- On CDC’s recommendation, the Transportation Security Administration (TSA) extends the mask requirement for all public transportation and transportation hubs through April 18, 2022.","[-0.019024644047021866, -0.028367798775434494, -0.0071374946273863316, -0.02917459048330784, 0.016734398901462555, -0.005087986122816801, -0.019415026530623436, -0.01299973949790001, -0.00825659092515707, 0.004417829215526581, 0.028497926890850067, 0.004216131754219532, -0.004560969769954681, -0.007534383330494165, -0.006675541866570711, -0.02208263985812664, 0.018400032073259354, -0.0013891112757846713, 0.019506115466356277, -0.018842464312911034, 0.0039005724247545004, -0.022199755534529686, -0.03859582170844078, -0.017085744068026543, -0.013533261604607105, 0.015185881406068802, 0.02437288500368595, -0.03063201904296875, 0.018087726086378098, -0.015914594754576683, 0.014366078190505505, -0.011301575228571892, -0.022121679037809372, -0.010130427777767181, -0.015732416883111, -0.03201137110590935, -0.007384736556559801, -0.015146843157708645, 0.024047566577792168, 0.005182328633964062, -0.0018608234822750092, 0.01104131992906332, 0.02124982327222824, 0.003057996742427349, -0.006428299471735954, 0.023240774869918823, -0.01929791085422039, -0.013702427968382835, -0.03274008259177208, 0.021757321432232857, 0.02642889879643917, 0.0026497216895222664, -0.03799723461270332, -0.023292826488614082, -0.00991571694612503, 0.004388550762087107, -0.027821263298392296, 0.01997457444667816, -0.012928169220685959, -0.021080657839775085, 0.005436077248305082, -0.021119697019457817, -0.03529058396816254, 0.014327039942145348, 0.012199454940855503, -0.014821524731814861, -0.029252666980028152, 0.03463994711637497, 0.00307100941427052, 0.040729913860559464, 0.01700766757130623, 0.030658043920993805, 0.03214149922132492, -0.01895957998931408, 0.02325378730893135, -0.022980520501732826, 0.00166725879535079, 0.015003702603280544, -0.013819542713463306, 0.0013720319839194417, -0.00038936594501137733, -0.017111768946051598, -0.022551098838448524, 0.020495085045695305, 0.01958419196307659, 0.01060539297759533, 0.002552126068621874, 0.014366078190505505, -0.01823086477816105, 0.01651318185031414, 0.02082040347158909, 0.00414130836725235, -0.007235090248286724, 0.020156752318143845, -0.0019340203143656254, 0.0264419112354517, -0.007215571124106646, 0.04557065665721893, 0.014899601228535175, -0.011737502180039883, ...]",0.164449


In [12]:
# Testing related text #2
get_rows_sorted_by_relevance("What age people need to wear a mask?", df)

Unnamed: 0,text,embeddings,distances
304,"July 8, 2022 -- New York Department of Health recommends that all people should wear N95, KN95, or KF94 masks in all public indoor settings and when in crowded outdoor areas due to high community transmission of COVID-19.","[0.010357880964875221, -0.002179683418944478, -0.013266251422464848, -0.013967831619083881, 0.02208065427839756, -0.001712494413368404, -0.020486153662204742, 0.009241729974746704, -0.01662108302116394, 0.007213524077087641, 0.0043115317821502686, 0.0006605222006328404, 0.011435763910412788, 0.005899655167013407, -0.009936932474374771, -0.016939982771873474, 0.018700312823057175, -0.01417192816734314, 0.024172641336917877, -0.027910152450203896, -0.022093411535024643, 0.029058191925287247, -0.02464461326599121, -0.027527472004294395, -0.008648575283586979, 0.022986331954598427, 0.010976547375321388, -0.01738644205033779, 0.012545536272227764, -0.03094608150422573, 0.024517053738236427, -0.0023885630071163177, -0.0050386241637170315, -0.01600879430770874, -0.011238045990467072, 0.002594253746792674, -0.006387572269886732, -0.007934238761663437, 0.018343143165111542, 0.012079942040145397, 0.018241096287965775, -0.01696549542248249, 0.0011887007858604193, -0.004722913261502981, 0.00047874904703348875, 0.011301825754344463, 0.0008096080855466425, -0.013189715333282948, -0.03632912039756775, 0.0016112435841932893, 0.02162143774330616, -0.005877331830561161, -0.00898023135960102, -0.013087667524814606, -0.007691874634474516, -5.770101779489778e-05, -0.022029630839824677, -0.00337077584117651, -0.027935663238167763, -0.037298575043678284, -0.013049399480223656, -0.02464461326599121, -0.043651070445775986, 0.018725823611021042, 0.016557302325963974, -0.007685496471822262, -0.030665449798107147, 0.029134728014469147, -0.016365962103009224, 0.025091072544455528, 0.03217066079378128, 0.01765431836247444, 0.03426264598965645, 0.010842609219253063, 0.01440153643488884, -0.004927009344100952, 0.013929563574492931, 0.004742046818137169, -0.012213880196213722, -0.022323019802570343, 0.004199916496872902, -0.02437673695385456, -0.02571611851453781, 0.014146416448056698, 0.02288428321480751, 0.012558292597532272, 0.0021127143409103155, 0.018292119726538658, 0.01115513127297163, -0.0051151602528989315, 0.01585572212934494, 0.009962444193661213, 0.011824822053313255, -0.011665372177958488, -0.03599746152758598, 0.03888032212853432, 0.012092698365449905, 0.013470347970724106, 0.017794635146856308, -0.013266251422464848, ...]",0.143186
51,"April 3, 2020 -- At a White House press briefing, CDC announces new mask wearing guidelines and recommends that all people wear a mask when outside of the home.","[-0.009821628220379353, -0.0032139751128852367, -0.02507568523287773, -0.02882876805961132, 0.03971907123923302, 0.005292483605444431, -0.006882773246616125, 0.0007351113599725068, 0.00615442031994462, -0.0078115020878612995, 0.010610411874949932, 0.002549234079197049, -0.0007685074233449996, 0.007989614270627499, -0.008593924343585968, -0.002911820076406002, 0.0203557051718235, -0.010584967210888863, 0.025304686278104782, -0.01441438402980566, -0.0010965046240016818, -0.007754251826554537, -0.02323094941675663, -0.008682981133460999, -0.027047645300626755, -0.008015058934688568, 0.007315332069993019, -0.02100454457104206, 0.030253667384386063, -0.03943917900323868, 0.009446320123970509, -0.01819291152060032, 0.007875113748013973, -0.00027015042724087834, -0.011144748888909817, -0.007518888916820288, -0.006020836066454649, -0.018103856593370438, 0.01768402010202408, -0.000795144762378186, 0.008962871506810188, 0.0014503441052511334, 0.008460340090095997, -0.022251330316066742, -0.02461768127977848, 0.004242892377078533, -0.006408866960555315, -0.01572478376328945, -0.03765805438160896, -0.003045404562726617, 0.032467350363731384, 0.01483422052115202, -0.009910684078931808, -0.018943529576063156, -0.0036194990389049053, -0.004099766258150339, -0.002064195927232504, 0.02286200225353241, -0.01772218570113182, -0.027912762016057968, -0.011749058961868286, -0.021258991211652756, -0.033154357224702835, -0.004287420772016048, 0.024032454937696457, -0.00507938489317894, -0.03086433932185173, 0.0273020900785923, 0.001156140468083322, 0.041449304670095444, 0.02596624754369259, 0.00863209180533886, 0.03432480990886688, 0.004157016985118389, 0.01790029928088188, 0.008104115724563599, -0.015368558466434479, 0.006717382930219173, -0.016551733016967773, -0.0058999741449952126, -0.004382838029414415, -0.0016030118567869067, -0.02623341605067253, 0.022989224642515182, 0.02124626748263836, 0.014592496678233147, 0.00198468123562634, 0.026640530675649643, -0.0070990524254739285, -0.002981792902573943, 0.028955992311239243, 0.0024824419524520636, -0.0014988478505983949, -0.004885369446128607, -0.008116837590932846, 0.0333070233464241, 0.0018447358161211014, 0.04809035733342171, 0.011628197506070137, -0.013829157687723637, ...]",0.143795
93,"July 14, 2020 -- CDC again calls on all people to wear cloth face masks when leaving their homes to prevent the spread of COVID-19, calling masks “a critical tool in the fight against COVID-19.”","[-0.014334137551486492, 0.00615291902795434, -0.014346504583954811, -0.027357300743460655, 0.035742584615945816, -0.0041648149490356445, -0.014878314919769764, 0.0010079656494781375, 0.01188533753156662, -0.0024627135135233402, -0.0014377423794940114, 0.009986898861825466, 0.006081805098801851, 0.005955036263912916, -0.0013952285517007113, 0.007272194139659405, 0.019132796674966812, -0.0010442957282066345, -0.005166596733033657, -0.021730007603764534, -0.02992977760732174, 0.0007992610917426646, -0.01714159920811653, -0.019689342007040977, -0.02898983471095562, 0.010537260212004185, 0.018699927255511284, -0.025329001247882843, 0.02454983815550804, -0.024821927770972252, -0.009683891199529171, -0.013715753331780434, -0.013270516879856586, -0.00993124395608902, -0.022645216435194016, 0.006517765577882528, -0.005280998069792986, -0.01155140995979309, 0.00666617788374424, -0.005936484783887863, 0.036212556064128876, 0.0069444505497813225, -0.0012228540144860744, -0.008589351549744606, -0.0261947400867939, 0.010679489001631737, -0.02349858544766903, -0.01397547498345375, -0.0408628024160862, 0.005667488090693951, 0.028940362855792046, 0.008898543193936348, -0.004396709147840738, -0.009473640471696854, 0.00011430438462411985, -0.0027533541433513165, -0.01351787056773901, 0.01872466318309307, -0.01121129933744669, -0.02918771654367447, -0.008774866349995136, -0.015274080447852612, -0.026417357847094536, -0.0014624777249991894, 0.018229955807328224, -0.00545414537191391, -0.032205428928136826, 0.013926004059612751, -0.0006384812877513468, 0.031834401190280914, 0.03195807710289955, 0.028841422870755196, 0.04454836994409561, -0.0026080338284373283, 0.008255423977971077, 0.0048233941197395325, -0.00568294757977128, 0.007711246609687805, -0.0095292953774333, -0.009652971290051937, 0.00019140912627335638, -0.0019138979259878397, -0.018205219879746437, 0.017215805128216743, 0.027184152975678444, 0.007655591703951359, -0.007717430125921965, 0.023275967687368393, -0.010648570023477077, -0.0023977833334356546, 0.01794549822807312, 0.0012584110954776406, -0.009337595663964748, 0.0032867100089788437, -0.022880202159285545, 0.03178492933511734, 0.007167068775743246, 0.03282381221652031, 0.0045605809427797794, -0.014569123275578022, ...]",0.14721
201,"July 27, 2021 -- Amid a Delta variant surge, CDC releases updated masking guidance recommending that everyone in areas with substantial or high transmission wear a mask indoors.","[-0.004507143050432205, 0.0024336660280823708, -0.0009450975921936333, -0.013833805918693542, 0.034195639193058014, 0.014394808560609818, -0.03095712512731552, -0.00745878042653203, -0.013668055646121502, 0.007981532253324986, 0.019571328535676003, 0.004733456764370203, -0.01056341826915741, 0.0012072705430909991, -0.0027444486040621996, -0.013011427596211433, 0.019890081137418747, -0.015287311747670174, 0.027642112225294113, -0.03144162893295288, -0.015134311281144619, 0.0008606284973211586, -0.025882605463266373, -0.021904589608311653, -0.015070561319589615, 0.0032831383869051933, 0.0143438084051013, -0.03011562116444111, 0.020463833585381508, -0.016779068857431412, 0.025028351694345474, -0.010149041190743446, -0.006505713798105717, -0.003777202917262912, -0.012877551838755608, -0.017034068703651428, -0.01101604476571083, -0.012526925653219223, 0.022937342524528503, 0.004022641107439995, 0.016766317188739777, 0.0020830396097153425, -0.008478784002363682, -0.022057589143514633, 0.0020671021193265915, 0.016536816954612732, -0.006674651987850666, 0.006081774830818176, -0.03360913693904877, 0.01379555556923151, 0.028177613392472267, -0.0009251756127923727, -0.009696414694190025, -0.024288848042488098, 0.006416463293135166, -0.007069903425872326, -0.012297424487769604, 0.008128157816827297, -0.007930532097816467, -0.029580119997262955, -0.011347546242177486, -0.016893818974494934, -0.03289513289928436, -0.00022312589862849563, 0.018933827057480812, 0.004876894876360893, -0.021738838404417038, 0.030064621940255165, -0.003193887881934643, 0.03289513289928436, 0.031059125438332558, 0.027081109583377838, 0.03281863406300545, 0.0025276977103203535, -0.0034839515574276447, -0.003193887881934643, 0.008325783535838127, 0.00022073526633903384, -0.0027141673490405083, -0.013668055646121502, 0.0016830068780109286, -0.013999557122588158, -0.01648581773042679, 0.01283930242061615, 0.02805011346936226, 0.01155154686421156, 0.005316771566867828, 0.01690656878054142, -0.012934927828609943, 0.008791160769760609, 0.011985048651695251, 0.00896966177970171, 0.0008893160847947001, -0.002451197477057576, -0.01076104398816824, 0.03156912699341774, 0.019749829545617104, 0.030651124194264412, 0.01303692813962698, 0.0053358967415988445, ...]",0.151218
286,"May 3, 2022 -- CDC recommends that everyone continue to wear a mask while in indoor transportation hubs to prevent the spread of COVID-19 – but this is no longer legally enforceable.","[-0.009401852265000343, -0.0117507204413414, -0.003465855959802866, -0.016761211678385735, 0.015982510522007942, 0.0008425285341218114, -0.024880122393369675, -0.018586689606308937, -0.004372212570160627, -0.0020887686405330896, 0.028007691726088524, 0.0009829499758780003, -0.006944477558135986, -0.007608287967741489, -0.003216927172616124, 0.0022036589216440916, 0.024586515501141548, -0.015382528305053711, 0.026986444368958473, -0.010231615975499153, 0.00020315230358392, -0.011150737293064594, -0.04891771823167801, -0.024037593975663185, -0.02431843802332878, 0.01372300274670124, 0.020565355196595192, -0.020067498087882996, 0.0239354707300663, -0.029590623453259468, 0.029079999774694443, -0.010518841445446014, -0.003218522761017084, -0.006701931823045015, -0.022186584770679474, -0.04166686534881592, -0.011929438449442387, -0.013991080224514008, 0.016582492738962173, 0.007365741766989231, 0.022148288786411285, 0.003711593570187688, 0.0013866615481674671, -0.016314417123794556, -0.01765480265021324, 0.022186584770679474, 9.848448826232925e-05, -0.004426466301083565, -0.03724997490644455, 0.017399491742253304, 0.026475820690393448, 0.0013595346827059984, -0.0033509659115225077, -0.024624811485409737, 0.006271093152463436, -0.003816909622400999, -0.03543725982308388, 0.010640114545822144, -0.016761211678385735, -0.03308839350938797, -0.00686150137335062, -0.013761299662292004, -0.02765025570988655, 0.002060046186670661, 0.029641686007380486, -0.0031163981184363365, -0.023399315774440765, 0.03344583138823509, -0.0005904082790948451, 0.018624987453222275, 0.01932709477841854, 0.027956629171967506, 0.031071431934833527, -0.011961352080106735, 0.019301563501358032, -0.026169447228312492, -0.005297717172652483, 0.004994534887373447, -0.02790556661784649, -0.017795223742723465, 0.000590807176195085, -0.0218291487544775, -0.027343880385160446, 0.033420298248529434, 0.030841650441288948, 0.01406767312437296, 0.01281664613634348, 0.011859227903187275, -0.005182826891541481, 0.0001794162963051349, 0.0023839727509766817, 0.005067936610430479, -0.004710500594228506, -0.003679679473862052, 0.007199789397418499, 0.03209267929196358, 0.001649951678700745, 0.02479076385498047, 0.01383789349347353, 0.014131501317024231, ...]",0.156604
247,"January 14, 2022 -- CDC updates guidelines on masks to emphasize fit, comfort, and consistent wear.","[-0.017453836277127266, 0.00806998461484909, -0.011373976245522499, -0.02336624450981617, 0.028209522366523743, 0.007715755142271519, -0.006736794486641884, -0.004096177406609058, -0.0029336614534258842, -0.014452549628913403, 0.025040781125426292, 0.012758690863847733, 0.0005514703807421029, -0.0009829859482124448, -0.001698689884506166, -0.015122365206480026, 0.03217688947916031, 0.005529195070266724, 0.016036920249462128, -0.020017167553305626, -0.0014837371418252587, -0.004527692683041096, -0.040369242429733276, -0.013950188644230366, -0.02171746827661991, 0.002260626060888171, 0.014220690354704857, -0.021305274218320847, 0.031687408685684204, -0.029136959463357925, 0.019257185980677605, 0.006607984192669392, -0.0016906391829252243, -0.017312144860625267, -0.009937738068401814, -0.020996129140257835, -0.017428074032068253, -0.014091880060732365, 0.029033910483121872, -0.00019693933427333832, 0.011554311029613018, -0.0033812783658504486, 0.003970586694777012, -0.0038031330332159996, -0.01606268249452114, 0.012005148455500603, -0.00025097941397689283, -0.002663159277290106, -0.03972519189119339, -0.0030238288454711437, 0.01945040188729763, -0.005145983770489693, -0.013834258541464806, -0.018729062750935555, 0.009016741998493671, -0.01687418855726719, -0.012713606469333172, 0.013409184291958809, -0.012835976667702198, -0.030167443677783012, -0.007065261714160442, -0.029033910483121872, -0.024847565218806267, 0.001006332808174193, 0.02109917625784874, -0.015856586396694183, -0.03655644878745079, 0.05404892936348915, 0.0010184088023379445, 0.031069118529558182, 0.025040781125426292, 0.017685696482658386, 0.024821802973747253, 0.008797764778137207, 0.023404886946082115, -0.007921852171421051, 0.0013500961940735579, 0.0005623387987725437, -0.005377842579036951, 0.005384283140301704, 0.008198794908821583, 0.006936450954526663, -0.014336620457470417, 0.015096602961421013, 0.00923572015017271, 0.006247314158827066, 0.00014028280565980822, 0.0252468790858984, 0.004038212355226278, 0.008050662465393543, 0.022361520677804947, 0.006459851749241352, 0.0033200932666659355, 0.0019273286452516913, -0.029523391276597977, 0.03271789476275444, 0.024100463837385178, 0.029961347579956055, 0.022812359035015106, -0.007632028311491013, ...]",0.158555
240,"December 27, 2021 -- CDC shortens the recommended isolation period for people with COVID-19 to 5 days, followed by 5 days of wearing a mask around others if they are asymptomatic or if their symptoms are resolving (resolving is defined as without a fever for 24 hours).","[-0.0010809698142111301, 0.0178608987480402, 0.021124688908457756, -0.036312878131866455, 0.004307817202061415, -0.01029250305145979, -0.012823867611587048, 0.000991825945675373, -0.01363339088857174, 0.017501110211014748, 0.006501880940049887, 0.04014204815030098, -0.0004850711557082832, 0.025454992428421974, -0.0021346344146877527, -0.0023900195956230164, 0.022345397621393204, 0.0027514134999364614, 0.016999976709485054, -0.03289489075541496, -0.03240660950541496, -0.0044138263911008835, -0.04427960887551308, -0.00860278494656086, -0.014442913234233856, -0.005917225498706102, -0.003806684399023652, -0.00936733465641737, 0.011018503457307816, -0.02847464010119438, 0.02726678177714348, -0.03333177790045738, 0.007594094146043062, -0.004217870533466339, -0.0140317277982831, 0.007851085625588894, -0.001232755370438099, -0.007831810973584652, 0.004262844100594521, -0.0036942504812031984, 0.011731654405593872, 0.010504521429538727, -0.018593324348330498, 0.01205289363861084, 0.0036685513332486153, 0.0028654534835368395, -0.0012006314937025309, -0.03081326186656952, -0.024452727288007736, 0.006498668808490038, 0.01644744537770748, -0.019030209630727768, -0.011885849758982658, 0.019762635231018066, -0.012117141857743263, 0.0006545248324982822, -0.002483178861439228, -3.2098821975523606e-05, -0.01536808256059885, -0.025262249633669853, 0.0012680916115641594, -0.017192721366882324, -0.01676868461072445, 0.008731280453503132, 0.019454244524240494, -0.012984487228095531, -0.027960658073425293, 0.02541644312441349, -0.0024558736477047205, 0.04363713040947914, 0.027883561328053474, 0.02070065215229988, 0.012997337616980076, -0.0024285682011395693, 0.019402846693992615, -0.014532860368490219, -0.00175717833917588, 0.0025988249108195305, -0.0051623135805130005, 0.006970890332013369, 0.014378665946424007, -0.013671939261257648, -0.04237787425518036, 0.02031516656279564, 0.026418710127472878, 0.013877532444894314, -0.015702171251177788, 0.009444432333111763, -0.015046843327581882, -0.000595898658502847, 0.01036317553371191, 0.004680454730987549, 0.005194437690079212, 0.019479943439364433, -0.004995269235223532, 0.03533630818128586, -0.0013002156047150493, 0.027575170621275902, -0.015547975897789001, -0.01423732005059719, ...]",0.16813
175,"March 19, 2021 -- CDC updates its guidance on social distancing in K-12 schools: most elementary students can safely socially distance from at least 3 feet instead of 6 feet inside the classroom with universal masking, but middle and high school students should still maintain at least 6 feet apart in communities where the transmission level is high.","[0.012880055233836174, 0.0274911280721426, -0.006571456789970398, -0.03326118737459183, 0.021156884729862213, 0.010039903223514557, -0.023657243698835373, -0.013809675350785255, -0.005782881751656532, 0.006712502799928188, 0.017579447478055954, 0.021118417382240295, -0.021272286772727966, 0.003561408957466483, -0.004481412936002016, 0.007340798154473305, 0.009879624471068382, -0.011937611736357212, 0.020605523139238358, -0.008238363079726696, -0.007610067259520292, 0.0064656720496714115, -0.010815655812621117, -0.024977946653962135, -0.019592557102441788, 0.000976101728156209, 0.029158033430576324, -0.0016877424204722047, 0.013283959589898586, -0.04374987259507179, 0.040826376527547836, -0.020618345588445663, -0.0069304825738072395, -0.012642841786146164, -0.012290227226912975, 0.013425005599856377, -0.012963400222361088, -0.024375295266509056, 0.029081100597977638, 0.0010161716490983963, 0.007366442587226629, 0.001316695474088192, -0.01784871704876423, -0.0020002874080091715, 0.014271280728280544, 0.013809675350785255, -0.013360893353819847, -0.014309748075902462, -0.030491558834910393, 0.0028513711877167225, 0.021144062280654907, 0.013732741586863995, -0.01548940408974886, -0.004971867892891169, -0.011315727606415749, -0.02011827379465103, -0.00274077826179564, -0.0012942564208060503, -0.01495086494833231, -0.03515889495611191, -0.02015674114227295, -0.019605379551649094, -0.02964528277516365, 0.004439740441739559, -0.006440027616918087, 0.00043836425174959004, -0.006956127472221851, 0.01625874638557434, -0.011232382617890835, 0.028491271659731865, 0.021939048543572426, 0.026119135320186615, 0.028516916558146477, 0.003085379023104906, 0.01670752838253975, 0.009219272993505001, 0.022169850766658783, 0.01414305716753006, -0.007340798154473305, -0.01216200366616249, -0.011084925383329391, -0.025349793955683708, -0.013976366259157658, -0.009091049432754517, 0.010148894041776657, 0.003824267303571105, 0.01718195527791977, 0.01225817110389471, 0.002373738447204232, 0.015425292775034904, 8.509836334269494e-05, 0.013873787596821785, 0.013950721360743046, 0.009982203133404255, -0.02123381942510605, 0.030286401510238647, 0.003574231406673789, 0.02537543885409832, 0.015745852142572403, -0.007629300933331251, ...]",0.168998
161,"January 30, 2021 -- As part of the Biden Administration’s Executive Order on Promoting COVID-19 Safety in Domestic and International Travel, CDC requires face masks to be worn by all travelers while on public transportation and inside transportation hubs to prevent the spread of COVID-19 effective February 2, 2021.","[-0.015050387009978294, -0.013441064395010471, -0.007383572869002819, -0.01060221903026104, 0.01618335023522377, -0.015217756852507591, -0.02757735550403595, -0.011670809239149094, 0.009462818503379822, 0.003836625488474965, 0.02331586927175522, 0.007351386360824108, -0.01570698991417885, -0.0013220587279647589, -0.006694782990962267, -0.011065703816711903, 0.015964481979608536, -0.012353162281215191, 0.015050387009978294, -0.003226692322641611, -0.0005564233870245516, -0.020161595195531845, -0.0342978872358799, -0.01932474784553051, -0.018281906843185425, 0.008407101966440678, 0.009366258978843689, -0.03138823062181473, 0.02610965259373188, -0.021217312663793564, 0.016376469284296036, -0.0076475017704069614, -0.006318201310932636, -0.013099887408316135, -0.011188012547791004, -0.02535005286335945, -0.009005770087242126, -0.02384372614324093, 0.020843949168920517, 0.003701442386955023, 0.027294114232063293, 0.009623750112950802, -0.001823362777940929, 0.01045416109263897, -0.017084570601582527, 0.013035514391958714, -0.0013140121009200811, -0.017792673781514168, -0.021899664774537086, 0.014458156190812588, 0.031001994386315346, 0.0012327412841841578, -0.01902863197028637, -0.0277833491563797, 0.004869810771197081, -0.0057903435081243515, -0.02404971979558468, 0.010801774449646473, -0.005297890864312649, -0.031001994386315346, -0.015192007645964622, -0.018925637006759644, -0.034993115812540054, 0.009353384375572205, 0.016981573775410652, -0.006804216653108597, -0.020316090434789658, 0.030847499147057533, -0.008716092444956303, 0.03846925124526024, 0.01897713541984558, 0.010917645879089832, 0.028118088841438293, -0.017895668745040894, 0.030590007081627846, -0.029405545443296432, -0.017367811873555183, -0.0007676469977013767, -0.01199911069124937, -0.02567191794514656, 0.007724749390035868, -0.02196403779089451, -0.03424638882279396, 0.0284013282507658, 0.02141042985022068, 0.0165695883333683, 0.0008159266435541213, 0.01474139653146267, -0.01765105314552784, -0.00715826777741313, 0.010917645879089832, 0.02420421503484249, -0.006669033784419298, 0.021770918741822243, -0.005207768641412258, 0.030590007081627846, -0.002066370565444231, 0.037902772426605225, 0.021358931437134743, -0.001085488242097199, ...]",0.171808
136,"December 3, 2020 -- ACIP recommends that healthcare professionals and older people living in long-term care facilities be offered a vaccine first in the initial phases of the COVID-19 vaccination program. CDC also notes that people ages 70 years and older who live in multi-generational households should be given priority as soon as more vaccine doses are available","[-0.015077455900609493, -0.02451351471245289, -0.009689036756753922, -0.04280378296971321, 0.0022230392787605524, 0.031622182577848434, -0.007171910721808672, 0.020579716190695763, -0.017189817503094673, 0.01355959102511406, 0.02200903743505478, 0.015596060082316399, -0.020111707970499992, -0.0034752776846289635, -0.008860534988343716, 0.0021503083407878876, 0.0307114627212286, -0.006406654138118029, 0.01798669621348381, 0.003974908031523228, -0.017695773392915726, 0.034910887479782104, 0.011029817163944244, -0.014482959173619747, -0.005771047901362181, 0.010574457235634327, -0.0007407495868392289, -0.025044767186045647, -0.00018765787535812706, -0.016139961779117584, 0.006378193851560354, -0.014331172220408916, -0.01230735331773758, 0.0044365921057760715, 0.0007265196181833744, -0.004594702739268541, 0.009790227748453617, -0.0356445237994194, 0.0033709246199578047, -0.0038579062093049288, 0.027448052540421486, 0.000154256951645948, 0.003282382385805249, -0.007058070972561836, 0.01191523764282465, 0.017645176500082016, -0.03394957259297371, -0.013913759961724281, -0.030433187261223793, 0.017012733966112137, 0.0004905390669591725, 0.008828912861645222, 0.005138604436069727, -0.017708420753479004, 0.017796963453292847, -0.029598360881209373, -0.007671541068702936, 0.0199093259871006, -0.0025819512084126472, -0.007652567699551582, 0.00044310581870377064, -0.012610926292836666, -0.016076717525720596, -0.005167064256966114, 0.0009510371019132435, 0.017910802736878395, -0.0015313040930777788, 0.03066086769104004, 0.01850530132651329, 0.01860649138689041, 0.039236802607774734, 0.017075978219509125, 0.017607230693101883, 0.033494215458631516, 0.028384070843458176, 0.010226613841950893, -0.010302506387233734, 0.0019589941948652267, 0.01191523764282465, 0.006141027435660362, 0.0008126900647766888, -0.012952445074915886, -0.00886685959994793, -0.00039369615842588246, -0.0019336964469403028, 0.02835877239704132, 0.0045124851167202, 0.00870874896645546, -0.0007901592762209475, 0.0036681729834526777, 0.007608296815305948, 0.005717290565371513, 0.015431624837219715, 0.04098234698176384, -0.012990391813218594, 0.013319263234734535, 0.023653391748666763, 0.011137331835925579, -0.015115402638912201, -0.02257823757827282, ...]",0.172726


In [13]:
import tiktoken

def create_prompt(question, df, max_token_count):
    tokenizer = tiktoken.get_encoding("cl100k_base")
    prompt_template = get_prompt_template()
    current_token_count = len(tokenizer.encode(prompt_template)) + len(tokenizer.encode(question))
    context = format_context(question, df, current_token_count, max_token_count, tokenizer)
    return prompt_template.format(context, question)

def get_prompt_template():

    return """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

{}



Question: {}
Answer:"""

def format_context(question, df, current_token_count, max_token_count, tokenizer):
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break
    return "\n\n###\n\n".join(context)

In [14]:
custom_covid_prompt = create_prompt("When did the New York Department of Health say that everyone should wear a mask?", df, 400)
print(custom_covid_prompt)


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

July 8, 2022 -- New York Department of Health recommends that all people should wear N95, KN95, or KF94 masks in all public indoor settings and when in crowded outdoor areas due to high community transmission of COVID-19.

###

April 3, 2020 -- At a White House press briefing, CDC announces new mask wearing guidelines and recommends that all people wear a mask when outside of the home.

###

May 3, 2022 -- CDC recommends that everyone continue to wear a mask while in indoor transportation hubs to prevent the spread of COVID-19 – but this is no longer legally enforceable.

###

July 27, 2021 -- Amid a Delta variant surge, CDC releases updated masking guidance recommending that everyone in areas with substantial or high transmission wear a mask indoors.

###

July 14, 2020 -- CDC again calls on all people to wear cloth face masks when leaving their h

In [35]:
import openai
def complete_prompt(query, max_prompt_tokens=1800, max_answer_tokens=200):
    try:
        prompt=create_prompt(query, df, max_prompt_tokens)
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

In [36]:
def basic_complete_prompt(query, max_prompt_tokens=1800, max_answer_tokens=200):
    response=openai.Completion.create(
    model=COMPLETION_MODEL_NAME,
    prompt=query,
    max_tokens=max_answer_tokens
    )["choices"][0]["text"].strip()
    return response

## Custom Performance Demonstration


### Question 1

In [46]:
print("Basic Completion Model:")
print(basic_complete_prompt("Has the US gone over a million covid deaths? If so when?"))

Basic Completion Model:
As of early December 2021, the US has recorded over 925,000 deaths due to Covid-19. However, the actual number of deaths is likely higher due to unreported or misreported cases. It is possible that the US will reach a million deaths by the end of 2021 if the current trends continue.


In [47]:
print("Custom Query:")
print(complete_prompt("Has the US gone over a million covid deaths? If so when?"))

Custom Query:
Yes, the US has gone over a million COVID-19 deaths. According to the context, the US recorded over 1 million deaths on June 1, 2022.


### Question 2

In [48]:
print("Basic Completion Model:")
print(basic_complete_prompt("What are some updates on omnicron variants"))

Basic Completion Model:
1. Increased transmissibility: The Omicron variant is reported to be more transmissible than previous variants of COVID-19. According to preliminary data, it has a higher reproduction number (R0) than the Delta variant, meaning that it can spread more easily from person to person.

2. Different mutations: The Omicron variant has a significantly higher number of mutations compared to other variants. Some of these mutations are in the spike protein, which is the part of the virus that allows it to enter and infect human cells.

3. Vaccine effectiveness: Initial studies suggest that current COVID-19 vaccines may have reduced effectiveness against the Omicron variant. However, health experts still recommend getting fully vaccinated as it can still provide protection against severe illness and hospitalization.

4. Geographic spread: The Omicron variant was first identified in South Africa and has since been reported in multiple countries around the world. The variant

In [49]:
print("Custom Query:")
print(complete_prompt("What are some updates on omnicron variants"))

Custom Query:
On March 14, 2022, CDC estimated that 23% of all current COVID-19 infections in the U.S. are caused by the Omicron BA.2 subvariant, and on March 26, 2022, CDC estimated that about 55% of all current COVID-19 cases in the U.S. are caused by the Omicron subvariant BA.2. On June 30, 2022, the FDA called for Omicron-specific updates to COVID-19 vaccine boosters from Pfizer-BioNTech and Moderna in fall 2022. Additionally, on December 2, 2021, a second case of the Omicron variant was detected in the U.S. by the Minnesota and the New York City Departments of Health, and on July 6, 2022, CDC data showed that Omicron subvariants BA.4 and BA.5 had become dominant in the U.S., making up over 70
