### Name: Syed Asad Rizvi

### ERP ID: 25365

### Project: Healthcare Q&A Bot Using OpenAI Word Embeddings

_______________________________________________________________________________________________________________________________

Importing libraries

In [2]:
import pandas as pd
import openai
from getpass import getpass
from youtube_transcript_api import YouTubeTranscriptApi
from openai.embeddings_utils import get_embedding, cosine_similarity

import warnings
warnings.filterwarnings('ignore')

Initializing OpenAI Models & API Key

In [10]:
# Text generation model
COMPLETIONS_MODEL = "text-davinci-003"

# Word embedding model
EMBEDDINGS_MODEL = "text-embedding-ada-002"

# Entering OpenAI API key
openai.api_key = getpass("Enter your OpenAI API Key")

Fetching YouTube Healthcare podcast episodes

In [3]:
# Flu & Covid
flu_covid_ep = YouTubeTranscriptApi.get_transcript("FMTQCsv-row&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=25")

# Back pain 
back_pain_ep = YouTubeTranscriptApi.get_transcript("r7hf-9mac3Y&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=37")

# Anxiety
anxiety_ep = YouTubeTranscriptApi.get_transcript("PIOWM1KunQ4&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=38")

# Exercise
exercise_ep = YouTubeTranscriptApi.get_transcript("4cwSdRer9bU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=39")

# Better Sleep
better_sleep_ep = YouTubeTranscriptApi.get_transcript("8lLtuVUMmMc&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=40")

# Stress & Burnouts
stress_burnout_ep = YouTubeTranscriptApi.get_transcript("OfeIOI8ov2A&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=47")

# Head Migraine
head_mig_ep = YouTubeTranscriptApi.get_transcript("1IBPG58aizU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=48")

# Seasonal disorders
seasonal_dis_ep = YouTubeTranscriptApi.get_transcript("3Lj7XGQspbI&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=55")

# Suicide & Depression
suicide_dep_ep = YouTubeTranscriptApi.get_transcript("xPVIw2IZxBA&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=59")

# Diabetes
diabetes_ep = YouTubeTranscriptApi.get_transcript("3U-F1bWKbqQ&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=66")

In [4]:
# Putting each transcript in a list

transcripts = [flu_covid_ep, back_pain_ep, anxiety_ep, exercise_ep, better_sleep_ep, stress_burnout_ep, head_mig_ep,
              seasonal_dis_ep, suicide_dep_ep, diabetes_ep]

for transcript in transcripts:
    # loop through each dictionary in the transcript and convert the 'start' values to integers
    for caption in transcript:
        caption['start'] = int(caption['start'])

Loading the Excel sheets and sorting 'start' and 'end' values with their respective texts in each episode

In [5]:
# Load the excel file
qna_bot = pd.read_excel('Healthcare Bot.xlsx', sheet_name=None)

k = 0
# iterate through each sheet of the DataFrame
for each_sheet in qna_bot:
    for i, row in qna_bot[each_sheet].iterrows():
        text = [t['text'] for t in transcripts[k] if t['start'] >= row['start'] and t['start'] <= row['end']]
        context = ' '.join(text)
        qna_bot[each_sheet].at[i, 'context'] = context
    k = k+1

Creating a dataframe for each episode separately

In [6]:
for sheet_name in qna_bot.keys():
    df = qna_bot[sheet_name]
    df = df.drop('Unnamed: 0', axis=1)
    qna_bot[sheet_name] = df

Merging the dataframes of all episodes into a single dataframe

In [7]:
# Create an empty list to store all the dataframes
dfs = []

# Loop through each sheet in the qna_bot dictionary and append its dataframe to the list
for sheet_name in qna_bot.keys():
    dfs.append(qna_bot[sheet_name])

# Concatenate all the dataframes in the list into a single dataframe
merged_df = pd.concat(dfs, ignore_index=True)

In [8]:
merged_df

Unnamed: 0,episode,url,start,end,context
0,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,90,320,finally i feel like i i haven't seen you throu...
1,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,320,466,yeah absolutely um and i think what's interest...
2,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,466,744,yeah and i think that's the thing that we need...
3,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,744,895,got i'm going to get my flu shot my both of my...
4,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,895,960,maybe have not gotten the flu shot in their li...
...,...,...,...,...,...
133,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,990,1447,want to talk about another thing and that is f...
134,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,1447,1624,being utilized what what's the holdup here pro...
135,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,1624,1712,first discovered in the 1920s is first deliver...
136,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,1712,1763,in fact I have patients that choose to use vio...


Generating word embedding of each context and creating the column for the same

In [11]:
merged_df['embedding'] = merged_df['context'].apply(lambda row: get_embedding(row, engine='text-embedding-ada-002'))

In [12]:
merged_df

Unnamed: 0,episode,url,start,end,context,embedding
0,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,90,320,finally i feel like i i haven't seen you throu...,"[-0.0333162285387516, -0.00646474352106452, 0...."
1,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,320,466,yeah absolutely um and i think what's interest...,"[-0.030882058665156364, 0.0020020385272800922,..."
2,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,466,744,yeah and i think that's the thing that we need...,"[-0.01247295830398798, 0.011332061141729355, 0..."
3,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,744,895,got i'm going to get my flu shot my both of my...,"[-0.021822135895490646, 0.013645283877849579, ..."
4,Episode 1,https://www.youtube.com/watch?v=FMTQCsv-row&li...,895,960,maybe have not gotten the flu shot in their li...,"[-0.013866368681192398, 0.014302095398306847, ..."
...,...,...,...,...,...,...
133,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,990,1447,want to talk about another thing and that is f...,"[-0.004034609533846378, -0.007231032010167837,..."
134,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,1447,1624,being utilized what what's the holdup here pro...,"[0.004235747270286083, -0.0019685362931340933,..."
135,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,1624,1712,first discovered in the 1920s is first deliver...,"[-0.01571071334183216, 0.013606120832264423, 0..."
136,Episode 10,https://www.youtube.com/watch?v=3U-F1bWKbqQ&li...,1712,1763,in fact I have patients that choose to use vio...,"[-0.018342748284339905, -0.0028241740074008703..."


Saving the model

In [13]:
import pickle

# Specify the file path where you want to save the pickle file
file_path = 'model.pkl'

# Save the DataFrame as a pickle file
with open(file_path, 'wb') as f:
    pickle.dump(merged_df, f)

Calculating cosine similarity of the question with the context and creating column for the same

In [192]:
# Example question
question = "How good is morning walk for a person's health?"

# Generating word embedding of the question asked 
question_vector = get_embedding(question, engine='text-embedding-ada-002')

# Calculating cosine similarity of question vector with the context embedding 
merged_df["similarities"] = merged_df['embedding'].apply(lambda x: cosine_similarity(x, question_vector))

# Sorting the cosine similarities in terms of highest and most similar context at top
sorted_merged_df = merged_df.sort_values("similarities", ascending=False).head(5)

sorted_merged_df

Unnamed: 0,episode,url,start,end,context,embedding,similarities
49,Episode 4,https://www.youtube.com/watch?v=4cwSdRer9bU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=39,151,235,both audiences today what do you think sure let's just start off with some of the facts how much exercise is really recommended for an adults on a weekly basis sure most of the institutional guidelines suggest that we do 30 minutes of moderate-intensity exercise 5 days a week and most of the health benefits attributed to exercise can be had from that volume of exercise so think 150 minutes of exercise accumulated over the course of the week there seems to be added benefit working up to about an hour a day of exercise ok so when you say that I'm thinking of the treadmills in the gym that's a 20 minute limit so are you talking about 30 minutes of cardio exercise or just exercise in general most of the conventional exercise recommendations are focused on aerobic exercise walking cycling swimming for example but it doesn't have to happen all at the same time there can be an accumulation of activity throughout the course of the day so imagine 10 minute walk before work in the morning 10 minute walk during a lunch break and it walk in the evening before or after dinner that's your thirty minutes so your exercise physiologist right what exactly is that and how is that different than being a trainer okay is it an exercise physiologist is supposed to know the the the physiologic effects of exercise on the human body and most,"[0.012411648407578468, 0.007749965880066156, 0.014446664601564407, -0.05099893733859062, -0.01690428890287876, 0.00869920663535595, -0.010818745009601116, -0.019010823220014572, -0.01116333156824112, 0.0007074607419781387, -0.013146335259079933, 0.03302837908267975, -0.012756235897541046, -0.00398876192048192, -0.009167325682938099, 0.012359635904431343, 0.04025821387767792, -0.0020268892403692007, 0.005594669375568628, -0.008010031655430794, 0.000450239225756377, 0.010896764695644379, -0.007847490720450878, -0.011403893120586872, -0.003975758794695139, 0.01962197758257389, 0.03063577227294445, -0.013094321824610233, 4.4571854232344776e-05, -0.032248180359601974, 0.011436400935053825, -0.011735477484762669, -0.00822458602488041, -0.010415642522275448, 0.0012255609035491943, 0.015694981440901756, -0.011410394683480263, 0.0121125727891922, 0.013627457432448864, -0.01258719339966774, 0.0036929368507117033, -0.012782243080437183, -0.009264850057661533, 0.004525147844105959, 0.00283309374935925, 0.008263596333563328, 0.013913529925048351, -0.02800910547375679, -0.0012873265659436584, 0.04098639637231827, 0.007847490720450878, -0.0025177637580782175, -0.017684485763311386, 0.0010110065340995789, -0.014095575548708439, 0.0037579534109681845, -0.023522967472672462, 0.0314679816365242, -0.002647796645760536, -0.013913529925048351, -0.005838481243699789, -0.009316863492131233, -0.019595971331000328, 0.009453398175537586, -0.017372407019138336, -0.002025263849645853, 0.007333860732614994, 0.012886269018054008, -0.005324850790202618, 0.0015238240594044328, 0.01012306846678257, 0.012697720900177956, 0.0006591047276742756, 0.0112283481284976, 0.03235220909118652, -0.036981381475925446, -0.00015177288150880486, -0.014537688344717026, -0.0021861796267330647, 0.008647194132208824, -0.007288348861038685, -0.020220128819346428, -0.005887243431061506, 0.028399204835295677, 0.03042771853506565, -0.007743464317172766, 0.0006176566821523011, -0.012301120907068253, -0.00674221059307456, 0.025304419919848442, 0.007457391824573278, -0.002524265320971608, 0.024810293689370155, 0.01973900757730007, 0.013666466809809208, 0.01096828281879425, -0.03128593787550926, 0.003063902258872986, 0.009752473793923855, -0.006885246839374304, ...]",0.809692
59,Episode 4,https://www.youtube.com/watch?v=4cwSdRer9bU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=39,1205,1425,what about heart rate is there a certain link is there a benefit to a heart rate goal so like during aerobic exercise yeah me personally that's that's another obstacle to exercise is just complicated complicating things there are people who have certain health conditions that they're perhaps do need to monitor their heart but my preference is to go by what we like to call perceived exertion how does the effort feel to you okay most of the health benefits attributed to exercise can be head from moderate intensity exercise the way I like to define that to my clients if you can imagine you have an appointment somewhere you're running a bit late you drive your car you park your car in the parking lot so you're not too late you're gonna walk a little bit faster than the otherwise would it's that intensity of exercise where we get the most health benefit so you some people say like if you can't talk while you're exercising that that's an indication yeah and if you can sing to the person you could probably pick it up a little bit that's good to know okay so I'm gonna ask some like real quick questions so just like a quick answer sure um what's the best exercise to help improve posture strength training what's the best full-body workout I would say cross-country skiing cross-country skiing what if you're not a skier learn how about rowing or or swimming okay what's the best exercise for weight loss whatever you want to do most often okay very good so for people that are older so 65 and older many people kind of stop their exercising and their mode of exercising might be you know walking or something like that what's the best physical activity for people who are 65 and older and you know kind of what's the best exercise for seniors like for joint protection and that kind of thing this is the population that's starting to see the consequences of that age related loss of muscle mass and muscle strength and so I have really advocate strength training for those folks so now my last question is really about children because the evidence shows that kids are sitting and watching screen time whether it's TV video games computer for seven or more hours a day and that was according to the HHS Health and Human Services what's the best exercise for kids or how do you really get kids involved in exercise I don't think children should be exercising I think they should be playing okay so we do need to create those opportunities for them the data that most familiar with finds that we call it screen time so time spent in front of computers televisions and so on there's no correlation between body weight and screen time up to 14 hours per week think 2 hours a day once we start getting more than that then we see a body weight increase so we really need to I think if we just limit our children to two hours of screen time whatever else they're going to be doing is going to be somewhat more physically act so than what they're doing in front of the screens that might be a good place to start I also encourage parents to incorporate their children into some of their physical activity so it becomes a family thing as opposed to just making the child do something yeah that's it that's a great idea especially instead of doing the family going out to dinner maybe everyone go rollerskating or walk or ride your bicycles to dinner that's all okay yeah that's awesome all right so,"[-0.011920389719307423, 0.004172797314822674, 0.01776164583861828, -0.036897704005241394, -0.007176022045314312, 0.0247658658772707, 0.012217739596962929, -0.03991084173321724, -0.013268372043967247, -0.013228725641965866, -0.0032394519075751305, 0.027990451082587242, -0.009792692959308624, 0.005487740505486727, -0.014246320351958275, 0.003723470028489828, 0.052148401737213135, -0.016294723376631737, 0.020219730213284492, -0.006105565931648016, -0.021726299077272415, 0.010539369657635689, -0.004295040853321552, -0.009984318166971207, -0.0250698234885931, 0.019241781905293465, 0.03155863657593727, -0.018303481861948967, -0.00177583412732929, -0.024197598919272423, 0.035629015415906906, -0.006587931886315346, -0.024105090647935867, -0.010955657809972763, -0.021039092913269997, 0.014854233711957932, -0.014352044090628624, -0.011087813414633274, 0.004863307811319828, -0.0014718774473294616, 0.01163625717163086, 0.0009333454072475433, -0.02134304866194725, -0.0010778900468721986, -0.0002529531193431467, 0.024197598919272423, -0.002121089491993189, -0.02849264070391655, -0.0012893382227048278, 0.01587182842195034, 0.011173713952302933, 0.013037100434303284, -0.016717620193958282, -0.0029024563264101744, -0.0008755275630392134, -0.004483361728489399, 0.004784014541655779, 0.023285729810595512, -0.002215249929577112, -0.010248628444969654, 0.011814665980637074, 0.008484357967972755, -0.014629569835960865, 0.010010749101638794, -0.016360800713300705, 0.012310247868299484, 0.0059998417273163795, 0.014946741983294487, -0.008616512641310692, 0.015885043889284134, 0.03422817215323448, 0.027964020147919655, 0.0006525157950818539, 0.029206277802586555, 0.022228488698601723, -0.003267534775659442, 0.010195765644311905, -0.023999366909265518, 0.002155780093744397, 0.017021577805280685, -0.012904945760965347, -0.02306106500327587, -0.021210893988609314, 0.028175467625260353, 0.03607834130525589, -0.008722236379981041, 0.004073680844157934, 0.0038159785326570272, 0.013849854469299316, 0.013493035919964314, -0.01100191194564104, -0.00973983108997345, 0.020193299278616905, 0.02991991490125656, 0.0071958452463150024, 0.014497414231300354, -0.01429918222129345, 0.016730835661292076, -0.019598601385951042, -0.001266211038455367, ...]",0.804405
52,Episode 4,https://www.youtube.com/watch?v=4cwSdRer9bU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=39,408,506,started so let's break down the exercise recommendations into a few components one is the frequency think 5 days a week that is correct I encourage our patients to don't worry about the 30 minutes don't worry about the moderate intensity but let's just focus on the 5 days a week just kind of checking days off on the calendar can I go for a short walk during my lunch break at work or maybe park in a different parking structure so that requires that I do that extra five-minute walk before and after work so now we're establishing the habit of dedicating time to exercise and then over time progressively increasing the volume of exercise and then once we're at that point then we start focusing on the intensity of exercise and I think once we get to the point where we're routinely accumulating 30 minutes of exercise throughout the course of the day the rest of its going to fall into place pretty well and I don't mean to suggest that this is easy but I think it's very doable and again if you look at all of the benefits associated with physical activity it's kind of hard to to say I don't have enough time to do such a thing I like the way that you said that you break it up because I think even when I'm talking to my patients and we say okay at least thirty minutes five days a week people think of I have to physically go to a gym and work out on some kind of equipment for 30 minutes and then you know drive back home and then you know eat my dinner get my kids ready for bed and it just seems like very overwhelming so what you're suggesting is just kind of in like weaving it into your daily routine which i think is pretty doable yeah I mean if,"[0.008435215801000595, 0.009937926195561886, 0.0279704537242651, -0.03913726285099983, -0.024844815954566002, 0.007767344359308481, -0.021478744223713875, -0.04346507042646408, -0.009784315712749958, -0.007326549384742975, -0.009303448721766472, 0.03972499072551727, 0.004184214398264885, 0.0034729312174022198, -0.002658128272742033, 0.008535396307706833, 0.045335110276937485, -0.0033961262088268995, 0.014626383781433105, -0.019381627440452576, -0.011574211530387402, 0.007406693883240223, -0.012242082506418228, -0.021799322217702866, -0.010265183635056019, 0.02388308197259903, 0.015507973730564117, -0.004030603915452957, -0.01880725845694542, -0.01337746437638998, 0.031817395240068436, -0.023549145087599754, -0.0003894525289069861, -0.00997799914330244, -0.0013265595771372318, 0.014987034723162651, -0.0007956018089316785, -0.0015945429913699627, 0.02122495323419571, -0.02839789167046547, 0.010726015083491802, -0.0072464048862457275, -0.01601555570960045, 0.01837982051074505, -0.021051306277513504, 0.006441619712859392, 0.0044380053877830505, -0.01630941964685917, -0.022280190140008926, 0.010512296110391617, -0.008635577745735645, -0.014065371826291084, -0.030080927535891533, 0.007239725906401873, -0.005359668284654617, 0.02436394989490509, -0.0029403038788586855, 0.020396793261170387, -1.7531623598188162e-05, -0.02215997315943241, -0.020169716328382492, 0.006785573437809944, -0.007607055362313986, 0.01902097836136818, -0.015868624672293663, -0.004965623840689659, 0.017979098483920097, 0.0011462343391031027, 0.009944605641067028, -0.004301091656088829, 0.013397500850260258, 0.02927948348224163, -0.0036565959453582764, 0.02181268110871315, 0.02960006147623062, -0.014773315750062466, 0.011273669078946114, -0.024190302938222885, 0.01214190199971199, -0.013497681356966496, -0.007326549384742975, 0.012555982917547226, -0.02218668907880783, 0.030668655410408974, 0.03742751479148865, -0.018473323434591293, 0.0006895772530697286, -0.012442444451153278, -0.006561836693435907, 0.01971556432545185, -0.0032274886034429073, -0.016376206651329994, 0.012502552941441536, 0.022948062047362328, 0.015280897729098797, 0.018446607515215874, -0.01417223084717989, -0.002449418418109417, 0.00659523019567132, -0.01347096636891365, ...]",0.799465
51,Episode 4,https://www.youtube.com/watch?v=4cwSdRer9bU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=39,347,408,yeah and I think let's talk a little bit about some of the outcomes since you brought it up so like you said most of the time we think of exercise it's looking better and maybe feeling better but there are so many more benefits to exercising like improved sleep quality sleep quality stress management reduced prevalence of heart disease and people who are regularly physically active I think the biggest thing and a component that we haven't talked about so far with regard to exercise appropriately programmed exercise delays the age-related loss of muscle mass and muscle tissue and if we remain strong as we age that gives us the ability to remain healthy and physically active being able to do the things that we really want to be doing right and also things like osteoporosis right right and then even cancer prevention some forms of cancer there there seems to be a decreased prevalence and folks that are physically active for someone who's listening and they're like okay I get it there's a lot of benefits how do I get started so let's break down the exercise,"[-0.0007731420919299126, -0.021638160571455956, 0.025015441700816154, -0.03673120588064194, -0.016572238877415657, 0.01777654141187668, -0.0036979918368160725, -0.006469849497079849, -0.011676491238176823, -0.0014513799687847495, -0.016035540029406548, 0.03251614794135094, 0.012959334068000317, 0.00030946137849241495, -0.013574575074017048, 0.006198226939886808, 0.05356524884700775, -0.006394580472260714, 0.017737271264195442, -0.017645638436079025, -0.010014031082391739, -0.017554007470607758, 0.002308790571987629, -0.01259935274720192, -0.010118752717971802, 0.021638160571455956, 0.0281178280711174, -0.02696588821709156, -0.003508183406665921, -0.011375416070222855, 0.0018555410206317902, -0.0014947414165362716, -0.025800855830311775, -0.01704348810017109, -0.01460870448499918, 0.007107998710125685, -0.013928011991083622, -0.014713426120579243, 0.011309964582324028, -0.038485296070575714, 0.011159426532685757, -0.002321880776435137, -0.009261342696845531, -0.0019569904543459415, 0.0034787303302437067, 0.021834515035152435, 0.0003299148811493069, -0.014163636602461338, -0.0033543731551617384, 0.014098185114562511, 0.016493698582053185, 0.02030295692384243, -0.017985984683036804, -0.0008349116542376578, 0.011663400568068027, 0.010426373220980167, -0.01653296872973442, 0.015839185565710068, -0.001336840447038412, -0.0042248740792274475, -0.019216466695070267, 0.002356242621317506, -0.009025718085467815, 0.005805519875138998, -0.01765872910618782, 0.012507720850408077, -0.006839648820459843, 0.006214589811861515, 0.0206694845110178, 0.007441799622029066, 0.029322130605578423, 0.033144477754831314, -0.021022919565439224, 0.01822160929441452, 0.01655915006995201, -0.020054243505001068, -0.007055637426674366, -0.023038817569613457, 0.025656864047050476, -0.0016870042309165, 0.010675088502466679, -0.028196370229125023, -0.007121088914573193, 0.018941571936011314, 0.019164105877280235, -0.0037568979896605015, -0.003841984551399946, -0.007042547222226858, 0.01027583610266447, 0.010969618335366249, -0.01006639190018177, 0.010374012403190136, -0.004705939907580614, 0.03597196936607361, 0.003665266325697303, 0.011781212873756886, 0.0031465657521039248, -0.011552133597433567, -0.023457704111933708, -0.0017393651651218534, ...]",0.798051
53,Episode 4,https://www.youtube.com/watch?v=4cwSdRer9bU&list=PL_OlobI2SUirmdJ68fTIBmkSsrQbr-0pY&index=39,506,698,you could imagine signing up for a marathon but without preparing for that marathon is kind of overwhelming we can train for it so that by the time of that event were prepared for it so many of us have jobs now that requires sitting like sitting all day long and there's a lot of evidence that sitting is the new smoking so I mean why sitting so dangerous and then for those that are sitting most of the day and people who are in their car right now listening to the podcast or at their desk sitting what are some things that they can do to be more active exercise the studies that find that they're that sitting is harmful and found that the consequences of sitting or or just a sedentary lifestyle are mitigated by exercise so if you do have a sedentary occupation where maybe you are sitting at your desk eight hours a day as long as you're exercising sitting isn't you know like if you're supposed to be you know emailing or sitting in meetings I mean people say do walking meetings or you know get the stand desk and stuff but I don't see a lot of people actually enforcing it or doing it it's not practical I don't think it's going to happen but again as long as we exercise during our non-work hours we're going to be okay so when I started doing an exercise program the people in my class were saying you need to measure your waist you have to weigh yourself every day you need to take pictures of your yourself and all of those things and I kind of found that very one takes up a lot of time to kind of discouraging if the way doesn't change so what's the recommendation should you weigh yourself daily or not and then you know what about like waist measurements yeah I think may if one of your goals with exercise is weight loss then yes weighing yourself would be a measure of progress I'm reminded of a woman in our program and I was counseling her earlier this year she she started exercising and January like a lot of us do and she stopped exercising toward the end of February and I asked her why did you start exercising and she said to lose weight and then I asked her why did you stop exercising and she said well I didn't lose weight well that made sense to me I continued doing something I forget not getting the desired outcome right and then I asked her do you feel that you did get any benefit from your exercise and her routine was stopping at the gym on the way home from work mm-hmm and and she paused for a moment she said yeah you know it reduced my stress apparently she had a an occupation that was very stressful and I don't think she's really fond of it she's exercising now but she's got a more realistic goal in mind and something that's easy for her to measure is stress her measurement of progress is if she's feeling stressed then she's not exercising enough if she is exercising then she's not feeling stressed so yeah if it's to mood is just amazing yeah oh the way we measure progress should be dependent on what our goals are so let's pretend the goal is,"[-0.0023253774270415306, -0.010357576422393322, 0.023815656080842018, -0.03753098472952843, -0.025738239288330078, -0.0037910083774477243, -0.026916159316897392, -0.014988022856414318, 0.0036826939322054386, -0.006360093597322702, 0.008929179050028324, 0.024709250777959824, -0.00972799863666296, 0.019022738561034203, -0.012435861863195896, -0.007954347878694534, 0.06006040424108505, -0.003777469042688608, 0.009484291076660156, -0.037910085171461105, -0.014974483288824558, -0.002308453433215618, 0.0031123501248657703, -0.008069432340562344, -0.026753688231110573, 0.016517965123057365, 0.04273008182644844, -0.012374934740364552, -0.0289741363376379, -0.018454087898135185, 0.011400103569030762, -0.001678028958849609, -0.009193195030093193, -0.025995485484600067, -0.023247005417943, 0.002215370535850525, -0.0035439159255474806, 0.0044036624021828175, 0.001146610826253891, -0.04857906326651573, -0.0021645980887115, 0.005788057576864958, -0.020945321768522263, 0.000301249761832878, -0.013383613899350166, 0.016125325113534927, -0.0063194758258759975, -0.029759416356682777, -0.0075955563224852085, 0.00858392659574747, 0.022664815187454224, 0.012869119644165039, -0.028486719354987144, 0.0022661429829895496, -0.004271653946489096, 0.0013395460555329919, -0.008299600332975388, 0.008495920337736607, -0.0036691545974463224, -0.023869814351201057, -0.01766880787909031, 0.011359485797584057, -0.02166290581226349, 0.015326505526900291, -0.010966845788061619, -0.008042353205382824, -0.036068737506866455, 0.0032646674662828445, -0.006065613590180874, 0.00034123306977562606, 0.006075767800211906, 0.04256760701537132, -0.0052431002259254456, 0.018860267475247383, 0.021893072873353958, -0.01068252045661211, -0.003489758586511016, -0.005554504226893187, -0.002022435190156102, -0.0026553983334451914, 0.001337007386609912, -0.026767227798700333, -0.016721054911613464, 0.036014579236507416, 0.024533240124583244, -0.005141555331647396, -0.0019835096318274736, -0.010384655557572842, -0.005737285129725933, 0.0013014667201787233, -0.006021610461175442, 0.005683127790689468, 0.0038180870469659567, 0.024546779692173004, 0.022028466686606407, 0.01775004342198372, 0.00038227412733249366, -0.0005893833586014807, -0.0036623848136514425, -0.011095468886196613, ...]",0.790754


In [127]:
pd.set_option('display.max_colwidth', None)

Merging the top most similar contexts obtained above and putting in a list

In [193]:
context = []
for i, row in sorted_merged_df.iterrows():
    context.append(row['context'])

context

["both audiences today what do you think sure let's just start off with some of the facts how much exercise is really recommended for an adults on a weekly basis sure most of the institutional guidelines suggest that we do 30 minutes of moderate-intensity exercise 5 days a week and most of the health benefits attributed to exercise can be had from that volume of exercise so think 150 minutes of exercise accumulated over the course of the week there seems to be added benefit working up to about an hour a day of exercise ok so when you say that I'm thinking of the treadmills in the gym that's a 20 minute limit so are you talking about 30 minutes of cardio exercise or just exercise in general most of the conventional exercise recommendations are focused on aerobic exercise walking cycling swimming for example but it doesn't have to happen all at the same time there can be an accumulation of activity throughout the course of the day so imagine 10 minute walk before work in the morning 10 m

Joining all contexts together for the prompt

In [194]:
text = "\n".join(context)
text

"both audiences today what do you think sure let's just start off with some of the facts how much exercise is really recommended for an adults on a weekly basis sure most of the institutional guidelines suggest that we do 30 minutes of moderate-intensity exercise 5 days a week and most of the health benefits attributed to exercise can be had from that volume of exercise so think 150 minutes of exercise accumulated over the course of the week there seems to be added benefit working up to about an hour a day of exercise ok so when you say that I'm thinking of the treadmills in the gym that's a 20 minute limit so are you talking about 30 minutes of cardio exercise or just exercise in general most of the conventional exercise recommendations are focused on aerobic exercise walking cycling swimming for example but it doesn't have to happen all at the same time there can be an accumulation of activity throughout the course of the day so imagine 10 minute walk before work in the morning 10 mi

In [195]:
context = text
context

"both audiences today what do you think sure let's just start off with some of the facts how much exercise is really recommended for an adults on a weekly basis sure most of the institutional guidelines suggest that we do 30 minutes of moderate-intensity exercise 5 days a week and most of the health benefits attributed to exercise can be had from that volume of exercise so think 150 minutes of exercise accumulated over the course of the week there seems to be added benefit working up to about an hour a day of exercise ok so when you say that I'm thinking of the treadmills in the gym that's a 20 minute limit so are you talking about 30 minutes of cardio exercise or just exercise in general most of the conventional exercise recommendations are focused on aerobic exercise walking cycling swimming for example but it doesn't have to happen all at the same time there can be an accumulation of activity throughout the course of the day so imagine 10 minute walk before work in the morning 10 mi

Setting the prompt and hyperparameters using OpenAI text model

In [196]:
prompt = f"""Answer the following question using only the context below. If you don't know the answer, just reply I don't know.

Context:
{context}

Q: {question}
A:"""

openai.Completion.create(prompt=prompt, temperature=1, max_tokens=500, top_p=1, frequency_penalty=0, presence_penalty=0, 
                         model=COMPLETIONS_MODEL)["choices"][0]["text"].strip(" \n")

'Morning walks are a great way to get the recommended 30 minutes of moderate-intensity exercise 5 days a week. They can help with improved sleep quality, stress management, reduced prevalence of heart disease, delayed age-related muscle loss, and even cancer prevention.'

-------------------------------------------xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-------------------------------------------