# Doctor Finder Tutorial: Webscraping and Basic Search Engine with Python, part 2
Author:  Colby Carter    
    
Last modified: 2/11/2023    
    
Key steps:    
1. Fit tf-idf to UTMC profiles and tune to a reasonable feature set.    
2. Calculate Cosine Similarity for a health request    

## 3. Build a simple search engine

#### Import Libraries

In [6]:
import numpy as np
import pandas as pd
import pickle

import re

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.pipeline import Pipeline
from sklearn.decomposition import TruncatedSVD

import nltk
from nltk.stem import WordNetLemmatizer

from uszipcode import SearchEngine
# import math
from haversine import haversine, Unit

# import seaborn as sns
import matplotlib.pyplot as plt

### Data

In [7]:
output_file_path = "./"

In [8]:
# pulled on 1/7/2023
fullDoctorDF = pd.read_json(output_file_path+'Data/UTMC_doctor_profiles.json', dtype={'Zip': str})
print(fullDoctorDF.shape)
fullDoctorDF.head(20)

(1178, 12)


Unnamed: 0,Name,Profession,Provider,Website,Phone,Zip,Gender,Languages,Bio,About,Clinical_Interest,Specialties
0,"Todd B. Abel, MD",Neurological Surgery,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,865-524-1869,37920.0,Male,English,Dr. Abel has special interests in spine surger...,Dr. Todd B. Abel is a board certified neurosur...,spine trauma,"Brain and Spinal Cord Injury, Neurological Sur..."
1,"Julia A. Abraham, MD",Family medicine physician,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,865-531-1300,37919.0,Female,English,Primary care physician at Rocky Hill Family Ph...,Dr. Abraham is a Wisconsin native but grew up ...,"Preventative Care, Chronic Disease Management...",Family Medicine
2,"Wala Abusalah, MD",Transplant Nephrologist,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920.0,Female,"Arabic, English","Family, travelling and self development!",A great source of happiness is seeing dialysis...,"Transplant medicine, immunosuppression, chron...","Transplant Surgery, Nephrology"
3,"John H. Acker, MD","Cardiologist, Clinical Assistant Professor, D...",UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920.0,Male,English,I enjoy spending time with my family.,Dr. Acker enjoys patient care and improving a ...,Cardiovascular Disease,Cardiology
4,"Brittany L. Adams, FNP-C",Nurse Practitioner,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920.0,Female,English,"I live for vacationing, outdoors, family time ...",I have been lucky to call East Tennessee my ho...,I see patients with blood disorders such as bl...,"Hematology/Oncology, Medical Oncology"
5,"Theresa A. Adams, NP",Nurse Practitioner,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,(865) 305-9749,37920.0,Female,English,,,,Neonatology
6,"Lauren C. Ade, APRN",Advance Practice Registered Nurse,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920.0,Female,English,Enthusiastic lover of all things outdoors.,Lauren has been with UTMC since 2012. She foun...,General cardiology\nCardiovascular risk assess...,Cardiology
7,"Michial A. Adkins, CRNA",Certified Registered Nurse Anesthetist,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,(865) 305-9220,37920.0,Male,English,,,,Anesthesia
8,"Fatima Ahmed, DO",Family Medicine Physician,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,(865) 305-9350,37920.0,Female,"English, Urdu",,,Family Medicine,Family Medicine
9,"Shaun B. Ajinkya, MD",Neurologist,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,865-521-6174,37919.0,Male,English,,,"General Neurology, Neurology",Neurology


In [9]:
fullDoctorDF["Profession"].describe()

count     1178
unique     380
top           
freq       202
Name: Profession, dtype: object

In [10]:
miniDoctorDF = fullDoctorDF.head(10)

### Key Processing Functions    

In [11]:
fullDoctorDF.columns

Index(['Name', 'Profession', 'Provider', 'Website', 'Phone', 'Zip', 'Gender',
       'Languages', 'Bio', 'About', 'Clinical_Interest', 'Specialties'],
      dtype='object')

In [12]:
textColumns = ["Provider","Profession","Gender","Languages","Bio","About","Clinical_Interest","Specialties","Zip"]

In [13]:
def concatenate(columns):
    """Concatenate the relevant text features for a given doctor
    columns: row of relevant text columns
    y: concatenated string
    """

    y = str()
    
    for field in columns:
        # if field is not empty, append to y
        y = ' '.join([y,str(field).strip()])

    return y

In [14]:
def remove_name(columns):
    """Remove the doctor's name from concatenated text field
    columns: row including [name, text] columns
    """
    
    # remove credentials, then split full name into a list
    name_list = columns[0].split(",")[0].split()
    # print(name_list)
    
    text = columns[1]
    
    for name in name_list:
        text = text.replace(name, " ")
    
    # drop 'Dr.'
    text = text.replace("Dr.", " ")
    
    text = text.strip()
    
    return text

In [15]:
def lemmatize(text, lemmatizer):
    """Take a string of words, remove punctuation and apply NLTK lematizer.
        This function can apply to doctor bios as well as user request
        
        text (str): String of preprocessed tokens
        lemmatizer: NLTK WordNetLemmatizer() object
        
        Returns: processed string
    """
    # remove punctuation, extra spaces, tokens to lowercase, split into a list for lemmatizer
    text_list = re.sub(r'[^\w\s]', '', text).strip().lower().split()
    
    # lemmatize text, return to string
    lemmatized_list = [lemmatizer.lemmatize(word) for word in text_list]
    lemmatized_text = " ".join(lemmatized_list)
    
    return lemmatized_text

In [16]:
lemmatizer = WordNetLemmatizer()

lem_test = lemmatize("This is my text, with a comma, and with an exclamation!", lemmatizer)
lem_test

'this is my text with a comma and with an exclamation'

In [17]:
def remove_duplicates(text):
    deduped = ' '.join(dict.fromkeys(text.split()))
    return deduped

In [18]:
remove_duplicates(lem_test)

'this is my text with a comma and an exclamation'

In [19]:
miniDoctorProcDF = miniDoctorDF.copy()

# concatenate key text fields
miniDoctorProcDF["Raw_Text"] = miniDoctorProcDF[textColumns].apply(concatenate, axis=1)

# remove doctor's name from Text field
miniDoctorProcDF["Text"] = miniDoctorProcDF[["Name", "Raw_Text"]].apply(remove_name, axis=1)

# drop punctuation and lemmatize
lemmatizer = WordNetLemmatizer()
miniDoctorProcDF["Text"] = miniDoctorProcDF["Text"].apply(lemmatize, args=(lemmatizer,))

# remove dups
miniDoctorProcDF["Text"] = miniDoctorProcDF["Text"].apply(remove_duplicates)

miniDoctorProcDF[["Name","Raw_Text","Text"]].head(5)

Unnamed: 0,Name,Raw_Text,Text
0,"Todd B. Abel, MD",UTMC Neurological Surgery Male English Dr. Ab...,utmc neurological surgery male english ha spec...
1,"Julia A. Abraham, MD",UTMC Family medicine physician Female English...,utmc family medicine physician female english ...
2,"Wala Abusalah, MD","UTMC Transplant Nephrologist Female Arabic, E...",utmc transplant nephrologist female arabic eng...
3,"John H. Acker, MD","UTMC Cardiologist, Clinical Assistant Profess...",utmc cardiologist clinical assistant professor...
4,"Brittany L. Adams, FNP-C",UTMC Nurse Practitioner Female English I live...,utmc nurse practitioner female english i live ...


In [20]:
# preprocess full dataset
fullDoctorProcDF = fullDoctorDF.copy()

# on full doctor dataframe
fullDoctorProcDF["Raw_Text"] = fullDoctorProcDF[textColumns].apply(concatenate, axis=1)

fullDoctorProcDF["Text"] = fullDoctorProcDF[["Name", "Raw_Text"]].apply(remove_name, axis=1)  # remove name from Text

# lemmatize
lemmatizer = WordNetLemmatizer()
fullDoctorProcDF["Text"] = fullDoctorProcDF["Text"].apply(lemmatize, args=(lemmatizer,))

# remove dups
fullDoctorProcDF["Text"] = fullDoctorProcDF["Text"].apply(remove_duplicates)

print(fullDoctorProcDF.shape)

(1178, 14)


In [21]:
fullDoctorProcDF[["Name","Raw_Text","Text"]].head(5)

Unnamed: 0,Name,Raw_Text,Text
0,"Todd B. Abel, MD",UTMC Neurological Surgery Male English Dr. Ab...,utmc neurological surgery male english ha spec...
1,"Julia A. Abraham, MD",UTMC Family medicine physician Female English...,utmc family medicine physician female english ...
2,"Wala Abusalah, MD","UTMC Transplant Nephrologist Female Arabic, E...",utmc transplant nephrologist female arabic eng...
3,"John H. Acker, MD","UTMC Cardiologist, Clinical Assistant Profess...",utmc cardiologist clinical assistant professor...
4,"Brittany L. Adams, FNP-C",UTMC Nurse Practitioner Female English I live...,utmc nurse practitioner female english i live ...


In [58]:
fullDoctorProcDF.to_csv(output_file_path+'Data/UTMC_doctor_profiles_processed.csv', index=False)

### Fit TfidfVectorizer with Concatenated Text Field    
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

#### Remove Stop Words

In [22]:
# nltk standard english stopwords
NLTK_stopwords = nltk.corpus.stopwords.words('english')
print("NLTK 'english' stopwords:", len(NLTK_stopwords))
print(NLTK_stopwords[:10])

NLTK 'english' stopwords: 153
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your']


In [23]:
# add stopwords from exploration of UTMC profile text - don't need to run
newStopWords = ['nan','dr','university','college','medicine', 'medical','tennessee','tn','tenn','knoxville','ut','us','team','vols','football','academy',
                'practice','practicing','hospital','professional','american','school','undergraduate','graduate','graduated','cum','laude','honors','pllc',
                'received','serves','served','serving','brings','enjoy','enjoys','enjoying','excited','friends','son','daughter','wife','husband','grandchildren','children',
                'work','working','worked','try','dog','following','interest','interests','joined','making','completed','movies','music','needs',
                'numerous','passion','originally','part','playing','participated','places','problems','prior','proud','provide','provides','providing',
                'recently','since','snow','spare','specific','spending','taking','things','throughout','going','help','helping','include','including',
                'term','traveling','way','well','area','areas','began','important',
                'first','second','third','one','two','three','four','five','six','many','also','new','says','time',
                '2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016',
                'years','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16',
                'indiana','georgia','carolina','texas','north','south','east','west']

# print("New words:", len(newStopWords))
# newStopWords.sort()
# NLTK_stopwords.extend(newStopWords)
# print("Last:")
# stopwords[-10:]

In [None]:
# save list of added stopwords -- ONE TIME -- UPDATE CSV GOING FORWARD
# pd.DataFrame({"Stop_Words": newStopWords}).to_csv(output_file_path+"Data/Stop_words_list.csv", index=False)

In [24]:
# read in CSV of UTMC-specific stopwords
stopWordsDF = pd.read_csv(output_file_path+"Data/Stop_words_list.csv", sep=',', dtype={'Stop_Words': str})
print(stopWordsDF.shape)
stopWordsDF.head()

(169, 1)


Unnamed: 0,Stop_Words
0,1
1,2
2,3
3,4
4,5


In [25]:
newStopWords = stopWordsDF["Stop_Words"].tolist()
stopWords = NLTK_stopwords + newStopWords
print(len(stopWords))

322


In [26]:
# initialize vectorizer
vectorizer = TfidfVectorizer(strip_accents = 'ascii',
                             lowercase = True,
                             stop_words = stopWords, 
                             # ngram_range = (1,2),  # just use unigram to reduce dimensions
                             max_df = .3,
                             # min_df = .001,          # want rare words for matching user needs
                             max_features = None,    # optional: top max_features ordered by term frequency
                             norm = 'l2',            # standard
                            )

In [27]:
# fit and create sparse vector matrix
tfidfDF = vectorizer.fit_transform(fullDoctorProcDF["Text"])
tfidfDF.shape

(1178, 4083)

In [28]:
# if max/min_df or max_features is used (FOR TESTING)
print("# stop words:", len(vectorizer.stop_words_))
vectorizer.stop_words_

# stop words: 5


{'37920', 'english', 'female', 'male', 'utmc'}

In [29]:
# n-gram features
len(vectorizer.vocabulary_)

4083

In [30]:
# example
vectorizer.vocabulary_['neurological']

2493

In [94]:
# save vectorizer model object
pickle.dump(vectorizer, open(output_file_path+"Models/UTMC_Tfidf_vectorizer_unigram.pkl", 'wb'))

### Calculate Geographical Distance

In [31]:
#for extensive list of zipcodes, set simple_zipcode =False
def distance_zip(doc_zip, user_zip):
    """calc Euclidean distance between two zip codes"""
#     print(doc_zip, user_zip)
    search = SearchEngine()

    # doctor DF
    doc_zip = search.by_zipcode(doc_zip)
    user_zip = search.by_zipcode(user_zip)

    if doc_zip is not None and user_zip is not None:
        doc_lat =doc_zip.lat
        doc_long =doc_zip.lng

        user_lat =user_zip.lat
        user_long =user_zip.lng

        return haversine((doc_lat, doc_long), (user_lat, user_long), unit=Unit.MILES)
    else:
        return np.nan

In [32]:
distance_zip('37920', '02215')  # Boston

814.9786792035693

In [34]:
distance_zip('37831', '37920')  # UTMC, 6.24

23.74898292494684

In [35]:
distance_zip('x', '37920')

nan

In [36]:
distance_zip('39722', '37920')  # 39722 is missing...

nan

### Cosine Similarity Function    
Measure how similar a search is to a doctor profile    
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html

In [37]:
# Load Model
loaded_vectorizer = pickle.load(open(output_file_path+"Models/UTMC_Tfidf_vectorizer_unigram.pkl", 'rb'))
loaded_vectorizer

TfidfVectorizer(max_df=0.3,
                stop_words=['i', 'me', 'my', 'myself', 'we', 'our', 'ours',
                            'ourselves', 'you', 'your', 'yours', 'yourself',
                            'yourselves', 'he', 'him', 'his', 'himself', 'she',
                            'her', 'hers', 'herself', 'it', 'its', 'itself',
                            'they', 'them', 'their', 'theirs', 'themselves',
                            'what', ...],
                strip_accents='ascii')

In [38]:
# Generate Text column on full doctor dataset
# read in
fullDoctorDF = pd.read_json(output_file_path+"Data/UTMC_doctor_profiles.json",dtype={'Zip': str})
fullDoctorProcDF = fullDoctorDF.copy()

# preprocess
fullDoctorProcDF["Raw_Text"] = fullDoctorProcDF[textColumns].apply(concatenate, axis=1)

fullDoctorProcDF["Text"] = fullDoctorProcDF[["Name", "Raw_Text"]].apply(remove_name, axis=1)  # remove name from Text

# lemmatize
lemmatizer = WordNetLemmatizer()
fullDoctorProcDF["Text"] = fullDoctorProcDF["Text"].apply(lemmatize, args=(lemmatizer,))

# remove dups
fullDoctorProcDF["Text"] = fullDoctorProcDF["Text"].apply(remove_duplicates)
print(fullDoctorProcDF.shape)

fullDoctorProcDF.head()

(1178, 14)


Unnamed: 0,Name,Profession,Provider,Website,Phone,Zip,Gender,Languages,Bio,About,Clinical_Interest,Specialties,Raw_Text,Text
0,"Todd B. Abel, MD",Neurological Surgery,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,865-524-1869,37920,Male,English,Dr. Abel has special interests in spine surger...,Dr. Todd B. Abel is a board certified neurosur...,spine trauma,"Brain and Spinal Cord Injury, Neurological Sur...",UTMC Neurological Surgery Male English Dr. Ab...,utmc neurological surgery male english ha spec...
1,"Julia A. Abraham, MD",Family medicine physician,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,865-531-1300,37919,Female,English,Primary care physician at Rocky Hill Family Ph...,Dr. Abraham is a Wisconsin native but grew up ...,"Preventative Care, Chronic Disease Management...",Family Medicine,UTMC Family medicine physician Female English...,utmc family medicine physician female english ...
2,"Wala Abusalah, MD",Transplant Nephrologist,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920,Female,"Arabic, English","Family, travelling and self development!",A great source of happiness is seeing dialysis...,"Transplant medicine, immunosuppression, chron...","Transplant Surgery, Nephrology","UTMC Transplant Nephrologist Female Arabic, E...",utmc transplant nephrologist female arabic eng...
3,"John H. Acker, MD","Cardiologist, Clinical Assistant Professor, D...",UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920,Male,English,I enjoy spending time with my family.,Dr. Acker enjoys patient care and improving a ...,Cardiovascular Disease,Cardiology,"UTMC Cardiologist, Clinical Assistant Profess...",utmc cardiologist clinical assistant professor...
4,"Brittany L. Adams, FNP-C",Nurse Practitioner,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920,Female,English,"I live for vacationing, outdoors, family time ...",I have been lucky to call East Tennessee my ho...,I see patients with blood disorders such as bl...,"Hematology/Oncology, Medical Oncology",UTMC Nurse Practitioner Female English I live...,utmc nurse practitioner female english i live ...


In [39]:
fullDoctorProcDF["Zip"].describe()  # 8 missing zips...

count      1178
unique       19
top       37920
freq        758
Name: Zip, dtype: object

In [40]:
# transform into tfidf sparse matrix
tfidfMatrix = loaded_vectorizer.transform(fullDoctorProcDF["Text"])
tfidfMatrix.shape  # checks

(1178, 4086)

In [41]:
sum(tfidfMatrix[0])

<1x4086 sparse matrix of type '<class 'numpy.float64'>'
	with 22 stored elements in Compressed Sparse Row format>

In [42]:
def top_cosine_similarity(request, zip5, model, vectorizedDF, doctorDF, topX=5, milesMax=100):
    """Calculate cosine similarity between input/request and all doctor text tf-idf vectors.
        
        request:        input open form text describing health professional needs
        model:          tf-idf vectorizer model object
        vectorizedDF:   doctor tf-idf vectorized sparse matrix
        doctorDF:       pre-vectorized doctor dataframe of key attributes
        topX:           number of top most-similar health professionals
        
        Return: dataframe of topX most-similar health professionals, with key attributes from profile
    """
    # Copy input and print shapes
    print("Doctor dataframe:   ", doctorDF.shape)
    outDF = doctorDF.copy()
    print("Doctor tfidf DF:", vectorizedDF.shape)
    print()
    
    # calculate distance between zip5 and doctorDF["Zip"]
    outDF["Distance"] = outDF["Zip"].apply(distance_zip, args=(zip5,))
    print("Distance:")
    print(outDF["Distance"].describe())
    
    # transform request text into tfidf vectorized sparse matrix
    requestVector = model.transform([request])
    print("Request vector:", requestVector.shape, "\nSum:", requestVector.sum())
    
    # calculate cosine similarity
    cosineArray = cosine_similarity(requestVector, vectorizedDF, dense_output=True)[0]
    
    # append scores to doctorDF
    outDF["Score"] = cosineArray
    print("\nScores:")
    print(outDF["Score"].describe())
    
    # keep only within mile radius
    print("Columns:", outDF.columns)
    outDF = outDF[outDF["Distance"]<=100]
    print("Within radius:", outDF.shape)
    
    # sort doctorDF by desc cosine similarities
    outDF = outDF.sort_values(by=["Score"], axis=0, ascending=False, inplace=False).head(topX).reset_index()
    
    return outDF

In [81]:
sample_request = "holistic integrative care"
zip5 = '37831'

In [82]:
docRankedDF = top_cosine_similarity(sample_request, zip5, loaded_vectorizer, tfidfMatrix, fullDoctorProcDF, topX=5)
docRankedDF

Doctor dataframe:    (1178, 14)
Doctor tfidf DF: (1178, 4086)

Distance:
count    860.000000
mean      23.406434
std        3.810078
min        9.053126
25%       23.748983
50%       23.748983
75%       23.748983
max       50.346442
Name: Distance, dtype: float64
Request vector: (1, 4086) 
Sum: 1.6285281530208984

Scores:
count    1178.000000
mean        0.007047
std         0.018759
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         0.201788
Name: Score, dtype: float64
Columns: Index(['Name', 'Profession', 'Provider', 'Website', 'Phone', 'Zip', 'Gender',
       'Languages', 'Bio', 'About', 'Clinical_Interest', 'Specialties',
       'Raw_Text', 'Text', 'Distance', 'Score'],
      dtype='object')
Within radius: (860, 16)


Unnamed: 0,index,Name,Profession,Provider,Website,Phone,Zip,Gender,Languages,Bio,About,Clinical_Interest,Specialties,Raw_Text,Text,Distance,Score
0,648,"Richard H. Mays, MD",Physician,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920,Male,English,Faith Based Family and Integrative Health Care,Dr. Mays is a board certified Family Physician...,"Family Practice, Integrative Medicine",Family Medicine,UTMC Physician Male English Faith Based Famil...,utmc physician male english faith based family...,23.748983,0.201788
1,192,"Cara C. Connors, MD",Family Medicine and Obesity Medicine Physician,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37701,Female,English,"Faith, Family, Holistic Living",I am passionate about helping patients with th...,obesity medicine,Family Medicine,UTMC Family Medicine and Obesity Medicine Phy...,utmc family medicine and obesity physician fem...,21.422726,0.191448
2,403,"Christopher D. Harris, MD, FACS","Integrative and Functional Medicine , Consultant",UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920,Male,English,"I’m a life long learner with many interests, i...",I practiced urology in Knoxville for 29 years....,"Gut Health Restoration, balancing hormones, \...","Urology, Integrative Health","UTMC Integrative and Functional Medicine , Co...",utmc integrative and functional medicine consu...,23.748983,0.165594
3,1024,"Christina M. Stockwell, D.O.",OB/GYN,UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37920,Female,"English, Japanese","I love to travel, watch TN football, and spend...",Raised in a military family had the privilege ...,"minimally invasive gynecologic surgery, PCOS,...","Gynecology, Obstetrics","UTMC OB/GYN Female English, Japanese I love t...",utmc obgyn female english japanese i love to t...,23.748983,0.144521
4,110,"Jennifer H. Brinkmann, MD","Family Medicine Physician, Pediatrician",UTMC,https://www.utmedicalcenter.org/find-a-doctor/...,,37922,Female,"English, Spanish","Dog lover, avid reader (always looking for a g...","I'm native to East TN, born in Chattanooga, an...",Primary holistic care of families,"Internal Medicine, Pediatrics","UTMC Family Medicine Physician, Pediatrician ...",utmc family medicine physician pediatrician fe...,12.68041,0.137843


In [83]:
docRankedDF["Text"][0]

'utmc physician male english faith based family and integrative health care is a board certified practicing in south knoxville he married with 3 child 5 grandchild enjoys organic gardening hiking photography also medicine strives to incorporate natural product diet into his practice ha strong god very active church 37920'

In [84]:
docRankedDF["Text"][1]

'utmc family medicine and obesity physician female english faith holistic living i am passionate about helping patient with their wellness a life long learner enjoy time friend love being outside in nature by water 37701'

In [85]:
docRankedDF["Text"][2]

'utmc integrative and functional medicine consultant male english im a life long learner with many interest including travel gardening reading i practiced urology in knoxville for 29 year graduated from 2 fellowship 2018 board certified 2019 practicing full time since gut health restoration balancing hormone bioidentical replacement female 37920'

In [86]:
docRankedDF["Text"][3]

'utmc obgyn female english japanese i love to travel watch tn football and spending time with loved one raised in a military family had the privilege world understand all different aspect of medicine subsequently pursued do degree for it holistic approach manipulation being able care my patient through various stage life minimally invasive gynecologic surgery pcos lgbtq infertility hormone replacement obstetrics gynecology 37920'

In [87]:
docRankedDF["Text"][4]

'utmc family medicine physician pediatrician female english spanish dog lover avid reader always looking for a good book recommendation love to cook garden hike travel and any activity involving water im native east tn born in chattanooga i trained here with medical school residency memphis moved knoxville be closer live my husband our three daughter worklife balance that ut allows me the flexibility able present when need primary holistic care of internal pediatrics 37922'