In [1]:
#import libraries
import pandas as pd
import numpy as np
from datascience import *
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use("fivethirtyeight")

# Infodemiology: Covid and Babies
by: Sylvia Guendelman, Chaya Bakshi, Chloe Chu, Dharaa Upadhyaya, and Leena Usman

<img src="babies-face-masks.png" width="50%">

### Abstract

<b>Backgound:</b> In December 2019, a novel strain of coronavirus, SARS-CoV-2, was first detected in Wuhan, China spreading quickly to over 200 countries across the globe, and currently on December 6, 2020 there are over 14.8 million cases of covid-19 just in the U.S. During this worldwide pandemic, individuals are increasingly turning to search engines, such as Google, to inquire about specific health information and resources.

<b>Objective</b>: The purpose of this project is to investigate what concerns do Google users across the United States and Canada have when it comes to the coronavirus and babies. We applied a tested protocol which retrieves information from three Google APIs: Google Trends, Google Health Trends, and Google Custom Search.

<b>Methods</b>: Our protocol consisted of six steps: (1) brainstorming an initial search term, region, and timezones, (2) developing a master list of top search queries for the initial search term using Google Trends, (3) running the simulation 30 times and taking the mean across relative search volumes to gather the relative search volume using Google Health Trends, (4) pulling follow-up queries by repeating steps two and three, (5) comparing relative search volumes of similar terms to neighboring countries, (6) determining popular websites using Google Custom Search.

<b>Results:</b> We successfully tested the methodology on the initial search term covid babies. We identified top search queries for covid babies, of which covid in babies was the most popular and obtained the relative search volume for the top queries: relative search volume for covid in babies was 0.000776, and the most popular website corresponding to this search term was Mayo Clinic.

### Background

We started our research focusing on the intersection between pregnancy and COVID-19. Due to implications of COVID-19 infection during pregnancy remaining unclear, many pregnant women have been searching Google. Our exploration showed that from March 2020 on, information seekers increased their interest on how covid affects pregnancy. Many of the top searches as seen with the Google Trends website were of covid symptoms while pregnant, other pregnancy concerns were related to healthcare access including if it was safe to go to the hospital for delivery, home birth, and if telehealth was safe. The information from the Center for Disease Control (CDC) in September 2020 states that pregnant women might be at risk for severe COVID-19 compared to non-pregnant women. Specifically, “pregnant women are 5.4 times more likely to be hospitalized” (CDC, 2020). Pregnant women with COVID-19 could also be more susceptible to adverse pregnancy outcomes like preterm birth and compromised maternal health, but the relationship to COVID is unknown. In addition, the systemic reviews and case reports from PubMed searches reveal inconclusive and insufficient evidence of COVID-19 vertical transmission from mother to baby. according to research literature from June 2020 to present. In general, the literature from the pregnancy and coronavirus topic informs us of the following: common maternal complications are prematurely ruptured membranes and intrauterine distress, neonatal symptoms include shortness of breath, and no conclusive evidence supporting vertical transmission. In another literature search (utilizing PubMed and Google Scholar) investigating the topic of home deliveries and telehealth, literature reviews identified particular themes such as health care provider support for telehealth delivery, environmental attributes, and provider-patient interactions during telehealth delivery. However, most of these topics were either not accessible on the Google APIs or these tools did not allow for more in depth analysis of the pregnancy symptoms or concerns by geolocation.

As a result, we decided to turn our attention to babies, since healthy birth outcomes are one of the most important desired outcomes for parents-to-be. In the past, the Zika epidemic had vast psychological impacts on pregnant women due to its ability to be passed from a pregnant woman to her fetus, and now, there has been similar heightened anxiety and emotional stress during the Coronavirus pandemic as well (NBC News). Thus, we decided to explore the intersection between babies and COVID-19 since children under the age of one appear to be at higher risk of severe illness with COVID-19 than older children. This is due to their immature immune systems and smaller airways, making them more susceptible to develop issues with their breathing if they have respiratory infections. (Mayo Clinic) 
Currently, the known evidence COVID in babies is that those under 1 year old and children with certain underlying conditions may be more likely to have severe illness from COVID-19. According to the CDC, while most children have mild symptoms, some can get severely ill and die (CDC, 2020). The possible underlying conditions include asthma, diabetes, heart disease, and immunosuppression. According to some search data, it seems that parents’ main concerns about COVID in babies revolve around breastfeeding (Is it safe to breastfeed if mother or baby test positive?) and symptoms of rashes (possible concern is, “Are my babies rashes linked to COVID-19?”). The public health literature team did further research (literature reviews, news articles, and otherwise) to better understand these concerns. 


Once the data team made the switch to babies and coronavirus, the literature team expanded our literature search to dig deeper into popular internet searchers’ concerns on follow-up searches such as “texas babies covid”, “kawasaki disease” and its focus in New York, “toxic shock syndrome”, and the names “john travolta” and “jett travolta”. Firstly, the information we found regarding John and Jett Travolta seemed to be another irrelevant search term since John Travolta is a celebrity, and happens to have had a son who was diagnosed with Kawasaki Disease (popular search term) at a young age. Much more relevant was the “texas babies covid” search term, which upon reviewing articles and literature, revealed a situation in which 85 babies under the age of 1 tested positive for COVID-19 in Nueces County, Texas. The data confirms that 90% of children had mild symptoms, while 10% became very ill (Dong et al, 2020). Furthermore, the research on Kawasaki Disease, Toxic Shock Syndrome, and rashes symptoms in its relation to COVID-19, especially in children, is currently being investigated by the CDC. The CDC has called the condition where different organs become inflamed Multisystem Inflammatory Syndrome (MIS-C). Because there have been case reports of toxic shock-like and Kawasaki-like symptoms of children going to the ER, namely New York City, we hypothesize a growing concern for children and babies with these inflammatory symptoms manifested in Google searches. The information on rashes, Kawasaki Disease, and Toxic Shock can help us interpret the main concerns and top queries a bit more.

<b>Canada VS United States Health System</b>

The United States requires individuals to fund their own health care insurance with the only exception being if you qualify for a government program such as Medicare, Medicaid, or the Veterans Health Administration. Additionally, health insurance in the US is often tied to employment, with employers providing options for different insurance packages to their employees. The type of coverage varies and coverage is not guaranteed.
In Canada, on the other hand, the federal government provides funding for health care to each provincial government as long as the province follows regulations detailed in the Canada Health Act. This funding provides all Canadian citizens with health insurance through the national health care system.
Overall, Canada provides universal access to health care for its citizens, whereas nearly one in five non-elderly Americans is uninsured. In Canada, coverage is not tied to your job or income, everyone is in the same system, and enjoys equal access. (Materia Socio-Medica)



<b>How the US versus Canada responded to the pandemic</b>

Canada and the United States are similar in many ways. With the COVID-19 outbreak, the two have similar risk profiles: similar population ages and similar distance from the earlier hotspots such as Europe and East Asia. Yet the outbreak has been dramatically worse in the United States than Canada. The United States is currently seeing around twice as many coronavirus cases as Canada and 30 percent more deaths. It’s important to note that Canada performed better during crucial moments. In the early stages of the pandemic, Canada was able to ramp up testing quickly, enabling it to better isolate the sick, trace contacts, and limit spread. Generally, this is because Canada’s single-payer national health-care system has advantages, allowing people to seek care and testing for COVID-19 without fear of out-of-pocket costs, leading to Canadian testing rates being consistently higher than the United States.  Additionally, The Canadian people have been less divided and more disciplined. Once lockdown measures were announced, they were strict, broadly uniform and widely followed, unlike the responses in the United States to lockdown measures. In Canada, there is a consensus that COVID-19 is a very serious health problem and there’s support from leaders across the political spectrum, which is different from the United States, where the Republican Party has many Coronavirus skeptic members. (NY Times)



### Research Question

What concerns do Google users in the United States and Canada have when it comes to coronavirus and babies?
+ add sub questions

Initially, our team set out to explore covid and pregnancy symptoms. After weeks of uninteresting results that yielded to indication of what users were concerned about when it came to pregnancy and covid, we decided to place our focus on covid and babies instead. We included the United States and Canada to understand how searches in these two English-speaking countries compare and contrast, and by how much through their relative search volumes.

### Methodology

<b>Data Science Methodology</b>

We obtain our data by using three Google APIs: Google Trends, Google Health, and Google Custom Search.

Google Trends API provides us with the top search topics and top search queries given an initial search term for a specified time period and location. Accompanied with the initial search term, it also returns the corresponding Relative Search Index, which is the search interest relative to all searches during a specific period of time.

Google Health Trends API generates the Relative Search Volume for a list of top queries in a specific region and time period. Because each run of the API is a random sample, the simulation was run 30 times for the same search term, and the mean was taken across all relative search volumes in order to determine the final relative search volume for the master list.

Google Custom Search provides us with the list of top websites that people are shown when browsing Google with the initial search term. 

When brainstorming what terms to use as the initial search term, it is important to keep it as generic and simple as possible so that the API may gather more data efficiently. Because we wanted to explore what concerns users’ are having when it comes to coronavirus and babies, we entered <em><b>covid babies</b></em>, <b><em>covid 19 babies</em></b>, <em><b>coronavirus babies</b></em>, and <em><b>corona babies</b></em> as initial search terms into the API. After much speculation, we came to a conclusion that they all returned similar queries, where covid babies returned queries that were more specific. Therefore, we decided to use covid babies as our initial search term, the U.S. as the geolocation, start date to be January 1, 2020, and the end date to be October 1, 2020. 

Using covid babies as our initial search term, the API was run simultaneously 30 times, and the means across the relative search index and relative search volume were taken. Then, the queries that the API returns were taken as a new set of initial search terms and the process was repeated until the API yielded us no further results. After crafting a master list and gathering their corresponding relative search indices and relative search volumes, a Graph Visualization was created in order to portray user search behavior. 

While we wanted to go into region-specific searches, the API continuously yielded us a NoneType error, meaning the data isn’t available due to privacy issues. Therefore, we decided to compare searches to our neighboring, English-speaking country, Canada. The same procedure was repeated and finally, we extracted similar top-level quieres that the U.S. and CA shared in common in order to produce a bar plot to compare how their relative search volumes differ in the two countries.

Lastly, the Custom Search API can only yield data relevant to the U.S., so a list of popular websites was extracted and demonstrated in a bar plot as well.


#### Google Trends Methodology

During this pandemic, with many recent mothers being worried and cautious of their newborns, we have seen, though the Google Trends website that there have been increased searches for terms such as “covid in babies”, “covid in babies symptoms”, “covid rash in babies”, and “covid test for babies” from March onwards.

This search then led us to look at the interest over time of the search term “covid babies” within different geolocations. We found an interesting trend with the majority of high interest over time for “covid babies” being in red-states and majority of low interest over time in blue-states. 

Next, we began looking at the search term “covid babies” between two geolocations - the United States and Canada. Many more related queries appeared for the United States, however, most inquired about the symptoms of covid in babies. With Canada, there were 6 related queries, similar to the ones from the US (ie: symptoms of covid, can babies get covid). When looking at the interests over time for the search term “covid babies” in Canada vs the US, you can see that initially, around March, there was significantly more interest in Canada than in the US. Then in July, there was a huge peak for the term in the US. However, overall, there is higher interest over time for the term in Canada than in the US


<b>Literature Search Methodology</b>

We communicated closely with the data science team, and based our literature searches on follow up search terms with high relative search volume, as well as gathering preliminary information on the main topics such as Babies and Covid-19 and Pregnancy and Covid-19. The public health literature team mainly used PubMed and Google Scholar in the literature search process, filtering the dates from as early as August 2019, though some articles even earlier to understand concepts and symptoms of other linked diseases.  For both the pregnancy and babies topics, we tried to include some key search terms in the search bar in order to find the most useful and relevant research articles that would inform us of the concerns that seemed to arise in the follow-up search terms generated by the data team. For search terms such as ‘texas babies and covid’ and ‘jett travolta’ we utilized Advanced Google Search to find relevant local news articles explaining the context of the searches. Most of the literature searches and articles we have looked into are from 2018, 2019, and 2020. We also focused on the United States (babies and COVID-19 topic), but eventually expanded our searches to Canada. The research we found relevant also include studies from China, U.S., Canada, and some regions in Europe.

To consolidate our findings, we created a Literature Search Spreadsheet categorizing each study and tabulating important details and conclusions. Then, we summarized the main findings on a document, categorizing by search term. 

### The Data

To start off, the following information was entered into the API:


- <b>Initial Search Term:</b> covid babies
- <b>Geolocation:</b> US
- <b>Timeline:</b> January 1, 2020 - October 1, 2020 

After running the simulation 30 times and taking the mean of `Relative Search Volume` across all 30 runs, the following table of level one queries was returned:

In [2]:
raw_level_one_queries_US = pd.read_csv("raw_level_one_queries.csv").drop(columns=['Unnamed: 0'])
raw_level_one_queries_US

Unnamed: 0,Related Query,Relative Search Volume
0,covid 19,0.772093
1,covid symptoms,0.119134
2,covid 19 symptoms,0.05311
3,symptoms of covid,0.029
4,symptoms of covid 19,0.015203
5,signs of covid,0.007375
6,covid in babies,0.00079
7,covid 19 babies,0.000664
8,covid symptoms babies,0.000427
9,covid in babies symptoms,0.000342


Then after carefully filtering our terms based on their relevancy to our ultimate objective and similarity to other queries in the dataset, the following, cleaned dataset was used as our <b>master list</b>, or in other words, <b>level one queries</b> which sit at the top of our Graph Visualization.

In [3]:
cleaned_level_one_queries_US = pd.read_csv("cleaned_level_one_queries.csv").drop(columns=['Unnamed: 0'])
cleaned_level_one_queries_US

Unnamed: 0,Related Query,Relative Search Volume
0,covid in babies,0.00079
1,covid 19 babies,0.000664
2,covid symptoms babies,0.000427
3,covid 19 in babies,0.000283
4,babies and covid,0.000266
5,babies with covid,0.000225
6,can babies get covid,0.000116
7,texas babies covid,7.8e-05
8,covid rash in babies,6.6e-05
9,do babies get covid,6.5e-05


This set of level one queries was then passed into the API and the same process was followed to pull the follow-up terms (level two, three, etc..) until the API yielded no further data.

Following the same protocal, let's take a look at Canada's level one queries.

In [4]:
cleaned_level_one_queries_CA = pd.read_csv("cleaned_level_one_queries_CA.csv").drop(columns=['Unnamed: 0', 'Country'])
cleaned_level_one_queries_CA

Unnamed: 0,Related Query,Relative Search Volume
0,babies and covid 19,0.002843
1,can babies get covid,0.002399
2,can babies get covid 19,0.001155
3,covid 19 babies,0.012589
4,covid 19 symptoms,0.959913
5,covid and babies,0.005159
6,covid in babies,0.009931
7,covid in babies symptoms,0.004036
8,covid symptoms in babies,0.004036
9,signs of covid 19 in babies,0.000857


Lastly, the websites for level-one queries of the US was pulled, as shown below.

In [5]:
websites_US = pd.read_csv('websites.csv').drop(columns = ['Unnamed: 0'])
websites_US

Unnamed: 0,query,link,displayLink,site_probability
0,covid in babies,https://www.mayoclinic.org/diseases-conditions...,www.mayoclinic.org,0.000276
1,covid in babies,https://www.hopkinsmedicine.org/health/conditi...,www.hopkinsmedicine.org,0.000158
2,covid in babies,https://www.cdc.gov/coronavirus/2019-ncov/need...,www.cdc.gov,0.000118
3,covid 19 in babies,https://www.mayoclinic.org/diseases-conditions...,www.mayoclinic.org,0.000103
4,babies and covid,https://www.mayoclinic.org/diseases-conditions...,www.mayoclinic.org,9.4e-05
5,babies with covid,https://www.mayoclinic.org/diseases-conditions...,www.mayoclinic.org,8.1e-05
6,covid in babies,https://www.healthline.com/health/baby/coronav...,www.healthline.com,6.3e-05
7,covid 19 in babies,https://www.hopkinsmedicine.org/health/conditi...,www.hopkinsmedicine.org,5.9e-05
8,covid in babies,https://www.cdc.gov/coronavirus/2019-ncov/hcp/...,www.cdc.gov,5.5e-05
9,babies and covid,https://www.cdc.gov/coronavirus/2019-ncov/need...,www.cdc.gov,5.4e-05


### Results

### Data Team

#### User Behavior (US)
link: https://drive.google.com/file/d/1Y81BINhnHP0whHnrAoYweuIHKA83WFoF/view?usp=sharing

<img src="covid_babies_final.png" width="100%">

<b>User Behavior (Canada)</b>

link: https://drive.google.com/file/d/10_ViQsiQKdBtLpBLfhsIR_coeLo5Zb0l/view?usp=sharing

<img src="canada.png" width="100%">

#### Top Websites (US)

<img src="websites.png" width="100%">

### Google Trends

<img src="image.png" width="100%">

<img src="image (1).png" width="100%">

<img src="image (2).png" width="100%">

### Literature Search 

In [9]:
lit_search = pd.read_csv("Literature_Review_Babies_COVID.csv")
lit_search

Unnamed: 0,Topic,Author,Year,Reseach Question,Study Design,Population (Who's participating),Sample Size (N),Main Results (Findings),Conclusions (what did the study find),"Comments (issues, limitations...)",Link
0,Covid-19 in pregnant women and babies: What pe...,Henry Rozycki and Sailesh Kotecha,6/13/20,What are medical considerations for mothers an...,Literature Review,,,- Covid-19 infection in babies and infants of ...,-viral carriage orevalence based on universal ...,-no risks for breastfeding and for milk banks ...,https://www.sciencedirect.com/science/article/...
1,COVID-19 in babies: Knowledge for neonatal care,JanetGreenaJuliaPettybPatriciaBromleycKarenWal...,1-Oct-20,What are the paths for neonatal care for babie...,Literature Review,"lit review, no particular population",N/A; lit review,"-To date, emerging evidence to support vertica...","-While babies have been infected, the naivete ...",,https://www.sciencedirect.com/science/article/...
2,Coronavirus (COVID 19) Infection in Pregnancy,Edgar Iván Ortiz et al.,Jun-20,What are the general guidelines for managing c...,,,,-There are no intrauterine infection confirmed...,-others and newborn babies might be allowed to...,some findings are different from most studies ...,http://www.scielo.org.co/scielo.php?script=sci...
3,Manifestations in Neonates Born to COVID-19 Po...,"Parul Jain, Anup Thakur, Neelam Kler and Panka...",5-Jun-20,,lit reviews of cases studies,2 babies,2,#NAME?,,,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7...
4,Breastfeeding mothers with COVID-19 infection:...,Augusto Pereira et al,8-Aug-20,What should infected mothers do about breastfe...,22 case studies of newborns to mothers with CO...,newborns to mothers with covid-19,22,"- Out of 22 mothers, 20 (90.9%) chose to breas...",- breastfeeding in newborns of mothers with CO...,,https://internationalbreastfeedingjournal.biom...
5,texas babies covid' search term -> \nthere wer...,https://www.cnn.com/2020/07/18/health/texas-in...,the county had a huge spike in sept 21 and cas...,,,,,,,,https://abcnews.go.com/Health/covid-19-cases-i...
6,Clinical characteristics and intrauterine vert...,"Huijun Chen, PhD et al",13-Mar-20,evaluate the clinical characteristics of COVID...,"case review \n-tested amniotic fluid, cord blo...",9 pregnant women with laboratory-confirmed COV...,9,-All nine patients had a caesarean section in ...,no evidence for intrauterine infection caused ...,,https://www.sciencedirect.com/science/article/...
7,Novel coronavirus infection in \nnewborn babie...,"Zhi-Jiang Zhang, Xue-Jie Yu, Tao Fu, Yu Liu, Y...",8-Apr-20,How does the infection look in newborns?,Retrospective study,4 infections in newborn babies in China as of ...,4,-4 nucleic acid-confirmed neonatal infections ...,newborn babies are susceptible to SARS-CoV-2 i...,Informed consent was waived as part of a publi...,https://erj.ersjournals.com/content/early/2020...
8,Possible coronavirus disease 2019 pandemic and...,"Marzollo, Roberto MD, et al",Sep-20,Describing a vertical transmission case,Case Study,case of early-onset neonatal COVID-19 infectio...,1,-pregnant mother was admitted to hospital for ...,-This case confirms the possibility of transmi...,im asumming this was a vaginal birth and not a...,https://journals.lww.com/pidj/Fulltext/2020/09...
9,Epidemiology of COVID-19 Among Children in China,"Yuanyuan Dong, et al",1-Jun-20,What are the epidemiological characteristics a...,Case Series,2135 patients with COVID-19 reported to the CD...,2135,- 728 (34.1%) laboratory-confirmed cases and 1...,- Children of all ages appeared susceptible to...,- Unable to assess clinical characteristics of...,https://pediatrics.aappublications.org/content...


### Notes

<b>Challenges:</b>
- NoneType error → hindered our ability to expand our research work to state-specific.
- Ambiguous goal
    - We initially set out to explore pregnancy and covid, but after weeks of trying to find interesting information, we decided it was best to pursue another path so we decided to go with babies and covid.
    - Had we decided to go with babies and covid in the first place, we may have been able to develop our research further.
- Relative Search Index v Relative Search Volume
    - Struggled to understand the difference between the two and learn how to extract each.

<b>Interesting Information:</b>
- High amount of concern revolves around rashes in babies.
- Travolta cases
- Trend of covid and baby searches in TX
- Kawasaki disease
- Trend of these searches in NY
- Canada exceeded the US in relative search volume for all similar terms compared.


### Next Steps

- Come up with hypothesis , research Canada lit team
