# Search Engine For Candidate Sentences

### What do we know about non-pharmaceutical interventions?
##### COVID-19 Open Research Dataset Challenge (CORD-19)

Task Details
What do we know about the effectiveness of non-pharmaceutical interventions? 

What is known about equity and barriers to compliance for non-pharmaceutical interventions?

Specifically, we want to know what the literature reports about:
- Guidance on ways to scale up NPIs in a more coordinated way (e.g., establish funding, infrastructure and authorities to support real time, authoritative (qualified participants) collaboration with all states to gain consensus on consistent guidance and to mobilize resources to geographic areas where critical shortfalls are identified) to give us time to enhance our health care delivery system capacity to respond to an increase in cases.
- Rapid design and execution of experiments to examine and compare NPIs currently being implemented. DHS Centers for Excellence could potentially be leveraged to conduct these experiments.
- Rapid assessment of the likely efficacy of school closures, travel bans, bans on mass gatherings of various sizes, and other social distancing approaches.
- Methods to control the spread in communities, barriers to compliance and how these vary among different populations..
- Models of potential interventions to predict costs and benefits that take account of such factors as race, income, disability, age, geographic location, immigration status, housing status, employment status, and health insurance status.
- Policy changes necessary to enable the compliance of individuals with limited resources and the underserved with NPIs.
- Research on why people fail to comply with public health advice, even if they want to do so (e.g., social or financial costs may be too high).
- Research on the economic impact of this or any pandemic. This would include identifying policy and programmatic alternatives that lessen/mitigate risks to critical government services, food distribution and supplies, access to critical household supplies, and access to health diagnoses, treatment, and needed care, regardless of ability to pay.


## Demonstration of how to use the simple search engine for fetching relevant sentences

Let's import our search engine for `src` directory.

First, one needs to set the Python source files environment variables for Juptyer Notebook. If you haven't done this, please run those two command BEFORE running Juptyer Notebook:
1. `export PYTHONPATH=/path/to/covid19/src`
2. `export JUPYTER_PATH=/path/to/covid19/src`

In [2]:
from searchengine import SearchEngine
from pprint import pprint
import os

In [3]:
data_dir = "/Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/"

Initialize out SearchEngine object with:
1. Sentences metadata
2. bi-gram model
3. tri-gram model
4. Trained FastText vectors

In [4]:
search_engine = SearchEngine(
    os.path.join(data_dir, "sentences_with_metadata.csv"),
    os.path.join(data_dir, "covid_bigram_model_v0.pkl"),
    os.path.join(data_dir, "covid_trigram_model_v0.pkl"),
    os.path.join(data_dir, "fasttext_no_subwords_trigrams/word-vectors-100d.txt"),
)

Loading CSV: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/sentences_with_metadata.csv and building mapping dictionary...
Finished loading CSV: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/sentences_with_metadata.csv and building mapping dictionary
Loaded 217389 sentences
Loading bi-gram model: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/covid_bigram_model_v0.pkl


  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


Finished loading bi-gram model: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/covid_bigram_model_v0.pkl
Loading tri-gram model: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/covid_trigram_model_v0.pkl
Finished loading tri-gram model: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/covid_trigram_model_v0.pkl
Loading fasttext model: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/fasttext_no_subwords_trigrams/word-vectors-100d.txt
Finished loading fasttext model: /Users/Sarah/Documents/VIPProjects/POC/COVID/Corpus/data/fasttext_no_subwords_trigrams/word-vectors-100d.txt


Simple search function that gets a list of keywords to search:

In [5]:
def search(keywords, optional_keywords=None, top_n=10, synonyms_threshold=0.8, only_sentences=False):
    print(f"\nSearch for terms {keywords}\n\n")
    results = search_engine.search(
        keywords, optional_keywords=optional_keywords, top_n=top_n, synonyms_threshold=synonyms_threshold
    )
    print("\nResults:\n")
    
    if only_sentences:
        for result in results:
            print(result['sentence'] + "\n")
    else:
        pprint(results)

Guidance on ways to scale up NPIs in a more coordinated way (e.g., establish funding, infrastructure and authorities to support real time, authoritative (qualified participants) collaboration with all states to gain consensus on consistent guidance and to mobilize resources to geographic areas where critical shortfalls are identified) to give us time to enhance our health care delivery system capacity to respond to an increase in cases.

### Guidance on ways to scale up NPIs in a more coordinated way (e.g., establish funding, infrastructure and authorities to support real time, authoritative (qualified participants)

In [6]:
search(keywords=["non-pharmaceutical","equity","equal","compliance","balance"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=20, only_sentences=True)


Search for terms ['non-pharmaceutical', 'equity', 'equal', 'compliance', 'balance']




  return (m / dist).astype(REAL)


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['nonpharmaceutical', 'equity', 'equal', 'compliance', 'balance', 'school_closures_social_distancing', 'pharmaceutical_measures', 'nonpharmaceutical_measures', 'community_mitigation', 'nonpharmaceutical_public_health_interventions', 'nonpharmaceutical_intervention', 'nonpharmaceutical_interventions_eg', 'nonpharmaceutical_interventions_npi', 'nonpharmacological_interventions', 'use_nonpharmaceutical_interventions', 'health_equity', 'social_equity', 'demandside', 'public_goods', 'financing', 'financial_risk', 'access_essential_medicines', 'sustainability']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_corona

#### collaboration with all states to gain consensus on consistent guidance

In [21]:
search(keywords=["non-pharmaceutical","national","consensus","guidance"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=20, only_sentences=True)


Search for terms ['non-pharmaceutical', 'national', 'consensus', 'guidance']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['nonpharmaceutical', 'national', 'consensus', 'guidance', 'school_closures_social_distancing', 'pharmaceutical_measures', 'nonpharmaceutical_measures', 'community_mitigation', 'nonpharmaceutical_public_health_interventions', 'nonpharmaceutical_intervention', 'nonpharmaceutical_interventions_eg', 'nonpharmaceutical_interventions_npi', 'nonpharmacological_interventions', 'use_nonpharmaceutical_interventions', 'guidelines', 'international_guidelines']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

According

#### mobilize resources to geographic areas where critical shortfalls are identified)

In [29]:
search(["mobilize resource","delivery system", "health care"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=20, only_sentences=True)


Search for terms ['mobilize resource', 'delivery system', 'health care']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['mobilize', 'resource', 'delivery_system', 'health_care', 'resources', 'health_services', 'health_care_delivery', 'medical_care', 'healthcare', 'delivery_health_care', 'health_care_system', 'care', 'health_care_services']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

[7] [8] [9] [10] Infection transmission between COVID19 patients and healthcare workers has also been documented. 11 Given the current status of the COVID19 outbreak, the US Surgeon General, 12 Centers for Disease Control and Prevention (CDC), 

#### Rapid design and execution of experiments to examine and compare NPIs currently being implemented.

In [45]:
search(["non-pharmaceutical", "rapid experiment","multidisciplinary","homeland security science"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=20, only_sentences=True)


Search for terms ['non-pharmaceutical', 'rapid experiment', 'multidisciplinary', 'homeland security science']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['nonpharmaceutical', 'rapid', 'experiment', 'multidisciplinary', 'homeland_security', 'science', 'school_closures_social_distancing', 'pharmaceutical_measures', 'nonpharmaceutical_measures', 'community_mitigation', 'nonpharmaceutical_public_health_interventions', 'nonpharmaceutical_intervention', 'nonpharmaceutical_interventions_eg', 'nonpharmaceutical_interventions_npi', 'nonpharmacological_interventions', 'use_nonpharmaceutical_interventions', 'interdisciplinary', 'collaborative_approach', 'pccm', 'multidisciplinary_teams', 'specialty_society', 'applied_epidemiology', 'multidisciplinary_research', 'interdisciplinary_team', 'community_practice', 'multidisciplinary_team', 's_emergency_operations', 'departments_health', 'response_system_mmrs', 'citizen_corps', 'euroatlantic', 'australian_cbrn_data', 'offic

#### Rapid assessment of the likely efficacy of school closures, travel bans, bans on mass gatherings of various sizes, and other social distancing approaches.

In [57]:
search(['travel_ban',"school",'social_distance'],
       optional_keywords=["efficacy", "effective", "assesment","control"],
       top_n=10, only_sentences=True)


Search for terms ['travel_ban', 'school', 'social_distance']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['travelban', 'school', 'socialdistance']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['efficacy', 'effective', 'assesment', 'control', 'effectiveness', 'most_effective', 'very_effective', 'highly_effective', 'symptom_scoring', 'yos', 'sinus_imaging', 'determination_etiology', 'prehct_screening', 'feasible_clinical_practice', 'rulingout', 'quantify_degree', 'can_used_processofcare', 'need_intravenous_fluid']

Results:

Incorporating social mixing patterns in different contexts and at different times of the week (weekend vs weekday) into mathematical models, it is possible to evaluate the potential effectiveness of a range of control measures targeting respiratory infections, including school closures [4, 14] and social distancing [11] .

Earlier studies on the effectiveness of spread control measures during infectious d

#### Methods to control the spread in communities, barriers to compliance and how these vary among different populations

In [60]:
search(['community',"compliance",'population',"effective","control"],
       optional_keywords=[ "new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['community', 'compliance', 'population', 'effective', 'control']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['community', 'compliance', 'population', 'effective', 'control', 'communities', 'within_community', 'populations', 'most_effective', 'very_effective', 'highly_effective']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

Population migration from rural communities to urban areas for employment, as well as the wild animal protection policy changes in China in recent years, have led to a perceived overall reduction in activities such as household animal raising and wildlife trade. 30, 31 Protective atti

#### Models of potential interventions to predict costs and benefits that take account of such factors as race, income, disability, age, geographic location, immigration status, housing status, employment status, and health insurance status.

In [67]:
search(["intervention", "benefit_cost","housing", "demographic","income", "age", "immigration","census"],
       optional_keywords=["model", "principal_component","data"],
       top_n=10, only_sentences=True)


Search for terms ['intervention', 'benefit_cost', 'housing', 'demographic', 'income', 'age', 'immigration', 'census']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['intervention', 'benefitcost', 'housing', 'demographic', 'income', 'age', 'immigration', 'census', 'interventions', 'formative_phase', 'fixed_transshipment_costs', 'industryspecific_factors', 'issue_relates', 'energy_carbon', 'feasibility_cost', 'public_pension_entitlement_unmet', 'tourism_demand_volatility', 'perceptionbased', 'airport_performance', 'demographics', 'demography', 'sociodemographic', 'incomes', 'household_income', 'earnings', 'premiums', 'disposable_income', 'individual_income', 'purchasing_power', 'chinese_visitors', 'outofpocket', 'family_income', 'emigration']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['model', 'principalcomponent', 'data', 'models']

Results:

Ratio from aggregate case data Assuming homogeneous attack rates across the differ

#### Policy changes necessary to enable the compliance of individuals with limited resources and the underserved with NPIs.

In [74]:
search([ "access","policy", "compliance","equitable"],
       optional_keywords=["NPIs", "non-pharmaceutical_intervention","npis"],
       top_n=10, only_sentences=True)


Search for terms ['access', 'policy', 'compliance', 'equitable']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['access', 'policy', 'compliance', 'equitable', 'policies', 'sustainable_financing', 'equitable_distribution', 'aff_ordable_prices', 'equitable_access_health_care', 'promotion_health', 'fi_nancing', 'universal_access', 'access_affordable', 'propoor', 'shared_responsibility']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['npis', 'nonpharmaceuticalintervention', 'npis', 'social_distancing', 'social_distancing_measures', 'pharmaceutical_interventions', 'school_closure_restraint', 'children_madang_attend_school', 'nonpharmaceutical_interventions', 'public_health_interventions', 'closing_schools', 'selfimposed_measures', 'quarantine_isolation', 'social_distancing', 'social_distancing_measures', 'pharmaceutical_interventions', 'school_closure_restraint', 'children_madang_attend_school', 'nonpharmaceutical_interventions', '

#### Research on why people fail to comply with public health advice, even if they want to do so (e.g., social or financial costs may be too high).

In [107]:
search(["fail_to_comply","social_burden", "health_advice","financial_cost", "socioeconomic"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['fail_to_comply', 'social_burden', 'health_advice', 'financial_cost', 'socioeconomic']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['failtocomply', 'socialburden', 'healthadvice', 'financialcost', 'socioeconomic', 'social_determinants', 'poverty_social', 'social_inequality', 'social_economic', 'societal_factors', 'socioeconomical', 'social_economic_factors', 'political_socioeconomic', 'socioeconomic_cultural', 'social_cultural_economic']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

Every location has a different socio-economic profile such that the growth rate of the epidemic (and hence R 0 ) might diffe

#### Research on the economic impact of this or any pandemic.

In [133]:
search(["basic service","economic_impact", "total_cost","financial_cost","million"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['basic service', 'economic_impact', 'total_cost', 'financial_cost', 'million']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['basic', 'service', 'economicimpact', 'totalcost', 'financialcost', 'million', 'services', 'billion', '25_million', '10_million']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

It had allocated €2800 million to all regions for health services and created a new fund with €1000 million for priority health interventions. 4 However, these amounts need to be seen against the background of almost a decade of austerity from which the health system has yet to recover. 5 Third, in service deli

#### This would include identifying policy and programmatic alternatives that lessen/mitigate risks to critical government services, food distribution and supplies, access to critical household supplies, and access to health diagnoses, treatment, and needed care, regardless of ability to pay.

In [140]:
search(["mitigate risk", "access to basic","food security","necessary resource", "gini coefficient"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['mitigate risk', 'access to basic', 'food security', 'necessary resource', 'gini coefficient']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['mitigate_risk', 'access_basic', 'food_security', 'necessary', 'resource', 'gini_coefficient', 'need_implemented', 'measures_should_taken', 'mitigating_strategies', 'limit_spread_disease', 'reduce_threat', 'reduce_risks_associated', 'measures_needed', 'should_implement', 'implement_preventive', 'avoid_spread', 'inequitable_access', 'weak_infrastructure', 'basic_primary_care', 'affordable_health_care', 'lack_investment', 'basic_health_services', 'basic_health_care', 'public_financing', 'clean_water_sanitation', 'safe_drinking_water_sanitation', 'livelihoods', 'human_health_wellbeing', 'environmental_sustainability', 'ecosystem_services', 'livelihood', 'food_security_nutrition', 'human_welfare', 'human_wellbeing', 'food_supply', 'biodiversity_conservation', 'required', 'needed', 'essential', 'resources',