# Search Engine For Candidate Sentences

## Demonstration of how to use the simple search engine for fetching relevant sentences

Let's import our search engine for `src` directory.

First, one needs to set the Python source files environment variables for Juptyer Notebook. If you haven't done this, please run those two command BEFORE running Juptyer Notebook:
1. `export PYTHONPATH=/path/to/covid19/src`
2. `export JUPYTER_PATH=/path/to/covid19/src`

In [1]:
from search.searchengine import SearchEngine
from pprint import pprint
import os

unable to import 'smart_open.gcs', disabling that module


In [2]:
data_dir = "../../../workspace/kaggle/covid19/data"

Initialize out SearchEngine object with:
1. Sentences metadata
2. bi-gram model
3. tri-gram model
4. Trained FastText vectors

In [3]:
search_engine = SearchEngine(
    os.path.join(data_dir, "sentences_with_metadata.csv"),
    os.path.join(data_dir, "covid_bigram_model_v0.pkl"),
    os.path.join(data_dir, "covid_trigram_model_v0.pkl"),
    os.path.join(data_dir, "fasttext_no_subwords_trigrams/word-vectors-100d.txt"),
)

Loading CSV: ../../../workspace/kaggle/covid19/data/sentences_with_metadata.csv and building mapping dictionary...
Finished loading CSV: ../../../workspace/kaggle/covid19/data/sentences_with_metadata.csv and building mapping dictionary
Loaded 249343 sentences
Loading bi-gram model: ../../../workspace/kaggle/covid19/data/covid_bigram_model_v0.pkl
Finished loading bi-gram model: ../../../workspace/kaggle/covid19/data/covid_bigram_model_v0.pkl
Loading tri-gram model: ../../../workspace/kaggle/covid19/data/covid_trigram_model_v0.pkl
Finished loading tri-gram model: ../../../workspace/kaggle/covid19/data/covid_trigram_model_v0.pkl
Loading fasttext model: ../../../workspace/kaggle/covid19/data/fasttext_no_subwords_trigrams/word-vectors-100d.txt
Finished loading fasttext model: ../../../workspace/kaggle/covid19/data/fasttext_no_subwords_trigrams/word-vectors-100d.txt


Simple search function that gets a list of keywords to search:

In [4]:
def search(keywords, optional_keywords=None, top_n=10, synonyms_threshold=0.8, only_sentences=False):
    print(f"\nSearch for terms {keywords}\n\n")
    results = search_engine.search(
        keywords, optional_keywords=optional_keywords, top_n=top_n, synonyms_threshold=synonyms_threshold
    )
    print("\nResults:\n")
    
    if only_sentences:
        for result in results:
            print(result['sentence'] + "\n")
    else:
        pprint(results)

Let's see some examples:

In [9]:
search(["demographic", "Sampling methods", "asymptomatic", "serosurveys", "convalescent samples", "screening", "ELISAs"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['demographic', 'Sampling methods', 'asymptomatic', 'serosurveys', 'convalescent samples', 'screening', 'ELISAs']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['demographic', 'sampling_methods', 'asymptomatic', 'serosurveys', 'convalescent_samples', 'screening', 'elisas', 'demographics', 'demography', 'sociodemographic', 'sampling_techniques', 'symptomatic', 'seroprevalence_studies', 'serological_surveys', 'serologic_surveys', 'seroepidemiological_studies', 'serological_surveillance', 'serologic_investigations', 'serosurvey', 'serological_studies', 'serological_investigations', 'convalescentphase_serum_sample', '≥_4fold_increase', 'increase_igg_titer', 'elisa_macelisa', 'acutephase_samples', 'convalescent_serology', 'serum_samples_taken', '4fold_increase_antibody_titer', 'fourfold_increase_titer', 'between_acuteand', 'elisa_tests', 'elisa', 'indirect_elisas', 'enzymelinked_immunosorbent_assays_elisas', 'two_elisas', 'enzymelinked_immunosorbe

In [12]:
search(["Denominators", "testing", "demographics", "sharing information"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['Denominators', 'testing', 'demographics', 'sharing information']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['denominators', 'testing', 'demographics', 'sharing_information', 'numerators', 'denominator_calculate', 'using_multiple_imputation', 'random_effects_models', 'burden_influenza_ah1n1pdm09', 'appropriate_denominators', 'fit_lognormal_distribution', 'reporting_completeness_proportions', 'sex_age_groups', 'national_estimates', 'demographic', 'demographic_characteristics', 'demographic_data', 'sociodemographic', 'questionnaire_addressed', 'demographic_features', 'informationsharing', 'communication_cooperation', 'electronic_communications', 'information_sharing', 'sharing_surveillance', 'datasharing', 'international_coordination', 'nhcmoh', 'sharing_information_between', 'ihr_2005_implementation']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19'

In [14]:
search(["mitigation", "government", "strategies"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['mitigation', 'government', 'strategies']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['mitigation', 'government', 'strategies', 'local_government', 'governments', 'central_government', 'national_government', 'governmental', 'local_agencies', 'government_s', 'stateowned_enterprises_soes', 'chinese_government', 'central_local_governments', 'strategy']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

As the virus spreads globally it is likely that government strategies will shift from containment and delay towards mitigation (4) .

4 As a result, some of the public health precautionary strategies are selfiniti

In [21]:
search(["existing diagnostic platforms", "burden"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['existing diagnostic platforms', 'burden']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['existing', 'diagnostic_platforms', 'burden', 'naatbased', 'tests_nats', 'nucleic_acidbased_amplification', 'simultaneously_detect_multiple', 'rapid_turnaround_times', 'dipstick_assays', 'detection_platforms', 'over_conventional_methods', 'detection_capabilities', 'poc_diagnosis']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

Taken together, our observations suggest that any conclusion drawn, at present, about existing lineages and direction of viral spread, based on phylogenetic analysis of SARS-CoV-2 sequence data, i

In [27]:
search(["Recruit", "expertise", "capacity"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['Recruit', 'expertise', 'capacity']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['recruit', 'expertise', 'capacity', 'recruited', 'recruits', 'capability', 'capacities', 'capabilities']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

South Korea, as of writing, has the most extensive capabilities of testing individuals with a capacity of around 20,000 tests a day.

We assessed the required expertise and capacity for molecular detection of 2019-nCoV in specialised laboratories in 30 European Union/European Economic Area (EU/EEA) countries.

Organisations such as the Global Outbreak Alert and Response Network

In [29]:
search(["government", "best practices", "guidelines", "public", "public health"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['government', 'best practices', 'guidelines', 'public', 'public health']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['government', 'best_practices', 'guidelines', 'public', 'public_health', 'local_government', 'governments', 'central_government', 'national_government', 'governmental', 'local_agencies', 'government_s', 'stateowned_enterprises_soes', 'chinese_government', 'central_local_governments', 'recommendations', 'international_guidelines', 'national_guidelines', 'published_guidelines', 'guideline', 'guidance', 'standards_guidelines', 'newspapers_internet', 'medical_public_health']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuh

In [30]:
search(["point-of-care test", "rapid influenza test", "speed", "accuracy"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['point-of-care test', 'rapid influenza test', 'speed', 'accuracy']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['pointofcare_test', 'rapid', 'influenza', 'test', 'speed_accuracy', 'liat', 'alere_i', 'filmarray_respiratory_panel', 'verigene_respiratory', 'xpress_flursv', 'easytoperform', 'cliawaived', 'panel_rp', 'pointofcare_settings', 'cepheid_sunnyvale_ca_usa', 'influenza_virus', 'influenza_a', 'infl_uenza', 'influenza_a_b', 'tests', 'economical_method', 'speed_cost', 'accuracy_sensitivity', 'efficiency_speed', 'multiplexing_capacity', 'metagenomics_datasets', 'standardization_automation', 'automatization', 'sophisticated_tools', 'fast_turnaround_time']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_c

In [54]:
search(["PCR", "test", "longitudinal study"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['PCR', 'test', 'longitudinal study']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['pcr_test', 'longitudinal_study', 'pcr_tests']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

In subsequent research in this project, a longitudinal study should be conducted that uses a wider sample and measures the mental health status of medical personnel from multiple dimensions, which can help better identify the mutual influence between demographic data and mental health status.

346 347 The availability of rapid PCR tests would also be beneficial for case identification at arrival, and 348 would address concerns with f

In [59]:
search(["assays", "development", "issues", "private sector"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['assays', 'development', 'issues', 'private sector']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['assays', 'development', 'issues', 'private_sector', 'assay', 'developing', 'issues_related', 'concerns', 'issue', 'public_sector', 'privatesector', 'private_health_care', 'publicsector', 'forprofit', 'public_sectors', 'nongovernmental_organisations', 'fi_nanced', 'bilateral_aid', 'corporations']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

The Lancet Infectious Diseases Commission will discuss disruptive factors and how preparedness planning must consider this new ecology by exploring current preparedness p

In [60]:
search(["track", "tracking", "evolution", "mutations"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['track', 'tracking', 'evolution', 'mutations']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['track', 'tracking', 'evolution', 'mutations', 'evolutionary_process', 'mutation', 'point_mutations', 'substitutions', 'point_mutation', 'mutations_within']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

Mutation plays an important role in the evolution of antibiotic resistance mechanisms in bacteria either by refining existing horizontally acquired genetic determinants (e.g. those encoding b-lactamases), or by giving rise to variant drug targets with decreased affinity for antibiotics through point mutations (e.g. 

In [62]:
search(["Latency", "viral load", "pathogen", "sampling issue"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['Latency', 'viral load', 'pathogen', 'sampling issue']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['latency', 'viral_load', 'pathogen', 'sampling', 'issue', 'viral_loads', 'virus_load', 'pathogens', 'issues']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

As a result, relying on R 0 alone is often misleading when comparing different pathogens or outbreaks of the same pathogen in different settings [13] [14] [15] .

The low level of genetic variation in SARS-CoV-2 could merely be a sampling issue due to the short time this virus has circulated in humans.

Thus, positive pathogen results (especially virus a

In [67]:
search(["cytokines", "severe"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['cytokines', 'severe']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['cytokines', 'severe', 'cytokines_chemokines', 'cytokine', 'other_cytokines', 'cytokines_including', 'inflammatory_cytokines']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

The proportion of CD8 + T reduction in the mild and severe group was 28.43% and 61.9%, 43 respectively; The proportion of B cell reduction was 25.49% and 28.57%; The proportion of NK 44 cell reduction was 34.31% and 47.62%; The detection value of IL-6 was 0 in 55.88% of the mild 45 group, mild group has a significantly lower proportion of patients with IL-6 higher than

In [68]:
search(["protocols", "screening", "testing"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=15, only_sentences=True)


Search for terms ['protocols', 'screening', 'testing']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['protocols', 'screening', 'testing', 'procedures', 'protocol']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

Interventions: Screening and management of patients using a hospital-specific protocol, which included fever triage, monitoring visitors and patients, emergency response, personnel training for healthcare team members, health education for patients and family, medical materials management, disinfection and wastes disposal protocols. :

One important lesson gained from our recent experience suggests the need of updatin

In [76]:
search(["protocols", "effects", "supply", "test"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['supplies', 'tests']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['supplies', 'tests', 'supply', 'test']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

If such tests were fast, there may be 349 potential to test suspected cases in real time based on questionnaire responses, travel origin, or 350 borderline symptoms.

11.20034363 doi: medRxiv preprint to the much larger efforts to test the population, with a ratio of close to 4 tests per 1000 inhabitants, compared with 0.066 tests per 1000 inhabitants for Japan.

Wang et al. also reported that some patients were negative in the first three tests and turned 

In [78]:
search(["Technology", "product", "development", "roadmap", "diagnostics"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['Technology', 'product', 'development', 'roadmap', 'diagnostics']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['technology', 'product_development', 'roadmap', 'diagnostics', 'technologies', 'development_manufacturing', 'road_map', 'blueprint', 'gvap', 'progress_challenges', 'mcmi', 'apsed_2010', 'strategic_goals', 'accelerate_research', 'roadmaps', 'address_gaps']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

77 An updated roadmap for MERS-CoV product development lists all available diagnostics and other diagnostics in the developmental phase.

Since the initial description of LAMP, a number of advancemen

In [81]:
search(["scaling", "diagnostic", "Coalition for Epidemic Preparedness Innovations", "funding"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['market forces', 'diagnostic', 'Coalition for Epidemic Preparedness Innovations', 'funding']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['market_forces', 'diagnostic', 'coalition_epidemic_preparedness_innovations', 'funding', 'free_market', 'insurance_medicare', 'international_intellectual_property', 'economic_policy', 'policy_priorities', 'cost_cutting', 'policy_goal', 'government_involvement', 'patent_system', 'revitalise', 'coalition_epidemic_preparedness_innovation', 'cepi_has', 'global_alliance', 'vaccine_institute_ivi', 'launched_global_health', 'global_health_security_agenda', 'us_agency_international', 'global_fund_fight_aids', 'who_rotary_international', 'vaccine_initiative', 'funds', 'funding_support', 'financial_support', 'research_funding', 'government_funding', 'fund', 'grants']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbrea

In [82]:
search(["technology", "platforms", "CRISPR", "response times", "holistic approaches"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['technology', 'platforms', 'CRISPR', 'response times', 'holistic approaches']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['technology_platforms', 'crispr', 'response', 'times', 'holistic_approaches', 'proteomic_technologies', 'enabling_technology', 'advances_nextgeneration', 'unbiased_nextgeneration_sequencing_ngs', 'manufacturing_technologies', 'advances_genomics', 'recent_technological', 'much_popularity', 'high_throughput_sequencing_technology', 'analytical_systems', 'crisprcas9', 'genome_editing', 'responses', 'collaborative_transdisciplinary', 'real_time_problem_solving', 'callaghan_2016', 'eids_biological_invasions', 'evidencedriven', 'collaborative_interdisciplinary', 'societal_relevance', 'seeks_address', 'challenge_lies', 'evolutionarily_enlightened']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019nco

In [84]:
search(["Coupling genomics", "diagnostic testing", "large scale"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['Coupling genomics', 'large scale']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['coupling', 'genomics', 'large_scale', 'largescale']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

Emerging infectious diseases (EIDs) such as Ebola, influenza, SARS, MERS, and, most recently, coronavirus (2019-nCoV) cause large-scale mortality and morbidity, disrupt trade and travel networks, and stimulate civil unrest (9) .

Emerging infectious diseases (EIDs) such as Ebola, influenza, SARS, MERS, and, most recently, coronavirus (2019-nCoV) cause large-scale mortality and morbidity, disrupt trade and travel networks, and st

In [85]:
search(["rapid sequencing", "bioinformatics", "genome", "specificity"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['rapid sequencing', 'bioinformatics', 'genome', 'specificity']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['rapid', 'sequencing_bioinformatics', 'genome', 'specificity', 'highthroughput_sequencing_techniques', 'microarrays_deep_sequencing', 'advances_nextgeneration', 'highthroughput_sequencing_methods', 'combined_next_generation', 'massive_parallel_sequencing', 'sample_preparation_sequencing', 'benchtop_sequencing', 'highthroughput_sequencing_technologies', 'high_throughput_dna_sequencing', 'genomes']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

However, the fifth gene in the Betacoronavirus core genome

In [86]:
search(["sequencing", "analytics", "unknown pathogen", "distinguishing", "naturally-occurring pathogens", "intentional"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['sequencing', 'analytics', 'unknown pathogen', 'distinguishing', 'naturally-occurring pathogens', 'intentional']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['sequencing', 'analytics', 'unknown', 'pathogen', 'distinguishing', 'naturallyoccurring', 'pathogens', 'intentional', 'dna_sequencing', 'sanger_sequencing', 'direct_sequencing', 'realtime_online', 'smart_healthcare', 'pathogens', 'distinguishing_between', 'differentiating', 'potential_pathogens', 'infectious_agents', 'viruses_bacteria', 'viral_pathogens', 'bacterial_pathogens', 'microbes', 'bacteria_viruses', 'other_pathogens', 'pathogen', 'pathogenic_agents']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'nove

In [87]:
search(["One Health", "surveillance", "spillover", "bats", "wildlife", "future"],
       optional_keywords=["new_coronavirus", "coronavirus", "covid19"],
       top_n=10, only_sentences=True)


Search for terms ['One Health', 'surveillance', 'spillover', 'bats', 'wildlife', 'future']


Search terms after cleaning, bigrams, trigrams and synonym expansion: ['one_health', 'surveillance', 'spillover', 'bats', 'wildlife', 'future', 'onehealth', 'ecohealth', 'spillover_events', 'spillover_from', 'spillover_humans', 'spillover_infections', 'bat_species', 'fruit_bats', 'insectivorous_bats', 'species_bats', 'bat', 'wildlife_species', 'domestic_animal', 'wildlife_populations', 'wildlife_conservation', 'wild_animals']
Optional search terms after cleaning, bigrams, trigrams and synonym expansion: ['newcoronavirus', 'coronavirus_covid19', '2019ncov_covid19', 'outbreak_2019_novel', 'sarscov2_2019ncov', 'coronavirus_2019ncov', 'ongoing_outbreak_novel_coronavirus', 'since_late_december', 'ongoing_outbreak_covid19', 'originating_wuhan_china', 'novel_coronavirus_outbreak', 'wuhan_coronavirus']

Results:

To prevent the next epidemic and pandemic related to these interfaces, we call for resear