In [1]:
!wget https://github.com/neuml/txtai/releases/download/v1.1.0/tests.gz
!gunzip tests.gz
!mv tests ../articles.sqlite

--2023-01-26 16:23:24--  https://github.com/neuml/txtai/releases/download/v1.1.0/tests.gz
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/286301447/080d8800-e653-11ea-8d02-c0c858a09e7a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230126%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230126T152325Z&X-Amz-Expires=300&X-Amz-Signature=6fc4ba933d9397d90bf303bb09d373b345bc6aa5dd922c4a347a1364cf622418&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=286301447&response-content-disposition=attachment%3B%20filename%3Dtests.gz&response-content-type=application%2Foctet-stream [following]
--2023-01-26 16:23:24--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/286301447/080d8800-e653-11ea-8d02-c0c858a09e7a?X-Amz-Algorithm=AWS4-HMAC-S

In [2]:
!pip install txtai sentence-transformers pandas



In [3]:
import sqlite3

import regex as re

from txtai.embeddings import Embeddings
from txtai.pipeline import Tokenizer


def stream():
  # Connection to database file
  db = sqlite3.connect("../articles.sqlite")
  cur = db.cursor()

  # Select tagged sentences without a NLP label. NLP labels are set for non-informative sentences.
  cur.execute("SELECT Id, Name, Text FROM sections WHERE (labels is null or labels NOT IN ('FRAGMENT', 'QUESTION')) AND tags is not null")

  count = 0
  for row in cur:
    # Unpack row
    uid, name, text = row

    # Only process certain document sections
    if not name or not re.search(r"background|(?<!.*?results.*?)discussion|introduction|reference", name.lower()):
      # Tokenize text
      tokens = Tokenizer.tokenize(text)

      document = (uid, tokens, None)

      count += 1
      if count % 1000 == 0:
        print("Streamed %d documents" % (count), end="\r")

      # Skip documents with no tokens parsed
      if tokens:
        yield document

  print("Iterated over %d total rows" % (count))

  # Free database resources
  db.close()

# BM25 + fastText vectors
embeddings = Embeddings({
    "method": "sentence-transformers",
    "path": "all-MiniLM-L6-v2",
    "scoring": "bm25"
})

embeddings.score(stream())
embeddings.index(stream())

Iterated over 21499 total rows
Iterated over 21499 total rows


In [7]:
import pandas as pd

pd.set_option("display.max_colwidth", None)

def search(query: str, topn: int = 5) -> pd.DataFrame:
    db = sqlite3.connect("../articles.sqlite")
    cur = db.cursor()

    results = []
    for uid, score in embeddings.search(query, topn):
      cur.execute("SELECT article, text FROM sections WHERE id = ?", [uid])
      uid, text = cur.fetchone()

      cur.execute("SELECT Title, Published, Reference from articles where id = ?", [uid])
      results.append(cur.fetchone() + (text,))

    db.close()
    df = pd.DataFrame(results, columns=["Title", "Published", "Reference", "Match"])
    return df


search("risk factors")

Unnamed: 0,Title,Published,Reference,Match
0,Prudently conduct the engineering and synthesis of the SARS-CoV-2 virus,2020-04-02 00:00:00,https://doi.org/10.1016/j.synbio.2020.03.002,"Furthermore, the ratio of benefits to risks is not only a scientific issue, but also involves differences in risk perception and value judgment of different subjects; the views about the risks involved often differ between the experts and the public."
1,"Security, Privacy and Risks Within Smart Cities: Literature Review and Development of a Smart City Interaction Framework",2020-07-21 00:00:00,https://doi.org/10.1007/s10796-020-10044-1,The study identified that some of the categories of threats such as socio-political risks have an impact on each other and argue that risk management and risk mitigation strategies are required to take a more holistic view of all threats and their interconnections instead of focusing on each type of risk separately.
2,"Associations with covid-19 hospitalisation amongst 406,793 adults: the UK Biobank prospective cohort study",2020-05-11 00:00:00,http://medrxiv.org/cgi/content/short/2020.05.06.20092957v1?rss=1,"In addition, many risk factors for covid-19 documented in the literature are highly correlated and it is not clear which may be independently related to risk."
3,"Associations with covid-19 hospitalisation amongst 406,793 adults: the UK Biobank prospective cohort study",2020-05-11 00:00:00,http://medrxiv.org/cgi/content/short/2020.05.06.20092957v1?rss=1,"The large numbers of covariables available in this cohort also enabled multivariable adjustment, permitting assessment of independent risk factors."
4,Adapting for Unique Settings,2020-06-14 00:00:00,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7292961/,The risk paradigm had not been challenged.


In [13]:
search("symptoms of covid")

Unnamed: 0,Title,Published,Reference,Match
0,"Risk factors associated with mortality in hospitalized patients with SARS-CoV-2 infection. A prospective, longitudinal, unicenter study in Reus, Spain",2020-05-29 00:00:00,https://doi.org/10.1101/2020.05.29.122986,Certain 4 60 clinical symptoms of COVID-19 have been reported previously.
1,COVID-19: what has been learned and to be learned about the novel coronavirus disease,2020-03-15 00:00:00,https://doi.org/10.7150/ijbs.45134,"Fever is often the major and initial symptom of COVID-19, which can be accompanied by no symptom or other symptoms such as dry cough, shortness of breath, muscle ache, dizziness, headache, sore throat, rhinorrhea, chest pain, diarrhea, nausea, and vomiting."
2,Data analytics for novel coronavirus disease,2020-06-15 00:00:00,https://doi.org/10.1016/j.imu.2020.100374,"The clinical symptoms, features, and parameters of COVID-19 are being investigated in a number of experiments and studies [ ]."
3,Being a front-line dentist during the Covid-19 pandemic: a literature review,2020-04-24 00:00:00,https://www.ncbi.nlm.nih.gov/pubmed/32341913/,Typical clinical manifestations of Covid-2019 do not comprise ocular symptoms.
4,Teleconsultation in primary ophthalmic emergencies during the COVID-19 lockdown in Paris: experience with 500 patients in March and April 2020,2020-06-10 00:00:00,https://www.ncbi.nlm.nih.gov/pubmed/32564983/,"Nearly all patients (480, 96%) had no suspicion of COVID-19 infection and 20 (4%) were suspected with symptoms or confirmed cases."


In [10]:
search("how covid affects mental health")

Unnamed: 0,Title,Published,Reference,Match
0,COVID‐19‐related self‐harm and suicidality among individuals with mental disorders,2020-07-30 00:00:00,https://www.ncbi.nlm.nih.gov/pubmed/32659855/,"describe how the COVID-19 pandemic may affect mental health and psychiatric care, and predict that suicide rates may increase because of the pandemic."
1,COVID‐19‐related self‐harm and suicidality among individuals with mental disorders,2020-07-30 00:00:00,https://www.ncbi.nlm.nih.gov/pubmed/32659855/,"2 Although the COVID-19 pandemic may affect the risk of suicide in populations at large, individuals living with a mental disorder may be at particularly elevated risk."
2,COVID-19 from the perspective of urban and rural general adult mental health services,2020-05-21 00:00:00,https://doi.org/10.1017/ipm.2020.62,"We should now also be considering how the mental health of the population will be affected following resolution of the COVID-19 crisis (Das, 2020)."
3,COVID-19 from the perspective of urban and rural general adult mental health services,2020-05-21 00:00:00,https://doi.org/10.1017/ipm.2020.62,"We should now also be considering how the mental health of the population will be affected following resolution of the COVID-19 crisis (Das, 2020) ."
4,Work-related and Personal Factors Associated with Mental Well-being during COVID-19 Response: A Survey of Health Care and Other Workers,2020-06-11 00:00:00,http://medrxiv.org/cgi/content/short/2020.06.09.20126722v1?rss=1,Prevention of exposure to COVID-19 and increased supervisor support are modifiable risk factors that may protect mental health and well-being.
