# BioGPT Local

This repo contains a demo to run BioGPT on local machine.

BioGPT-Large model with 1.5B parameters is coming, currently available on PubMedQA task with SOTA performance of 81% accuracy. See Question Answering on PubMedQA for evaluation.


## Getting Started

1. Go to: [http://192.168.206.126:6611/lab](http://192.168.206.126:6611/lab) with your favorite browser (eg: Chrome). 

2. Pre-run, from top panel, `Run` -> `Run all Markdown cells`, to display the instructions.

3. Load model, place your mouse in `Block01 - Loading module`. go to `Run` -> `Run Selected Cells`.

4. Enjoy with your own text: place your mouse in `Block02 - Run`. go to `Run` -> `Run Selected Cells`.

**Cite**: BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.

## Section 1. Using BioGPT pre-trained model

Go to Hugging Face: [`microsoft/biogpt`](https://huggingface.co/microsoft/biogpt) for an alternative way to try BioGPT.

Or run the following Commands to run BioGPT on local machine.

### 1.1 Loading model

Run the following cell to load the `BioGPT` model: **Block01 - loading model**.

Congratulation! The Model is loaded properly, if you can see the following log.  

If not, post your log to ... (to do)

```
Loading codes from /home/wm/work/biogpt/BioGPT/data/bpecodes ...
Read 40000 codes from the codes file.
```

In [1]:
## Block01 - loading model
## waiting for 1-2 mintues
## each time, this block only need to run once, (model saved in m)

import torch
from fairseq.models.transformer_lm import TransformerLanguageModel
m = TransformerLanguageModel.from_pretrained(
        "/home/wm/work/biogpt/BioGPT/checkpoints/Pre-trained-BioGPT",
        "checkpoint.pt", 
        "/home/wm/work/biogpt/BioGPT/data",
        tokenizer='moses', 
        bpe='fastbpe', 
        bpe_codes="/home/wm/work/biogpt/BioGPT/data/bpecodes",
        min_len=100,
        max_len_b=1024)
m.cuda()
# waiting for 1-2 mintues to load module.

2023-02-14 16:44:02 | INFO | fairseq.file_utils | loading archive file /home/wm/work/biogpt/BioGPT/checkpoints/Pre-trained-BioGPT
2023-02-14 16:44:02 | INFO | fairseq.file_utils | loading archive file /home/wm/work/biogpt/BioGPT/data
2023-02-14 16:44:04 | INFO | fairseq.tasks.language_modeling | dictionary: 42384 types
2023-02-14 16:44:09 | INFO | fairseq.models.fairseq_model | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, '

GeneratorHubInterface(
  (models): ModuleList(
    (0): TransformerLanguageModel(
      (decoder): TransformerDecoder(
        (dropout_module): FairseqDropout()
        (embed_tokens): Embedding(42384, 1024, padding_idx=1)
        (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
        (layers): ModuleList(
          (0): TransformerDecoderLayerBase(
            (dropout_module): FairseqDropout()
            (self_attn): MultiheadAttention(
              (dropout_module): FairseqDropout()
              (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (activation_dropout_module): FairseqDropout()
            (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)

### 1.2 Input your text 

**Only need to run the following code: Block02 - Run**

Prepare your text (English only?), replace the example "COVID19-is"

Only need to update the text and the the folloing code, if you want to use different text content.

1. Place your mouse in the following cell: **Block02 - Run**. 

2. From the top panel, choose `Run` -> `Run Selected Cells` (or keyboard shortcut: `Ctrl` + `Enter`)

In [11]:
## Block02 - Run
my_input = "Omicron variants of SARS-CoV-2"

src_tokens = m.encode(my_input)
generate = m.generate([src_tokens], beam=5)[0]
output = m.decode(generate[0]["tokens"])
print(output)

Omicron variants of SARS-CoV-2 have been isolated from patients with severe acute respiratory syndrome (SARS) and have been shown to be highly pathogenic in mice and ferrets, suggesting that they may play a role in the pathogenesis of SARS-CoV-2 infection and the development of severe disease in patients with SARS-CoV-2 infection. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14).


**Examples**


| # | Input | Output |
| - | ----- | ------ |
| 1 | COVID-19 | COVID-19 is a global pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19), which has spread to more than 200 countries and territories, including the United States (US), Canada, Australia, New Zealand, the United Kingdom (UK) and the United States of America (USA), as of March 11, 2020, with more than 800 000 confirmed cases and more than 800 000 deaths. | 
| 2 | SARS-CoV-2 | SARS-CoV-2 is the causative agent of COVID-19, a severe acute respiratory syndrome (SARS) that has infected more than 390 000 people worldwide and killed more than 250 000 people. | 
| 3 | The treatment of COVID-19 | The treatment of COVID-19 with remdesivir (remdesivir for COVID-19) has been approved by the US Food and Drug Administration (FDA) for the treatment of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in patients with confirmed SARS-CoV-2 infection and is currently being evaluated in clinical trials for the treatment of COVID-19 in patients with confirmed SARS-CoV-2 infection, as well as in patients with suspected or confirmed SARS-CoV-2 infection. | 
| 4 | The drug that can treat COVID-19 is | The drug that can treat COVID-19 is hydroxychloroquine (HCQ), which has been shown to inhibit the replication of the SARS-CoV-2 virus in cell culture and in animal models of COVID-19, and has been approved by the US Food and Drug Administration (FDA) for the treatment of patients with COVID-19 in March 2020, and by the European Medicines Agency (EMA) for the treatment of patients with COVID-19 in April 2020, and by the European Medicines Agency (EMA) for the treatment of patients with COVID-19 in May 2020. | 
| 5 | Omicron variants of SARS-CoV-2 | Omicron variants of SARS-CoV-2 have been isolated from patients with severe acute respiratory syndrome (SARS) and have been shown to be highly pathogenic in mice and ferrets, suggesting that they may play a role in the pathogenesis of SARS-CoV-2 infection and the development of severe disease in patients with SARS-CoV-2 infection. | 
| 6 | Bicalutamide | Bicalutamide (Casodex) is an androgen receptor (AR) antagonist approved for the treatment of metastatic castration-resistant prostate cancer (mCRPC) in patients who have progressed on or are ineligible for docetaxel chemotherapy, as well as for the treatment of early-stage prostate cancer in men who have not progressed on or are ineligible for docetaxel chemotherapy, as well as for the treatment of metastatic castration-sensitive prostate cancer (mCSPC) in men who have not progressed on or are ineligible for docetaxel chemotherapy. | 
| 7 | Janus kinase 3 (JAK-3) | Janus kinase 3 (JAK-3) is a member of the Janus kinase (JAK) family of non-receptor tyrosine kinases and plays an important role in the regulation of cell proliferation, differentiation, survival, migration and angiogenesis. | 
| 8 | Apricitabine | Apricitabine is an oral prodrug of 5-aza-2 â€™-deoxycytidine (5-aza-CdR), a DNA methyltransferase (DNMT) inhibitor, which has been approved by the US Food and Drug Administration (FDA) for the treatment of myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML) in combination with low-dose cytarabine (Ara-C) and granulocyte colony-stimulating factor (G-CSF) for patients with intermediate-2 or high-risk MDS or AML. | 
| 9 | Xylazine | Xylazine is an alpha 2-adrenoceptor agonist which has been used as a sedative and analgesic in veterinary medicine for many years, but its effects on the cardiovascular system have not been extensively studied in the dog, and its effects on the central nervous system (CNS) have not been well characterized in the dog, despite the fact that xylazine has been widely used as a sedative and analgesic in veterinary medicine for more than 30 years. | 
| 10 | Psoralen | Psoralen photochemotherapy (PUVA) is a well-established treatment for psoriasis, but its use is limited by the risk of skin cancer, particularly squamous cell carcinoma (SCC) of the head and neck (H & N), which is the most common site of skin cancer in the United States (US) | 
| 11 | CP-673451 | CP-673451 is a potent, selective, and orally active inhibitor of human neutrophil elastase (HNE) and human cathepsin G (CatG) with in vitro and in vivo anti-inf lammatory activity in a variety of animal models of inf lammation and in a model of acute lung injury (ALI) in the rat induced by intratracheal instillation of lipopolysaccharide (LPS) and tumor necrosis factor-alpha (TNF-alpha), a model of acute lung injury (ALI) in which neutrophils play an important role. | 
| 12 | BIIB-021 | BIIB-021 is a novel, orally active, non-peptide bradykinin B2 receptor antagonist with potent and long-lasting anti-inf lammatory activity in animal models of acute and chronic inf lammation and in a rat model of adjuvant-induced arthritis (AIA), an animal model of rheumatoid arthritis (RA) and in a rat model of collagen-induced arthritis (CIA), an animal model of collagen-induced arthritis (CIA), in which arthritis is induced by immunization with bovine type II collagen (CII). | 




