<a href="https://colab.research.google.com/github/cosmoshsv/CharCnn_Keras/blob/master/QA%20Model%20BioBERT%2BSQuAD2.0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Adapted from BERT TPU training tutorial by [Pragnakalp Techlabs](https://www.pragnakalp.com/).


##**BioBERT Fine-tuning on SQUAD 2.0 and Prediction on COVID-19 BioMedical Datasets using Cloud TPU** 
**[Inferences on COVID-19 Open Research Dataset Challenge (CORD-19) Dataset](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge)**

---

[BioASQ](http://bioasq.org/participate/challenges): BioBERT Model was the winner in 7th BioASQ challenge. BioASQ was trained on fine-tuned BioBERT, but in this notebook when run, initial results were run for only few epochs, hence not reporting the inferences.

### **Overview**
**BERT**, or Bidirectional Embedding Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. The academic paper can be found here: https://arxiv.org/abs/1810.04805.

**SQuAD** Stanford Question Answering Dataset is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

**[BioBERT](https://github.com/dmis-lab/biobert)** : This repository provides the code for fine-tuning BioBERT, a biomedical language representation model designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering, etc. The paper [BioBERT: a pre-trained biomedical language representation model for biomedical text mining](https://academic.oup.com/bioinformatics/article/36/4/1234/5566506) can be referenced for more details. This project is done by [DMIS-Lab](https://dmis.korea.ac.kr/).


### **Motivation**

> The main idea is to be able to ask questions related to transmission, cure, therapeutics, vaccines, risk factors, social and economic impacts.


### **Download the BioBERT Pretrained Model**


BioBERT repo provides five versions of pre-trained weights. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google, and training details are described in our paper. Currently available versions of pre-trained weights are as follows:

* **[BioBERT-Base v1.1 (+ PubMed 1M)](https://drive.google.com/file/d/1R84voFKHfWV9xjzeLzWBbmY1uOMYpnyD/view?usp=sharing)** - based on BERT-base-Cased (same vocabulary)
* **[BioBERT-Large v1.1 (+ PubMed 1M)](https://drive.google.com/file/d/1GJpGjQj6aZPV-EfbiQELpBkvlGtoKiyA/view?usp=sharing)** - based on BERT-large-Cased (custom 30k vocabulary), [NER/QA Results](https://github.com/dmis-lab/biobert/wiki/BioBERT-Large-Results)
* **[BioBERT-Base v1.0 (+ PubMed 200K)](https://drive.google.com/file/d/17j6pSKZt5TtJ8oQCDNIwlSZ0q5w7NNBg/view?usp=sharing)** - based on BERT-base-Cased (same vocabulary)
* **[BioBERT-Base v1.0 (+ PMC 270K)](https://drive.google.com/file/d/1LiAJklso-DCAJmBekRTVEvqUOfm0a9fX/view?usp=sharing)** - based on BERT-base-Cased (same vocabulary)
* **[BioBERT-Base v1.0 (+ PubMed 200K + PMC 270K)](https://drive.google.com/file/d/1jGUu2dWB1RaeXmezeJmdiPKQp3ZCmNb7/view?usp=sharing)** - based on BERT-base-Cased (same vocabulary)


I have used **[BioBERT-Large v1.1 (+ PubMed 1M)](https://drive.google.com/file/d/1GJpGjQj6aZPV-EfbiQELpBkvlGtoKiyA/view?usp=sharing)**

---




### **Clone the BERT github repository**

**The BioBERT has been cloned and uploaded to the GCS Bucket. We still need BERT to access SQUAD related files.**



In [0]:
!git clone https://github.com/google-research/bert.git

Cloning into 'bert'...
remote: Enumerating objects: 340, done.[K
remote: Total 340 (delta 0), reused 0 (delta 0), pack-reused 340[K
Receiving objects: 100% (340/340), 300.28 KiB | 3.90 MiB/s, done.
Resolving deltas: 100% (185/185), done.


In [0]:
ls -l

total 62376
-rw-r--r-- 1 root root     2666 Apr 18 01:01 adc.json
drwxr-xr-x 3 root root     4096 Apr 18 01:05 [0m[01;34mbert[0m/
-rw-r--r-- 1 root root   284109 Apr 18 00:59 BioASQ-test-factoid-4b-1.json
-rw-r--r-- 1 root root   161553 Apr 18 00:59 BioASQ-test-factoid-5b-1.json
-rw-r--r-- 1 root root   210897 Apr 18 00:59 BioASQ-test-factoid-6b-1.json
-rw-r--r-- 1 root root  6650199 Apr 18 00:56 BioASQ-train-factoid-4b.json
-rw-r--r-- 1 root root 10042717 Apr 18 00:58 BioASQ-train-factoid-5b.json
-rw-r--r-- 1 root root  4370528 Mar 23 02:33 dev-v2.0.json
drwxr-xr-x 1 root root     4096 Apr  3 16:24 [01;34msample_data[0m/
-rw-r--r-- 1 root root 42123633 Mar 23 02:33 train-v2.0.json


In [0]:
cd bert

/content/bert


In [0]:
!pip install tensorflow==1.15.0

Collecting tensorflow==1.15.0
[?25l  Downloading https://files.pythonhosted.org/packages/3f/98/5a99af92fb911d7a88a0005ad55005f35b4c1ba8d75fba02df726cd936e6/tensorflow-1.15.0-cp36-cp36m-manylinux2010_x86_64.whl (412.3MB)
[K     |████████████████████████████████| 412.3MB 25kB/s 
Collecting tensorboard<1.16.0,>=1.15.0
[?25l  Downloading https://files.pythonhosted.org/packages/1e/e9/d3d747a97f7188f48aa5eda486907f3b345cd409f0a0850468ba867db246/tensorboard-1.15.0-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 38.6MB/s 
Collecting tensorflow-estimator==1.15.1
[?25l  Downloading https://files.pythonhosted.org/packages/de/62/2ee9cd74c9fa2fa450877847ba560b260f5d0fb70ee0595203082dafcc9d/tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503kB)
[K     |████████████████████████████████| 512kB 49.4MB/s 
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Buildi

### **BERT repository files**




In [0]:
ls -l

total 400
-rw-r--r-- 1 root root  1323 Apr 18 01:05 CONTRIBUTING.md
-rw-r--r-- 1 root root 16475 Apr 18 01:05 create_pretraining_data.py
-rw-r--r-- 1 root root 13898 Apr 18 01:05 extract_features.py
-rw-r--r-- 1 root root   616 Apr 18 01:05 __init__.py
-rw-r--r-- 1 root root 11358 Apr 18 01:05 LICENSE
-rw-r--r-- 1 root root 37922 Apr 18 01:05 modeling.py
-rw-r--r-- 1 root root  9191 Apr 18 01:05 modeling_test.py
-rw-r--r-- 1 root root 11242 Apr 18 01:05 multilingual.md
-rw-r--r-- 1 root root  6258 Apr 18 01:05 optimization.py
-rw-r--r-- 1 root root  1721 Apr 18 01:05 optimization_test.py
-rw-r--r-- 1 root root 66488 Apr 18 01:05 predicting_movie_reviews_with_bert_on_tf_hub.ipynb
-rw-r--r-- 1 root root 50519 Apr 18 01:05 README.md
-rw-r--r-- 1 root root   110 Apr 18 01:05 requirements.txt
-rw-r--r-- 1 root root 34783 Apr 18 01:05 run_classifier.py
-rw-r--r-- 1 root root 11426 Apr 18 01:05 run_classifier_with_tfhub.py
-rw-r--r-- 1 root root 18667 Apr 18 01:05 run_pretraining.py
-rw-r--r-

**Download the SQUAD 2.0 Dataset**

In [0]:
#Download the SQUAD train and dev dataset
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json

--2020-04-18 01:02:45--  https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42123633 (40M) [application/json]
Saving to: ‘train-v2.0.json’


2020-04-18 01:02:46 (57.6 MB/s) - ‘train-v2.0.json’ saved [42123633/42123633]

--2020-04-18 01:02:46--  https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.111.153, 185.199.108.153, 185.199.109.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.111.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4370528 (4.2M) [application/json]
Saving to: ‘dev-v2.0.json’


2020-04-18 01:02:46 (50.1 MB/s) - ‘dev-v2.0.json’ saved [4370528/4370528]



### **Set up TPU environment**


In [0]:
import datetime
import json
import os
import pprint
import random
import string
import sys
import tensorflow as tf

assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!'
TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print('TPU address is => ', TPU_ADDRESS)

from google.colab import auth
auth.authenticate_user()
with tf.Session(TPU_ADDRESS) as session:
  print('TPU devices:')
  pprint.pprint(session.list_devices())

  # Upload credentials to TPU.
  with open('/content/adc.json', 'r') as f:
    auth_info = json.load(f)
  tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
  # tfio.gcs.configure_colab_session(session, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.

TPU address is =>  grpc://10.1.117.58:8470
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 2615474174161180305),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 16474022337975193864),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 14788761283896059928),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 17331643448068154837),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 5266438037643047317),
 _DeviceAttributes(/job:tpu_worker/replica:0/tas

### **Create output directory** 


> Created output directory at GCS (Google Cloud Storage) bucket, where fine_tuned model resides after training completion. 

> Also need to move Pre-trained Model at GCS (Google Cloud Storage) bucket, as Local File System is not Supported on TPU. 




In [0]:
BUCKET = 'bertsquadcovid' #@param {type:"string"}
assert BUCKET, '*** Must specify an existing GCS bucket name ***'
output_dir_name = 'biobert_output' #@param {type:"string"}
BUCKET_NAME = 'gs://{}'.format(BUCKET)
OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET,output_dir_name)
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))

***** Model output directory: gs://bertsquadcovid/biobert_asq_output *****


### **Training**





--init_checkpoint=$BUCKET_NAME/biobert_large/bio_bert_large_1000k.ckpt \

In [0]:
!python run_squad.py \
  --vocab_file=$BUCKET_NAME/biobert_large/vocab_cased_pubmed_pmc_30k.txt \
  --bert_config_file=$BUCKET_NAME/biobert_large/bert_config_bio_58k_large.json \
  --init_checkpoint=$OUTPUT_DIR/bio_bert_large_1000k.ckpt \
  --do_train=True \
  --train_file=train_v2.0.json \
  --do_predict=True \
  --predict_file=test_v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=100 \
  --use_tpu=True \
  --tpu_name=grpc://10.1.117.58:8470 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --version_2_with_negative=True \
  --output_dir=$OUTPUT_DIR




W0418 02:08:18.804683 140413433636736 module_wrapper.py:139] From run_squad.py:1127: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0418 02:08:18.804908 140413433636736 module_wrapper.py:139] From run_squad.py:1127: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0418 02:08:18.805089 140413433636736 module_wrapper.py:139] From /content/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0418 02:08:20.137794 140413433636736 module_wrapper.py:139] From run_squad.py:1133: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related op

### **Create Input File**


> We are creating input_file.json as a blank json file and then writing the data in SQUAD format in the file.

> The context is obtained from CORID-19 Corpus. The steps were:
  1. Topic Modeling was done on abstracts, and once model converged to coherent topics, dominant topics were assigned to each abstract. 
  2. In this example, obtained an article that fell to the same topic cluster. 


In [0]:
!touch input_file.json

In [0]:
%%writefile input_file.json
{
    "version": "v2.0",
    "data": [
        {
            "title": "Abstract",
            "paragraphs": [
                {
                    "qas": [
                        {
                            "question": "Where did SARS-CoV-2 originate?",
                            "id": "ebbe2413-0ff7-4d4e-bdf7-defc98e60afc",
                            "is_impossible": ""
                        },
                        {
                            "question": "Results of Fisher Test?",
                            "id": "e4be707c-9216-461e-8b1e-0227289f78fd",
                            "is_impossible": ""
                        },
                        {
                            "question": "Are there any vaccines?",
                            "id": "304e0818-c263-44e5-a258-5f0d02a29874",
                            "is_impossible": ""
                        },
                        {
                            "question": "Possible Sources of HBoV are?",
                            "id": "4bd45c08-c593-4edf-b591-b6ed9f5760d4",
                            "is_impossible": ""
                        },
                        {
                            "question": "What are the possible signs of HBoV?",
                            "id": "7f4aa4aa-a241-4876-8485-ab547ae312dd",
                            "is_impossible": ""
                        },
                        {
                            "question": "What are the risk factors?",
                            "id": "683ff326-162d-48d5-8ff7-8544626b0e77",
                            "is_impossible": ""
                        }
                    ],
                    "context": "NPA samples were collected from 235 children hospitalized with ARTI at the First Hospital of Lanzhou University, Gansu Province, China during December 2007-November 2008. All patients were <15 years of age, and informed consent was obtained from their parents. Demographic data and clinical fi ndings were recorded. The study protocol was approved by the hospital ethics committee.\n DNA and RNA were extracted from the NPAs by using QIAamp DNA and QIAamp viral RNA mini kits (QIA-GEN, Beijing, China). The cDNA sample was synthesized by using random hexamer primers. A standard reverse transcription-PCR was used to screen for human rhinovirus, respiratory synctial virus (RSV), influenza virus A, influenza virus B, parainfluenza virus 1-3, human metapneumovirus, human coronavirus (HCoV)-NL63, and HCoV-HKU1, and PCR method was used to screen for adenovirus (ADV) .To screen for HBoV, PCR was performed by using primers 188F and 542R, as described by Allander et al. . For HBoV2, nested PCR was performed with sf1/sr1 and sf2/sr2 primers, which amplified a 455-bp fragment of the partial NS1 gene, as described .In addition, we designed HBoV2 forward and reverse primers, which produced a 563-bp fragment of the NP1 gene of HBoV2. Positive and negative controls were included for each PCR. Purifi ed PCR products were sequenced by using SinoGenoMax. ClustalX (ftp://ftp-igbmc.u-strasbg.fr/pub/ ClustalX) was used to align the obtained sequences with sequences available in GenBank.\n In total, 260 viruses were identifi ed in 196 (83.4%) of the 235 children. Using nested PCR, we found 21 positive specimens; further nucleotide sequence analysis showed that 10 (4.3%) were HBoV2 and 11 were HBoV (Figure, panel A). All 11 HBoV strains detected by using HBoV2 nested-PCR were included in the 18 HBoV-positive patients as determined by PCR using primers 188F and 542R. Of the 10 HBoV2-positive patients, 7 (70%) were co-infected with other respiratory viruses, including 4 patients with RSV. Of the 18 HBoV-positive patients, 12 (66.7%) displayed co-infections. There were no statistically signifi cant differences in the HBoV2 and HBoV detection (p = 0.119 by χ 2 test) and co-infection (p = 1.000 by Fisher exact test) rates.\n Of the 10 HBoV2-positive patients, 9 were male and 1 was female (χ 2 = 1.957, p = 0.162). The median age was 8.5 months, and 9/10 (90%) were <3 years old. HBoV2 infections were detected throughout the year. Of the 18 patients who were HBoV positive, 11 were male and 7 were female (χ 2 = 0.084, p = 0.772). The median age of patients was 11.5 months, and 16/18 (88.9%) were <3 years old.\n HBoV infections were detected in every month except August, with peaks in December (3 cases) and January (4 cases). The main diagnoses of the 3 patients with HBoV2 monoinfection were acute asthmatic bronchopneumonia, bronchopneumonia, and acute upper respiratory tract infection in 1 patient each. For the 6 patients with HBoV monoinfection, the main diagnoses were acute asthmatic bronchopneumonia (4 cases) and bronchopneumonia (2 cases). The clinical signs and symptoms of HBoV2 and HBoV positive patients included cough, fever, sputum production, crack- les, wheezing, rhinorrhea, cyanosis, vomiting, and diarrhea . For patients with HBoV2 monoinfection, the median hospital stay was 11.3 days (range 4-23 days), and 2 had underlying illnesses (idiopathic pulmonary hemosiderosis and iron defi ciency anemia). The chest radiograph of 1 patient showed upper middle zone air-space shadows.\n For patients with HBoV monoinfection, the median hospital stay was 7.8 days (range 6-10 days), and none had underlying illnesses. Chest radiographs of 2 patients showed shadows in the left lung zone. Ten HBoV2 NS-1 sequences (455 bp) shared 98%-99% and 95%-96% nucleotide sequence identity and 99%-100% and 98%-99% deduced amino acid sequence identity with HBoV2 strain PK-2255 (FJ170279) and HBoV2 strain W153 (EU082213), respectively; These sequences also shared 81%-82% and 83.3%-84.4% nucleotide sequence identity and 90% and 88% deduced amino acid sequence identity with the HBoV prototype strain ST1 or ST2 (DQ000495 and DQ000496) and human bocavirus 3 strain W471 (EU918736), respectively. The 4 HBoV2 NP-1 sequences shared 98%-99% and 97.6%-98.3% nucleotide sequence identity and 98%-100% and 97%-100% deduced amino acid sequence identity with HBoV2 strain PK-2255 and HBoV2 strain W153, respectively, and shared 74%-78% and 69.7%-70.3% nucleotide sequence identity and 69%-73% and 58%-62% deduced amino acid sequence identity with the prototype strain ST1 or ST2 and human bocavirus 3 strain W471, respectively. The nucleotide and deduced amino acid sequences of NS-1 and NP-1 shared high identities (>97%) with the HBoV2 and HBoV sequences ( ). Phylogenetic analysis indicated that HBoV2 is more closely related to HBoV .\n Using nested PCR and sequencing, we identifi ed HBoV2 infections in 10 (4.3%) of 235 NPAs from children hospitalized with ARTI. Most of the patients were <3 years old. In HBoV2-positive patients, co-infection was high (70%), with RSV being the most common co-pathogen. Primers SN1 and SN2 were designed to detect the NP1 gene in the 10 HBoV2-positive patients. However, only 4 gave positive results, which occurred because of the low PCR sensitivity with this pair of primers and because the NP1 gene of HBoV2 is divergent, as described . Furthermore, as previous studies pointed out, potential recombination upstream from the NP1 gene may explain the lower detection. Phylogenetic analysis showed that the NS-1 region of the HBoV2 strain (LZ480 and LZ578) clustered closely with that of the HBoV2 PK-2255 strain (FJ170279), and the NP-1 region clustered closely with that of the HBoV2 W153 strain (EU082213), suggesting potential recombination in the HBoV2 strains ( . In addition, 11 HBoV sequences were amplifi ed by using nested-PCR for HBoV2. In the future, HBoV2-specifi c primers should be designed to investigate the prevalence of HBoV2 and its potential association with disease.\n We found no difference in the clinical symptoms or length of hospital stay between the groups with HBoV2 and HBoV monoinfection, as well as between the groups with HBoV2 monoinfection and HBoV2 co-infection ). Statistical analysis indicated that HBoV2 and HBoV coinfection obviously did not correlate with disease severity (data not shown). Two of 3 patients with HBoV2 monoinfection had diarrhea with no vomiting , and only 1 of 10 patients who were HBoV2 positive vomited. Further investigation is needed to exclude oral or inhaled gastric viruses as possible sources of NPA-associated HBoV2. Phylogenetic analysis showed a high degree of similarity between HBoV2 sequences found in China and those in other areas . Our results suggest that like HBoV, HBoV2 is distributed worldwide and may be associated with respiratory and enteric diseases. Additional studies are needed to confi rm the association between human bocavirus species (HBoV2 and HBoV) and respiratory tract infections or other diseases."
                 }
            ]
        }
    ]
}

Overwriting input_file.json


### **Prediction**



In [0]:
!python run_squad.py \
  --vocab_file=$BUCKET_NAME/biobert_large/vocab_cased_pubmed_pmc_30k.txt \
  --bert_config_file=$BUCKET_NAME/biobert_large/bert_config_bio_58k_large.json \
  --init_checkpoint=$OUTPUT_DIR/model.ckpt-10859 \
  --do_train=False \
  --max_query_length=30  \
  --do_predict=True \
  --use_tpu=True \
  --tpu_name=grpc://10.1.117.58:8470 \
  --predict_file=input_file.json \
  --predict_batch_size=8 \
  --n_best_size=3 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=$OUTPUT_DIR/output/




W0418 01:59:38.910346 140328320477056 module_wrapper.py:139] From run_squad.py:1127: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0418 01:59:38.910575 140328320477056 module_wrapper.py:139] From run_squad.py:1127: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0418 01:59:38.910758 140328320477056 module_wrapper.py:139] From /content/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0418 01:59:40.186986 140328320477056 module_wrapper.py:139] From run_squad.py:1133: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related op

In [0]:
from google.colab import drive
drive.mount('/content/drive')