From fe8054d088b0011cc95ca1de1e9c54c7e8be95a3 Mon Sep 17 00:00:00 2001
From: Arman Cohan <rmncohan@gmail.com>
Date: Fri, 22 Mar 2019 00:53:20 -0700
Subject: [PATCH] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index d4875df..93829ad 100644
--- a/README.md
+++ b/README.md
@@ -7,12 +7,12 @@ Two datasets of long and structured documents (scientific papers) are provided.
 ArXiv dataset: [Download](https://drive.google.com/file/d/1K2kDBTNXS2ikx9xKmi2Fy0Wsc5u_Lls0/view?usp=sharing)  
 PubMed dataset: [Download](https://drive.google.com/file/d/1Sa3kip8IE0J1SkMivlgOwq1jBgOnzeny/view?usp=sharing)
 
-The datasets are rather large. You need about 5G disk space to download and about 15G additional space when extracting the files. Each `tar` file consists of 4 files. `train.txt`, `val.txt`, `test.txt` respectively correspond to the training, validation, and test sets. These files are text files where each line is a json object corresponding to one scientific paper from ArXiv or PubMed. Use the the following script: `scripts/json_to_bin.py` to convert these files into Tensorflow bin files that are used for training. The `vocab` file is a plaintext file for the vocabulary. 
+The datasets are rather large. You need about 5G disk space to download and about 15G additional space when extracting the files. Each `tar` file consists of 4 files. `train.txt`, `val.txt`, `test.txt` respectively correspond to the training, validation, and test sets. These files are text files where each line is a json object corresponding to one scientific paper from ArXiv or PubMed. The `vocab` file is a plaintext file for the vocabulary. 
 
 #### Code
 
 The code is based on the pointer-generator network code by [See et al. (2017)](https://github.com/abisee/pointer-generator). Refer to their repo for documentation about the structure of the code.
-You will need `python 3.6` and `Tensorflow 1.5` to run the code. The code might run with later versions of Tensorflow but it is not tested. Checkout other dependencies in `requirements.txt` file. To run the code unzip the files in the `data` directory and simply execute the run script: `./run.sh`.  
+You will need `python 3.6` and `Tensorflow 1.5` to run the code. The code might run with later versions of Tensorflow but it is not tested. Checkout other dependencies in `requirements.txt` file. A small sample of the dataset is already provided in this repo. To run the code with the sample data unzip the files in the `data` directory and simply execute the run script: `./run.sh`. To train the model with the entire dataset, first convert the jsonlines files to binary using the the following script: `scripts/json_to_bin.py` and modify the corresponding training data path in the `run.sh` script.
 
 #### References