Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
inital commit

initial commit
  • Loading branch information
armancohan committed Oct 29, 2018
1 parent 6c54644 commit c767d75
Show file tree
Hide file tree
Showing 23 changed files with 4,990 additions and 2 deletions.
115 changes: 115 additions & 0 deletions .gitignore
@@ -0,0 +1,115 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
.DS_Store
# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
.idea
18 changes: 16 additions & 2 deletions README.md
@@ -1,3 +1,4 @@
This repository contains data and code for the NAACL 2018 paper ["A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"](https://arxiv.org/abs/1804.05685). Please note that the code is not actively maintained.

#### Data

Expand All @@ -8,10 +9,23 @@ PubMed dataset: [Download](https://drive.google.com/file/d/1Sa3kip8IE0J1SkMivlgO

The datasets are rather large. You need about 5G disk space to download and about 15G additional space when extracting the files. Each `tar` file consists of 4 files. `train.txt`, `val.txt`, `test.txt` respectively correspond to the training, validation, and test sets. These files are text files where each line is a json object corresponding to one scientific paper from ArXiv or PubMed. The `vocab` file is a plaintext file for the vocabulary.

#### Reference
#### Code

The code is based on the pointer-generator network code by [See et al. (2017)](https://github.com/abisee/pointer-generator). Refer to their repo for documentation about the structure of the code.
You will need `python 3.6` and `Tensorflow 1.5` to run the code. The code might run with later versions of Tensorflow but it is not tested. Checkout other dependencies in `requirements.txt` file. To run the code unzip the files in the `data` directory and simply execute the run script: `./run.sh`.

#### References

If you ended up finding this paper or repo useful please cite:
```
"A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"
"A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"
Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian
NAACL-HLT 2018
```

Another relevant reference is Pointer-Generator network by See et al. (2017):
```
"Get to the point: Summarization with pointer-generator networks."
Abigail See, Peter J. Liu, and Christopher D. Manning.
ACL (2017).
```
Empty file added __init__.py
Empty file.

0 comments on commit c767d75

Please sign in to comment.