Skip to content

Latest commit

 

History

History
67 lines (32 loc) · 2.35 KB

README.md

File metadata and controls

67 lines (32 loc) · 2.35 KB

First things first: conda activate p36 pip install --upgrade pip --user

Outcome: Customizes a popular repo by same name. To do abstractive summarization of one's own text files.

Methods: (Python 3 on Jupyter/ Google Colab)

Do not 'mount' drive untill you install the dependnecy in step 2 . ( Under google 'drive' > 'My Drive' is the 'PreSumm' clone. It's the oroginal repo. Under 'bert_data' is pre-preprocessed data. Under 'models' , a pretrained model. Under 'stanford', the stanford-core-nlp files. )

Step 1 - Open colab . Use a/c & p/w sent separately .

Training numbers are small for debugging. But, if one needs GPU : Runtime -> Change runtime type -> Hardware Accelerator-> GPU.

To cross-check GPU :

#!/usr/bin/env bash

import tensorflow as tf

tf.test.gpu_device_name()   # you should get '/device:GPU:0'

Step 2 - Upload pyrouge . It's the ' .ipynb' sent separately. File > "upload notebook" . Then, Runtime > "run all" . When successfull it says "ran ~11 tests in ~5s." Mount google drive (use a/c & p/w sent separately)

Step 3 - Requirements . In colab use prefix, !

Set environment to python 36 (e.g., conda activate p36)

pip install multiprocess --user

pip install tensorboardX --user

pip install pytorch-transformers==1.1.0 --user
    
pip install torch==1.1.0 --user

Step 4 - stanford-core-nlp . Use the bash. Edit the path if needed .

%%bash

#export CLASSPATH='/content/drive/My Drive/PreSumm/stanford/stanford-corenlp-3.9.2.jar'

export CLASSPATH=~/o3/PreSumm/stanford/stanford-corenlp-3.9.2.jar

echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer

Skip preprocessing, we're running on pretrained data.

Step 5 - Test . Foll. under src folder. In colab use shebang .

python process_test.py ~/o3/PreSumm/raw_data/cnn/test.txt

#!/bin/python

note: clean out .pt files under raw data/cnn/bert, logs and.pt files , except model_step_148000.pt

cd src

python train.py -task abs -mode test -test_from ../models/model_step_148000.pt -batch_size 3000 -test_batch_size 50 -bert_data_path ../bert_data/cnndm -log_file ../logs/val_abs_bert_cnndm -sep_optim true -use_interval true -visible_gpus 0 -max_pos 512 -max_length 800 -alpha 0.95 -min_length 200 -result_path ../logs/abs_test_result