Skip to content

data-iitd/TechTrack

Repository files navigation

Source Code: The base code for this repository has been taken from https://github.com/allenai/propara

TechTrack

The TechTrack dataset is track properties of diverse set of entities in technical procedural documents. To understand the format of the dataset, refer to the ProPara dataset:

    Reasoning about Actions and State Changes by Injecting Commonsense Knowledge, Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark, EMNLP 2018

These models are built using the PyTorch-based deep-learning NLP library, AllenNLP.

  • ProLocal: A simple local model that takes a sentence and entity as input and predicts state changes happening to the entity.
  • Bert: A Bert-based classifier that takes as input a natural query and step text and builds a linear classifier on top of CLS embedding.

ProLocal and Bert models are described in our paper. The setups are also described in brief in dataset/README.md.

Setup Instruction

  1. Create the techtrack environment using Anaconda
conda create -n techtrack python=3.7
  1. Activate the environment
source activate techtrack
  1. Install the requirements in the environment:
pip install -r requirements.txt

Train your own models

Detailed instructions are given in the following READMEs:

Make sure to place the data files for the desired setup from dataset/ to data/Inputs. Continue to Scripts to use section for more instructions.

Scripts to use

Use various scripts in the root folder to run training and testing of various models and datasets

Helper script to move data files

For each setup i and model M the data files are in dataset/setup_{i}/{M}/ which has to be placed in data/Inputs/. This can also be done using the helper script transport_data.sh whose synopsis is:

./transport_data.sh setup-num model

where

  • setup-num is the setup number from {1,2,3}.
  • model is the model name. Choose from {bert,prolocal} for setup-num = 1 and {bert} for setup-num = 2, 3.

Data Processing Scripts

Use the scripts from "data processing scripts" folder to parse data from wikihow pages and parse Brat output files to usable dataset formats

  • State change type dataset for ProLocal model
  • Natural Query, step and change type dataset for Bert model

The raw subfolder also contains some un-documented, intermediate and raw scripts which need not be used but are present in case needed.

About

Tracking Entity States in WikiHow Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published