Convert winobias dataset from .txt format to .conll format
Based on Berkeley Coref system (please check their website for more info)
- extract the senteces from winobias.txt (in our case, winobias.txt means anti_stereotyped_type1.txt.dev etc.)
mkdir wino_sentences
python toSentences.py data/anti_stereotyped_type1.txt.dev wino_sentences/
-
Run Berkeleycoref preprocessh script (refer to "preprocessing" section here)
-
Add all the side info obtained by Berkeleycoref to our data:
mkdir wino_berkeley
python addCoref.py data/winobias.txt data/wino_preprocess/ wino_berkeley/
mkdir wino_conll
python toWino.py wino_berkeley/ wino_conll/