XML(from pdf) to txt breakdown
Soup_final.ipynb will take a directory of XML files (parsed pdfs using GROBID) and will make a folder for each file with the files in the folder corresponding to different sections of the pdf paper.
Tools used: Beautiful Soup
Output Data: used in https://github.com/vmm221313/LongSumm