Parsers

XML(from pdf) to txt breakdown

Soup_final.ipynb will take a directory of XML files (parsed pdfs using GROBID) and will make a folder for each file with the files in the folder corresponding to different sections of the pdf paper.

Tools used: Beautiful Soup

Output Data: used in https://github.com/vmm221313/LongSumm

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
Soup.ipynb		Soup.ipynb
Soup_2.ipynb		Soup_2.ipynb
Soup_final.ipynb		Soup_final.ipynb
a.txt		a.txt
sample.pdf		sample.pdf
sample.tei.xml		sample.tei.xml
soup.py		soup.py
xml_parser.py		xml_parser.py
xml_to_.py		xml_to_.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parsers

About

Releases

Packages

Languages

dchandak99/Parsers

Folders and files

Latest commit

History

Repository files navigation

Parsers

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages