TopcoderToPDF Crawler

Crawls/downloads algorithmic problems from the Topcoder problem archive and compiles them into a single PDF file for portability.

Note: This project is no longer active. If you are using this and it does not work, please try to fix it yourself.

Library requirements

pdfkit
PyPDF2

Note: After cloning the repository, you need to run git submodule update --init --recursive to fetch the PyPDF2 submodule. Install the version of PyPDF2 that is provided in this repository. The newer versions do not seem to work in this situation.

Documentation and working steps

topcoderParse.py crawls the topcoder archive and saves the htmls in the folder htmls.
Downloading all the problems can take a lot of time and can even fail. In that case one might stop and rerun the program. Before re-running the program, set done = x in topcoderParse.py, where x denotes the number of problems to skip downloading. Note that the program prints the problem number of the problem being downloaded, so set done as the problem number of the last successful download. This way it will skip downloading the problems already downloaded.
topcoderGenPdf.py cleans the htmls and uses pdfkit to generate pdfs for all the files into the PDFs folder.
filemerger.py merges the pdfs into single files. This produces two files srmmerged.pdf and othermerged.pdf for SRMs and non-SRMs respectively.
createindex.py generates the LaTeX code for the final pdfs of the two files. This also includes a generated index for easy navigation.
The command pdflatex Topcoder<X>.tex compiles the LaTeX documents to the final PDFs. X stands for SRMs and Others. The final PDFs are named as TopcoderSRMs.pdf and TopcoderOthers.pdf.

To make this work behind a proxy, uncomment the proxy option in the following lines of the file topcoderGenPdf.py and set the proper proxy address.

options = {
    'page-size': 'A5',
    'margin-top': '0.30in',
    'margin-right': '0.0in',
    'margin-bottom': '0.30in',
    'margin-left': '0.0in',
    'cache-dir': 'html_cache',
    # 'proxy': '10.3.100.207:8080'
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Merged		Merged
PDFs		PDFs
PyPDF2 @ 1f09c39		PyPDF2 @ 1f09c39
html_cache		html_cache
htmls		htmls
proc_htmls		proc_htmls
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
TopCoder Statistics - Problem Archive.html		TopCoder Statistics - Problem Archive.html
createindex.py		createindex.py
filemerger.py		filemerger.py
topcoderGenPdf.py		topcoderGenPdf.py
topcoderParse.py		topcoderParse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TopcoderToPDF Crawler

Library requirements

Documentation and working steps

About

Uh oh!

Releases

Packages

Languages

biswajitsc/TopcoderToPdf

Folders and files

Latest commit

History

Repository files navigation

TopcoderToPDF Crawler

Library requirements

Documentation and working steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages