Crawls/downloads algorithmic problems from the Topcoder problem archive and compiles them into a single PDF file for portability.
Note: This project is no longer active. If you are using this and it does not work, please try to fix it yourself.
- pdfkit
- PyPDF2
Note: After cloning the repository, you need to run git submodule update --init --recursive
to fetch the PyPDF2 submodule. Install the version of PyPDF2 that is provided in this repository. The newer versions do not seem to work in this situation.
-
topcoderParse.py
crawls the topcoder archive and saves the htmls in the folderhtmls
. -
Downloading all the problems can take a lot of time and can even fail. In that case one might stop and rerun the program. Before re-running the program, set
done = x
intopcoderParse.py
, wherex
denotes the number of problems to skip downloading. Note that the program prints the problem number of the problem being downloaded, so setdone
as the problem number of the last successful download. This way it will skip downloading the problems already downloaded. -
topcoderGenPdf.py
cleans the htmls and usespdfkit
to generate pdfs for all the files into thePDFs
folder. -
filemerger.py
merges the pdfs into single files. This produces two filessrmmerged.pdf
andothermerged.pdf
for SRMs and non-SRMs respectively. -
createindex.py
generates the LaTeX code for the final pdfs of the two files. This also includes a generated index for easy navigation. -
The command
pdflatex Topcoder<X>.tex
compiles the LaTeX documents to the final PDFs.X
stands for SRMs and Others. The final PDFs are named asTopcoderSRMs.pdf
andTopcoderOthers.pdf
. -
To make this work behind a proxy, uncomment the proxy option in the following lines of the file
topcoderGenPdf.py
and set the proper proxy address.options = { 'page-size': 'A5', 'margin-top': '0.30in', 'margin-right': '0.0in', 'margin-bottom': '0.30in', 'margin-left': '0.0in', 'cache-dir': 'html_cache', # 'proxy': '10.3.100.207:8080' }