Script to parse accepted codes from Codeforces
If we create an object of the class CodeforcesScraper
by passing the username in the constructor. When we call the function to scrape all the accepted codes, we pass the number of pages of submission and the name of the directory where to save the code. An user can have more than one submission pages, for example I have 90 currently. The function will start from page 1 and finish on the page specified.
comparison
can be either time
or memory
. This is an optional parameter which is time
by default.
cf = CodeforcesScraper(user-name)
cf.get_all_accepted_sol(total_page_of_submission, root_directory_name_to_save_codes, comparison)
- You can't access all the codes on Codeforces. For example you
can't check
what others has submitted in theGym
. Usually in the submission table, we can find the submission link in the first cell of a row. The cell has an<a></a>
. If the code is hidden then the link is not present. Instead they use a<span></span>
where we can just get the submission id. - A single problem may have one or more accepted submissions.
Div.1 A
problems areDiv.2 C
problems when the contest is held for both divisons.
- All the submission page is requested and scraped using
bs4
. - The submission id is retrieved.
- The name, problem link, language, time and memory is also retrieved.
- We can check if the problem belongs to a gym using the problem-link as it will contain
gym
in the url. - By parsing the link we determine the problem level e.g. A, B, C... and contest id.
- If submission for a problem is more than one then the efficient one is considered.
- After we have all the details about the problem and submission, we can call another url to get the code.
- The
util.py
has a dictionary to map file extension with programming language. Not all the languages supported byCodeforces
is not given. They can be added if required. - After getting the code we have a
CodeInfo
object. - We create separate folders for A, B, C etc. write the codes in file with appropriate extensions. The naming convention is like
1000A. problem_name
.
- Save all the accepted submissions for a particular problem.
- Write functions to retrieve
wrong answer
,time limit exceeded
,runtime error
etc.
In case of any bug found please create an issue
🙂