# Codeforces Scraper: Step-by-Step Explanation

This scraper extracts problem details and editorial content from Codeforces. It uses Python with Selenium for web automation and BeautifulSoup for HTML parsing. The scraped data is stored locally in structured formats.

---

## 1. **Configuration**
   - **Directories:**
     - `codeforces_scraper/problems` to store problem statements.
     - `codeforces_scraper/editorials` to store editorial content.
   - These directories are created using `os.makedirs()` to ensure they exist before storing data.

---

## 2. **Selenium WebDriver Setup**
   - **Headless Mode:** Configured to run in headless mode for efficiency.
   - **Driver Installation:** `webdriver_manager` is used to install the appropriate ChromeDriver.
   - **Binary Location:** The Chrome browser binary is explicitly specified to avoid driver location issues.

---

## 3. **Helper Functions**
### `save_to_file(directory, filename, content)`
   - Saves content (HTML or text) to a specified file.
   - Parameters:
     - `directory`: Target directory.
     - `filename`: Name of the file.
     - `content`: The content to save.

### `save_metadata(directory, metadata)`
   - Saves problem metadata as a JSON file.
   - Parameters:
     - `directory`: Target directory.
     - `metadata`: A dictionary containing metadata such as title, tags, time limit, and memory limit.

---

## 4. **Scraping Functions**
### `scrape_problem(driver, url)`
   - **Process:**
     1. Open the problem URL in the browser using Selenium.
     2. Wait for the `problem-statement` element to load.
     3. Parse the page with BeautifulSoup to extract:
        - Title
        - Problem statement
        - Tags
        - Time and memory limits
     4. Save the problem statement in `.txt` format.
     5. Save metadata in `metadata.json`.
   - **Error Handling:** Catches and logs any exceptions encountered during scraping.

### `scrape_editorial(driver, url)`
   - **Process:**
     1. Open the editorial URL in the browser using Selenium.
     2. Wait for the `content` element to load.
     3. Parse the page with BeautifulSoup to extract:
        - Editorial title
        - Main content
     4. Save the editorial in `.txt` format.
   - **Error Handling:** Catches and logs any exceptions encountered during scraping.

---

## 5. **Main Program**
   - **Problem URL:** Example problem URL from Codeforces.
   - **Editorial URL:** Corresponding editorial URL.
   - **Execution:**
     1. Call `scrape_problem()` to extract problem details.
     2. Call `scrape_editorial()` to extract editorial content.
     3. Ensure the driver quits at the end, even if an error occurs.

---

 6. **Output**
   - **Problem Statement:** Stored as a `.txt` file in the `problems` directory.
   - **Metadata:** Stored as `metadata.json` in the `problems` directory.
   - **Editorial Content:** Stored as a `.txt` file in the `editorials` directory.

---

 7. Dependencies
   - Libraries:
     - `os` and `json` for file handling.
     - `BeautifulSoup` for HTML parsing.
     - `Selenium` for web automation.
     - `webdriver_manager` for managing the ChromeDriver.
   - System Requirements:
     - Chrome browser installed.
     - Python packages installed via `pip`.

---

 8. Error Handling
   - Gracefully handles errors during scraping and logs them for debugging.
   - Ensures the browser driver is quit in case of exceptions.

---

 Example Usage:
To scrape a problem and its editorial:
```python
problem_url = "https://codeforces.com/problemset/problem/1/A"
editorial_url = "https://codeforces.com/blog/entry/1"

scrape_problem(driver, problem_url)
scrape_editorial(driver, editorial_url)
