Proposal: Enhance Project Scraper Robustness and Observability (GSoC 2026)

### Description
While reviewing the `backend/apps/owasp/` module in preparation for GSoC 2026, I noticed that the current `OwaspScraper` in `scraper.py` has a few limitations regarding page structure parsing and production observability. 

If the scraper encounters a non-standard page layout or fails a request, it logs the exception locally but does not provide database-level visibility for maintainers to track scrape health over time.

### Proposed Changes
To make the data ingestion pipeline more robust, I propose the following enhancements:

1. **Flexible Selectors (`scraper.py`)**: Update `get_urls()` and other parsing methods to use more resilient XPath/CSS selectors, falling back to alternative content containers if `div[@class='sidebar']` is missing.
2. **Scrape Logging Model (`scrape_log.py`)**: Create a lightweight Django model to record the status, duration, and specific errors of automated scraping tasks.
3. **Admin Integration (`admin.py`)**: Register the new logging model in the Django Admin panel so maintainers can visually monitor scraper health and identify failing project URLs without digging through server logs.

**Note to Maintainers:** I already have my local environment fully synced and am actively testing these XPath fallback strategies. I would love to be assigned this issue so I can submit a Pull Request with the implementation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Enhance Project Scraper Robustness and Observability (GSoC 2026) #4380

Description

Proposed Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Enhance Project Scraper Robustness and Observability (GSoC 2026) #4380

Description

Description

Proposed Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions