Skip to content

Fix DAC (Dacorum) scraper#346

Merged
symroe merged 1 commit into
masterfrom
fix/DAC-scraper
Jun 16, 2026
Merged

Fix DAC (Dacorum) scraper#346
symroe merged 1 commit into
masterfrom
fix/DAC-scraper

Conversation

@symroe

@symroe symroe commented Jun 8, 2026

Copy link
Copy Markdown
Member

What broke

The https://democracy.dacorum.gov.uk certificate is not trusted by wreq's embedded BoringSSL CA bundle. Every request to the ModGov ASMX endpoint fails immediately with CERTIFICATE_VERIFY_FAILED. The dashboard error showed "Request timeout after 30+ seconds" but local reproduction reveals the underlying cause is a TLS certificate validation failure (completes in under 1 second, not a timeout).

What was fixed

  • Added verify_requests = False to the Scraper class in councillors.py

Scrape results

Metric Count
Councillors found 50
With email address 0
With photo 50

Note: Dacorum does not publish email addresses via their ModGov XML service — the <emailaddress> field is absent from all councillor records.


Generated by Claude Code

The https://democracy.dacorum.gov.uk certificate is not trusted by wreq's
embedded BoringSSL CA bundle, causing CERTIFICATE_VERIFY_FAILED on every
request. Adding verify_requests = False bypasses TLS certificate validation,
allowing the ModGov ASMX endpoint to respond with councillor data. We scrape
read-only public data so MITM risk is acceptable.

Locally verified: 50 councillors, 50 with photos. The council does not
publish email addresses via the ModGov XML service.
@symroe

symroe commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after 7d88f41

Initial fix: verify_requests = False to bypass BoringSSL cert rejection on democracy.dacorum.gov.uk.

Verified locally with uv run manage.py councillors --council DAC -v — completed in 4 seconds with no errors.

Metric Count
Councillors found 50
With email address 0
With photo 50

Emails: the <emailaddress> field is absent from all councillor records — Dacorum does not publish email addresses via this service.


Generated by Claude Code

@symroe

symroe commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Re-scrape after 7d88f41 — emails ARE available ✅

Re-ran the scraper on this branch. Correction to the PR body: Dacorum does publish email addresses via the ModGov XML service. The <email> tag is populated for every councillor that has one — the earlier "0 emails" note was incorrect (it appears to have looked for an <emailaddress> tag rather than the actual <email> tag, which the ModGovCouncillorScraper base class already handles).

No code change required — the existing verify_requests = False fix is sufficient and emails are captured automatically.

Metric Count
Councillors found 51
With email address 50
With photo 51

Emails captured cleanly, e.g. colette.wyatt-lowe@dacorum.gov.uk. The single councillor without an email (Hugo Hardy, id 1588) genuinely has no <email> tag in the council's XML — likely newly elected.

Delta vs PR body: emails 0 → 50.

@symroe symroe merged commit d8528bf into master Jun 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant