AI Policy #5
shelltr
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
slug: ai-policy
description: Our AI policy
AI Policy
The following is intended to be a living document of our AI policy. Comments, questions, and concerns should be added below. You can also reach maintainers at the following locations:
1. How CivicPatch uses AI
Maintaining a separate scraper for each of the ~34,000 municipal websites in the US would be an enormous burden for our small team. Instead, we use a single scraper pipeline that converts each page to markdown and sends it to an LLM (large language model) via a third party API to extract structured data: name, email, phone, role, and so on.
The hallucination problem
LLMs are prone to hallucinating facts when prompted for information that doesn't exist or is ambiguous. For factual fields like contact details, we check the LLM's output against the source page text. If a value can't be found there, the model is re-prompted. Fields that require interpretation (like how a role is titled or whether a seat is ward-based) have more room for judgment, which is why human review matters.
Once a scrape finishes, results are published as pull requests to our open-data repository. Nothing reaches the public dataset until a volunteer has reviewed and approved it.
Models in Use
Planned: We don't yet have solid numbers on how often volunteers correct the LLM's output. That's a number we're going to capture as we bring on more volunteers through various channels, like Unified.
2. Transparency
We want people to trust our data and processes. The goal of CivicPatch is to produce a centralized dataset that other organizations and developers can build on, deduplicating data gathering efforts across the civic tech space.
To that end, we maintain transparency about how data is collected and how we verify it.
Each person record includes a list of source URLs (the page our scraping pipelines used) and a timestamp for when it was scraped. Users of our data can manually fact-check any record using the source URL.
Active: Source URLs in every person record, visible on site and in raw data; scrape timestamps in raw data
Planned:
3. Accuracy & Verification
Every record in CivicPatch has been reviewed by a human volunteer before it reaches the public dataset. Volunteers look at the diff between the current scrape and the previous one, resolve any issues flagged by our heuristics, and publish when the data looks correct.
What human review doesn't cover: a volunteer can't catch data that goes stale between scrape cycles, and like any system that accepts human input, we're not immune to malicious actors. We're working on moderation filters and a changelog to address both.
Active: Human review required before publish; volunteers can correct records after the fact
Planned: Moderation filters for malicious edits; changelog queue for tracking changes
4. Coverage bias
Not all municipal websites are created equal. Low-population jurisdictions tend to have older, harder-to-scrape sites. Some municipalities publish their elected officials in PDFs or non-standard formats the pipeline can't read. Others list officials in multiple places, with no clear signal about which is current.
We know lower-population jurisdictions have more data quality issues, but we don't yet have a clear picture of whether those gaps follow geographic or demographic patterns. That's something we want to understand better.
The long-term answer to non-standard formats isn't building scrapers for every edge case. It's pushing for data standards that make local government data consistently machine-readable.
Planned:
5. Human oversight
No scraped data reaches the public dataset without a human volunteer reviewing and approving it. Both volunteers and maintainers can edit records after the fact. When there's a disagreement about a record, maintainers have final say.
The process for handling disputed records is currently informal. We're working on formalizing it.
Active: Human review required before publish; maintainers can override volunteer decisions
Planned: Formal process for disputed records
6. Data privacy
The contact information we publish about elected officials is public record, sourced directly from municipal websites. We don't publish anything that wasn't already published by the municipality.
For volunteers, signing in with GitHub grants CivicPatch read access to your GitHub profile (name and email) and your CivicPatch organization team membership, which determines your role on the platform. We store your GitHub ID, display name, email, and team assignments. We don't request or store anything beyond that.
We recognize that requiring a GitHub account is a barrier for some volunteers. We're exploring Google SSO and other email-based options to make signing up more accessible.
Active: GitHub OAuth with minimal scopes; only profile, email, and team membership stored
Planned: Google SSO; additional email-based sign-in options
7. Misuse
The data CivicPatch publishes is intentionally public. Elected officials' contact information is already on municipal websites, and we're centralizing it, not creating new exposure. We believe the public benefit of a shared, maintained dataset outweighs the risk of misuse.
That said, we're aware that a public dataset can be bulk-downloaded and used in ways we didn't intend. We're considering whether a terms of use would help set community expectations around acceptable use, even if it can't prevent bad actors outright.
Planned: Terms of use covering acceptable use of the dataset
8. Accessibility
We haven't done a formal accessibility review of the site yet. We're working on it!
Planned:
9. Accountability
Every review and edit on CivicPatch is attributed to a GitHub account, so there's already an audit trail for all changes to the dataset.
We don't yet have a formal moderation policy. We're working on one.
Planned:
Notes
This policy is a guiding set of principles but is a living document that will change as AI and regulations change and we grow in experience.
Beta Was this translation helpful? Give feedback.
All reactions