AI Policy #5

shelltr · 2026-05-13T03:04:00Z

shelltr
May 13, 2026
Maintainer

slug: ai-policy
description: Our AI policy

AI Policy

The following is intended to be a living document of our AI policy. Comments, questions, and concerns should be added below. You can also reach maintainers at the following locations:

email
community <- you will need an invite from someone at unified

1. How CivicPatch uses AI

Maintaining a separate scraper for each of the ~34,000 municipal websites in the US would be an enormous burden for our small team. Instead, we use a single scraper pipeline that converts each page to markdown and sends it to an LLM (large language model) via a third party API to extract structured data: name, email, phone, role, and so on.

The hallucination problem

LLMs are prone to hallucinating facts when prompted for information that doesn't exist or is ambiguous. For factual fields like contact details, we check the LLM's output against the source page text. If a value can't be found there, the model is re-prompted. Fields that require interpretation (like how a role is titled or whether a seat is ward-based) have more room for judgment, which is why human review matters.

Once a scrape finishes, results are published as pull requests to our open-data repository. Nothing reaches the public dataset until a volunteer has reviewed and approved it.

Models in Use

Google Gemini
- Used for any task that would require a Google search (for example, searching for the latest domain for a municipality), or filling out or expected officials list (for the very first time scrape of a municipality).
- If you are aware of any alternative APIs for this functionality, let us know! If this service becomes prohibitively expensive or is otherwise unavailable to us, this would risk some of the features in our pipelines.
Deepseek v3.2
- Used in extraction tasks to convert unstructured markdown -> structured data (example: parsing out names, emails, phones, roles from text and grouping them under the proper person record)

Planned: We don't yet have solid numbers on how often volunteers correct the LLM's output. That's a number we're going to capture as we bring on more volunteers through various channels, like Unified.

2. Transparency

We want people to trust our data and processes. The goal of CivicPatch is to produce a centralized dataset that other organizations and developers can build on, deduplicating data gathering efforts across the civic tech space.

To that end, we maintain transparency about how data is collected and how we verify it.

Each person record includes a list of source URLs (the page our scraping pipelines used) and a timestamp for when it was scraped. Users of our data can manually fact-check any record using the source URL.

Active: Source URLs in every person record, visible on site and in raw data; scrape timestamps in raw data
Planned:

Surface timestamps (scraped at, reviewed at, reviewed by) on the site
Disclosure on jurisdiction/person pages that data is machine-scraped and volunteer-reviewed

3. Accuracy & Verification

Every record in CivicPatch has been reviewed by a human volunteer before it reaches the public dataset. Volunteers look at the diff between the current scrape and the previous one, resolve any issues flagged by our heuristics, and publish when the data looks correct.

What human review doesn't cover: a volunteer can't catch data that goes stale between scrape cycles, and like any system that accepts human input, we're not immune to malicious actors. We're working on moderation filters and a changelog to address both.

Active: Human review required before publish; volunteers can correct records after the fact
Planned: Moderation filters for malicious edits; changelog queue for tracking changes

4. Coverage bias

Not all municipal websites are created equal. Low-population jurisdictions tend to have older, harder-to-scrape sites. Some municipalities publish their elected officials in PDFs or non-standard formats the pipeline can't read. Others list officials in multiple places, with no clear signal about which is current.

We know lower-population jurisdictions have more data quality issues, but we don't yet have a clear picture of whether those gaps follow geographic or demographic patterns. That's something we want to understand better.

The long-term answer to non-standard formats isn't building scrapers for every edge case. It's pushing for data standards that make local government data consistently machine-readable.

Planned:

Support for PDFs and non-conventional formats
Advocacy for open data standards at the local government level (Civic Data Tech)

5. Human oversight

No scraped data reaches the public dataset without a human volunteer reviewing and approving it. Both volunteers and maintainers can edit records after the fact. When there's a disagreement about a record, maintainers have final say.

The process for handling disputed records is currently informal. We're working on formalizing it.

Active: Human review required before publish; maintainers can override volunteer decisions
Planned: Formal process for disputed records

6. Data privacy

The contact information we publish about elected officials is public record, sourced directly from municipal websites. We don't publish anything that wasn't already published by the municipality.

For volunteers, signing in with GitHub grants CivicPatch read access to your GitHub profile (name and email) and your CivicPatch organization team membership, which determines your role on the platform. We store your GitHub ID, display name, email, and team assignments. We don't request or store anything beyond that.

We recognize that requiring a GitHub account is a barrier for some volunteers. We're exploring Google SSO and other email-based options to make signing up more accessible.

Active: GitHub OAuth with minimal scopes; only profile, email, and team membership stored
Planned: Google SSO; additional email-based sign-in options

7. Misuse

The data CivicPatch publishes is intentionally public. Elected officials' contact information is already on municipal websites, and we're centralizing it, not creating new exposure. We believe the public benefit of a shared, maintained dataset outweighs the risk of misuse.

That said, we're aware that a public dataset can be bulk-downloaded and used in ways we didn't intend. We're considering whether a terms of use would help set community expectations around acceptable use, even if it can't prevent bad actors outright.

Planned: Terms of use covering acceptable use of the dataset

8. Accessibility

We haven't done a formal accessibility review of the site yet. We're working on it!

Planned:

WCAG audit of the review and queue UI
Keyboard navigation support
Screen reader compatibility for custom components

9. Accountability

Every review and edit on CivicPatch is attributed to a GitHub account, so there's already an audit trail for all changes to the dataset.

We don't yet have a formal moderation policy. We're working on one.

Planned:

Moderation queue for flagged edits
Formal removal process: maintainers can revoke access by removing a volunteer from the GitHub organization

Notes

This policy is a guiding set of principles but is a living document that will change as AI and regulations change and we grow in experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Policy #5

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

AI Policy #5

Uh oh!

Uh oh!

shelltr May 13, 2026 Maintainer

slug: ai-policy description: Our AI policy

AI Policy

1. How CivicPatch uses AI

The hallucination problem

Models in Use

2. Transparency

3. Accuracy & Verification

4. Coverage bias

5. Human oversight

6. Data privacy

7. Misuse

8. Accessibility

9. Accountability

Notes

Replies: 0 comments

shelltr
May 13, 2026
Maintainer

slug: ai-policy
description: Our AI policy