A Rails application for navigating PDF accessibility audits. We use traditional NLP and LLM processes to prioritize and stratify documents, guiding stakeholders through corrective action decision-making. In the future we hope to build in more accessibility auditing and remediation. For additional documentation, see the docs folder.
Fill out this Google Form to connect with Code for America and learn more.
Before you begin, ensure you have the following installed:
- Ruby 3.2.2 (we recommend using a version manager like
rbenv
orrvm
) - Node.js 18.17.0 (we recommend using
nvm
for version management) - Yarn (latest version)
- Redis (for Sidekiq background jobs)
- PostgreSQL locally or in a container.
- Docker and Docker Compose (for LocalStack S3 in development)
-
Clone the repository:
git clone https://github.com/codeforamerica/asap_pdf.git cd asap_pdf
-
Install Ruby dependencies:
bundle install
-
Install JavaScript dependencies:
yarn install
-
Setup the database:
bin/rails db:setup
Start the development server and all required processes:
bin/dev
This command starts the following processes (defined in Procfile.dev
):
- Rails server
- JavaScript build process (with esbuild)
- CSS build process (with Tailwind CSS)
- Sidekiq worker for background jobs
The application will be available at http://localhost:3000
- Frontend: Built with Hotwired (Turbo + Stimulus) and Tailwind CSS
- Backend: Ruby on Rails 7.0
- Background Jobs: Sidekiq with Redis
- Testing: RSpec
The application includes several Python components for PDF processing:
- Site Crawler: Downloads PDF files and their metadata from government websites
- Document Classifier: Determines document types using ML
- Document Inference: LLM summary and exception check
- Evaluation: Automated LLM evaluation suite.
To set up the Python components, follow these instructions.
Also see "Setting API credentials" below.
Run the test suite:
bundle exec rails test:prepare
bundle exec rspec
The project includes several development tools:
- Brakeman: Security analysis (
bin/brakeman
) - RuboCop: Code style checking (
bin/rubocop
) - Overcommit: Git hooks management
- Better Errors: Enhanced error pages in development
- Bullet: N+1 query detection
For the Python components to use the Rails API, an API username and password must be provided in AWS secrets manager for production or in the Rails credential file for local development.
To set credentials for local development add an api_user
and api_password
configuration via editing the credentials file:
EDITOR="Your editor" rails credentials:edit --environment development
After setting the API credentials, navigate to the AI configuration screen (http://localhost:3000/configuration/edit) and save. This will set the secrets value in the LocalStack secretsmanager service.
Some basic API endpoints are currently provided.
-
GET /api/v1/sites
- Lists all sites
- Returns site details excluding user_id, created_at, updated_at
- Includes s3_endpoint for each site
-
GET /api/v1/sites/:id
- Retrieves a specific site
- Returns site details excluding user_id, created_at, updated_at
- Includes s3_endpoint
POST /api/v1/documents/inference
- Adds or updates a document inference record.
- Ideally used for storing AI results for documents.
When the database is set up (bin/rails db:setup), it is populated with a few sample sites, sandbox documents and an admin user. See documents.rake for details. Currently, the easiest way to create your own sites, documents and users would be via a custom rake task or by modifying the one provided. To scrape a list of your website's PDFs, check out the documentation in the python_components directory.
- If contributing in Ruby, ensure all tests pass and no new RuboCop violations are introduced. If contributing in Python, ensure all tests pass and that no new python linting violations are introduced.
- Update documentation as needed
- Follow the existing code style and conventions
This project's code is licensed under the Apache 2.0 License. All assets, such as images provided by Code for America are licensed under the CC-BY-4.0 license.