A modern, enterprise-grade document processing platform built with .NET 8 and Blazor that leverages AWS Bedrock AI for intelligent document analysis, classification, and metadata extraction.
- π€ AI-Powered Processing: Integrates with AWS Bedrock (Claude 3 models) for intelligent document analysis
- π Real-time Dashboard: Monitor document processing statistics, queue status, and system health
- π Smart Classification: Automatically categorize documents using AI-driven classification
- π Metadata Extraction: Extract and store structured metadata from unstructured documents
- β‘ Background Processing: Asynchronous document processing with queue management
- π Security-First: Built-in authentication with ASP.NET Core Identity
- π± Responsive UI: Modern Blazor Server-Side Rendering with Bootstrap 5
- π Real-time Updates: SignalR integration for live processing status updates
- π Analytics & Charts: Visual insights with Chart.js integration
The application follows Clean Architecture principles with clear separation of concerns:
DocumentProcessor/
βββ src/
β βββ DocumentProcessor.Core/ # Domain entities and interfaces
β βββ DocumentProcessor.Infrastructure/ # Data access, AI services, external integrations
β βββ DocumentProcessor.Application/ # Business logic and services
β βββ DocumentProcessor.Web/ # Blazor UI and API endpoints
βββ tests/
βββ DocumentProcessor.Tests/ # Unit and integration tests
- .NET 8.0 SDK or later
- SQL Server (LocalDB or full instance)
- AWS Account with Bedrock access (for AI features)
- Visual Studio 2022 or VS Code
-
Clone the repository
git clone https://github.com/yourusername/document-processor.git cd document-processor -
Configure AWS Credentials
Set up your AWS credentials using one of these methods:
- AWS CLI:
aws configure - Environment variables:
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY - IAM roles (for EC2 deployment)
- AWS CLI:
-
Configure Application Settings
Update
src/DocumentProcessor.Web/appsettings.json:{ "ConnectionStrings": { "DefaultConnection": "Server=(localdb)\\mssqllocaldb;Database=DocumentProcessorDB;Trusted_Connection=True;" }, "BedrockOptions": { "Region": "us-west-2", "ClassificationModelId": "anthropic.claude-3-haiku-20240307-v1:0", "ExtractionModelId": "anthropic.claude-3-sonnet-20240229-v1:0", "MaxTokens": 2000, "Temperature": 0.3 } } -
Set up the database
dotnet ef database update -p src/DocumentProcessor.Infrastructure -s src/DocumentProcessor.Web
-
Run the application
dotnet run --project src/DocumentProcessor.Web
-
Access the application
Navigate to
https://localhost:7266orhttp://localhost:5197
- Multi-format Support: PDF, DOCX, TXT, RTF, ODT, JPG, PNG, XLSX
- Drag-and-drop Upload: Intuitive file upload interface
- Batch Processing: Queue multiple documents for processing
- Document Viewer: Preview documents directly in the browser
- Search & Filter: Find documents by metadata, type, or content
- Intelligent Classification: Automatically categorize documents into predefined types
- Content Extraction: Extract text from various document formats including PDFs
- Metadata Generation: Create structured metadata from unstructured content
- Multi-model Support: Configurable AI models for different tasks:
- Classification: Claude 3 Haiku for fast categorization
- Extraction: Claude 3 Sonnet for detailed content analysis
- Summarization: Claude 3 Haiku for quick summaries
- Processing Statistics: Total documents, processed, queued, and failed counts
- Activity Charts: 7-day processing activity visualization
- Document Type Distribution: Doughnut chart showing document categories
- Queue Monitoring: Real-time processing queue status
- System Health: Monitor database, storage, and AI processor status
- Storage Usage: Track storage consumption with visual indicators
- Async Queue Processing: Non-blocking document processing
- Priority Management: Process documents based on priority levels
- Retry Logic: Automatic retry with exponential backoff
- Status Tracking: Real-time status updates via SignalR
- Auto-refresh: Dashboard updates every 10 seconds
- Backend:
- .NET 8 with C# 12
- Entity Framework Core 8
- ASP.NET Core Identity
- Frontend:
- Blazor Server-Side Rendering
- Bootstrap 5
- Chart.js
- Database:
- SQL Server
- Temporal tables for audit trails
- AI/ML:
- AWS Bedrock
- Claude 3 Haiku & Sonnet models
- Real-time:
- SignalR for live updates
- Document Processing:
- PdfPig for PDF extraction
- DocumentFormat.OpenXml for Office documents
- Background Jobs:
- IHostedService
- Custom Background Task Queue
src/
βββ DocumentProcessor.Core/ # Domain layer
β βββ Entities/ # Domain models
β β βββ Document.cs # Main document entity
β β βββ Classification.cs # Classification results
β β βββ DocumentMetadata.cs # Extracted metadata
β β βββ ProcessingQueue.cs # Queue management
β βββ Interfaces/ # Core contracts
β βββ IDocumentProcessor.cs
β βββ IAIProcessor.cs
β βββ IDocumentRepository.cs
β
βββ DocumentProcessor.Infrastructure/ # Infrastructure layer
β βββ AI/ # AI processing services
β β βββ BedrockAIProcessor.cs # AWS Bedrock integration
β β βββ DocumentContentExtractor.cs # Content extraction
β βββ Data/ # EF Core context
β β βββ ApplicationDbContext.cs
β βββ Repositories/ # Data access
β βββ BackgroundTasks/ # Queue processing
β
βββ DocumentProcessor.Application/ # Application layer
β βββ Services/ # Business logic
β βββ DocumentProcessingService.cs
β βββ BackgroundDocumentProcessingService.cs
β
βββ DocumentProcessor.Web/ # Presentation layer
βββ Components/ # Blazor components
β βββ Pages/ # Page components
β β βββ Dashboard.razor # Main dashboard
β β βββ DocumentUpload.razor # Upload interface
β β βββ DocumentList.razor # Document management
β βββ Layout/ # Layout components
βββ Hubs/ # SignalR hubs
βββ wwwroot/ # Static assets
Configure storage in appsettings.json:
{
"DocumentStorage": {
"Provider": "LocalFileSystem",
"LocalFileSystem": {
"RootPath": "uploads",
"MaxFileSizeInMB": 100,
"AllowedExtensions": [".pdf", ".doc", ".docx", ".txt", ".rtf", ".odt"]
},
"S3": {
"BucketName": "document-processor-bucket",
"Region": "us-east-1",
"UsePresignedUrls": true
},
"FileShare": {
"NetworkPath": "\\\\fileserver\\documents",
"MaxFileSizeInMB": 100
}
}
}{
"BedrockOptions": {
"Region": "us-west-2",
"ClassificationModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"ExtractionModelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"SummarizationModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"MaxTokens": 2000,
"Temperature": 0.3,
"TopP": 0.9,
"MaxRetries": 3,
"RetryDelayMilliseconds": 1000,
"EnableDetailedLogging": true,
"UseSimulatedResponses": false
}
}Run the test suite:
# Run all tests
dotnet test
# Run with coverage
dotnet test /p:CollectCoverage=true /p:CoverletOutputFormat=opencover
# Run specific test project
dotnet test tests/DocumentProcessor.Tests- Virtualization: Efficient rendering of large document lists
- Lazy Loading: Load data on demand
- Caching: In-memory caching for frequently accessed data
- Connection Pooling: Optimized database connections
- Async/Await: Non-blocking I/O operations throughout
- Batch Processing: Process multiple documents efficiently
- Optimized Queries: EF Core query optimization
- Authentication: ASP.NET Core Identity integration
- Role-based Access: Configurable user roles and permissions
- Input Validation: Comprehensive validation on all inputs
- File Type Validation: Whitelist-based file extension filtering
- Secure File Storage: Files stored outside web root
- SQL Injection Prevention: Parameterized queries via EF Core
- XSS Protection: Built-in Blazor security features
- CSRF Protection: Anti-forgery tokens
FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY ["src/DocumentProcessor.Web/DocumentProcessor.Web.csproj", "DocumentProcessor.Web/"]
RUN dotnet restore "DocumentProcessor.Web/DocumentProcessor.Web.csproj"
COPY . .
WORKDIR "/src/DocumentProcessor.Web"
RUN dotnet build "DocumentProcessor.Web.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "DocumentProcessor.Web.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "DocumentProcessor.Web.dll"]Deploy to AWS using Elastic Beanstalk or ECS:
# Using AWS CLI for Elastic Beanstalk
eb init -p docker document-processor
eb create production
eb deployThe application includes built-in monitoring capabilities:
- Health Checks:
/healthendpoint for monitoring - Logging: Structured logging with configurable levels
- Metrics: Processing statistics and system metrics
- Dashboard: Real-time monitoring via the web interface
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Follow C# coding conventions
- Write unit tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
- Add meaningful commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing issues before creating new ones
- Provide detailed information for bug reports
- Add support for additional AI providers (OpenAI, Azure OpenAI)
- Implement OCR for scanned documents
- Add batch export functionality
- Enhanced search with full-text search
- Document versioning and change tracking
- Multi-tenant support
- REST API for external integrations
- Mobile-responsive design improvements
- Workflow automation features
- Machine learning model training on classified documents
- Advanced analytics and reporting
- Plugin architecture for custom processors
Real-time dashboard with processing statistics, activity charts, and system health monitoring
Intuitive drag-and-drop interface for uploading documents with progress tracking
Comprehensive document list with status indicators and quick actions
Powerful search capabilities to find documents by various criteria
View and edit extracted metadata from processed documents
Additional documentation can be found in the /docs directory:
- Built with .NET 8
- AI powered by AWS Bedrock
- UI components from Bootstrap
- Charts by Chart.js
Built with β€οΈ using .NET 8 and AWS Bedrock AI
For more information, visit our documentation or contact the maintainers.