A production-ready FastAPI service that converts various file formats into JSON format with intelligent database schema validation and mapping.
- Multi-Format Support: Process CSV, PDF, TXT, DOCX, PPTX, XLSX, HTML, Python files, and Images
- Schema Validation: Automatic validation against 8+ database schemas
- Intelligent Mapping: Smart data extraction and field mapping
- Industry Recognition: Auto-detection of 15+ industry types
- Data Transformation: Automatic normalization of phone numbers, emails, currencies
- RESTful API: Clean, documented API endpoints
- Production Ready: Error handling, logging, and comprehensive testing
- business_data - Complete business profile with products/services
- customer_profile - Customer information and behavior tracking
- conversation_data - Chat conversation history and analytics
- appointment_data - Appointment booking information
- embedding_data - Vector embeddings for RAG systems
- feedback_data - Customer feedback and ratings
- escalation_data - Issue escalation tracking
- token_usage_data - API token usage and cost tracking
- E-commerce
- Healthcare
- Real Estate
- Restaurants
- Education
- Financial Services
- Travel & Hospitality
- Events & Entertainment
- Logistics & Delivery
- Professional Services
- Beauty & Wellness
- Enterprise Telecoms
- Enterprise Banking
- Manufacturing & FMCG
- Retail Chains
# Clone the repository
git clone <repository-url>
cd file_processor_project
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython -m app.mainThe API will be available at http://localhost:8005
curl -X POST "http://localhost:8005/process-file/" \
-F "file=@business_data.csv"curl -X POST "http://localhost:8005/process-file/?target_schema=business_data" \
-F "file=@business_data.csv"Process an uploaded file with optional schema mapping.
Parameters:
file(form-data, required): The file to processtarget_schema(query, optional): Target database schema
Example Response (business_data schema):
{
"business_profile": {
"business_name": "Tech Solutions Ltd",
"industry": "professional_services",
"description": "Leading IT consulting firm",
"contact_info": {
"email": "info@techsolutions.com",
"phone": "+2348012345678",
"website": "https://techsolutions.com",
"address": "123 Lagos Street"
}
},
"products_services": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "IT Consulting",
"description": "Expert IT consulting services",
"price": 50000.0,
"currency": "NGN",
"category": "Services",
"availability": true
}
],
"faqs": [],
"policies": null,
"schema_applied": "business_data"
}Get list of supported database schemas.
Health check endpoint.
API information and available endpoints.
python tests/test_schema_mapping.py# Make sure the server is running first
python tests/test_api.pyThe test suite automatically creates sample files:
business_data.csv- Business information in CSV formatcompany_info.txt- Business details in text formatproducts.json- Products/services in JSON format
import requests
# Process file without schema
with open("business_info.csv", "rb") as f:
files = {"file": f}
response = requests.post(
"http://localhost:8005/process-file/",
files=files
)
print(response.json())
# Process with business_data schema
with open("business_info.csv", "rb") as f:
files = {"file": f}
params = {"target_schema": "business_data"}
response = requests.post(
"http://localhost:8005/process-file/",
files=files,
params=params
)
data = response.json()
print(f"Business: {data['business_profile']['business_name']}")
print(f"Products: {len(data['products_services'])}")const formData = new FormData();
formData.append('file', fileInput.files[0]);
fetch('http://localhost:8005/process-file/?target_schema=business_data', {
method: 'POST',
body: formData
})
.then(response => response.json())
.then(data => {
console.log('Business:', data.business_profile.business_name);
console.log('Products:', data.products_services.length);
})
.catch(error => console.error('Error:', error));- CSV:
.csv - PDF:
.pdf - Text:
.txt - Word:
.docx - PowerPoint:
.pptx - Excel:
.xlsx,.xls - HTML:
.html,.htm - Python:
.py - Images:
.jpg,.jpeg,.png,.gif,.bmp
# Server configuration
HOST=0.0.0.0
PORT=8005
# File upload limits
MAX_FILE_SIZE=10485760 # 10MB
# Enable debug mode
DEBUG=Falsefile_processor_project/ βββ app/ β βββ api/ # API endpoints β βββ core/ # Core processing logic β βββ schemas/ # Database schemas β βββ utils/ # Utility functions β βββ main.py # Application entry point βββ tests/ # Test files βββ requirements.txt # Dependencies βββ README.md # Documentation
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License
For issues and questions:
- Create an issue on GitHub
- Email: support@example.com
- Added schema validation and mapping
- Support for 8 database schemas
- Intelligent industry detection
- Enhanced data transformation
- Comprehensive testing suite
- Initial release
- Basic file processing
- Multi-format support
This Version Key Improvements Summary
-
Schema Validation: All extracted data is validated against Pydantic models
-
Intelligent Mapping: Smart extraction of business info, products, contacts
-
Industry Detection: Automatic industry classification from content
-
Data Transformation: Phone numbers, emails, currencies normalized
-
Flexible API: Optional schema parameter for targeted mapping
-
Comprehensive Testing: Unit tests and integration tests included
-
Production Ready: Error handling, validation, and documentation