File Format to JSON Converter with Schema Validation

A production-ready FastAPI service that converts various file formats into JSON format with intelligent database schema validation and mapping.

🚀 Features

Multi-Format Support: Process CSV, PDF, TXT, DOCX, PPTX, XLSX, HTML, Python files, and Images
Schema Validation: Automatic validation against 8+ database schemas
Intelligent Mapping: Smart data extraction and field mapping
Industry Recognition: Auto-detection of 15+ industry types
Data Transformation: Automatic normalization of phone numbers, emails, currencies
RESTful API: Clean, documented API endpoints
Production Ready: Error handling, logging, and comprehensive testing

📋 Supported Database Schemas

business_data - Complete business profile with products/services
customer_profile - Customer information and behavior tracking
conversation_data - Chat conversation history and analytics
appointment_data - Appointment booking information
embedding_data - Vector embeddings for RAG systems
feedback_data - Customer feedback and ratings
escalation_data - Issue escalation tracking
token_usage_data - API token usage and cost tracking

🏗️ Supported Industries

E-commerce
Healthcare
Real Estate
Restaurants
Education
Financial Services
Travel & Hospitality
Events & Entertainment
Logistics & Delivery
Professional Services
Beauty & Wellness
Enterprise Telecoms
Enterprise Banking
Manufacturing & FMCG
Retail Chains

📦 Installation

# Clone the repository
git clone <repository-url>
cd file_processor_project

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🎯 Quick Start

1. Start the Server

python -m app.main

The API will be available at http://localhost:8005

2. Process a File (No Schema)

curl -X POST "http://localhost:8005/process-file/" \
  -F "file=@business_data.csv"

3. Process with Schema Validation

curl -X POST "http://localhost:8005/process-file/?target_schema=business_data" \
  -F "file=@business_data.csv"

📚 API Documentation

Endpoints

`POST /process-file/`

Process an uploaded file with optional schema mapping.

Parameters:

file (form-data, required): The file to process
target_schema (query, optional): Target database schema

Example Response (business_data schema):

{
  "business_profile": {
    "business_name": "Tech Solutions Ltd",
    "industry": "professional_services",
    "description": "Leading IT consulting firm",
    "contact_info": {
      "email": "info@techsolutions.com",
      "phone": "+2348012345678",
      "website": "https://techsolutions.com",
      "address": "123 Lagos Street"
    }
  },
  "products_services": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "IT Consulting",
      "description": "Expert IT consulting services",
      "price": 50000.0,
      "currency": "NGN",
      "category": "Services",
      "availability": true
    }
  ],
  "faqs": [],
  "policies": null,
  "schema_applied": "business_data"
}

`GET /schemas/`

Get list of supported database schemas.

`GET /health`

Health check endpoint.

`GET /`

API information and available endpoints.

🧪 Testing

Run Schema Mapping Tests

python tests/test_schema_mapping.py

Run API Tests

# Make sure the server is running first
python tests/test_api.py

Sample Test Files

The test suite automatically creates sample files:

business_data.csv - Business information in CSV format
company_info.txt - Business details in text format
products.json - Products/services in JSON format

💡 Usage Examples

Python Client Example

import requests

# Process file without schema
with open("business_info.csv", "rb") as f:
    files = {"file": f}
    response = requests.post(
        "http://localhost:8005/process-file/",
        files=files
    )
    print(response.json())

# Process with business_data schema
with open("business_info.csv", "rb") as f:
    files = {"file": f}
    params = {"target_schema": "business_data"}
    response = requests.post(
        "http://localhost:8005/process-file/",
        files=files,
        params=params
    )
    data = response.json()
    print(f"Business: {data['business_profile']['business_name']}")
    print(f"Products: {len(data['products_services'])}")

JavaScript/Fetch Example

const formData = new FormData();
formData.append('file', fileInput.files[0]);

fetch('http://localhost:8005/process-file/?target_schema=business_data', {
  method: 'POST',
  body: formData
})
.then(response => response.json())
.then(data => {
  console.log('Business:', data.business_profile.business_name);
  console.log('Products:', data.products_services.length);
})
.catch(error => console.error('Error:', error));

🔧 Configuration

Supported File Extensions

CSV: .csv
PDF: .pdf
Text: .txt
Word: .docx
PowerPoint: .pptx
Excel: .xlsx, .xls
HTML: .html, .htm
Python: .py
Images: .jpg, .jpeg, .png, .gif, .bmp

Environment Variables

# Server configuration
HOST=0.0.0.0
PORT=8005

# File upload limits
MAX_FILE_SIZE=10485760  # 10MB

# Enable debug mode
DEBUG=False

🏗️ Project Structure

file_processor_project/ ├── app/ │ ├── api/ # API endpoints │ ├── core/ # Core processing logic │ ├── schemas/ # Database schemas │ ├── utils/ # Utility functions │ └── main.py # Application entry point ├── tests/ # Test files ├── requirements.txt # Dependencies └── README.md # Documentation

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License

📞 Support

For issues and questions:

Create an issue on GitHub
Email: support@example.com

🔄 Version History

v2.0.0 (Current)

Added schema validation and mapping
Support for 8 database schemas
Intelligent industry detection
Enhanced data transformation
Comprehensive testing suite

v1.0.0

Initial release
Basic file processing
Multi-format support

This Version Key Improvements Summary

Schema Validation: All extracted data is validated against Pydantic models
Intelligent Mapping: Smart extraction of business info, products, contacts
Industry Detection: Automatic industry classification from content
Data Transformation: Phone numbers, emails, currencies normalized
Flexible API: Optional schema parameter for targeted mapping
Comprehensive Testing: Unit tests and integration tests included
Production Ready: Error handling, validation, and documentation

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
quick_test_files		quick_test_files
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

EngrIBGIT/file_processor_model

Folders and files

Latest commit

History

Repository files navigation

File Format to JSON Converter with Schema Validation

🚀 Features

📋 Supported Database Schemas

🏗️ Supported Industries

📦 Installation

🎯 Quick Start

1. Start the Server

2. Process a File (No Schema)

3. Process with Schema Validation

📚 API Documentation

Endpoints

POST /process-file/

GET /schemas/

GET /health

GET /

🧪 Testing

Run Schema Mapping Tests

Run API Tests

Sample Test Files

💡 Usage Examples

Python Client Example

JavaScript/Fetch Example

🔧 Configuration

Supported File Extensions

Environment Variables

🏗️ Project Structure

🤝 Contributing

📄 License

📞 Support

🔄 Version History

v2.0.0 (Current)

v1.0.0

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /process-file/`

`GET /schemas/`

`GET /health`

`GET /`

Packages