Skip to content

EngrIBGIT/file_processor_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

File Format to JSON Converter with Schema Validation

A production-ready FastAPI service that converts various file formats into JSON format with intelligent database schema validation and mapping.

πŸš€ Features

  • Multi-Format Support: Process CSV, PDF, TXT, DOCX, PPTX, XLSX, HTML, Python files, and Images
  • Schema Validation: Automatic validation against 8+ database schemas
  • Intelligent Mapping: Smart data extraction and field mapping
  • Industry Recognition: Auto-detection of 15+ industry types
  • Data Transformation: Automatic normalization of phone numbers, emails, currencies
  • RESTful API: Clean, documented API endpoints
  • Production Ready: Error handling, logging, and comprehensive testing

πŸ“‹ Supported Database Schemas

  1. business_data - Complete business profile with products/services
  2. customer_profile - Customer information and behavior tracking
  3. conversation_data - Chat conversation history and analytics
  4. appointment_data - Appointment booking information
  5. embedding_data - Vector embeddings for RAG systems
  6. feedback_data - Customer feedback and ratings
  7. escalation_data - Issue escalation tracking
  8. token_usage_data - API token usage and cost tracking

πŸ—οΈ Supported Industries

  • E-commerce
  • Healthcare
  • Real Estate
  • Restaurants
  • Education
  • Financial Services
  • Travel & Hospitality
  • Events & Entertainment
  • Logistics & Delivery
  • Professional Services
  • Beauty & Wellness
  • Enterprise Telecoms
  • Enterprise Banking
  • Manufacturing & FMCG
  • Retail Chains

πŸ“¦ Installation

# Clone the repository
git clone <repository-url>
cd file_processor_project

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🎯 Quick Start

1. Start the Server

python -m app.main

The API will be available at http://localhost:8005

2. Process a File (No Schema)

curl -X POST "http://localhost:8005/process-file/" \
  -F "file=@business_data.csv"

3. Process with Schema Validation

curl -X POST "http://localhost:8005/process-file/?target_schema=business_data" \
  -F "file=@business_data.csv"

πŸ“š API Documentation

Endpoints

POST /process-file/

Process an uploaded file with optional schema mapping.

Parameters:

  • file (form-data, required): The file to process
  • target_schema (query, optional): Target database schema

Example Response (business_data schema):

{
  "business_profile": {
    "business_name": "Tech Solutions Ltd",
    "industry": "professional_services",
    "description": "Leading IT consulting firm",
    "contact_info": {
      "email": "info@techsolutions.com",
      "phone": "+2348012345678",
      "website": "https://techsolutions.com",
      "address": "123 Lagos Street"
    }
  },
  "products_services": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "IT Consulting",
      "description": "Expert IT consulting services",
      "price": 50000.0,
      "currency": "NGN",
      "category": "Services",
      "availability": true
    }
  ],
  "faqs": [],
  "policies": null,
  "schema_applied": "business_data"
}

GET /schemas/

Get list of supported database schemas.

GET /health

Health check endpoint.

GET /

API information and available endpoints.

πŸ§ͺ Testing

Run Schema Mapping Tests

python tests/test_schema_mapping.py

Run API Tests

# Make sure the server is running first
python tests/test_api.py

Sample Test Files

The test suite automatically creates sample files:

  • business_data.csv - Business information in CSV format
  • company_info.txt - Business details in text format
  • products.json - Products/services in JSON format

πŸ’‘ Usage Examples

Python Client Example

import requests

# Process file without schema
with open("business_info.csv", "rb") as f:
    files = {"file": f}
    response = requests.post(
        "http://localhost:8005/process-file/",
        files=files
    )
    print(response.json())

# Process with business_data schema
with open("business_info.csv", "rb") as f:
    files = {"file": f}
    params = {"target_schema": "business_data"}
    response = requests.post(
        "http://localhost:8005/process-file/",
        files=files,
        params=params
    )
    data = response.json()
    print(f"Business: {data['business_profile']['business_name']}")
    print(f"Products: {len(data['products_services'])}")

JavaScript/Fetch Example

const formData = new FormData();
formData.append('file', fileInput.files[0]);

fetch('http://localhost:8005/process-file/?target_schema=business_data', {
  method: 'POST',
  body: formData
})
.then(response => response.json())
.then(data => {
  console.log('Business:', data.business_profile.business_name);
  console.log('Products:', data.products_services.length);
})
.catch(error => console.error('Error:', error));

πŸ”§ Configuration

Supported File Extensions

  • CSV: .csv
  • PDF: .pdf
  • Text: .txt
  • Word: .docx
  • PowerPoint: .pptx
  • Excel: .xlsx, .xls
  • HTML: .html, .htm
  • Python: .py
  • Images: .jpg, .jpeg, .png, .gif, .bmp

Environment Variables

# Server configuration
HOST=0.0.0.0
PORT=8005

# File upload limits
MAX_FILE_SIZE=10485760  # 10MB

# Enable debug mode
DEBUG=False

πŸ—οΈ Project Structure

file_processor_project/ β”œβ”€β”€ app/ β”‚ β”œβ”€β”€ api/ # API endpoints β”‚ β”œβ”€β”€ core/ # Core processing logic β”‚ β”œβ”€β”€ schemas/ # Database schemas β”‚ β”œβ”€β”€ utils/ # Utility functions β”‚ └── main.py # Application entry point β”œβ”€β”€ tests/ # Test files β”œβ”€β”€ requirements.txt # Dependencies └── README.md # Documentation

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

MIT License

πŸ“ž Support

For issues and questions:

πŸ”„ Version History

v2.0.0 (Current)

  • Added schema validation and mapping
  • Support for 8 database schemas
  • Intelligent industry detection
  • Enhanced data transformation
  • Comprehensive testing suite

v1.0.0

  • Initial release
  • Basic file processing
  • Multi-format support

This Version Key Improvements Summary

  1. Schema Validation: All extracted data is validated against Pydantic models

  2. Intelligent Mapping: Smart extraction of business info, products, contacts

  3. Industry Detection: Automatic industry classification from content

  4. Data Transformation: Phone numbers, emails, currencies normalized

  5. Flexible API: Optional schema parameter for targeted mapping

  6. Comprehensive Testing: Unit tests and integration tests included

  7. Production Ready: Error handling, validation, and documentation

About

File Format to JSON Converter with Schema Validation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published