Excel Processor

AI-powered Excel file standardization for commodity classification

Transform unstructured Excel files into standardized formats with automatic column mapping and AI-driven commodity classification.

What it does

Input: Non-standardized Excel files with varying column names and formats
Output: Standardized Excel files with consistent schema and AI-classified commodities

Core features

Automatic column detection - Uses OpenAI to map your columns to standard fields
Smart commodity classification - AI embeddings classify products into categories
Level3 optimization - If your file already has LEVEL3_DESC, only determines scope (saves tokens)
Automatic AI correction - Analyzes and corrects commodity assignments for top EAR items
Remapping capability - Adjust column mappings and commodity classifications
Asynchronous processing - Upload and get notified when complete
Real-time status - Watch processing progress
Export functionality - Download processed files in multiple formats
PDF reports - Generate detailed analysis reports
Admin controls - File deletion and user management
Cross-reference lookup - Integration with external SQL Server database
Pagination - Handle large file lists efficiently

Supported formats

.xlsx (Excel 2007+)
.xls (Excel 97-2003)
.csv (Comma-separated values)

Quick start

Prerequisites

Ruby 3.2+
PostgreSQL (for main database)
SQL Server (for cross-reference lookups, optional with mock mode)
OpenAI API key

Installation

# Clone and setup
git clone 
cd excel-processor
bundle install

# Database setup
rails db:create db:migrate

# Configure credentials (required)
rails credentials:edit
# Add:
# openai:
#   api_key: your_openai_api_key
# sqlserver:
#   username: your_sql_username
#   password: your_sql_password

# Create database config (not tracked in git)
cp config/database.yml.example config/database.yml
# Edit config/database.yml with your database settings

# Optional: Enable mock mode for development
echo "MOCK_OPENAI=true" >> .env
echo "MOCK_SQL_SERVER=true" >> .env

# Start server
rails server

Basic usage

Upload commodity references (one-time setup)
- Go to /commodity_references/upload
- Upload CSV with: GLOBAL_COMM_CODE_DESC,LEVEL1_DESC,LEVEL2_DESC,LEVEL3_DESC,Infinex Scope Status
Process your Excel file
- Go to root URL
- Upload your Excel/CSV file
- Wait for processing (runs in background)
Download results
- Excel file: Standardized data with AI-generated Commodity and Scope columns
- PDF report: Detailed analysis and classification summary
- Export options: Multiple format support for downstream processing

How it works

Standard output schema

Every processed file gets these columns:

Column	Description	Type
`SFDC QUOTE NUMBER`	Item identifier	String
`ITEM`	Item code	String
`MFG_PARTNO`	Manufacturer part number	String
`GLOBAL_MFG_NAME`	Manufacturer name	String
`DESCRIPTION`	Item description	Text
`SITE`	Location/facility	String
`STD_COST`	Standard cost	Number
`LAST_PURCHASE_PRICE`	Last purchase price	Number
`LAST_PO`	Last PO price	Number
`EAU`	Estimated Annual Usage	Integer
`Commodity`	AI-generated classification	String
`Scope`	In scope / Out of scope	String

Processing logic

Column identification: OpenAI analyzes your headers and maps them
Commodity classification:
- If file has LEVEL3_DESC → use existing commodities, determine scope only
- If no LEVEL3_DESC → generate embeddings and find similar commodities
Scope determination: Multi-tier logic determines "In scope" or "Out of scope"
Cross-reference override: Items with SQL Server cross-references become "In scope"
Automatic correction: AI analyzes top EAR items and corrects commodity assignments with high confidence
Excel generation: Create standardized output file

Scope Determination Business Logic

The system uses a hierarchical approach to determine scope status:

1. Base Scope Determination

Commercial Mode (default): Uses infinex_scope_status from commodity references
Auto Mode (when enabled): Uses autograde_scope from commodity references if present, falls back to infinex_scope_status

2. Cross-Reference Override (Highest Priority)

Any MPN found in SQL Server cross-reference database automatically becomes "In scope"
Important: This override applies regardless of mode (Commercial or Auto)
Business Rule: Cross-references indicate confirmed vendor relationships

3. Mode-Specific Logic

Commercial Mode:

Scope = infinex_scope_status from commodity_references table
If cross-reference exists → Force "In scope"

Auto Mode (include_medical_auto_grades = true):

Scope = autograde_scope (if present) OR infinex_scope_status (fallback)
If cross-reference exists → Force "In scope"

4. Fallback Rules

Unknown commodities: Default to "Out of scope"
Missing scope data: Default to "Out of scope"
Insufficient context: Marked as "Requires Review"

5. Priority Order (Highest to Lowest)

Cross-reference override: Always "In scope" if MPN exists in cross-reference database
Auto Mode scope: Uses autograde_scope when Auto Mode enabled
Commercial scope: Uses infinex_scope_status as default/fallback
Default fallback: "Out of scope" for unknown items

AI optimization

Token savings: Files with existing LEVEL3_DESC skip embedding generation
Batch processing: Reduces API calls by processing items in groups
Embedding cache: Avoids regenerating embeddings for similar descriptions
Automatic correction: Improves accuracy by analyzing highest revenue items
Conservative corrections: Only applies changes with high AI confidence
Smart fallbacks: Handles unknown commodities gracefully

Automatic AI Correction

The system automatically analyzes and corrects commodity assignments for items with the highest business impact:

How it works

Target identification: After processing, identifies top items by Estimated Annual Revenue (EAR)
AI analysis: Uses same detailed analysis as manual review but returns structured JSON
Conservative correction: Only applies changes when AI has high confidence
Background processing: Runs automatically without user intervention

Configuration

# app/models/excel_processor_config.rb
TOP_EAR_ANALYSIS_COUNT = 1  # Number of top EAR items to analyze

Correction criteria

High confidence required: Only applies corrections with "high" confidence level
Evidence-based: Analyzes MPN patterns, descriptions, and manufacturer data
Similarity matching: Compares against top 5 most similar commodities
Conservative approach: Prefers false negatives over false positives

Logging

All automatic corrections are logged with details:

Original vs. new commodity assignment
AI confidence level and reasoning
Evidence supporting the decision

Remapping

Fix incorrect mappings without re-uploading:

Access remap page from processed file details
Adjust column mappings - Choose from all available source columns
Change commodities - Bulk update commodity classifications
Reprocess - Apply changes and get updated file

Configuration

Environment setup

# config/credentials.yml.enc (encrypted, safe to commit)
openai:
  api_key: your_openai_api_key
sqlserver:
  username: your_sql_username  
  password: your_sql_password

# config/database.yml (NOT tracked in git)
default: &default
  adapter: postgresql
  encoding: unicode
  username: your_pg_username
  password: your_pg_password
  host: localhost

# .env (optional, for development)
MOCK_OPENAI=true          # Use mock OpenAI responses
MOCK_SQL_SERVER=true      # Use mock SQL Server data
OPENAI_SSL_BYPASS=true    # Skip SSL verification if needed

Models used

text-embedding-3-small - For commodity similarity matching
gpt-4-turbo - For column identification and analysis

Performance tuning

# config/environments/production.rb
config.active_job.queue_adapter = :async

Testing

Simple integration tests that verify end-to-end functionality:

# Run tests
rails test

# Specific test files
rails test test/integration/file_upload_integration_test.rb
rails test test/integration/file_remap_integration_test.rb

Tests simulate OpenAI responses to avoid API costs and ensure reliability.

Architecture decisions

Technical decisions made

NO Redis - Keep infrastructure simple, avoid external dependencies
NO Sidekiq - Use Active Job with async adapter (Rails native)
NO Hotwire - Maintain simplicity with traditional request/response
NO bundle exec - Use bundle directly, ignore psych warnings
YES Active Storage - Store original files for remapping capability
YES PostgreSQL - JSONB support for embeddings and column mappings
YES Tailwind CSS - Manual setup, no cssbundling-rails complexity
YES Kaminari - Simple pagination without extra configuration

Why these choices

Rails 7.1: Mature, productive framework
PostgreSQL: JSONB support for embeddings and metadata
Active Storage: Built-in file handling
Active Job: Async processing without external dependencies
Tailwind CSS: Utility-first styling
No Redis/Sidekiq: Keep infrastructure simple
No Hotwire: Maintain simplicity with traditional requests

OpenAI optimization decisions

Batch processing - Process items in groups to reduce API calls
Embedding cache - In-memory cache to avoid duplicate API requests
Character limits - Truncate long descriptions to save tokens
LEVEL3_DESC detection - Skip AI classification when exact commodity exists
Scope-only processing - When commodities exist, only determine scope
Automatic correction - Target highest EAR items for maximum business impact
High confidence threshold - Only apply corrections with high AI certainty

File processing flow

Upload → Active Storage → Background Job → OpenAI API → Database → Auto-Correction → Excel Generation

Data storage

ProcessedFile: Metadata, status, column mappings
ProcessedItem: Individual row data with AI classifications
CommodityReference: Classification reference database

Known limitations

Large files: Processing time scales with file size and AI API latency
OpenAI dependency: Requires internet connection and valid API key
Column variety: Works best with recognizable English column names
Single tenant: No multi-user/organization support

Troubleshooting

Common issues

Processing fails immediately

Check OpenAI API key in credentials
Verify file format is supported

Commodity references missing

Upload reference CSV file first
Ensure CSV has required columns

Column mapping incorrect

Use remap functionality to adjust
Ensure source columns have clear names

Memory issues with large files

Increase server memory
Process files in smaller batches

Getting help

Check logs: tail -f log/development.log
Rails console: rails console → ProcessedFile.last.status
Reset database: rails db:reset

Contributing

This is a focused application for a specific use case. If you want to contribute:

Keep it simple - no over-engineering
Test your changes
Update documentation
Follow existing patterns

License

MIT License - Use freely for commercial and personal projects.

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
storage		storage
test		test
tmp		tmp
vendor		vendor
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.rspec		.rspec
.ruby-version		.ruby-version
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile.dev		Procfile.dev
README.md		README.md
README_MOCK_SETUP.md		README_MOCK_SETUP.md
Rakefile		Rakefile
config.ru		config.ru
demo_logs.sh		demo_logs.sh
package.json		package.json
postcss.config.js		postcss.config.js
sed_items		sed_items
tailwind.config.js		tailwind.config.js
test_change1.rb		test_change1.rb
test_change2.rb		test_change2.rb
test_change3.rb		test_change3.rb
yarn.lock		yarn.lock

JuanFLex/excel_processor

Folders and files

Latest commit

History

Repository files navigation

Excel Processor

What it does

Core features

Supported formats

Quick start

Prerequisites

Installation

Basic usage

How it works

Standard output schema

Processing logic

Scope Determination Business Logic

1. Base Scope Determination

2. Cross-Reference Override (Highest Priority)

3. Mode-Specific Logic

4. Fallback Rules

5. Priority Order (Highest to Lowest)

AI optimization

Automatic AI Correction

How it works

Configuration

Correction criteria

Logging

Remapping

Configuration

Environment setup

Models used

Performance tuning

Testing

Architecture decisions

Technical decisions made

Why these choices

OpenAI optimization decisions

File processing flow

Data storage

Known limitations

Troubleshooting

Common issues

Getting help

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages