Skip to content

BuzzGoMax/excel-to-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Excel To CSV

📊 Convert Excel files (XLSX, XLS) to CSV format. Handle multiple sheets, large files, and batch processing. Perfect for ETL pipelines and data integration.

Apify Actor License: MIT

🎯 What This Actor Does

A robust Excel to CSV converter that:

  • Converts XLSX and XLS files to standard CSV format
  • Handles multiple sheets - extract all or specific sheets
  • Processes large files - memory-efficient streaming
  • Preserves data types - dates, numbers, text formatted correctly
  • Batch processing - convert multiple files at once

🚀 Use Cases

Use Case Description
ETL Pipelines Transform Excel exports for data warehouses
Data Migration Convert legacy Excel databases
API Integration Excel → CSV → JSON for APIs
Reporting Standardize financial reports
Data Analysis Prepare data for Pandas, R, SQL
Automation Process daily Excel email attachments

📥 Input Options

Upload Directly

Drag and drop your Excel file in the Apify Console.

Provide URL

{
    "fileUrl": "https://example.com/report.xlsx"
}

Batch Processing

{
    "fileUrls": [
        "https://example.com/report-q1.xlsx",
        "https://example.com/report-q2.xlsx",
        "https://example.com/report-q3.xlsx"
    ]
}

⚙️ Configuration

File Input

Parameter Type Description
file string Upload file directly
fileUrl string URL to Excel file
fileUrls array Multiple file URLs

Sheet Selection

Parameter Type Default Description
allSheets boolean true Convert all sheets
sheets array [] Specific sheet names or indices

CSV Options

Parameter Type Default Description
delimiter string , Field separator: , ; \t `
includeHeaders boolean true First row is headers
dateFormat string YYYY-MM-DD Date formatting (dayjs)
skipEmptyRows boolean true Omit blank rows

Output Options

Parameter Type Default Description
outputToDataset boolean false Also push rows as JSON

📤 Output

Dataset (Metadata)

{
    "fileName": "sales-report.xlsx",
    "sheetName": "Q1 Sales",
    "sheetIndex": 0,
    "rowCount": 1523,
    "columnCount": 12,
    "csvUrl": "https://api.apify.com/v2/key-value-stores/.../records/sales_q1.csv",
    "status": "success",
    "convertedAt": "2024-01-15T10:30:00.000Z"
}

Key-Value Store (CSV Files)

Download CSV files directly from the Key-Value Store.

🚀 Quick Start

Using Apify Console

  1. Upload your Excel file or enter URL
  2. Configure sheet and CSV options
  3. Click Start
  4. Download CSVs from StorageKey-Value Store

Using API

curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~excel-to-csv/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "fileUrl": "https://example.com/data.xlsx",
    "allSheets": true,
    "delimiter": ","
  }'

Using JavaScript

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });

const run = await client.actor('YOUR_USERNAME/excel-to-csv').call({
    fileUrl: 'https://example.com/quarterly-report.xlsx',
    allSheets: true,
    delimiter: ',',
    dateFormat: 'YYYY-MM-DD'
});

// Get conversion results
const { items } = await client.dataset(run.defaultDatasetId).listItems();

for (const item of items) {
    console.log(`Sheet: ${item.sheetName}`);
    console.log(`Rows: ${item.rowCount}`);
    console.log(`Download: ${item.csvUrl}`);
}

Using Python

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient('YOUR_TOKEN')

# Convert Excel to CSV
run = client.actor('YOUR_USERNAME/excel-to-csv').call(run_input={
    'fileUrl': 'https://example.com/data.xlsx'
})

# Get CSV URLs
items = client.dataset(run['defaultDatasetId']).list_items().items

# Load into pandas
for item in items:
    if item['status'] == 'success':
        df = pd.read_csv(item['csvUrl'])
        print(f"Loaded {item['sheetName']}: {len(df)} rows")

💡 Advanced Examples

Extract Specific Sheets

{
    "fileUrl": "https://example.com/workbook.xlsx",
    "allSheets": false,
    "sheets": ["Summary", "Data", "0"]
}

Note: You can use sheet names or zero-based indices

European CSV Format

{
    "fileUrl": "https://example.com/report.xlsx",
    "delimiter": ";",
    "dateFormat": "DD.MM.YYYY"
}

Output as JSON Dataset

{
    "fileUrl": "https://example.com/customers.xlsx",
    "outputToDataset": true,
    "includeHeaders": true
}

This adds each row as a JSON object to the Dataset:

{
    "_sheet": "Customers",
    "_file": "customers.xlsx",
    "Name": "John Doe",
    "Email": "john@example.com",
    "SignupDate": "2024-01-15"
}

📊 Supported Formats

Format Extension Support
Excel 2007+ .xlsx ✅ Full
Excel 97-2003 .xls ✅ Full
OpenDocument .ods ✅ Full
CSV (input) .csv ✅ Full
Numbers .numbers ⚠️ Limited

💰 Cost Estimation

File Size Sheets Approx. Time Compute Units
1 MB 3 ~5 seconds ~0.002
10 MB 5 ~15 seconds ~0.008
50 MB 10 ~45 seconds ~0.03
100 MB 20 ~2 minutes ~0.08

🔧 Technical Details

  • Node.js: 22.x
  • Library: SheetJS (xlsx)
  • Max File Size: ~200MB recommended
  • Memory: 512MB-2GB depending on file size

⚠️ Limitations

  • Formulas: Values only (not formula text)
  • Formatting: Lost in CSV conversion
  • Merged Cells: Unmerged, value in first cell
  • Images/Charts: Not extracted
  • Password Protected: Not supported

❓ FAQ

How are dates handled?

Dates are converted using the dateFormat parameter (default: YYYY-MM-DD). Uses dayjs formatting.

What about number formatting?

Numbers are extracted as raw values. Currency symbols and formatting are removed.

Can I convert password-protected files?

No, password-protected Excel files are not currently supported.

What's the maximum file size?

Recommended max is ~200MB. Larger files may timeout or run out of memory.

🔗 Integration Pipeline

// 1. Fetch Excel from email/S3/API
const excelUrl = await fetchLatestReport();

// 2. Convert to CSV
const convertRun = await client.actor('YOUR_USERNAME/excel-to-csv').call({
    fileUrl: excelUrl,
    outputToDataset: true
});

// 3. Load into database
const { items } = await client.dataset(convertRun.defaultDatasetId).listItems();
await database.insertMany(items);

// 4. Notify completion
await sendSlackNotification(`Imported ${items.length} rows`);

📄 License

MIT License - see LICENSE for details.

About

Apify actor: excel-to-csv

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors