<a href="https://colab.research.google.com/github/deanopatoni/patoni/blob/main/Excel_to_CSV_Multi_Sheet_Converter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Excel to CSV Multi-Sheet Converter

This Python script provides a powerful yet simple way to convert Excel workbooks (XLSX files) into a single, well-organized CSV file.

## What It Does

* **Input:** Any Excel workbook (.xlsx file)
* **Output:** Single organized CSV file containing all sheets
* **Process:** Automatically extracts and combines all sheets while preserving their names and structure

## Key Features

* Preserves original sheet names as section headers in the CSV
* Shows preview of each sheet's data during processing
* Reports detailed statistics (rows, columns, memory usage)
* Handles large Excel files efficiently
* Provides error handling and progress updates
* Gives user control over final download

## How to Use

1. Run the script
2. Click 'Upload' when prompted
3. Select your Excel file
4. Review the data previews that appear
5. Type 'yes' when asked to download
6. Get your combined CSV file!

## Output Format

The resulting CSV will look like this:

```
--- Data from tab 'Sheet1' ---
column1,column2,column3
data,data,data
data,data,data

--- Data from tab 'Sheet2' ---
column1,column2,column3
data,data,data
data,data,data
```

## Common Use Cases

* Combining multiple Excel sheets into one file
* Converting Excel data for database imports
* Creating text-based backups of Excel workbooks
* Sharing data with CSV-only systems
* Analyzing multiple sheets of data together

## Requirements

* Python 3.x
* Required libraries: openpyxl, pandas
* Google Colab environment (for the upload/download functionality)

---
*Note: This script is designed to run in Google Colab and uses Colab's built-in file handling capabilities.*

In [2]:
import openpyxl
import pandas as pd
from google.colab import files
import time
import io
import csv
import logging

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Set pandas display options for better output readability
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.expand_frame_repr', False)

def write_to_csv(file_path, data, mode='w'):
    """
    Helper function to write data to CSV file.
    Args:
        file_path: Path to the output file
        data: List of rows to write to the CSV file
        mode: Mode for file writing ('w' for write, 'a' for append)
    """
    try:
        with open(file_path, mode, newline='', encoding='utf-8') as f:
            writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
            for row in data:
                writer.writerow(row)
    except Exception as e:
        logging.error(f"Error writing to CSV: {str(e)}")
        return False
    return True

def process_excel_sheet(sheet, sheet_name, sheet_number):
    """
    Process a single Excel sheet and return its data.
    Args:
        sheet: The Excel sheet object
        sheet_name: Name of the sheet
        sheet_number: 1-based index of the sheet
    """
    try:
        logging.info(f"Reading data from '{sheet_name}'...")

        data = list(sheet.values)
        if not data:
            logging.warning(f"No data found in tab '{sheet_name}'")
            return None

        # Extract column names and data
        columns = data[0]
        rows = data[1:]

        # Create a pandas DataFrame to easily handle data
        df = pd.DataFrame(rows, columns=columns)

        # Clean column names (strip any extra spaces)
        df.columns = df.columns.str.strip()

        # Insert sheet number and tab name columns at the beginning
        df.insert(0, 'Sheet_Number', sheet_number)
        df.insert(1, 'Tab', sheet_name)

        logging.info(f"Extracted {len(df)} rows and {len(df.columns)} columns from '{sheet_name}'")
        return df

    except Exception as e:
        logging.error(f"Error processing tab '{sheet_name}': {str(e)}")
        return None

def preview_dataframe(df, sheet_name, num_rows=5):
    """
    Display a preview of the DataFrame with basic statistics.
    Args:
        df: DataFrame to preview
        sheet_name: Name of the sheet
        num_rows: Number of rows to preview
    """
    total_columns = len(df.columns)
    preview_columns = min(10, total_columns)

    logging.info(f"\nPreview of '{sheet_name}' (first {num_rows} rows{', first 10 columns' if total_columns > 10 else ''}):")
    print(df.iloc[:num_rows, :preview_columns])

    if total_columns > 10:
        logging.info(f"\nNote: {total_columns - 10} additional columns not shown in preview")

    logging.info(f"\nTotal rows: {len(df)}")
    logging.info(f"Total columns: {total_columns}")
    logging.info(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024 / 1024:.2f} MB")
    logging.info("\n" + "="*50)

def save_to_csv(df, output_file, sheet_name, first_sheet=False):
    """
    Save DataFrame to CSV file with appropriate mode and header.
    """
    try:
        mode = 'w' if first_sheet else 'a'
        write_to_csv(output_file, [
            ['='*50],
            [f"Data from tab '{sheet_name}'"],
            ['='*50],
        ], mode)

        # Write the actual data
        df.to_csv(output_file, index=False, mode=mode, header=first_sheet, quoting=csv.QUOTE_NONNUMERIC)

    except Exception as e:
        logging.error(f"Error saving data for sheet '{sheet_name}': {str(e)}")
        return False
    return True

def write_summary_to_csv(output_file, processed_tabs, total_tabs, processing_time):
    """
    Write summary of processed tabs to the CSV file in a CSV-compatible format.
    """
    try:
        summary_data = [
            [],
            ['Processing Summary'],
            ['Metric', 'Value'],
            ['Total tabs in workbook', total_tabs],
            ['Processed tabs', len(processed_tabs)],
            ['Processing time (seconds)', f"{processing_time:.2f}"],
            [],
            ['Sheet Number', 'Tab Name'],
        ]

        for idx, tab in enumerate(processed_tabs, 1):
            summary_data.append([idx, tab])

        write_to_csv(output_file, summary_data, mode='a')

    except Exception as e:
        logging.error(f"Error writing summary: {str(e)}")
        return False
    return True

def process_excel_file(file_path, preview=True):
    """
    Process Excel file with optional preview and return success status.
    """
    logging.info(f"Opening workbook: {file_path}")
    start_time = time.time()

    try:
        workbook = openpyxl.load_workbook(file_path, read_only=True, data_only=True)
        output_file = "all_tabs_data.csv"

        processed_tabs = []
        failed_tabs = []
        total_tabs = len(workbook.sheetnames)

        for idx, sheet_name in enumerate(workbook.sheetnames):
            logging.info(f"\n--- Processing tab '{sheet_name}' ({idx+1}/{total_tabs}) ---")

            sheet = workbook[sheet_name]
            df = process_excel_sheet(sheet, sheet_name, idx + 1)  # 1-based sheet numbering
            if df is not None:
                if preview:
                    preview_dataframe(df, sheet_name)

                if save_to_csv(df, output_file, sheet_name, first_sheet=(idx==0)):
                    processed_tabs.append(sheet_name)
                else:
                    failed_tabs.append(sheet_name)

                logging.info(f"Finished processing tab '{sheet_name}'")
            else:
                failed_tabs.append(sheet_name)

        end_time = time.time()
        processing_time = end_time - start_time

        logging.info(f"\nTotal processing time: {processing_time:.2f} seconds")
        logging.info(f"\nProcessed tabs ({len(processed_tabs)}/{total_tabs}):")
        for tab in processed_tabs:
            logging.info(f"- {tab}")

        if failed_tabs:
            logging.warning(f"\nFailed tabs ({len(failed_tabs)}):")
            for tab in failed_tabs:
                logging.warning(f"- {tab}")

        # Write summary to CSV
        if write_summary_to_csv(output_file, processed_tabs, total_tabs, processing_time):
            return True, output_file
        else:
            return False, None

    except Exception as e:
        logging.error(f"Error processing workbook: {str(e)}")
        return False, None

def main():
    try:
        logging.info("Please upload your Excel file:")
        uploaded = files.upload()

        if not uploaded:
            logging.error("No file uploaded. Exiting.")
            return

        file_path = list(uploaded.keys())[0]
        success, output_file = process_excel_file(file_path)

        if success:
            download = input("\nDo you want to download the CSV file? (yes/no): ").lower().strip()
            while download not in ['yes', 'no']:
                download = input("Please enter 'yes' or 'no': ").lower().strip()

            if download == 'yes':
                files.download(output_file)
                logging.info(f"File '{output_file}' has been prepared for download.")
            else:
                logging.info("Skipping download.")
        else:
            logging.error("Failed to process data. Exiting.")

    except Exception as e:
        logging.error(f"An unexpected error occurred: {str(e)}")

if __name__ == "__main__":
    main()


Saving Book1.xlsx to Book1.xlsx
   Sheet_Number     Tab  Blank
0             1  Sheet1  Blank
   Sheet_Number     Tab Blank2
0             2  Sheet2  Blank
1             2  Sheet2  Blank
2             2  Sheet2  Blank
3             2  Sheet2  Blank
4             2  Sheet2  Blank

Do you want to download the CSV file? (yes/no): yes


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>