Skip to content

ChenAI-TGF/Table_Comparison_Program

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

Introduction

This is a GUI-based tool built with Python Tkinter for Excel/CSV table comparison and table cleaning. It addresses two core pain points in daily table processing:

  1. Comparing two tables to extract unique data (A has but B doesn't, B has but A doesn't) and common data, with full support for preserving images in Excel files.
  2. Cleaning tables where a single row of data is split across multiple rows (merging fragmented data based on a key column while retaining images).

The tool uses multi-threaded processing to avoid GUI freezes and supports both .xlsx (with image handling) and .csv formats.

Tool Interface Screenshot

Main Window of Table Compare & Clean Tool Figure: The main interface of the tool, showing file upload, comparison configuration, table cleaning, and progress bar areas.

Features

1. Table Comparison

  • Upload two Excel/CSV files (File A and File B)
  • Select specific sheets and comparison columns for each file (supports cross-column comparison)
  • Extract three types of results:
    • Data unique to File A (A has, B doesn't)
    • Data unique to File B (B has, A doesn't)
    • Common data (exists in both files), exportable in either A/B table format
  • Preserve images in Excel files during comparison and export
  • Real-time progress bar for comparison tasks

2. Table Cleaning

  • Fix the issue where a single row of data occupies multiple rows in a table
  • Define a "key column" (non-empty value = start of a new row) to merge fragmented data
  • Preserve original images and map them to correct rows in the cleaned table
  • Export cleaned tables with both merged data and original images

3. General Features

  • Support for .xlsx (with image handling) and .csv (no image support) formats
  • User-friendly GUI with clear file info display (row/column count, image count)
  • Multi-threaded processing to prevent GUI freezing
  • Automatic Chinese font configuration (Windows) to avoid garbled text
  • Safe image copying and anchor mapping to ensure correct position in exported files

Requirements

  • Python 3.7+
  • Required Python packages:
    • pandas (data processing)
    • openpyxl (Excel image handling and file I/O)
    • tkinter (GUI, usually pre-installed with Python)

Installation

Install the required dependencies using pip:

pip install pandas openpyxl

Note: Tkinter is included with standard Python installations on Windows/macOS. For Linux, install it via sudo apt-get install python3-tk (Debian/Ubuntu) or equivalent.

Usage

1. Run the Tool

Execute the script directly:

python main.py

2. Table Comparison Workflow

  1. Upload Files: Click "选择文件" (Select File) for File A and File B respectively
  2. View File Info: The tool displays file format, sheet count, row/column count, and image count
  3. Configure Comparison:
    • Select target sheets from the dropdowns for File A/File B
    • Select comparison columns (the column used to judge data uniqueness)
  4. Start Comparison: Click "开始比对" (Start Comparison)
  5. Export Results: After completion, use the export buttons to save:
    • "导出A有B无的数据" (Export A-only data)
    • "导出B有A无的数据" (Export B-only data)
    • "导出共有数据" (Export common data, choose A/B format)

3. Table Cleaning Workflow

  1. Upload File: Click "选择文件" (Select File) in the "表格整理功能" section
  2. Configure Cleaning:
    • Select the target sheet from the dropdown
    • Select a "新行判断列" (New Row Judgment Column) (non-empty value = new row)
  3. Start Cleaning: Click "开始整理表格" (Start Table Cleaning)
  4. Export Cleaned Data: Click "导出整理后表格" (Export Cleaned Table) to save the merged data

Notes

  1. Image Support: Only .xlsx files support images – CSV files do not store images, so image-related features are disabled for CSV.
  2. Row Number Mapping: Excel rows are 1-based, while pandas DataFrames are 0-based. The tool automatically maps image positions (Excel row = DataFrame row + 2, accounting for headers).
  3. Multi-threading: Comparison/cleaning tasks run in background threads to keep the GUI responsive.
  4. Data Type Handling: The tool converts comparison column values to strings and fills NaN values to avoid type mismatch errors.

Troubleshooting

  1. Garbled Chinese Text: The tool automatically sets the "SimHei" font for Windows. For other OS, manually configure a Chinese-supported font if needed.
  2. File Read Errors: Ensure the file is not open in another program (e.g., Excel) and the format is valid (.xlsx/.csv).
  3. Image Export Issues: Corrupted images in the original Excel file may cause export failures – check the image integrity first.

中文使用说明(表格比对&整理工具)

工具简介

这是一款基于Python Tkinter开发的GUI桌面工具,专注于Excel/CSV表格比对表格整理两大核心功能,解决日常表格处理中的常见痛点:

  1. 比对两个表格,提取A有B无、B有A无的唯一数据,以及共有数据,全程支持保留Excel文件中的图片。
  2. 整理“一行数据分散到多行”的表格(基于关键列合并碎片化数据),同时保留图片并映射到正确行。

工具采用多线程处理避免界面卡顿,支持.xlsx(含图片处理)和.csv格式。

工具界面截图

表格比对&整理工具主界面 图示:工具主界面,包含文件上传、比对配置、表格整理和进度条等功能区域

功能特性

1. 表格比对功能

  • 上传两个Excel/CSV文件(文件A、文件B)
  • 支持为每个文件选择指定子表格和比对列(跨列比对)
  • 提取三类结果:
    • A有B无的数据
    • B有A无的数据
    • 共有数据(可选择按A/B表格格式导出)
  • 比对和导出过程中保留Excel中的图片
  • 实时进度条展示比对进度

2. 表格整理功能

  • 修复“一行数据占多行”的表格问题
  • 自定义“新行判断列”(关键列非空=新行开始),自动合并碎片化数据
  • 保留原始图片,并映射到整理后表格的正确行
  • 导出包含合并数据和图片的整理后表格

3. 通用特性

  • 支持.xlsx(含图片)和.csv(无图片)格式
  • 友好的GUI界面,清晰展示文件信息(行列数、图片数)
  • 多线程处理,避免界面卡死
  • 自动配置Windows中文字体(避免乱码)
  • 安全的图片复制和锚点映射,确保导出后图片位置正确

环境要求

  • Python 3.7及以上版本
  • 依赖Python包:
    • pandas:数据处理核心
    • openpyxl:Excel图片处理和文件读写
    • tkinter:GUI界面(Python默认自带)

安装依赖

使用pip安装所需依赖包:

pip install pandas openpyxl

注意:Windows/macOS的Python通常自带tkinter;Linux系统需手动安装,如Debian/Ubuntu:sudo apt-get install python3-tk

使用说明

1. 运行工具

直接执行脚本:

python main.py

2. 表格比对操作流程

  1. 上传文件:分别点击文件A、文件B对应的“选择文件”按钮,上传待比对的Excel/CSV文件
  2. 查看文件信息:工具自动显示文件格式、子表格数、行列数、图片数等信息
  3. 配置比对参数
    • 从下拉框选择文件A/文件B的目标子表格
    • 选择比对列(用于判断数据唯一性的列)
  4. 开始比对:点击“开始比对”按钮
  5. 导出结果:比对完成后,点击对应按钮导出:
    • “导出A有B无的数据”
    • “导出B有A无的数据”
    • “导出共有数据”(可选择按A/B表格格式导出)

3. 表格整理操作流程

  1. 上传文件:在“表格整理功能”区域点击“选择文件”,上传待整理的Excel/CSV文件
  2. 配置整理参数
    • 从下拉框选择目标子表格
    • 选择“新行判断列”(该列非空值视为新行的开始)
  3. 开始整理:点击“开始整理表格”按钮
  4. 导出整理结果:点击“导出整理后表格”保存合并后的数据

注意事项

  1. 图片支持:仅.xlsx文件支持图片功能,CSV文件无图片存储能力,因此图片相关功能对CSV禁用。
  2. 行号映射:Excel行号为1起始,pandas DataFrame为0起始,工具自动映射图片位置(Excel行 = DataFrame行 + 2,兼容表头行)。
  3. 多线程处理:比对/整理任务在后台线程运行,确保GUI界面保持响应。
  4. 数据类型处理:工具自动将比对列值转换为字符串并填充空值,避免类型不匹配错误。

常见问题排查

  1. 中文乱码:工具自动为Windows系统设置“SimHei”字体,其他系统若出现乱码,需手动配置支持中文的字体。
  2. 文件读取失败:确保文件未被其他程序(如Excel)占用,且格式为有效的.xlsx/.csv
  3. 图片导出异常:原始Excel文件中的图片损坏可能导致导出失败,需先检查图片完整性。

About

这是一个基于图形界面的工具,使用 Python Tkinter 构建,用于 Excel/CSV 表格比较和表格清理。它解决了日常表格处理中的两个核心痛点: 比较两个表格以提取唯一数据(A有但B没有,B有但A没有)和通用数据,并完全支持保存Excel文件中的图片。 清理单行数据被拆分到多行的表(根据关键列合并碎片数据,同时保留图像)。

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages