This is a GUI-based tool built with Python Tkinter for Excel/CSV table comparison and table cleaning. It addresses two core pain points in daily table processing:
- Comparing two tables to extract unique data (A has but B doesn't, B has but A doesn't) and common data, with full support for preserving images in Excel files.
- Cleaning tables where a single row of data is split across multiple rows (merging fragmented data based on a key column while retaining images).
The tool uses multi-threaded processing to avoid GUI freezes and supports both .xlsx (with image handling) and .csv formats.
Figure: The main interface of the tool, showing file upload, comparison configuration, table cleaning, and progress bar areas.
- Upload two Excel/CSV files (File A and File B)
- Select specific sheets and comparison columns for each file (supports cross-column comparison)
- Extract three types of results:
- Data unique to File A (A has, B doesn't)
- Data unique to File B (B has, A doesn't)
- Common data (exists in both files), exportable in either A/B table format
- Preserve images in Excel files during comparison and export
- Real-time progress bar for comparison tasks
- Fix the issue where a single row of data occupies multiple rows in a table
- Define a "key column" (non-empty value = start of a new row) to merge fragmented data
- Preserve original images and map them to correct rows in the cleaned table
- Export cleaned tables with both merged data and original images
- Support for
.xlsx(with image handling) and.csv(no image support) formats - User-friendly GUI with clear file info display (row/column count, image count)
- Multi-threaded processing to prevent GUI freezing
- Automatic Chinese font configuration (Windows) to avoid garbled text
- Safe image copying and anchor mapping to ensure correct position in exported files
- Python 3.7+
- Required Python packages:
pandas(data processing)openpyxl(Excel image handling and file I/O)tkinter(GUI, usually pre-installed with Python)
Install the required dependencies using pip:
pip install pandas openpyxlNote: Tkinter is included with standard Python installations on Windows/macOS. For Linux, install it via
sudo apt-get install python3-tk(Debian/Ubuntu) or equivalent.
Execute the script directly:
python main.py- Upload Files: Click "选择文件" (Select File) for File A and File B respectively
- View File Info: The tool displays file format, sheet count, row/column count, and image count
- Configure Comparison:
- Select target sheets from the dropdowns for File A/File B
- Select comparison columns (the column used to judge data uniqueness)
- Start Comparison: Click "开始比对" (Start Comparison)
- Export Results: After completion, use the export buttons to save:
- "导出A有B无的数据" (Export A-only data)
- "导出B有A无的数据" (Export B-only data)
- "导出共有数据" (Export common data, choose A/B format)
- Upload File: Click "选择文件" (Select File) in the "表格整理功能" section
- Configure Cleaning:
- Select the target sheet from the dropdown
- Select a "新行判断列" (New Row Judgment Column) (non-empty value = new row)
- Start Cleaning: Click "开始整理表格" (Start Table Cleaning)
- Export Cleaned Data: Click "导出整理后表格" (Export Cleaned Table) to save the merged data
- Image Support: Only
.xlsxfiles support images – CSV files do not store images, so image-related features are disabled for CSV. - Row Number Mapping: Excel rows are 1-based, while pandas DataFrames are 0-based. The tool automatically maps image positions (Excel row = DataFrame row + 2, accounting for headers).
- Multi-threading: Comparison/cleaning tasks run in background threads to keep the GUI responsive.
- Data Type Handling: The tool converts comparison column values to strings and fills NaN values to avoid type mismatch errors.
- Garbled Chinese Text: The tool automatically sets the "SimHei" font for Windows. For other OS, manually configure a Chinese-supported font if needed.
- File Read Errors: Ensure the file is not open in another program (e.g., Excel) and the format is valid (
.xlsx/.csv). - Image Export Issues: Corrupted images in the original Excel file may cause export failures – check the image integrity first.
这是一款基于Python Tkinter开发的GUI桌面工具,专注于Excel/CSV表格比对和表格整理两大核心功能,解决日常表格处理中的常见痛点:
- 比对两个表格,提取A有B无、B有A无的唯一数据,以及共有数据,全程支持保留Excel文件中的图片。
- 整理“一行数据分散到多行”的表格(基于关键列合并碎片化数据),同时保留图片并映射到正确行。
工具采用多线程处理避免界面卡顿,支持.xlsx(含图片处理)和.csv格式。
图示:工具主界面,包含文件上传、比对配置、表格整理和进度条等功能区域
- 上传两个Excel/CSV文件(文件A、文件B)
- 支持为每个文件选择指定子表格和比对列(跨列比对)
- 提取三类结果:
- A有B无的数据
- B有A无的数据
- 共有数据(可选择按A/B表格格式导出)
- 比对和导出过程中保留Excel中的图片
- 实时进度条展示比对进度
- 修复“一行数据占多行”的表格问题
- 自定义“新行判断列”(关键列非空=新行开始),自动合并碎片化数据
- 保留原始图片,并映射到整理后表格的正确行
- 导出包含合并数据和图片的整理后表格
- 支持
.xlsx(含图片)和.csv(无图片)格式 - 友好的GUI界面,清晰展示文件信息(行列数、图片数)
- 多线程处理,避免界面卡死
- 自动配置Windows中文字体(避免乱码)
- 安全的图片复制和锚点映射,确保导出后图片位置正确
- Python 3.7及以上版本
- 依赖Python包:
pandas:数据处理核心openpyxl:Excel图片处理和文件读写tkinter:GUI界面(Python默认自带)
使用pip安装所需依赖包:
pip install pandas openpyxl注意:Windows/macOS的Python通常自带tkinter;Linux系统需手动安装,如Debian/Ubuntu:
sudo apt-get install python3-tk
直接执行脚本:
python main.py- 上传文件:分别点击文件A、文件B对应的“选择文件”按钮,上传待比对的Excel/CSV文件
- 查看文件信息:工具自动显示文件格式、子表格数、行列数、图片数等信息
- 配置比对参数:
- 从下拉框选择文件A/文件B的目标子表格
- 选择比对列(用于判断数据唯一性的列)
- 开始比对:点击“开始比对”按钮
- 导出结果:比对完成后,点击对应按钮导出:
- “导出A有B无的数据”
- “导出B有A无的数据”
- “导出共有数据”(可选择按A/B表格格式导出)
- 上传文件:在“表格整理功能”区域点击“选择文件”,上传待整理的Excel/CSV文件
- 配置整理参数:
- 从下拉框选择目标子表格
- 选择“新行判断列”(该列非空值视为新行的开始)
- 开始整理:点击“开始整理表格”按钮
- 导出整理结果:点击“导出整理后表格”保存合并后的数据
- 图片支持:仅
.xlsx文件支持图片功能,CSV文件无图片存储能力,因此图片相关功能对CSV禁用。 - 行号映射:Excel行号为1起始,pandas DataFrame为0起始,工具自动映射图片位置(Excel行 = DataFrame行 + 2,兼容表头行)。
- 多线程处理:比对/整理任务在后台线程运行,确保GUI界面保持响应。
- 数据类型处理:工具自动将比对列值转换为字符串并填充空值,避免类型不匹配错误。
- 中文乱码:工具自动为Windows系统设置“SimHei”字体,其他系统若出现乱码,需手动配置支持中文的字体。
- 文件读取失败:确保文件未被其他程序(如Excel)占用,且格式为有效的
.xlsx/.csv。 - 图片导出异常:原始Excel文件中的图片损坏可能导致导出失败,需先检查图片完整性。