Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions guides/excel-extraction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: 'Excel Extraction Best Practices'
description: 'Learn how to prepare Excel files for optimal extraction and analysis'
---

## Excel File Preparation Guidelines

While Gurubase can interpret different Excel structures, it is still important to format your tables for the best possible extraction and analysis results. Follow these guidelines to ensure your data is structured optimally for processing.

## 1. Complete Headers and Columns

**Always include complete headers and columns for all data sections.**

**Good Example**
<Frame>
<img src="/images/excel-extraction/1-good.png" alt="" />
</Frame>

**Bad Example**
<Frame>
<img src="/images/excel-extraction/1-bad.png" alt="" />
</Frame>

*Missing Price and Stock columns*

## 2. Use Clear Structures

**Avoid complex nested structures and multi-tables when possible.**

**Good Example**
<Frame>
<img src="/images/excel-extraction/2-good.png" alt="" />
</Frame>

**Bad Example**
<Frame>
<img src="/images/excel-extraction/2-bad.png" alt="" />
</Frame>


**For Sub-tables**:

When you have sub-tables or multiple data sections, repeat headers and columns for each section:

<Frame>
<img src="/images/excel-extraction/2-sub-table.png" alt="" />
</Frame>

## 3. Clean Empty Rows and Columns

**Remove all empty rows and columns to keep files as small as possible.**

**Before Cleaning:**

<Frame>
<img src="/images/excel-extraction/3-bad.png" alt="" />
</Frame>

**After Cleaning:**

<Frame>
<img src="/images/excel-extraction/3-good.png" alt="" />
</Frame>

## 4. Split into Smaller Sheets

**Divide large datasets into multiple smaller, focused sheets.**

**Example Sheet Structure:**

- **Sheet 1**: Customer Information
- **Sheet 2**: Product Catalog
- **Sheet 3**: Sales Transactions
- **Sheet 4**: Inventory Levels

**Benefits:**

- Faster processing
- Better organization
- Easier to maintain
- Reduced file size

## 5. Use Table Structure Over Form Structure

**Prefer tabular data layout instead of form-based layouts.**

**Good Example (Table Structure)**

<Frame>
<img src="/images/excel-extraction/5-good.png" alt="" />
</Frame>


**Bad Example (Form Structure)**

<Frame>
<img src="/images/excel-extraction/5-bad.png" alt="" />
</Frame>

## 6. Proper Nested Structure

**When nesting is necessary, ensure flows completely encompass each other and merge headers around grouped content.**

**Good Example (Proper Nested Structure)**

<Frame>
<img src="/images/excel-extraction/6-good.png" alt="" />
</Frame>

**Good Example (Proper Nested Structure)**

<Frame>
<img src="/images/excel-extraction/6-good-more-complex.png" alt="" />
</Frame>

**Bad Example**

<Frame>
<img src="/images/excel-extraction/6-bad.png" alt="" />
</Frame>

Here is its fixed version:

<Frame>
<img src="/images/excel-extraction/6-bad-fixed.png" alt="" />
</Frame>

**Key Principles for Nested Structures**

1. **Complete Coverage**: Each nested level should fully encompass the data below it
2. **Merged Headers**: Use merged cells to group related columns under main categories
3. **Consistent Structure**: Maintain the same pattern throughout the sheet
4. **Clear Hierarchy**: Make the relationship between levels obvious

## 7. Column Oriented Tables

Gurubase can also handle column oriented excel files. Just make sure you include proper headers above the data cells:

**Good Example (Proper Nested Structure)**

<Frame>
<img src="/images/excel-extraction/7-good.png" alt="" width="300" />
</Frame>


## 8. File Size Optimization

Keep files as small as possible for better performance.

- Remove unused worksheets
- Delete empty rows and columns
- Use appropriate data types
- Compress images if present
- Avoid unnecessary formatting

## 9. Common Mistakes to Avoid

1. **Missing headers** - Always include column headers
2. **Unclear header hierarchy** - Make nested header relationships obvious
3. **Inconsistent header spanning** - Use merged cells consistently for grouped columns
4. **Mixed data types** - Keep consistent formats within columns
5. **Excessive nesting** - Prefer flat structures when possible
6. **Large single sheets** - Split into multiple focused sheets
7. **Unnecessary formatting** - Remove complex styling
8. **Hidden data** - Ensure all relevant data is visible
9. **Inconsistent naming** - Use clear, consistent naming conventions
10. **Ambiguous header names** - Use descriptive, specific header labels

Following these guidelines will significantly improve the quality of your Excel data extraction and analysis results.
Binary file added images/excel-extraction/1-bad.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/1-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/2-bad.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/2-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/2-sub-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/3-bad.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/3-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/5-bad.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/5-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/6-bad-fixed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/6-bad.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/6-good-more-complex.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/6-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/excel-extraction/7-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@
"guides/create-guru",
"guides/analyze-usage",
"guides/prompting-your-guru",
"guides/pii-masking"
"guides/pii-masking",
"guides/excel-extraction"
]
},
{
Expand Down