-
Notifications
You must be signed in to change notification settings - Fork 0
Add docs for excel extraction #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| --- | ||
| title: 'Excel Extraction Best Practices' | ||
| description: 'Learn how to prepare Excel files for optimal extraction and analysis' | ||
| --- | ||
|
|
||
| ## Excel File Preparation Guidelines | ||
|
|
||
| While Gurubase can interpret different Excel structures, it is still important to format your tables for the best possible extraction and analysis results. Follow these guidelines to ensure your data is structured optimally for processing. | ||
|
|
||
| ## 1. Complete Headers and Columns | ||
|
|
||
| **Always include complete headers and columns for all data sections.** | ||
|
|
||
| ✅ **Good Example** | ||
| <Frame> | ||
| <img src="/images/excel-extraction/1-good.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ❌ **Bad Example** | ||
| <Frame> | ||
| <img src="/images/excel-extraction/1-bad.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| *Missing Price and Stock columns* | ||
|
|
||
| ## 2. Use Clear Structures | ||
|
|
||
| **Avoid complex nested structures and multi-tables when possible.** | ||
|
|
||
| ✅ **Good Example** | ||
| <Frame> | ||
| <img src="/images/excel-extraction/2-good.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ❌ **Bad Example** | ||
| <Frame> | ||
| <img src="/images/excel-extraction/2-bad.png" alt="" /> | ||
| </Frame> | ||
|
|
||
|
|
||
| **For Sub-tables**: | ||
|
|
||
| When you have sub-tables or multiple data sections, repeat headers and columns for each section: | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/2-sub-table.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ## 3. Clean Empty Rows and Columns | ||
|
|
||
| **Remove all empty rows and columns to keep files as small as possible.** | ||
|
|
||
| **Before Cleaning:** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/3-bad.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| **After Cleaning:** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/3-good.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ## 4. Split into Smaller Sheets | ||
|
|
||
| **Divide large datasets into multiple smaller, focused sheets.** | ||
|
|
||
| **Example Sheet Structure:** | ||
|
|
||
| - **Sheet 1**: Customer Information | ||
| - **Sheet 2**: Product Catalog | ||
| - **Sheet 3**: Sales Transactions | ||
| - **Sheet 4**: Inventory Levels | ||
|
|
||
| **Benefits:** | ||
|
|
||
| - Faster processing | ||
| - Better organization | ||
| - Easier to maintain | ||
| - Reduced file size | ||
|
|
||
| ## 5. Use Table Structure Over Form Structure | ||
|
|
||
| **Prefer tabular data layout instead of form-based layouts.** | ||
|
|
||
| ✅ **Good Example (Table Structure)** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/5-good.png" alt="" /> | ||
| </Frame> | ||
|
|
||
|
|
||
| ❌ **Bad Example (Form Structure)** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/5-bad.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ## 6. Proper Nested Structure | ||
|
|
||
| **When nesting is necessary, ensure flows completely encompass each other and merge headers around grouped content.** | ||
|
|
||
| ✅ **Good Example (Proper Nested Structure)** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/6-good.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ✅ **Good Example (Proper Nested Structure)** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/6-good-more-complex.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| ❌ **Bad Example** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/6-bad.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| Here is its fixed version: | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/6-bad-fixed.png" alt="" /> | ||
| </Frame> | ||
|
|
||
| **Key Principles for Nested Structures** | ||
|
|
||
| 1. **Complete Coverage**: Each nested level should fully encompass the data below it | ||
| 2. **Merged Headers**: Use merged cells to group related columns under main categories | ||
| 3. **Consistent Structure**: Maintain the same pattern throughout the sheet | ||
| 4. **Clear Hierarchy**: Make the relationship between levels obvious | ||
|
|
||
| ## 7. Column Oriented Tables | ||
|
|
||
| Gurubase can also handle column oriented excel files. Just make sure you include proper headers above the data cells: | ||
|
|
||
| ✅ **Good Example (Proper Nested Structure)** | ||
|
|
||
| <Frame> | ||
| <img src="/images/excel-extraction/7-good.png" alt="" width="300" /> | ||
| </Frame> | ||
|
|
||
|
|
||
| ## 8. File Size Optimization | ||
|
|
||
| Keep files as small as possible for better performance. | ||
|
|
||
| - Remove unused worksheets | ||
| - Delete empty rows and columns | ||
| - Use appropriate data types | ||
| - Compress images if present | ||
| - Avoid unnecessary formatting | ||
|
|
||
| ## 9. Common Mistakes to Avoid | ||
|
|
||
| 1. **Missing headers** - Always include column headers | ||
| 2. **Unclear header hierarchy** - Make nested header relationships obvious | ||
| 3. **Inconsistent header spanning** - Use merged cells consistently for grouped columns | ||
| 4. **Mixed data types** - Keep consistent formats within columns | ||
| 5. **Excessive nesting** - Prefer flat structures when possible | ||
| 6. **Large single sheets** - Split into multiple focused sheets | ||
| 7. **Unnecessary formatting** - Remove complex styling | ||
| 8. **Hidden data** - Ensure all relevant data is visible | ||
| 9. **Inconsistent naming** - Use clear, consistent naming conventions | ||
| 10. **Ambiguous header names** - Use descriptive, specific header labels | ||
|
|
||
| Following these guidelines will significantly improve the quality of your Excel data extraction and analysis results. | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.