Skip to content

acidlemon/Data-XLSX-Parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

Data::XLSX::Parser - faster XLSX parser

SYNOPSIS

use Data::Dumper;
use Data::XLSX::Parser;

my $parser = Data::XLSX::Parser->new;
$parser->add_row_event_handler(sub {
    my ($row, $rowDetail) = @_;
    # array of cell values in parsed row
    print Dumper $row;
    # array of hashes with cell details (reference, value, column, row, style, etc.) in parsed row
    print Dumper $rowDetail;
});
$parser->open('foo.xlsx');

# parse sheet with sheet name
$parser->sheet_by_rid( $parser->workbook->sheet_rid( 'Sheet1' ) );

# .. or parse sheet with sheet Id
$parser->sheet_by_id(1);

# -----------
# print values of all sheets on the commandline
use Text::ASCIITable;

# get names of all sheets in the workbook
my @rows;

my $xlsx_parser = Data::XLSX::Parser->new;
$xlsx_parser->add_row_event_handler( sub{
    push @rows, $_[0];
});

$xlsx_parser->open( 'test.xlsx' );
my @names = $xlsx_parser->workbook->names;

for my $name ( @names ) {
    say "Table $name:";

    my $table = Text::ASCIITable->new;
    my $rid   = $xlsx_parser->workbook->sheet_id( $name );
    $xlsx_parser->sheet_by_rid( $rid );

    my $headers = shift @rows;
    $table->setCols( @{ $headers || [] } );

    for my $row ( @rows ) {
        $table->addRow( @{ $row || [] } );
    }
    
    print $table;

    @rows = ();
}

DESCRIPTION

Data::XLSX::Parser provides a fast way to parse Microsoft Excel's .xlsx files. The implementation of this module is highly inspired from Python's FastXLSX library.

The module uses a SAX based parser, so you can parse very large XLSX file with lower memory usage.

METHODS

new

Create new parser object.

add_row_event_handler

Add sub reference to row handler. Two arguments are returned, the first is an array with the cell values of the parsed row, the second is an array of hashes with the details of the parsed row cells:

|key |Content  
-------------------------
| i  |STYLE_INDEX        
| s  |STYLE OF CELL      
| f  |FORMAT OF CELL     
| r  |REFERENCE          
| c  |COLUMN OF CELL     
| v  |VALUE OF CELL      
| t  |TYPE OF CELL       
| s  |TYPE_SHARED_STRING 
| g  |GENERATED_CELL     
| row|ROW OF CELL        

Cell values are returned 'as is', except date values (where the format tag indicates this) are converted to epoch values.

open

Open a workbook to be parsed.

sheet_by_id

Start parsing of sheet identified by sheet Id.

sheet_by_rid

Start parsing of sheet identified by sheet relation Id.

workbook

returns the Data::XLSX::Parser::Workbook object (representation of xl/workbook.xml, used to get sheets).

shared_strings

returns the Data::XLSX::Parser::SharedStrings object (representation of xl/sharedStrings.xml).

styles

returns the Data::XLSX::Parser::Styles object (representation of xl/styles.xml).

relationships

returns the Data::XLSX::Parser::Relationships object (representation of xl/_rels/workbook.xml.rels).

AUTHOR

Daisuke Murase typester@cpan.org

About

faster XLSX parser inspired by Python's FastXLSX

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Perl 100.0%