Skip to content

ashane9/orc_file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ORCFILE

Ruby Gem for reading and writing Apache Optimized Row Columnar (ORC) files. This gem can also be paired using the factory_girl gem.

Installation

Must use jruby.

Add this line to your application’s Gemfile:

gem 'orc_file'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install orc_file

Usage

OrcFileWriter

To write a file, you will need to initialize the OrcFileWriter class. This object needs a table schema, your dataset, the path to store the file, and an optional configuration hash.

OrcFileWriter.new(table_schema, data_set, path, *options={})

table_schema

The table_schema must be a hash containing the column name and datatype as the key-value pair.

Valid datatypes are:

  • integer

  • decimal

  • float

  • date

  • datetime

  • time

  • string

    table_schema = {:id => :integer, :amount => :decimal, :rate => :float}
    

data_set

The data_set must contain a hash with the column name and data value as the key-value pair.

For one row in the dataset:

data_set = {:id => 1, :amount => 1000.01, :rate => 0.0005}

For multiple rows in the dataset:

dataset = [{:id => 1, :amount => 1000.01, :rate => 0.0005},
           {:id => 2, :amount => 2500.5, :rate => 0.1},
           {:id => 3, :amount => 10.12, :rate => 10.0134}]

path

The path should be the full file path or relative to your working directory. You must also specify the file name.

path = '/temp/orc_file.orc'

options

Options is an optional hash parameter containing 5 configurable settings for writing an ORC file.

`:stripe_size` defines the size of the stripe, defaulted as 67,108,864 bytes <br>
`:row_index_stride` defines the number of rows between row index entries, defaulted as 10,000 <br>
`:buffer_size` defines the orc buffer size, defaulted as 262,144 bytes <br>
`:compression` defines the compression codec (NONE,ZLIB,SNAPPY,LZO), defaulted as ZLIB. <br>

Define the options parameter has a hash

options = {:stripe_size => 70000000, :compression => 'SNAPPY'}

write_to_orc

Once you have the OrcFileWriter object initialized you must call write_to_orc to write out the file

OrcFileWriter.new(table_schema, data_set, path, options).write_to_orc

OrcFileReader

To read a file, you will need to initialize the OrcFileReader class. This object needs a table schema, and the path of the file to be read.

OrcFileReader.new(table_schema, path)

table_schema

The table_schema must be a hash containing the column name and datatype as the key-value pair.

Valid datatypes are:

  • integer

  • decimal

  • float

  • date

  • datetime

  • time

  • string

    table_schema = {:id => :integer, :amount => :decimal, :rate => :float}
    

path

The path should be the full file path or relative to your working directory. You must also specify the file name.

path = '/temp/orc_file.orc'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages