Skip to content

Open Google Docs spreadsheets, local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and shapefiles. Returns an Array of Arrays or Hashes, depending on whether there are headers.

License

btucker/remote_table

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

remote_table

Open Google Docs spreadsheets, local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), SHP (ESRI shapefiles), other delimited, fixed-width files.

Tested on MRI 1.8, MRI 1.9, and JRuby 1.6.7+. Thread-safe.

Real-world usage

Brighter Planet logo

We use remote_table for data science at Brighter Planet and in production at

It's also a big part of

Example

>> require 'remote_table'
=> true
>> t = RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'
=> #<RemoteTable:0x00000101b87390 @download_count_mutex=#<Mutex:0x00000101b87228>, @extend_bang_mutex=#<Mutex:0x00000101b871d8>, @errata_mutex=#<Mutex:0x00000101b871b0>, @cache=[], @download_count=0, @url="http://www.fueleconomy.gov/FEG/epadata/98guide6.zip", @format=nil, @headers=:first_row, @compression=:zip, @packing=nil, @streaming=false, @warn_on_multiple_downloads=true, @delimiter=",", @sheet=nil, @keep_blank_rows=false, @form_data=nil, @skip=0, @internal_encoding="UTF-8", @row_xpath=nil, @column_xpath=nil, @row_css=nil, @column_css=nil, @glob=nil, @filename="98guide6.csv", @transform_settings=nil, @cut=nil, @crop=nil, @schema=nil, @schema_name=nil, @pre_select=nil, @pre_reject=nil, @errata_settings=nil, @other_options={}, @transformer=#<RemoteTable::Transformer:0x00000101b8c2f0 @t=#<RemoteTable:0x00000101b87390 ...>, @legacy_transformer_mutex=#<Mutex:0x00000101b8c2a0>>, @local_copy=#<RemoteTable::LocalCopy:0x00000101b8bf58 @t=#<RemoteTable:0x00000101b87390 ...>, @encoded_io_mutex=#<Mutex:0x00000101b8be18>, @generate_mutex=#<Mutex:0x00000101b8bdc8>>>
>> t.rows.length
=> 806
>> t.rows.first.length
=> 26
>> require 'pp'
=> true
>> pp t[23]
{"Class"=>"TWO SEATERS",
 "Manufacturer"=>"PORSCHE",
 "carline name"=>"BOXSTER",
 "displ"=>"2.5",
 "cyl"=>"6",
 "trans"=>"Manual(M5)",
 "drv"=>"R",
 "cty"=>"19",
 "hwy"=>"26",
 "cmb"=>"22",
 "ucty"=>"21.2",
 "uhwy"=>"33.9499",
 "ucmb"=>"25.5114",
 "fl"=>"P",
 "G"=>"",
 "T"=>"",
 "S"=>"",
 "2pv"=>"",
 "2lv"=>"",
 "4pv"=>"",
 "4lv"=>"",
 "hpv"=>"",
 "hlv"=>"",
 "fcost"=>"956",
 "eng dscr"=>"",
 "trans dscr"=>""}

Columns and rows

  • If there are headers, you get an Array of Hashes with string keys.
  • If you set :headers => false, then you get an Array of Arrays.

Row keys

Row keys are strings. Row keys are NOT symbolized.

row['foobar'] # correct
row[:foobar]  # incorrect

You can call symbolize_keys yourself, but we don't do it automatically to avoid creating loads of garbage symbols.

Supported formats

Format Notes Library
Delimited (CSV, TSV, etc.) All RemoteTable::Delimited::PASSTHROUGH_CSV_SETTINGS, for example :col_sep, are passed directly to fastercsv. fastercsv (1.8); stdlib (1.9)
Fixed width You have to set up a :schema. fixed_width-multibyte
HTML See XML. nokogiri
ODS roo
XLS roo
XLSX roo
XML The idea is to set up a :row_[xpath|css] and (optionally) a :column_[xpath|css]. nokogiri

Compression and packing

You can directly pick a file out of a remote archive using :filename or use a :glob.

  • zip
  • tar
  • bz2
  • gz
  • exe (treated as zip)

Encoding

Everything is forced into UTF-8. You can improve the quality of the conversion by specifying the original encoding with :encoding

  • ASCII-8BIT and BINARY are equal
  • ISO-8859-1 and Latin1 are equal

More examples

RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdHRNaVpSUWw2Z2VhN3RUV25yYWdQX2c&output=csv')

# aircraft fuel use equations derived from EMEP/EEA and ICAO
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEhYenF3dGt1T0Y1cTdneUNsNjV0dEE&output=csv')

# distance classes from the WRI business travel tool and UK DEFRA/DECC GHG Conversion Factors for Company Reporting
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFBKM0xWaUhKVkxDRmdBVkE3VklxY2c&hl=en&gid=0&output=csv')

# seat classes used in the WRI GHG Protocol calculation tools
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdG9EdmxybG1wdC1iU3JRYXNkMGhvSnc&output=csv')

# pure automobile fuels
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdE9xTEdueFM2R0diNTgxUlk1QXFSb2c&gid=0&output=csv')

# blended automobile fuels
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEswNGIxM0U4U0N1UUppdWw2ejJEX0E&gid=0&output=csv')

# A list of hybrid make model years derived from the EPA fuel economy guide
RemoteTable.new('https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AoQJbWqPrREqdGtzekE4cGNoRGVmdmZMaTNvOWluSnc&output=csv')

# BTS aircraft type lookup table
RemoteTable.new("http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_AIRCRAFT_TYPE",
                :errata => { :url => RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdEZ2d3JQMzV5T1o1T3JmVlFyNUZxdEE&output=csv' })

# aircraft made by whitelisted manufacturers whose ICAO code starts with 'B' from the FAA
# for definition of `Aircraft::Guru` and `manufacturer_whitelist?` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb
RemoteTable.new("http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-B.htm",
                :encoding => 'windows-1252',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td',
                :errata => { :url => 'https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGVBRnhkRGhSaVptSDJ5bXJGbkpUSWc&output=csv', :responder => Aircraft::Guru.new },
                :select => proc { |record| manufacturer_whitelist? record['Manufacturer'] })

# OpenFlights.org airports database
RemoteTable.new('https://openflights.svn.sourceforge.net/svnroot/openflights/openflights/data/airports.dat',
                :headers => %w{ id name city country_name iata_code icao_code latitude longitude altitude timezone daylight_savings },
                :select => proc { |record| record['iata_code'].present? },
                :errata => { :url => RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdFc2UzhQYU5PWEQ0N21yWFZGNmc2a3c&gid=0&output=csv', :responder => Airport::Guru.new }) # see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/aircraft/data_miner.rb

# T100 flight segment data for #{month.strftime('%B %Y')}
# for definition of `form_data` and `FlightSegment::Guru` see https://github.com/brighterplanet/earth/blob/master/lib/earth/air/flight_segment/data_miner.rb
RemoteTable.new('http://www.transtats.bts.gov/DownLoad_Table.asp',
                :form_data => form_data,
                :compression => :zip,
                :glob => '/*.csv',
                :errata => { :url => RemoteTable.new('https://spreadsheets.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdGxpYU1qWFR3d0syTVMyQVVOaDd0V3c&output=csv', :responder => FlightSegment::Guru.new },
                :select => proc { |record| record['DEPARTURES_PERFORMED'].to_i > 0 })

# 1995 Fuel Economy Guide
# for definition of `:fuel_economy_guide_b` and `AutomobileMakeModelYearVariant::ParserB` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
RemoteTable.new("http://www.fueleconomy.gov/FEG/epadata/95mfgui.zip",
                :filename => '95MFGUI.DAT',
                :format => :fixed_width,
                :cut => '13-',
                :schema_name => :fuel_economy_guide_b,
                :select => proc { |row| row['model'].present? and (row['suppress_code'].blank? or row['suppress_code'].to_f == 0) and row['state_code'] == 'F' },
                :transform => { :class => AutomobileMakeModelYearVariant::ParserB, :year => 1995 },
                :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new })

# 1998 Fuel Economy Guide
# for definition of `AutomobileMakeModelYearVariant::ParserC` see https://github.com/brighterplanet/earth/blob/master/lib/earth/automobile/automobile_make_model_year_variant/data_miner.rb
RemoteTable.new('http://www.fueleconomy.gov/FEG/epadata/98guide6.zip',
                :filename => '98guide6.csv',
                :transform => { :class => AutomobileMakeModelYearVariant::ParserC, :year => 1998 },
                :errata => { :url => "https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDkxTElWRVlvUXB3Uy04SDhSYWkzakE&output=csv", :responder => AutomobileMakeModelYearVariant::Guru.new },
                :select => proc { |row| row['model'].present? })

# annual corporate average fuel economy data for domestic and imported vehicle fleets from the NHTSA
RemoteTable.new('https://spreadsheets.google.com/pub?key=0AoQJbWqPrREqdEdXWXB6dkVLWkowLXhYSFVUT01sS2c&hl=en&gid=0&output=csv',
                :errata => { 'url' => 'http://static.brighterplanet.com/science/data/transport/automobiles/make_fleet_years/errata.csv' },
                :select => proc { |row| row['volume'].to_i > 0 })

# total vehicle miles travelled by gasoline passenger cars from the 2010 EPA GHG Inventory
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
                :filename => 'Annex Tables/Annex 3/Table A-87.csv',
                :skip => 1,
                :select => proc { |row| row['Year'].to_i.to_s == row['Year'] })

# total vehicle miles travelled from the 2010 EPA GHG Inventory
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
                :filename => 'Annex Tables/Annex 3/Table A-87.csv',
                :skip => 1,
                :select => proc { |row| row['Year'].to_i.to_s == row['Year'] })

# total travel distribution from the 2010 EPA GHG Inventory
RemoteTable.new('http://www.epa.gov/climatechange/emissions/downloads10/2010-Inventory-Annex-Tables.zip',
                :filename => 'Annex Tables/Annex 3/Table A-93.csv',
                :skip => 1,
                :select => proc { |row| row['Vehicle Age'].to_i.to_s == row['Vehicle Age'] })

# building characteristics from the 2003 EIA Commercial Building Energy Consumption Survey
RemoteTable.new('http://www.eia.gov/emeu/cbecs/cbecs2003/public_use_2003/data/FILE02.csv',
                :skip => 1,
                :headers => ["PUBID8","REGION8","CENDIV8","SQFT8","SQFTC8","YRCONC8","PBA8","ELUSED8","NGUSED8","FKUSED8","PRUSED8","STUSED8","HWUSED8","ONEACT8","ACT18","ACT28","ACT38","ACT1PCT8","ACT2PCT8","ACT3PCT8","PBAPLUS8","VACANT8","RWSEAT8","PBSEAT8","EDSEAT8","FDSEAT8","HCBED8","NRSBED8","LODGRM8","FACIL8","FEDFAC8","FACACT8","MANIND8","PLANT8","FACDST8","FACDHW8","FACDCW8","FACELC8","BLDPLT8","ADJWT8","STRATUM8","PAIR8"])

# 2003 CBECS C17 - Electricity Consumption and Intensity - New England Division
# for definition of `CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER` see https://github.com/brighterplanet/earth/blob/master/lib/earth/industry/cbecs_energy_intensity/data_miner.rb
RemoteTable.new("http://www.eia.gov/emeu/cbecs/cbecs2003/detailed_tables_2003/2003set10/2003excel/C17.xls",
                :headers => false,
                :select => proc { |row| CbecsEnergyIntensity::NAICS_CODE_SYNTHESIZER.call(row) },
                :crop => (21..37))

# U.S. Census 2002 NAICS code list
RemoteTable.new('http://www.census.gov/epcd/naics02/naicod02.txt',
                :skip => 4,
                :headers => false,
                :delimiter => '	')

# MECS table 3.2 Total US
RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table3_2.xls",
                :crop => (15..94),
                :headers => ["NAICS Code", "Subsector and Industry", "Total", "BLANK", "Net Electricity", "BLANK", "Residual Fuel Oil", "Distillate Fuel Oil", "Natural Gas", "BLANK", "LPG and NGL", "BLANK", "Coal", "Coke and Breeze", "Other"])

# MECS table 6.1 Midwest
RemoteTable.new("http://205.254.135.24/emeu/mecs/mecs2006/excel/Table6_1.xls",
                :crop => (184..263),
                :headers => ["NAICS Code", "Subsector and Industry", "Consumption per Employee", "Consumption per Dollar of Value Added", "Consumption per Dollar of Value of Shipments"])

# U.S. Census Geographic Terms and Definitions
RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
                :skip => 6,
                :headers => %w{ Region Division FIPS Name },
                :select => proc { |row| row['Division'].to_i > 0 and row['FIPS'].to_i == 0 })

# state census divisions from the U.S. Census
RemoteTable.new('http://www.census.gov/popest/about/geo/state_geocodes_v2009.txt',
                :skip => 8,
                :headers => ['Region', 'Division', 'State FIPS', 'Name'],
                :select => proc { |row| row['State FIPS'].to_i > 0 })

# OpenGeoCode.org's Country Codes to Country Names list
RemoteTable.new('http://opengeocode.org/download/countrynames.txt',
                :format => :delimited,
                :delimiter => ';',
                :headers => false,
                :skip => 22)

# heating degree day data from WRI CAIT
RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDN4MkRTSWtWRjdfazhRdWllTkVSMkE&output=csv',
                :select => Proc.new { |record| record['country'] != 'European Union (27)' },
                :errata => { :url => RemoteTable.new('https://docs.google.com/spreadsheet/pub?key=0AoQJbWqPrREqdDNSMUtCV0h4cUF4UnBKZlNkczlNbFE&output=csv' })

# US average grid loss factor derived eGRID 2007 data
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
                :sheet => 'USGC',
                :skip => 5)

# eGRID 2010 regions and loss factors
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010V1_1_STIE_USGC.xls',
                :sheet => 'STIE07',
                :skip => 4,
                :select => proc { |row| row['eGRID2010 year 2007 file state sequence number'].to_i.between?(1, 51) })

# eGRID 2010 subregions and electricity emission factors
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/eGRID2010_Version1-1_xls_only.zip',
                :filename => 'eGRID2010V1_1_year07_AGGREGATION.xls',
                :sheet => 'SRL07',
                :skip => 4,
                :select => proc { |row| row['SEQSRL07'].to_i.between?(1, 26) })

# U.S. Census State ANSI Code file
RemoteTable.new('http://www.census.gov/geo/www/ansi/state.txt',
                :delimiter => '|',
                :select => proc { |record| record['STATE'].to_i < 60 })

# Mapping Hacks zipcode database
RemoteTable.new('http://mappinghacks.com/data/zipcode.zip',
                :filename => 'zipcode.csv')

# zipcode states and eGRID Subregions from the US EPA
RemoteTable.new('http://www.epa.gov/cleanenergy/documents/egridzips/Power_Profiler_Zipcode_Tool_v3-2.xlsx',
                :sheet => 'Zip-subregion')

# horse breeds
RemoteTable.new('http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8')

# Brighter Planet's list of cat and dog breeds, genders, and weights
RemoteTable.new('http://static.brighterplanet.com/science/data/consumables/pets/breed_genders.csv',
                :encoding => 'ISO-8859-1',
                :select => proc { |row| row['gender'].present? })

# residential electricity prices from the EIA
RemoteTable.new('http://www.eia.doe.gov/cneaf/electricity/page/sales_revenue.xls',
                :select => proc { |row| row['Year'].to_s.first(4).to_i > 1989 })

# residential natural gas prices from the EIA
# for definition of `NaturalGasParser` see https://github.com/brighterplanet/earth/blob/master/lib/earth/residence/residence_fuel_price/data_miner.rb
RemoteTable.new('http://tonto.eia.doe.gov/dnav/ng/xls/ng_pri_sum_a_EPG0_FWA_DMcf_a.xls',
                :sheet => 'Data 1',
                :skip => 2,
                :select => proc { |row| row['year'].to_i > 1989 },
                :transform => { :class => NaturalGasParser })

# 2005 EIA Residential Energy Consumption Survey microdata
RemoteTable.new('http://www.eia.doe.gov/emeu/recs/recspubuse05/datafiles/RECS05alldata.csv',
                :headers => :upcase)

# ...and more from the tests...

RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA&single=true&gid=0'

RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'

RemoteTable.new 'http://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA', :skip => 1, :headers => false

RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw'

RemoteTable.new 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw', :headers => %w{ col1 col2 col3 }

RemoteTable.new 'http://spreadsheets.google.com/pub?key=tujrgUOwDSLWb-P4KCt1qBg'

RemoteTable.new 'http://tonto.eia.doe.gov/dnav/pet/xls/PET_PRI_RESID_A_EPPR_PTA_CPGAL_M.xls', :transform => { :class => FuelOilParser }

RemoteTable.new 'http://www.freebase.com/type/exporttypeinstances/base/horses/horse_breed?page=0&filter_mode=type&filter_view=table&show%01p%3D%2Ftype%2Fobject%2Fname%01index=0&show%01p%3D%2Fcommon%2Ftopic%2Fimage%01index=1&show%01p%3D%2Fcommon%2Ftopic%2Farticle%01index=2&sort%01p%3D%2Ftype%2Fobject%2Ftype%01p%3Dlink%01p%3D%2Ftype%2Flink%2Ftimestamp%01index=false&=&exporttype=csv-8'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/02data.zip', :filename => 'guide_jan28.xls'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :filename => '2008_FE_guide_ALL_rel_dates_-no sales-for DOE-5-1-08.csv'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/08data.zip', :glob => '/*.csv'

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/98guide6.zip', :filename => '98guide6.csv'

RemoteTable.new 'http://www.worldmapper.org/data/opendoc/2_worldmapper_data.ods', :sheet => 'Data', :keep_blank_rows => true

RemoteTable.new 'https://spreadsheets.google.com/pub?key=t5HM1KbaRngmTUbntg8JwPA'

RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx'

RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => %w{foo bar baz}

RemoteTable.new 'www.customerreferenceprogram.org/uploads/CRP_RFP_template.xlsx', :headers => false

RemoteTable.new 'http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=293&Has_Group=3&Is_Zipped=0', :form_data => 'UserTableName=T_100_Segment__All_Carriers&[...]', :compression => :zip, :glob => '/*.csv'

RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-E.htm",
                :encoding => 'US-ASCII',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td'

RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
                :encoding => 'windows-1252',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td',
                :errata => Errata.new(:url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
                                      :responder => AircraftGuru.new)

RemoteTable.new "http://www.faa.gov/air_traffic/publications/atpubs/CNT/5-2-G.htm",
                :encoding => 'windows-1252',
                :row_xpath => '//table/tr[2]/td/table/tr',
                :column_xpath => 'td',
                :errata => { :url => 'http://spreadsheets.google.com/pub?key=tObVAGyqOkCBtGid0tJUZrw',
                             :responder => AircraftGuru.new }

RemoteTable.new 'http://www.fueleconomy.gov/FEG/epadata/00data.zip',
                :filename => 'Gd6-dsc.txt',
                :format => :fixed_width,
                :crop => 21..26, # inclusive
                :cut => '2-',
                :select => proc { |row| /\A[A-Z]/.match row['code'] },
                :schema => [[ 'code',   2, { :type => :string }  ],
                            [ 'spacer', 2 ],
                            [ 'name',   52, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
                :format => :fixed_width,
                :skip => 1,
                :schema => [[ 'header4', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header5', 10, { :type => :string } ],
                            [ 'spacer',  12 ],
                            [ 'header6', 10, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/test2.fixed_width.txt',
                :format => :fixed_width,
                :keep_blank_rows => true,
                :skip => 1,
                :schema => [[ 'header4', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header5', 10, { :type => :string } ],
                            [ 'spacer',  12 ],
                            [ 'header6', 10, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.fixed_width.txt',
                :format => :fixed_width,
                :skip => 1,
                :schema => [[ 'header1', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header2', 10, { :type => :string } ],
                            [ 'spacer',  12 ],
                            [ 'header3', 10, { :type => :string } ]]

RemoteTable.new 'http://cloud.github.com/downloads/seamusabshere/remote_table/remote_table_row_hash_test.alternate_order.fixed_width.txt',
                :format => :fixed_width,
                :skip => 1,
                :schema => [[ 'spacer',  11 ],
                            [ 'header2', 10, { :type => :string }  ],
                            [ 'spacer',  1 ],
                            [ 'header3', 10, { :type => :string } ],
                            [ 'spacer',  1 ],
                            [ 'header1', 10, { :type => :string } ]]

# ESRI Shapefile from NREL
require 'geo_ruby'
require 'dbf'
RemoteTable.new 'http://www.nrel.gov/gis/cfm/data/GIS_Data_Technology_Specific/United_States/Solar/High_Resolution/Lower_48_DNI_High_Resolution.zip',
  :format => :shp

Requirements

  • Unix tools like curl, iconv, perl, cat, cut, tail, etc. accessible from your $PATH
  • geo_ruby and dbf gems if you plan on fetching shapefiles

Wishlist

  • Win32 compat
  • The new "custom parser" syntax (aka transformer) hasn't been defined yet... only the old-style syntax is available

Authors

Copyright

Copyright (c) 2012 Brighter Planet. See LICENSE for details.

About

Open Google Docs spreadsheets, local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and shapefiles. Returns an Array of Arrays or Hashes, depending on whether there are headers.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 100.0%