Apply corrections from yaml file to array of records. Useful in scraping/parsing when one needs to apply errata to the scraped data.
errata.yaml:
- city: "LasVegas"
~city: "Las Vegas"
- city: "LosAngeles"
~city: "Los Angeles"
apply_errata.rb:
records = [ { city: 'LasVegas', population: '596424' },
{ city: 'LosAngeles', population: '3857799' } ]
ErrataSlip::load_file('errata.yaml').correct!(records)
p records
=> [ { city: 'Las Vegas', population: '596424' },
{ city: 'Los Angeles', population: '3857799' } ]
errata.yaml:
- city: "Los Angeles"
country: "USA"
~state: "California"
- city: "Los Angeles"
country: "USA"
~state: "Nevada"
apply_errata.rb:
records = [ { city: 'Las Vegas', country: 'USA' },
{ city: 'Los Angeles', country: 'USA' } ]
ErrataSlip::load_file('errata.yaml').correct!(records)
p records
=> [ { city: 'Las Vegas', country: 'USA', state: 'Nevada' },
{ city: 'Los Angeles', country: 'USA', state: 'California' } ]
Add this line to your application's Gemfile:
gem 'errata_slip'
And then execute:
$ bundle
Or install it yourself as:
$ gem install errata_slip
You are expected to have array of hashes as an input and corrections are applied to it.
You create ErrataSlip from yaml file with errata
e = ErrataSlip::load_file('errata.yaml')
Errata file is array of hashes, which has 'match' fields and 'correct' fields. 'Match' fields are used to find the record to correct, 'correct' fields are used to apply changes to the record. 'Correct' fields are prefixed with tilde (~):
- fieldname: "Value of fieldname to find"
~fieldname: "Value of fieldname to replace"
For example, if your records have key 'name', errata file might look like this:
- name: "Name to find"
~name: "Name to replace with"
'Correct' fields can introduce new fields to your records:
- name: "Name to find"
~name: "Name to replace with"
~applied_errata: true
You use correct! method to correct all records in-place
scraped_records = [ { :name => 'Adam'}, { :name => 'Eve' } ]
ErrataSlip::load_file('errata.yaml').correct!(scraped_records)
You use correct_item! method to correct one hash in-place
scraped_records = [ { :name => 'Adam'}, { :name => 'Eve' } ]
errata = ErrataSlip::load_file('errata.yaml')
scraped_records.map { |record| errata.correct_item!(record) }
While errata file is written with string hash keys, correction works on both string-keyed hashed and symbol-keyed hashes.
So it doesn't matter if you have
scraped_records = [ { :name => 'Adam'}, { :name => 'Eve' } ]
or
scraped_records = [ { 'name' => 'Adam'}, { 'name' => 'Eve' } ]
ErrataSlip will autodetect format and apply errata correctly.
In this example we change all names from 'Adaam' to 'Adam'
errata.yaml
- name: "Adaam"
~name: "Adam"
apply_errata.rb
records = [
{ name: 'Adaam' },
{ name: 'Andrew' }
]
ErrataSlip::load_file('errata.yaml').correct!(records)
p records
=> [
{ name: 'Adam' },
{ name: 'Andrew' }
]
We search for all records with name 'Hillary' and surname 'Clinton' and change them to 'Monika' and 'Lewinsky' respectively.
errata.yaml
- name: "Hillary"
surname: "Clinton"
~name: "Monika"
~surname: "Lewinsky"
apply_errata.rb
records = [
{ name: 'Bill', surname: 'Clinton' },
{ name: 'Hillary', surname: 'Clinton' }
]
ErrataSlip::load_file('errata.yaml').correct!(records)
p records
=> [
{ name: 'Bill', surname: 'Clinton' },
{ name: 'Monika', surname: 'Lewinsky' }
]
This example searches all records with name 'Adam' and changes surname to 'Smith' and book to 'The Wealth of Nations'.
errata.yaml
- name: "Adam"
~surname: "Smith"
~book: "The Wealth of Nations"
apply_errata.rb
records = [
{ name: 'Adam', surname: 'Mansbach', book: 'Go the F**k to Sleep' }
]
ErrataSlip::load_file('errata.yaml').correct!(records)
p records
=> [
{ name: 'Adam', surname: 'Smith', book: 'The Wealth of Nations' }
]
The syntax is the same.
errata.yaml
- name: "Adam"
surname: "Smith"
~book: "The Wealth of Nations"
apply_errata.rb
records = [
{ name: 'Adam', surname: 'Smith' },
{ name: 'Adam', surname: 'Mansbach', book: 'Go the F**k to Sleep' }
]
ErrataSlip::load_file('errata.yaml').correct!(records)
p records
=> [
{ name: 'Adam', surname: 'Smith', book: 'The Wealth of Nations' },
{ name: 'Adam', surname: 'Sandler', book: 'Go the F**k to Sleep' }
]
ErrataSlip sticks to Semantic Versioning
ErrataSlip is tested against MRI 1.9.3, 2.0.0 and 2.1.0
Artem Fedorov: artemf at mail dot ru