Skip to content

A Gem to harvest data from a stanford Purl page, with convenience methods for getting Nokogiri::XML::Document and errors when pieces are missing

License

Notifications You must be signed in to change notification settings

sul-dlss-deprecated/harvestdor

Repository files navigation

Harvestdor

Build Status Code Climate Test Coverage Gem Version

A Gem to harvest data from a Stanford Purl page, with convenience methods for getting Nokogiri::XML::Document and errors when pieces are missing

Installation

Add this line to your application's Gemfile:

gem 'harvestdor'

And then execute:

$ bundle

Or install it yourself as:

$ gem install harvestdor

Usage

Configuration

Possible configuration options (with default values unless otherwise indicated)

client = Harvestdor::Client.new({ # Example with all possible options :log_dir => File.join(File.dirname(FILE), "..", "logs"), :log_name => 'harvestdor.log', :purl => 'https://purl.stanford.edu' })

Option 1: use a yaml file

for contents of yml -- see spec/config/example.yml

client = Harvestdor::Client.new({:config_yml_path => path_to_my_yml}) client.mods('oo111oo2222')

Option 2: pass in non-default configurations as a hash

client = Harvestdor::Client.new({:purl => 'https://my_purl.org'}) client.mods('oo111oo2222')

Option 3: set the attributes explicitly in your code

client = Harvestdor::Client.new client.config.purl = 'https://my_purl.org' client.mods('oo111oo2222')

XML from PURL pages

You can get, for example, the contentMetadata for a druid:

it "#content_metadata retrieves contentMetadata as a Nokogiri::XML::Document" do cm = Harvestdor.content_metadata('bb375wb8869', 'https://purl-test.stanford.edu') cm.should be_kind_of(Nokogiri::XML::Document) cm.root.name.should == 'contentMetadata' cm.root.attributes['objectId'].text.should == @druid end

Or the MODS metadata:

it "#mods returns a Nokogiri::XML::Document from the purl mods" do x = Harvestdor.mods('bb375wb8869', 'https://purl-test.stanford.edu') x.should be_kind_of(Nokogiri::XML::Document) x.root.name.should == 'mods' x.root.namespace.href.should == Harvestdor::MODS_NAMESPACE end

Similarly for

  • mods
  • public_xml (all of it)
  • content_metadata
  • identity_metadata
  • rights_metadata
  • rdf
  • dc

You can also do this from a Harvestdor::Client object, and it will use the purl from the Client.config:

client = Harvestdor::Client.new({purl: 'https://thisone.org'}) client.identity_metadata('bb375wb8869')

Contributing

  • Fork it
  • Create your feature branch (git checkout -b my-new-feature)
  • Write code and tests.
  • Commit your changes (git commit -am 'Added some feature')
  • Push to the branch (git push origin my-new-feature)
  • Create new Pull Request

Releases

  • 0.0.14 Bug fix for compatibility with jruby
  • 0.0.13 Updated to work with Faraday 0.9, releases via rubygems instead of sul-gems
  • 0.0.11 better error handling, and better testing for errors
  • 0.0.10 tweak specs to test that unnec fetching isn't done.
  • 0.0.9 allows public xml to be passed as Nokogiri::XML::Document to content_metadata, etc. to avoid unnec fetching
  • 0.0.8 avoid undefined method 'size' from scrub_oai_args when using a non-nil default date param
  • 0.0.7 add oai client timeout overrides, update README
  • 0.0.6 refactoring oai_harvest for greater simplicity and passing errors through, add oai_record (get_record OAI request)
  • 0.0.5 don't send empty string arguments to OAI server so you can get actual results
  • 0.0.4 add integration spec and get it working with actual OAI server
  • 0.0.3 add method to get mods from purl
  • 0.0.2 tidy up README
  • 0.0.1 initial commit

About

A Gem to harvest data from a stanford Purl page, with convenience methods for getting Nokogiri::XML::Document and errors when pieces are missing

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages