Skip to content

Commit

Permalink
modified scrapper to work with website changes
Browse files Browse the repository at this point in the history
  • Loading branch information
cgibsonmm committed Apr 5, 2020
1 parent 2b90ed4 commit 7afd1a9
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 5 deletions.
2 changes: 1 addition & 1 deletion Gemfile.lock
Expand Up @@ -211,4 +211,4 @@ RUBY VERSION
ruby 2.4.2p198

BUNDLED WITH
1.17.1
2.1.4
5 changes: 3 additions & 2 deletions app/models/curiosity_scraper.rb
@@ -1,5 +1,6 @@
class CuriosityScraper
require "open-uri"
require 'json'
BASE_URL = "https://mars.jpl.nasa.gov/msl/multimedia/raw/"

attr_reader :rover
Expand All @@ -13,11 +14,11 @@ def scrape

# grabs the HTML from the main page of the curiosity rover image gallery
def main_page
Nokogiri::HTML(open("https://mars.jpl.nasa.gov/msl/multimedia/raw/"))
Nokogiri::HTML(open("https://mars.nasa.gov/msl/multimedia/raw-images/?order=sol+desc%2Cinstrument_sort+asc%2Csample_type_sort+asc%2C+date_taken+desc&per_page=50&page=0&mission=msl"))
end

def collect_links
latest_sol_available = main_page.css("#listImagesContentTxt").attr('value').value.to_i
latest_sol_available = JSON.parse(main_page.css('[data-react-props]').last.attr('data-react-props'))["header_counts"]["latest_sol"].to_i
latest_sol_scraped = rover.photos.maximum(:sol).to_i
sols_to_scrape = latest_sol_scraped..latest_sol_available
sols_to_scrape.map { |sol|
Expand Down
2 changes: 1 addition & 1 deletion db/schema.rb
Expand Up @@ -10,7 +10,7 @@
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema.define(version: 20160929212810) do
ActiveRecord::Schema.define(version: 2016_09_29_212810) do

# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
Expand Down
2 changes: 1 addition & 1 deletion spec/models/curiosity_scraper_spec.rb
Expand Up @@ -11,7 +11,7 @@

describe ".main_page" do
it "should return a Nokogiri page" do
expect(scraper.main_page.title).to eq "Raw Images - NASA Mars"
expect(scraper.main_page.title).to eq "Raw Images | Multimedia – NASA’s Mars Exploration Program "
end
end

Expand Down

0 comments on commit 7afd1a9

Please sign in to comment.