You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initial tutorial, with Mechanize, the Markdown and the HTML
Updated tutorial, with Capybara, the Markdown, and the HTML
Notes from second time teaching (Capybara)
Maybe rename the topic to just "Automating the web", Capybara is useful for far more than just scraping.
Pry fucked up for 3 or 4 students, IDK why, but their console input and output just became invisible. Spent an hour or two trying to debug it after the class, documented my failure here.
Try the Denver post example again, if it works fine, we can keep it, but I hadn't found the config options to turn off the js errors at that point, so that example wound up crashing Capybara for a lot of people.
I had to change up the lesson, switch it over to use Amazon instead of isbnsearch.org, whose database suddenly seems empty -.-
On Amazon, there's an intermediate page of results, and they have to figure out how to click the link. This tripped a lot of them up, but I think it's good to have that in there. Maybe add something like this to the "lets do it together" portion. I got around it with browser.click_on browser.find('#resultsCol h3 a').text IDK if there's a better way.
If Pry and Capybara hadn't kept fucking up, doing everything in pry would have been a good exercise, but with those failures, it was probably much harder. I'm hoping that turning off the js errors is good enough, everyone's problems reduced dramatically after that.
Now that we have Phantom, we can expand the exercise. This was maybe the fourth one I tried, and it was chosen b/c it didn't require js. But other things might be more interesting, IDK.
I spent the last 15 minutes doing it for them in class. Didn't finish, but this is what I came up with:
# Setup poltergeistrequire'capybara/poltergeist'Capybara.register_driver:poltergeistdo |app|
Capybara::Poltergeist::Driver.new(app,js_errors: false)endCapybara.default_driver=:poltergeistbrowser=Capybara.current_session# go to amazonisbns=%w[1405232501082172388X076422222805904742350672320835043953943903754344610752859978075286022427459454750425032337074459040X18603932251405232501082172388X076429076205904742350672320835382667242903754344610752859978038079392X27459454750425032337070118436118603932250758238614015204921538266724291921656573074720387307011843610764222228043953943901520492150752860224074459040X07472038730764290762038079392X19216565730758238614]isbns.eachdo |isbn|
# TODO: Skip if I already pulled thisbrowser.visit'http://amazon.com'browser.fill_in'field-keywords',with: isbnbrowser.click_button'Go'browser.click_onbrowser.find('#resultsCol h3 a').textlis=browser.all('#detail-bullets h3 li')texts=lis.map{ |li| li.text}data=texts.map{ |text| text.split(":")}.each_with_object({}){ |(key,value),attributes| attributes[key.downcase]=value.strip}File.open('isbn-results','a'){ |f| f.putsJSON.dump(data)}end
Switch from open-uri to RestClient
not common to use open-uri in prod, plus it monkey-patches Kernel#open...
though, to be fair, Kernel#open isn't something you'd use in maintainable code,
it really is intended for scripts
My plan going in this last time
1st Hour
Learning Objectives
Understand the internet
if we do, we could create such a tool
Scraping with Nokogiri
for the clone wars project
Increase familiarity with pry and CSS selectors
Imagine
What could you do with such a tool?
Goals (we'll do all of these together except the last one)
GET a webpage with Nokogiri
Find all the page's methods that deal with links
Look at its links
Select the links we want, click them
Look at its forms
Select the forms we want, click them
Use this information to take the list of ISBNs and extract the book data
Given that you know this
2nd Hour
Work on the project
I'll be around if anyone has
I ran out of time and did not get to go extensively through the Mechanize portion, otherwise they wouldn't have gotten to play with it themselves.
I also had them think about what they could use such a tool to do, hoping to spike their imaginations so they would have some context or hypothetical goals in mind when we went through it. IDK if this was valuable or not.
Rachel's feedback (mostly applies to my teaching style)
Rachel's feedback
Good:
* giving students time to catch up
* "cold-calling" student to explain what's happening
* having students think about the thing, then doing it
Bad:
* cohesiveness between example and result (pressing return too quickly after typing)
- throw a semicolon on the end so I can try it out
- if I'm going to go off exploring, tell them we're exploring, not following
* confusion between why we have a text file and a pry session
- start in pry, then as we show that something does what we think,
copy/paste it into the editor, so it feels like we're exploring and learning
and then aggregating our findings into a file that we can then reference
and use later on.
The text was updated successfully, but these errors were encountered:
Resources
Notes from second time teaching (Capybara)
browser.click_on browser.find('#resultsCol h3 a').text
IDK if there's a better way.Notes from first time teaching (Mechanize)
What I would change next time:
Which will open up many more possibilities.
I approximately figured out how to do it, here: https://gist.github.com/JoshCheek/1ef1c6fbe7ff7ee28de4#file-using_capybara_with_poltergeist_to_get_the_data-rb
but haven't updated the material yet.
not common to use open-uri in prod, plus it monkey-patches Kernel#open...
though, to be fair, Kernel#open isn't something you'd use in maintainable code,
it really is intended for scripts
My plan going in this last time
I ran out of time and did not get to go extensively through the Mechanize portion, otherwise they wouldn't have gotten to play with it themselves.
I also had them think about what they could use such a tool to do, hoping to spike their imaginations so they would have some context or hypothetical goals in mind when we went through it. IDK if this was valuable or not.
Rachel's feedback (mostly applies to my teaching style)
The text was updated successfully, but these errors were encountered: