Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
app
bin
config
db
for_import
lib
log
public
scrapy
test
tmp
vendor
.gitignore
Gemfile
Gemfile.lock
Procfile
README.md
Rakefile
blog_draft.md
config.ru
ideas.md
package.json
simplenote.md
todo.md

README.md

README

How to scrape

basic development flow is:

marginalrevolution.com scraped by scrapy python script, outputs json

that json is input for rake db:seed, which creates incomplete Book objects

rake amazon_scrape hits the Amazon API, outputs json with more book info

rake seed_amazon uses both existing Books and json file to modify Book records

update spiderX.py with latest month/year in marginalrevolution.com url $ python --version Python 2.7.14 $ sudo apt-get install python-pip $ pip install scrapy cd scrapy (within repo for marginalrevolutionbooks) $ /home/dennis/.local/bin/scrapy runspider spider5.py -o 2017-12-03-mr.json

edit filename in db/seeds.rb $ be rake db:seed this creates incomplete Book objects in db question: difference between db:seed and rake seed_amazon to run amazon scrape rake tasks you need to set aws_secret_access_key and aws_access_key_id vacuum gem docs

now the output file is ready to be imported into postgres, on development or production $ be rake seed_amazon

deployment of new books

heroku run rake db:seed

in between running these two the site is broken

heroku run rake seed_amazon

make secrets.txt look like

export AWS_ACCESS_KEY_ID=key export AWS_SECRET_ACCESS_KEY=secret

then source secrets.txt before running rake tasks $ source secrets.txt #now can run rake tasks $ be rake amazon_scrape

Sorted Action Items

  • Add Most Posted

asins = Book.group(:asin).count.sort_by(&:last).last(10).reverse.to_h.keys b=Book.where(asin: asins.first)

  • Figure out why some books aren't parsing https://twitter.com/nyarlathotepesq/status/871479869946957830
  • automate scrape + seeding -> fix trailing comma on rake amazon_scrape
  • write rails cron job to run scrapy, rake db:seed, rake amazon_scrape, rake seed_amazon
  • add Previous Year Next Year buttons
  • experiment w grid of book cover images instead of text-heavy rows, Book#show
  • Make site beautiful on mobile
  • Make site beautiful on desktop
  • send out email (1st and 15th)
  • write blog post analyzing data set, link to mrbooks
  • write blog post post-mortem on traffic surge, tech stack, etc.
  • submit to Product Hunt
  • submit to Hacker News
  • Make a "Tyler's Favorite's" section, Book.where(post_title LIKE '%best of%'), my favorites, something like that

Set up postgres on debian for rails

https://wiki.debian.org/PostgreSql $ psql -U postgres postgres=# CREATE USER mrbooks WITH PASSWORD 'mrbooks'; postgres=# ALTER USER mrbooks CREATEDB;

Docs

scrapy