GitHub - SocialScienceDataLab/Intro-to-web-scraping-with-R

Code and data for SSDL session "Three easy-to-learn tools to scrape data from the Web with R", June 15 2016

This repository contains code and data to implement basic scraping routines with R. In particular, the code shows how to

use regular expression to extract data from raw text (or websites)
use XPath for static webpage scraping
tap APIs from within R
scrape data from dynamic webpages (i.e. JavaScript-generated content) using AJAX and Selenium

Obviously, these are four not three tools. However, regular expressions are never easy to learn, so the title is still valid.

Technical setup

The scripts were tested on a Mac with R version 3.30 running. To be able to run the code, follow these instructions:

make sure that the newest version of R (currently 3.3.0; available here) is installed on your computer
install the newest stable version of RStudio (available here)

install a set of packages using this bunch of R code:

  pkgs <- c('RCurl', 'XML', 'stringr', 'jsonlite', 'httr',
  rvest', 'pdftools', 'devtools', 'RSelenium', 'plyr',
  'dpylr', wikipediatrend', 'twitteR', 'streamR', 'd3Network')
  install.packages(pkgs)

make sure Firefox is installed on your machine (available here)
install Java from here)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ajpsReviewers		ajpsReviewers
ieaSelenium		ieaSelenium
twitterApis		twitterApis
wikipediaPolsci		wikipediaPolsci
wikipediaTrend		wikipediaTrend
README.md		README.md
ajpsReviewers.r		ajpsReviewers.r
ieaSelenium.r		ieaSelenium.r
slidesSSDL2016.pdf		slidesSSDL2016.pdf
twitterApis.r		twitterApis.r
wikipediaPolsci.r		wikipediaPolsci.r
wikipediaTrend.R		wikipediaTrend.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code and data for SSDL session "Three easy-to-learn tools to scrape data from the Web with R", June 15 2016

Technical setup

About

Releases

Packages

Languages

SocialScienceDataLab/Intro-to-web-scraping-with-R

Folders and files

Latest commit

History

Repository files navigation

Code and data for SSDL session "Three easy-to-learn tools to scrape data from the Web with R", June 15 2016

Technical setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages