Skip to content

arizonee/twitter_scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Twitter webscraper for specific pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various Twitter pages.

The program uses Selenium (and ChromeDriver) to automate user behaviour within a browser session to load a specific Twitter page (no login) and load data from dynamic scrolling. Once the pages are rendered the HTML is extracted and sieved through BeautifulSoup. Note: it will continue scraping until 1) end of feed is reached, 2) manual interrupt by killing the connection.

This program will extract the following and output to a CSV file with punctuation and other non-text characters removed:

  • full tweet text from each Twitter page
  • date
  • header
  • url
  • user name
  • popularity metrics (string containing retweets/favourites)
  • like_fave: integer value for number of times 'favorited'
  • share_rtwt: integer value for number of times 'retweeted'

Twitter

Selenium Browser Automation

About

Web Scraper for Twitter pages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%