Apple News Scraper
This repository provides code and data used in the following paper:
Bandy, Jack and Nicholas Diakopoulos. "Auditing News Curation Systems: A Case Study Examining Algorithmic and Editorial Logic in Apple News." To Appear in Proceedings of the Fourteenth International AAAI Conference on Web and Social Media (ICWSM 2020).
Installation and Setup Instructions
Download appium-desktop: https://github.com/appium/appium-desktop/releases/latest
(You can try the brew/npm installation - https://appium.io - but those releases have been buggier in my experience)
After cloning this repository onto your computer,
instruments -s devicesin your terminal
- Choose a device, something like
iPhone XS Max (12.1) [XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX] (Simulator)
get_stories.pyand replace the first few lines with your device information. Afterwards, it may look something like:
# user-defined variables device_name_and_os = 'iPhone XS Max (12.1)' device_os = '12.1' udid = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'
- Change the output folder, where you want to save the data:
# easy relative path, keep data in repository output_folder = 'data_output/'
# put data in a folder on the desktop output_folder = '~/apple_news_data/'
Execution should be as easy as
To run repeatedly, I recommend cron. Just make sure you use absolute paths. For example, to run collection every five minutes, add something like this to your crontab:
*/5 * * * * /usr/local/bin/python /Users/jack/dev/apple-news-scraper/get_stories.py
If you're in a hurry, you can also just hack out a shell script:
while true do python get_stories.py & sleep 300 done