Skip to content

UTMediaCAT/post-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Post-processor

Handling post processing for results from domain or twitter crawlers

  • the post-processor scripts are under processor/
  • the old back-end scripts are in archived/
  • scripts/ are handy automation scripts to prepare data for postprocessing:
    • cleaner removes unnecessary headers and duplicate records from twitter crawler output csv files
    • metascraper can be used to populate the title_metascraper, author_metascraper, date, html_content, and article_text JSON fields produced by the domain crawler
    • url_expander lengthens short urls in tweets