Skip to content

July 28, 2022

alejandropaz edited this page Jul 28, 2022 · 2 revisions

Agenda

  • flagging issue -- any insights?
  • documentation on the new order of postprocessor input with conversion to pandas before conversion to DASK
    • next week consult with Nat about this
  • on Jul 19: resume the WaPo/Foxnews twitter
  • visualization
  • Alejandro will send scope for Israeli and Palestinian news domains
  • Twitter embedding issue - this week
  • to discuss next meeting:
    • how to cut a release
    • writing a paper about MediaCAT and architecture

Flagging

  • the URL expander seems to have set off alarm bells, question is if there's something we could do different

Crawls

  • WaPo/Foxnews
  • postprocessing NYT archive politics

Backburner

  • Apify pre-navigation: probably need a blacklist for each domain, but could look into it in the future
  • using crawler proxies
  • adding to regular postprocessor output:
    1. any non-scope domain hyperlink that ends in .co.il
    2. any link to a tweet or twitter handle
    • This is a bit outside our normal functionality, so I will put it on the backburner for now.
  • what to do with htz.li
  • finding language function
  • image_reference function
Clone this wiki locally