Replication material for a paper on copyright control, coauthored with Pierre Gueydier.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
afsp2013
data
google-trends
plots
tables
README.md
functions.r
make.r

README.md

Code related to a working paper that was first presented at the AFSP Annual Meeting in Paris, 2013. See Section 1 of this paper and its appendix, or read the HOWTO below for a technical summary.

DATA

The scraper currently collects slightly over 6,300 articles from

  • ecrans.fr (including articles from liberation.fr)
  • lemonde.fr (first lines only for paid content)
  • lesechos.fr (left-censored to December 2011)
  • lefigaro.fr (first lines only for paid content)
  • numerama.com (including old articles from ratiatium.com)
  • zdnet.fr

HOWTO

The entry point is make.r:

  • get_articles will scrape the news sources (adjust page counters to current website search results to update the data)
  • get_corpus will extract all entities and list the most common ones (set minimum frequency with threshold; defaults to 10)
  • get_ranking will export the top 15 central nodes of the co-occurrence network to the tables folder, in Markdown format
  • get_network returns the co-occurrence network, optionally trimmed to its top quantile of weighted edges (set with threshold; defaults to 0)

Tables

  • corpus.terms.csv – a list of all entities, ordered by their raw counts
  • corpus.freqs.csv – a list of entities found in each article
  • corpus.edges.csv – a list of undirected weighted network ties

Notes

  • The weighting scheme is inversely proportional to the number of entity pairs in each article.
  • The weighted degree formula is by Tore Opsahl and uses an alpha parameter of 1.