Skip to content
This repository was archived by the owner on Jun 24, 2019. It is now read-only.
/ afsp2013 Public archive

Replication material for a paper on copyright control, coauthored with Pierre Gueydier.

Notifications You must be signed in to change notification settings

briatte/afsp2013

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1f48668 · Sep 4, 2015

History

33 Commits
Jun 25, 2014
Jun 26, 2014
Jun 24, 2014
Jun 26, 2014
Jun 26, 2014
Jun 26, 2014
Sep 4, 2015
Jun 26, 2014

Repository files navigation

Code related to a working paper that was first presented at the AFSP Annual Meeting in Paris, 2013. See Section 1 of this paper and its appendix, or read the HOWTO below for a technical summary.

DATA

The scraper currently collects slightly over 6,300 articles from

  • ecrans.fr (including articles from liberation.fr)
  • lemonde.fr (first lines only for paid content)
  • lesechos.fr (left-censored to December 2011)
  • lefigaro.fr (first lines only for paid content)
  • numerama.com (including old articles from ratiatium.com)
  • zdnet.fr

HOWTO

The entry point is make.r:

  • get_articles will scrape the news sources (adjust page counters to current website search results to update the data)
  • get_corpus will extract all entities and list the most common ones (set minimum frequency with threshold; defaults to 10)
  • get_ranking will export the top 15 central nodes of the co-occurrence network to the tables folder, in Markdown format
  • get_network returns the co-occurrence network, optionally trimmed to its top quantile of weighted edges (set with threshold; defaults to 0)

Tables

  • corpus.terms.csv – a list of all entities, ordered by their raw counts
  • corpus.freqs.csv – a list of entities found in each article
  • corpus.edges.csv – a list of undirected weighted network ties

Notes

  • The weighting scheme is inversely proportional to the number of entity pairs in each article.
  • The weighted degree formula is by Tore Opsahl and uses an alpha parameter of 1.

About

Replication material for a paper on copyright control, coauthored with Pierre Gueydier.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages