Skip to content
This repository has been archived by the owner. It is now read-only.
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Code related to a working paper that was first presented at the AFSP Annual Meeting in Paris, 2013. See Section 1 of this paper and its appendix, or read the HOWTO below for a technical summary.

DATA

The scraper currently collects slightly over 6,300 articles from

  • ecrans.fr (including articles from liberation.fr)
  • lemonde.fr (first lines only for paid content)
  • lesechos.fr (left-censored to December 2011)
  • lefigaro.fr (first lines only for paid content)
  • numerama.com (including old articles from ratiatium.com)
  • zdnet.fr

HOWTO

The entry point is make.r:

  • get_articles will scrape the news sources (adjust page counters to current website search results to update the data)
  • get_corpus will extract all entities and list the most common ones (set minimum frequency with threshold; defaults to 10)
  • get_ranking will export the top 15 central nodes of the co-occurrence network to the tables folder, in Markdown format
  • get_network returns the co-occurrence network, optionally trimmed to its top quantile of weighted edges (set with threshold; defaults to 0)

Tables

  • corpus.terms.csv – a list of all entities, ordered by their raw counts
  • corpus.freqs.csv – a list of entities found in each article
  • corpus.edges.csv – a list of undirected weighted network ties

Notes

  • The weighting scheme is inversely proportional to the number of entity pairs in each article.
  • The weighted degree formula is by Tore Opsahl and uses an alpha parameter of 1.

About

Replication material for a paper on copyright control, coauthored with Pierre Gueydier.

Topics

Resources

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.