Skip to content
This repository has been archived by the owner on Jun 24, 2019. It is now read-only.

briatte/afsp2013

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Code related to a working paper that was first presented at the AFSP Annual Meeting in Paris, 2013. See Section 1 of this paper and its appendix, or read the HOWTO below for a technical summary.

DATA

The scraper currently collects slightly over 6,300 articles from

  • ecrans.fr (including articles from liberation.fr)
  • lemonde.fr (first lines only for paid content)
  • lesechos.fr (left-censored to December 2011)
  • lefigaro.fr (first lines only for paid content)
  • numerama.com (including old articles from ratiatium.com)
  • zdnet.fr

HOWTO

The entry point is make.r:

  • get_articles will scrape the news sources (adjust page counters to current website search results to update the data)
  • get_corpus will extract all entities and list the most common ones (set minimum frequency with threshold; defaults to 10)
  • get_ranking will export the top 15 central nodes of the co-occurrence network to the tables folder, in Markdown format
  • get_network returns the co-occurrence network, optionally trimmed to its top quantile of weighted edges (set with threshold; defaults to 0)

Tables

  • corpus.terms.csv – a list of all entities, ordered by their raw counts
  • corpus.freqs.csv – a list of entities found in each article
  • corpus.edges.csv – a list of undirected weighted network ties

Notes

  • The weighting scheme is inversely proportional to the number of entity pairs in each article.
  • The weighted degree formula is by Tore Opsahl and uses an alpha parameter of 1.

About

Replication material for a paper on copyright control, coauthored with Pierre Gueydier.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages