Skip to content
Big Data Project 2015 - A study of linguistic drift on Le Temps Newspaper Corpus
Java TeX TypeScript Scala Shell JavaScript Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
TFIDF
filter
metric/src/ch/epfl/bigdata
ngram
ngramTopic
ngramcorrection
ocr-synonyms
report
stats
webapp
.gitignore
README.md
coding_guide
gruntfile.js

README.md

A-study-of-linguistic-drift-on-Le-Temps-Newspaper-Corpus

EPFL - Big Data Project 2015 - A study of linguistic drift on Le Temps Newspaper Corpus

Project Description :

We have access to the archives of Le Temps newspaper, the archives cover approximately 200 years of newspaper (from 1816 to 1998). By using those archives, the goal of this project is to do some researches to quantify or represent in some way the linguistic drift across the years. Indeed, the language evolves and changes, some words appear while others disappear and we want to scientifically interpret this fact.

Project goals :

The first main goal of the project is to find a way to use the datas we have and to find a good distance metric which allows us to quantify and represent the drift between years and its evolution.

The second goal of this project would be to apply machine learning techniques on some part of the corpus (training set) and then, given a text, find which year it belongs to approximately (with a certain precision threshold to respect of course).

Team members :

  • Cynthia Oeschger (Team leader)
  • Farah Bouassida
  • Tao Lin
  • Jéremy Weber
  • Nicolas Bornand
  • Marc Schär
  • Gil Brechbühler
  • Malik Bougacha
You can’t perform that action at this time.