Skip to content

alvarosc99/Newspapers-comparison

Repository files navigation

Newspapers-comparison

About the project

The goal of this project was to perform a comparison between two Spanish journals with an opposite ideological line, Público and La Razón, through Natural Language Processing and Text Mining techniques, in order to discern the main differences in terms of language usage. Our primary hypothesis was based on the notion that the left-right division clearly influences the way articles are written, as well as the political context: therefore, we expected La Razón to display a more critic and more negative language than Público, as the last years have been marked by a left-wing government.

Regarding the results, by analyzing word frequencies and bigrams, performing Sentiment Analysis and TF-IDF, we were able to conclude that the proportion of articles with an overall negative lexicon is almost 20% higher on news from La Razón and that the pandemic was a very important point from where the conservative publication made emphasis when critizing the government's legislative and executive measures. Additionally, we found that La Razón tends to mention leftist political leaders more often and that Público sets the scope regarding economic articles in the global and transnational level, while the former does it on a local basis (hostelry and small businesses, mainly).

Further approaches could use more journals to answer the hypothesis and using other set of techniques, such as correlation analysis, Topic Modelling or Supervised Machine Learning to classify news as critic or not.

Used data and requirements

All data used was collected from public sources, through a dataset offered by Muñiz Peña (2021) in Kaggle. This dataset contains a total of 58424 articles from La Razón (31477) and Público (26948). Due to the large file size, it was not included on the repository, so it's necessary to download it and place it correctly in the Project's folder in order for the code to run properly.

About

Final project for Text Mining, part of the UC3M's Computational Social Science Master

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors