News Classifier

This project aims to classify news articles obtained from artigo.pt API into political spectrums.

The project is divided into three main parts: the data collection, data visualization and the model training.

Data Collection (collect.ipynb)

The data collection is done using the artigo.pt API and parlamento.pt json file with the information about the X Legislature of Portugal. The API is used to obtain news articles from different portuguese newspapers querying by the name of deputies and political parties.

The data retrieved from arquivo.pt is only the urls of the articles, so we need to scrape the articles to get the text. The scraping is done using the newspaper3k library.

After that, the data is saved in a csv file with the following columns: term,url,text,title.

Exploratory Data Analysis (EDA.ipynb)

The EDA is done using the data collected in the previous step. We plot some statistics about the data, like the distribution of the length of the articles and we filter the articles by political category.

We also plot the most common words in the articles as the bigrams and trigrams. It is also plotted the wordcloud of the most common words.

AI Analysis (model.ipynb)

The AI analysis is done by sentiment analysis of the articles. To do that, first we need to translate the articles to english using the googletrans library in the (translate.py)[translate.py] file.

After all the text is translated, we use Vader from the nltk library to classify the sentiment of the articles. The sentiment is classified in two categories: positive and negative. Then we search for the politicians names referenced in the articles and coorelate the sentiment of the article with the politician. With this information we can visualize the sentiment of the newspapers towards the politicians and theirs parties.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
Legislaturas		Legislaturas
images		images
EDA.ipynb		EDA.ipynb
README.md		README.md
articles_length_dist.png		articles_length_dist.png
articles_sentence_dist.png		articles_sentence_dist.png
articles_words_dist.png		articles_words_dist.png
collect.ipynb		collect.ipynb
model.ipynb		model.ipynb
news_per_deputy.csv		news_per_deputy.csv
number-of-results.png		number-of-results.png
results-publico-translated.csv		results-publico-translated.csv
results-publico.csv		results-publico.csv
results.csv		results.csv
searches.jsonl		searches.jsonl
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Classifier

Data Collection (collect.ipynb)

Exploratory Data Analysis (EDA.ipynb)

AI Analysis (model.ipynb)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News Classifier

Data Collection (collect.ipynb)

Exploratory Data Analysis (EDA.ipynb)

AI Analysis (model.ipynb)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages