# Classify news by political spectrum and show trends in newspapers and journalists by political spectrum
Classify news within the political spectrum. (How to reach this classification? News favors more libertarian, authoritarian, free-market or closed-market policies)

Show newspapers, within the political spectrum, what their trend is based on the number of news articles and their textual sentiment (positive, neutral or negative).

Show the same information by journalist.

## How to reach this classification?
- Get the topic of the news.
- Try to get who is the main character of the news. 
- If the main character is a politician, try to get the political party of the politician.
- If the sentiment of the news is positive, neutral or negative, try to get the political spectrum of the sentiment.


<img src="./images/political_spectrum.png" alt="political_spectrum" width="400"/>
<img src="./images/PT_Political_Compass.png" alt="PT_Political_Compass" width="550"/>


## Similar Works:
- [politiquices](https://github.com/politiquices) - [2021 arquivo awards](https://sobre.arquivo.pt/pt/conheca-os-vencedores-do-premio-arquivo-pt-2021/), 2nd place 

## Data Sources:
- [arquivo.pt](https://arquivo.pt/)
  - [Público](https://www.publico.pt/) 
  - [Jornal de Notícias](https://www.jn.pt/) 
  - [Diário de Notícias](https://www.dn.pt/)
  - [Expresso](https://expresso.pt/)
  - [Observador](https://observador.pt/)
  - [Sapo](https://www.sapo.pt/)
  - [RTP](https://www.rtp.pt/)
  - [TVI](https://tvi.iol.pt/)
  - [Correio da Manhã](https://www.cmjornal.pt/)
  - [Jornal i](https://ionline.sapo.pt/)
  - [Sol](https://sol.sapo.pt/)
  - [Jornal Económico](https://jornaleconomico.sapo.pt/)
  - [Notícias ao Minuto](https://www.noticiasaominuto.com/)
  - [SIC Notícias](https://sicnoticias.pt/)
  - [Renascença](https://rr.sapo.pt/)
  - [Jornal de Negócios](https://www.jornaldenegocios.pt/)
  - [Visão](https://visao.sapo.pt/) 
  - [Sábado](https://www.sabado.pt/) 
- [wikidata.org](https://www.wikidata.org/)
- [dados.gov.pt](https://dados.gov.pt/)
- [parlamento.pt](https://www.parlamento.pt/Cidadania/Paginas/DadosAbertos.aspx)


In [2]:
import json
from pprint import pprint

with open('./Legislaturas/X.json') as json_file:
    legislature_json = json.load(json_file)

legislature  = legislature_json['Legislatura']

l_init_date = legislature['DetalheLegislatura']['dtini'] # 2005-03-10
l_end_date = legislature['DetalheLegislatura']['dtfim'] # 2009-10-14

deputies = legislature['Deputados']['pt_ar_wsgode_objectos_DadosDeputadoSearch']

pprint(deputies[50])

{'depCPDes': 'EUROPA',
 'depCPId': '21',
 'depCadId': '1458',
 'depGP': {'pt_ar_wsgode_objectos_DadosSituacaoGP': {'gpDtFim': '2009-10-14',
                                                     'gpDtInicio': '2005-03-10',
                                                     'gpId': '237',
                                                     'gpSigla': 'PSD'}},
 'depId': '2516',
 'depNomeCompleto': 'CARLOS ALBERTO SILVA GONÇALVES',
 'depNomeParlamentar': 'CARLOS ALBERTO GONÇALVES',
 'depSituacao': {'pt_ar_wsgode_objectos_DadosSituacaoDeputado': [{'sioDes': 'Suspenso(Eleito)',
                                                                  'sioDtFim': '2005-03-12',
                                                                  'sioDtInicio': '2005-03-10'},
                                                                 {'sioDes': 'Efetivo',
                                                                  'sioDtFim': '2009-10-15',
                                                    

In [9]:
import requests
import pandas as pd

init_date = l_init_date.replace("-", "")
end_date = l_end_date.replace("-", "")
maxItems = 1
domains = ["publico.pt", "www.publico.pt", "jornal.publico.pt", "dossiers.publico.pt", "desporto.publico.pt", "www.publico.clix.pt", "digital.publico.pt", "blogues.publico.pt", "economia.publico.pt", "m.publico.pt", "ultimahora.publico.pt", "observador.pt", "www.dn.pt", "dn.sapo.pt", "www.dn.sapo.pt", "expresso.pt", "aeiou.expresso.pt", "expresso.sapo.pt", "www.correiomanha.pt", "www.correiodamanha.pt", "www.cmjornal.xl.pt", "www.cmjornal.pt", "www.jn.pt", "jn.pt", "jn.sapo.pt", "abola.pt", "www.abola.pt", "abola.pt:80", "www.sabado.pt", "www.sabado.pt:80", "www.sabado.xl.pt", "www.sabado.xl.pt:80", "sabado.pt", "visaoonline.clix.pt:80", "visao.clix.pt:80", "aeiou.visao.pt", "visao.sapo.pt"]

n_news_per_deputy = {}

# search for news for each deputy in the years of the legislature
for dep in deputies[:20]:
    dep_id = dep['depId']
    dep_name = dep['depNomeParlamentar']
    query = f"{dep_name}"

    payload = {
        "q": query,
        "maxItems": maxItems,
        "siteSearch": ",".join(domains),
        "from": init_date,
        "to": end_date,
    }

    r = requests.get('https://arquivo.pt/textsearch', params=payload)

    json_res = r.json()
    n_news_per_deputy[dep_name] = json_res['estimated_nr_results']

df = pd.DataFrame(n_news_per_deputy.items(), columns=['Deputy', 'N_news'])



Unnamed: 0,Deputy,N_news
0,ABEL BAPTISTA,13
1,ABÍLIO DIAS FERNANDES,54
2,ADÃO SILVA,9542
3,AFONSO CANDAL,8
4,AGOSTINHO BRANQUINHO,13
5,AGOSTINHO GONÇALVES,2453
6,AGOSTINHO LOPES,2948
7,ALBERTO ANTUNES,2472
8,ALBERTO ARONS DE CARVALHO,9547
9,ALBERTO COSTA,10452


In [11]:
df.sort_values(by='N_news', ascending=False, inplace=True)
print(df)

                          Deputy  N_news
9                  ALBERTO COSTA   10452
10               ALBERTO MARTINS   10405
16                ÁLVARO SARAIVA    9574
8      ALBERTO ARONS DE CARVALHO    9547
2                     ADÃO SILVA    9542
6                AGOSTINHO LOPES    2948
7                ALBERTO ANTUNES    2472
5            AGOSTINHO GONÇALVES    2453
17  ANA CATARINA MENDONÇA MENDES     570
19                     ANA DRAGO      82
1          ABÍLIO DIAS FERNANDES      54
18                     ANA COUTO      41
12                   ALDA MACEDO      23
4           AGOSTINHO BRANQUINHO      13
0                  ABEL BAPTISTA      13
3                  AFONSO CANDAL       8
15        ÁLVARO CASTELLO-BRANCO       7
14                  ALTINO BESSA       4
11                 ALCÍDIA LOPES       0
13                ALDEMIRA PINHO       0
