# Classify news by political spectrum and show trends in newspapers and journalists by political spectrum
Classify news within the political spectrum. (How to reach this classification? News favors more libertarian, authoritarian, free-market or closed-market policies)

Show newspapers, within the political spectrum, what their trend is based on the number of news articles and their textual sentiment (positive, neutral or negative).

Show the same information by journalist.

## How to reach this classification?
- Get the topic of the news.
- Try to get who is the main character of the news.
- If the main character is a politician, try to get the political party of the politician.
- If the sentiment of the news is positive, neutral or negative, try to get the political spectrum of the sentiment.

<img src="./images/political_spectrum.png" alt="political_spectrum" width="400"/>
<img src="./images/PT_Political_Compass.png" alt="PT_Political_Compass" width="550"/>

## Similar Works:
- [politiquices](https://github.com/politiquices) - [2021 arquivo awards](https://sobre.arquivo.pt/pt/conheca-os-vencedores-do-premio-arquivo-pt-2021/), 2nd place

## Data Sources:
- [arquivo.pt](https://arquivo.pt/)
  - [Público](https://www.publico.pt/)
  - [Jornal de Notícias](https://www.jn.pt/)
  - [Diário de Notícias](https://www.dn.pt/)
  - [Expresso](https://expresso.pt/)
  - [Observador](https://observador.pt/)
  - [Sapo](https://www.sapo.pt/)
  - [RTP](https://www.rtp.pt/)
  - [TVI](https://tvi.iol.pt/)
  - [Correio da Manhã](https://www.cmjornal.pt/)
  - [Jornal i](https://ionline.sapo.pt/)
  - [Sol](https://sol.sapo.pt/)
  - [Jornal Económico](https://jornaleconomico.sapo.pt/)
  - [Notícias ao Minuto](https://www.noticiasaominuto.com/)
  - [SIC Notícias](https://sicnoticias.pt/)
  - [Renascença](https://rr.sapo.pt/)
  - [Jornal de Negócios](https://www.jornaldenegocios.pt/)
  - [Visão](https://visao.sapo.pt/)
  - [Sábado](https://www.sabado.pt/)
- [wikidata.org](https://www.wikidata.org/)
- [dados.gov.pt](https://dados.gov.pt/)
- [parlamento.pt](https://www.parlamento.pt/Cidadania/Paginas/DadosAbertos.aspx)

In [1]:
import pandas as pd
import requests
import json
from pprint import pprint
with open("./Legislaturas/X.json") as json_file:
    legislature_json = json.load(json_file)

legislature = legislature_json["Legislatura"]

l_init_date = legislature["DetalheLegislatura"]["dtini"]  # 2005-03-10
l_end_date = legislature["DetalheLegislatura"]["dtfim"]  # 2009-10-14

deputies = legislature["Deputados"]["pt_ar_wsgode_objectos_DadosDeputadoSearch"]
parties = legislature["GruposParlamentares"]["pt_gov_ar_objectos_GPOut"]

pprint(len(deputies))

352


In [2]:
init_date = l_init_date.replace("-", "")
end_date = l_end_date.replace("-", "")
maxItems = 100
domains = [
    "publico.pt",
    "www.publico.pt",
    "jornal.publico.pt",
    "dossiers.publico.pt",
    "desporto.publico.pt",
    "www.publico.clix.pt",
    "digital.publico.pt",
    "blogues.publico.pt",
    "economia.publico.pt",
    "m.publico.pt",
    "ultimahora.publico.pt",
    "observador.pt",
    "www.dn.pt",
    "dn.sapo.pt",
    "www.dn.sapo.pt",
    "expresso.pt",
    "aeiou.expresso.pt",
    "expresso.sapo.pt",
    "www.correiomanha.pt",
    "www.correiodamanha.pt",
    "www.cmjornal.xl.pt",
    "www.cmjornal.pt",
    "www.jn.pt",
    "jn.pt",
    "jn.sapo.pt",
    "abola.pt",
    "www.abola.pt",
    "abola.pt:80",
    "www.sabado.pt",
    "www.sabado.pt:80",
    "www.sabado.xl.pt",
    "www.sabado.xl.pt:80",
    "sabado.pt",
    "visaoonline.clix.pt:80",
    "visao.clix.pt:80",
    "aeiou.visao.pt",
    "visao.sao.pt",
]

news_per_deputy = {}
total_dep = len(deputies)
depts = []

# search for news for each deputy in the years of the legislature
for index, dep in enumerate(deputies):
    dep_id = dep["depId"]
    dep_name = dep["depNomeParlamentar"]
    deputy = {
        "id": dep_id,
        "name": dep_name
    }
    depts.append(deputy)
    query = f"{dep_name}"

    print(f"{index + 1}/{total_dep} - {dep_name}")
    print(f"Searching news for {dep_name}...")

    payload = {
        "q": query,
        "maxItems": maxItems,
        "siteSearch": ",".join(domains),
        "from": init_date,
        "to": end_date,
    }

    r = requests.get("https://arquivo.pt/textsearch", params=payload)

    json_res = r.json()
    items = json_res["response_items"]

    news_per_deputy[dep_name] = {
        "estimated_nr_results": json_res["estimated_nr_results"],
        "items": items,
    }

    print(f"Found {json_res['estimated_nr_results']} news for {dep_name}.\n")

df = pd.DataFrame(
    [
        (dep, news_per_deputy[dep]["estimated_nr_results"])
        for dep in news_per_deputy
    ],
    columns=["Deputy", "N_news"],
)
df.sort_values(by="N_news", ascending=False, inplace=True)

1/352 - ABEL BAPTISTA
Searching news for ABEL BAPTISTA...
Found 13 news for ABEL BAPTISTA.

2/352 - ABÍLIO DIAS FERNANDES
Searching news for ABÍLIO DIAS FERNANDES...
Found 54 news for ABÍLIO DIAS FERNANDES.

3/352 - ADÃO SILVA
Searching news for ADÃO SILVA...
Found 9542 news for ADÃO SILVA.

4/352 - AFONSO CANDAL
Searching news for AFONSO CANDAL...
Found 8 news for AFONSO CANDAL.

5/352 - AGOSTINHO BRANQUINHO
Searching news for AGOSTINHO BRANQUINHO...
Found 13 news for AGOSTINHO BRANQUINHO.

6/352 - AGOSTINHO GONÇALVES
Searching news for AGOSTINHO GONÇALVES...
Found 2453 news for AGOSTINHO GONÇALVES.

7/352 - AGOSTINHO LOPES
Searching news for AGOSTINHO LOPES...
Found 2948 news for AGOSTINHO LOPES.

8/352 - ALBERTO ANTUNES
Searching news for ALBERTO ANTUNES...
Found 2472 news for ALBERTO ANTUNES.

9/352 - ALBERTO ARONS DE CARVALHO
Searching news for ALBERTO ARONS DE CARVALHO...
Found 9547 news for ALBERTO ARONS DE CARVALHO.

10/352 - ALBERTO COSTA
Searching news for ALBERTO COSTA...
Fo

Found 421 news for EUGÉNIO ROSA.

82/352 - FÁTIMA PIMENTA
Searching news for FÁTIMA PIMENTA...
Found 9617 news for FÁTIMA PIMENTA.

83/352 - FELICIANO BARREIRAS DUARTE
Searching news for FELICIANO BARREIRAS DUARTE...
Found 9541 news for FELICIANO BARREIRAS DUARTE.

84/352 - FERNANDA ASSEICEIRA
Searching news for FERNANDA ASSEICEIRA...
Found 2 news for FERNANDA ASSEICEIRA.

85/352 - FERNANDO ANTUNES
Searching news for FERNANDO ANTUNES...
Found 2506 news for FERNANDO ANTUNES.

86/352 - FERNANDO CABRAL
Searching news for FERNANDO CABRAL...
Found 9609 news for FERNANDO CABRAL.

87/352 - FERNANDO GOMES
Searching news for FERNANDO GOMES...
Found 11194 news for FERNANDO GOMES.

88/352 - FERNANDO JESUS
Searching news for FERNANDO JESUS...
Found 9591 news for FERNANDO JESUS.

89/352 - FERNANDO MONIZ
Searching news for FERNANDO MONIZ...
Found 9668 news for FERNANDO MONIZ.

90/352 - FERNANDO NEGRÃO
Searching news for FERNANDO NEGRÃO...
Found 404 news for FERNANDO NEGRÃO.

91/352 - FERNANDO PRATAS

Found 12040 news for JORGE PEREIRA.

163/352 - JORGE SEGURO SANCHES
Searching news for JORGE SEGURO SANCHES...
Found 133 news for JORGE SEGURO SANCHES.

164/352 - JORGE STRECHT
Searching news for JORGE STRECHT...
Found 1108 news for JORGE STRECHT.

165/352 - JORGE TADEU MORGADO
Searching news for JORGE TADEU MORGADO...
Found 0 news for JORGE TADEU MORGADO.

166/352 - JORGE VARANDA
Searching news for JORGE VARANDA...
Found 1141 news for JORGE VARANDA.

167/352 - JOSÉ ALBERTO FATEIXA
Searching news for JOSÉ ALBERTO FATEIXA...
Found 0 news for JOSÉ ALBERTO FATEIXA.

168/352 - JOSÉ ALBERTO LOURENÇO
Searching news for JOSÉ ALBERTO LOURENÇO...
Found 9635 news for JOSÉ ALBERTO LOURENÇO.

169/352 - JOSÉ AMARAL LOPES
Searching news for JOSÉ AMARAL LOPES...
Found 11340 news for JOSÉ AMARAL LOPES.

170/352 - JOSÉ APOLINÁRIO
Searching news for JOSÉ APOLINÁRIO...
Found 256 news for JOSÉ APOLINÁRIO.

171/352 - JOSÉ AUGUSTO DE CARVALHO
Searching news for JOSÉ AUGUSTO DE CARVALHO...
Found 10319 news f

Found 0 news for MARIA IDALINA TRINDADE.

240/352 - MARIA IRENE SILVA
Searching news for MARIA IRENE SILVA...
Found 37 news for MARIA IRENE SILVA.

241/352 - MARIA JOÃO FONSECA
Searching news for MARIA JOÃO FONSECA...
Found 338 news for MARIA JOÃO FONSECA.

242/352 - MARIA JOSÉ GAMBOA
Searching news for MARIA JOSÉ GAMBOA...
Found 12 news for MARIA JOSÉ GAMBOA.

243/352 - MARIA JÚLIA CARÉ
Searching news for MARIA JÚLIA CARÉ...
Found 0 news for MARIA JÚLIA CARÉ.

244/352 - MARIA MANUEL OLIVEIRA
Searching news for MARIA MANUEL OLIVEIRA...
Found 11703 news for MARIA MANUEL OLIVEIRA.

245/352 - MARIA OFÉLIA MOLEIRO
Searching news for MARIA OFÉLIA MOLEIRO...
Found 7 news for MARIA OFÉLIA MOLEIRO.

246/352 - MARIANA AIVECA
Searching news for MARIANA AIVECA...
Found 2 news for MARIANA AIVECA.

247/352 - MÁRIO ALBUQUERQUE
Searching news for MÁRIO ALBUQUERQUE...
Found 96 news for MÁRIO ALBUQUERQUE.

248/352 - MÁRIO FONTEMANHA
Searching news for MÁRIO FONTEMANHA...
Found 0 news for MÁRIO FONTEMAN

Found 962 news for RITA MIGUEL.

320/352 - RITA NEVES
Searching news for RITA NEVES...
Found 225 news for RITA NEVES.

321/352 - ROSA MARIA ALBERNAZ
Searching news for ROSA MARIA ALBERNAZ...
Found 0 news for ROSA MARIA ALBERNAZ.

322/352 - ROSALINA MARTINS
Searching news for ROSALINA MARTINS...
Found 5 news for ROSALINA MARTINS.

323/352 - ROSÁRIO ÁGUAS
Searching news for ROSÁRIO ÁGUAS...
Found 47 news for ROSÁRIO ÁGUAS.

324/352 - RUI CUNHA
Searching news for RUI CUNHA...
Found 10359 news for RUI CUNHA.

325/352 - RUI GOMES DA SILVA
Searching news for RUI GOMES DA SILVA...
Found 12028 news for RUI GOMES DA SILVA.

326/352 - RUI MORAIS
Searching news for RUI MORAIS...
Found 9866 news for RUI MORAIS.

327/352 - RUI VIEIRA
Searching news for RUI VIEIRA...
Found 9812 news for RUI VIEIRA.

328/352 - SÉRGIO LIPARI
Searching news for SÉRGIO LIPARI...
Found 0 news for SÉRGIO LIPARI.

329/352 - SÉRGIO VIEIRA
Searching news for SÉRGIO VIEIRA...
Found 11230 news for SÉRGIO VIEIRA.

330/352 - SÓN

ConnectionError: HTTPSConnectionPool(host='arquivo.pt', port=443): Max retries exceeded with url: /textsearch?q=VITALINO+CANAS&maxItems=100&siteSearch=publico.pt%2Cwww.publico.pt%2Cjornal.publico.pt%2Cdossiers.publico.pt%2Cdesporto.publico.pt%2Cwww.publico.clix.pt%2Cdigital.publico.pt%2Cblogues.publico.pt%2Ceconomia.publico.pt%2Cm.publico.pt%2Cultimahora.publico.pt%2Cobservador.pt%2Cwww.dn.pt%2Cdn.sapo.pt%2Cwww.dn.sapo.pt%2Cexpresso.pt%2Caeiou.expresso.pt%2Cexpresso.sapo.pt%2Cwww.correiomanha.pt%2Cwww.correiodamanha.pt%2Cwww.cmjornal.xl.pt%2Cwww.cmjornal.pt%2Cwww.jn.pt%2Cjn.pt%2Cjn.sapo.pt%2Cabola.pt%2Cwww.abola.pt%2Cabola.pt%3A80%2Cwww.sabado.pt%2Cwww.sabado.pt%3A80%2Cwww.sabado.xl.pt%2Cwww.sabado.xl.pt%3A80%2Csabado.pt%2Cvisaoonline.clix.pt%3A80%2Cvisao.clix.pt%3A80%2Caeiou.visao.pt%2Cvisao.sao.pt&from=20050310&to=20091014 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7cc8acd8c320>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

In [None]:
# write news_per_deputy to csv
with open("data.json", "w") as f:
    for dep in news_per_deputy:
        json.dump({
            "deputy": dep,
            "estimated_nr_results": news_per_deputy[dep]["estimated_nr_results"],
            "items": news_per_deputy[dep]["items"],
        }, f)
        f.write("\n")

# write df to csv
df.to_csv("news_per_deputy.csv", index=False)

In [None]:
def search(query, maxItems, _from=init_date, _to=end_date):
    payload = {
        "q": query,
        "maxItems": maxItems,
        "siteSearch": ",".join(domains),
        "from": _from,
        "to": _to,
    }

    return requests.get("https://arquivo.pt/textsearch", params=payload)


publico = ["publico.pt", "www.publico.pt", "jornal.publico.pt", "dossiers.publico.pt", "desporto.publico.pt",
           "www.publico.clix.pt", "digital.publico.pt", "blogues.publico.pt", "economia.publico.pt", "m.publico.pt", "ultimahora.publico.pt"]
observador = ["observador.pt"]
dn = ["www.dn.pt", "dn.sapo.pt", "www.dn.sapo.pt"]
expresso = ["expresso.pt", "aeiou.expresso.pt", "expresso.sapo.pt"]
cm = ["www.correiomanha.pt", "www.correiodamanha.pt", "www.cmjornal.xl.pt", "www.cmjornal.pt"]
jn = ["www.jn.pt", "jn.pt", "jn.sapo.pt"]
abola = ["abola.pt", "www.abola.pt", "abola.pt:80"]
visao = ["aeiou.visao.pt", "visao.sapo.pt"]
sabado = ["www.sabado.pt", "www.sabado.xl.pt", "www.sabado.xl.pt:80", "sabado.pt"]

terms = []
for dep in deputies:
    terms.append(dep.name)

for party in parties:
    terms.append(party["sigla"])
    terms.append(party["nome"])

print(terms)

item_df = pd.DataFrame()
for term in terms:
    item = search(term, 2000)  # use the max per dep