# Classify news by political spectrum and show trends in newspapers and journalists by political spectrum
Classify news within the political spectrum. (How to reach this classification? News favors more libertarian, authoritarian, free-market or closed-market policies)

Show newspapers, within the political spectrum, what their trend is based on the number of news articles and their textual sentiment (positive, neutral or negative).

Show the same information by journalist.

## How to reach this classification?
- Get the topic of the news.
- Try to get who is the main character of the news. 
- If the main character is a politician, try to get the political party of the politician.
- If the sentiment of the news is positive, neutral or negative, try to get the political spectrum of the sentiment.


<img src="./images/political_spectrum.png" alt="political_spectrum" width="400"/>
<img src="./images/PT_Political_Compass.png" alt="PT_Political_Compass" width="550"/>


## Similar Works:
- [politiquices](https://github.com/politiquices) - [2021 arquivo awards](https://sobre.arquivo.pt/pt/conheca-os-vencedores-do-premio-arquivo-pt-2021/), 2nd place 

## Data Sources:
- [arquivo.pt](https://arquivo.pt/)
  - [Público](https://www.publico.pt/) 
  - [Jornal de Notícias](https://www.jn.pt/) 
  - [Diário de Notícias](https://www.dn.pt/)
  - [Expresso](https://expresso.pt/)
  - [Observador](https://observador.pt/)
  - [Sapo](https://www.sapo.pt/)
  - [RTP](https://www.rtp.pt/)
  - [TVI](https://tvi.iol.pt/)
  - [Correio da Manhã](https://www.cmjornal.pt/)
  - [Jornal i](https://ionline.sapo.pt/)
  - [Sol](https://sol.sapo.pt/)
  - [Jornal Económico](https://jornaleconomico.sapo.pt/)
  - [Notícias ao Minuto](https://www.noticiasaominuto.com/)
  - [SIC Notícias](https://sicnoticias.pt/)
  - [Renascença](https://rr.sapo.pt/)
  - [Jornal de Negócios](https://www.jornaldenegocios.pt/)
  - [Visão](https://visao.sapo.pt/) 
  - [Sábado](https://www.sabado.pt/) 
- [wikidata.org](https://www.wikidata.org/)
- [dados.gov.pt](https://dados.gov.pt/)
- [parlamento.pt](https://www.parlamento.pt/Cidadania/Paginas/DadosAbertos.aspx)


In [14]:
import json
from pprint import pprint

with open("./Legislaturas/X.json") as json_file:
    legislature_json = json.load(json_file)

legislature = legislature_json["Legislatura"]

l_init_date = legislature["DetalheLegislatura"]["dtini"]  # 2005-03-10
l_end_date = legislature["DetalheLegislatura"]["dtfim"]  # 2009-10-14

deputies = legislature["Deputados"]["pt_ar_wsgode_objectos_DadosDeputadoSearch"]

pprint(len(deputies))

352


In [15]:
import requests
import pandas as pd

init_date = l_init_date.replace("-", "")
end_date = l_end_date.replace("-", "")
maxItems = 100
domains = [
    "publico.pt",
    "www.publico.pt",
    "jornal.publico.pt",
    "dossiers.publico.pt",
    "desporto.publico.pt",
    "www.publico.clix.pt",
    "digital.publico.pt",
    "blogues.publico.pt",
    "economia.publico.pt",
    "m.publico.pt",
    "ultimahora.publico.pt",
    "observador.pt",
    "www.dn.pt",
    "dn.sapo.pt",
    "www.dn.sapo.pt",
    "expresso.pt",
    "aeiou.expresso.pt",
    "expresso.sapo.pt",
    "www.correiomanha.pt",
    "www.correiodamanha.pt",
    "www.cmjornal.xl.pt",
    "www.cmjornal.pt",
    "www.jn.pt",
    "jn.pt",
    "jn.sapo.pt",
    "abola.pt",
    "www.abola.pt",
    "abola.pt:80",
    "www.sabado.pt",
    "www.sabado.pt:80",
    "www.sabado.xl.pt",
    "www.sabado.xl.pt:80",
    "sabado.pt",
    "visaoonline.clix.pt:80",
    "visao.clix.pt:80",
    "aeiou.visao.pt",
    "visao.sapo.pt",
]

news_per_deputy = {}
total_dep = len(deputies)

# search for news for each deputy in the years of the legislature
for index, dep in enumerate(deputies):
    dep_id = dep["depId"]
    dep_name = dep["depNomeParlamentar"]
    query = f"{dep_name}"

    print(f"{index+1}/{total_dep} - {dep_name}")
    print(f"Searching news for {dep_name}...")

    payload = {
        "q": query,
        "maxItems": maxItems,
        "siteSearch": ",".join(domains),
        "from": init_date,
        "to": end_date,
    }

    r = requests.get("https://arquivo.pt/textsearch", params=payload)

    json_res = r.json()
    items = json_res["response_items"]

    news_per_deputy[dep_name] = {
        "estimated_nr_results": json_res["estimated_nr_results"],
        "items": items,
    }

    print(f"Found {json_res['estimated_nr_results']} news for {dep_name}.\n")

df = pd.DataFrame(
    [
        (dep, news_per_deputy[dep]["estimated_nr_results"])
        for dep in news_per_deputy
    ],
    columns=["Deputy", "N_news"],
)
df.sort_values(by="N_news", ascending=False, inplace=True)

1/352 - ABEL BAPTISTA
Searching news for ABEL BAPTISTA...
Found 13 news for ABEL BAPTISTA.

2/352 - ABÍLIO DIAS FERNANDES
Searching news for ABÍLIO DIAS FERNANDES...
Found 54 news for ABÍLIO DIAS FERNANDES.

3/352 - ADÃO SILVA
Searching news for ADÃO SILVA...
Found 9542 news for ADÃO SILVA.

4/352 - AFONSO CANDAL
Searching news for AFONSO CANDAL...
Found 8 news for AFONSO CANDAL.

5/352 - AGOSTINHO BRANQUINHO
Searching news for AGOSTINHO BRANQUINHO...
Found 13 news for AGOSTINHO BRANQUINHO.

6/352 - AGOSTINHO GONÇALVES
Searching news for AGOSTINHO GONÇALVES...
Found 2453 news for AGOSTINHO GONÇALVES.

7/352 - AGOSTINHO LOPES
Searching news for AGOSTINHO LOPES...
Found 2948 news for AGOSTINHO LOPES.

8/352 - ALBERTO ANTUNES
Searching news for ALBERTO ANTUNES...
Found 2472 news for ALBERTO ANTUNES.

9/352 - ALBERTO ARONS DE CARVALHO
Searching news for ALBERTO ARONS DE CARVALHO...
Found 9547 news for ALBERTO ARONS DE CARVALHO.

10/352 - ALBERTO COSTA
Searching news for ALBERTO COSTA...
Fo

In [16]:
df

Unnamed: 0,Deputy,N_news
153,JORGE ALMEIDA,29345
141,JOÃO PORTUGAL,18239
297,PAULO PORTAS,15654
221,MANUEL JOSÉ RODRIGUES,15592
258,MIGUEL ALMEIDA,15111
...,...,...
310,RENATO LEAL,0
77,ESMERALDA SALERO RAMIRES,0
71,DELMAR PALAS,0
299,PEDRO FARMHOUSE,0


In [18]:
# write news_per_deputy to csv
with open("data.json", "w") as f:
    for dep in news_per_deputy:
        json.dump({
            "deputy": dep,
            "estimated_nr_results": news_per_deputy[dep]["estimated_nr_results"],
            "items": news_per_deputy[dep]["items"],
        }, f)
        f.write("\n")

# write df to csv
df.to_csv("news_per_deputy.csv", index=False)
