## Focus sur NRA and LA, des médias plus conservateurs

### Récupération des liens GDELT
On ne filtre pas par thème environnement dans un premier temps.

In [1]:
from urllib.request import urlopen
import pandas as pd
from gdeltdoc import GdeltDoc, Filters

In [2]:
partial_articles_dfs = []

for domain in ['nra.lv', 'la.lv']:
    for year in [2022, 2023, 2024]: 
        f = Filters(
            #keyword = "climate change",
            start_date = f"{year}-01-01",
            end_date = f"{year}-12-31", 
            country = "LG",
            domain=domain
        )

        gd = GdeltDoc()

        # Search for articles matching the filters
        partial_articles_df = gd.article_search(f)
        print(f"{len(partial_articles_df)} articles found for domain {domain}, in {year}")
        if partial_articles_df.empty: 
            continue
        partial_articles_dfs.append(partial_articles_df)

articles_df = pd.concat(partial_articles_dfs)

250 articles found for domain nra.lv, in 2022
250 articles found for domain nra.lv, in 2023
250 articles found for domain nra.lv, in 2024
0 articles found for domain la.lv, in 2022
250 articles found for domain la.lv, in 2023
250 articles found for domain la.lv, in 2024


In [103]:
articles_df.head()

Unnamed: 0,url,url_mobile,title,seendate,socialimage,domain,language,sourcecountry
0,https://nra.lv/izklaide/muzika/393706-summer-s...,,Summer Sound jau izziņo pirmos nākamā gada ...,20221007T174500Z,https://zinas.nra.lv/_mm/photos/2022-10/860px/...,nra.lv,Latvian,Latvia
1,https://nra.lv/pasaule/393758-austrija-apstrid...,,Austrija apstrīd Eiropas Savienības lēmumu kla...,20221007T174500Z,https://zinas.nra.lv/_mm/photos/2022-10/860px/...,nra.lv,Latvian,Latvia
2,https://nra.lv/latvija/393747-skolenus-aicina-...,,Skolēnus aicina sagatavot labāko Valsts prezid...,20221007T174500Z,https://zinas.nra.lv/_mm/photos/2022-10/860px/...,nra.lv,Latvian,Latvia
3,https://nra.lv/vakara-zinas/393713-vakara-zina...,,VAKARA ZIŅAS . Šipkēvics atzīst - jūtas noguris,20221007T174500Z,https://zinas.nra.lv/_mm/photos/2022-10/860px/...,nra.lv,Latvian,Latvia
4,https://nra.lv/latvija/393742-gulbe-depozita-s...,,Gulbe : Depozīta sistēma liek tirgotājiem celt...,20221007T174500Z,https://zinas.nra.lv/_mm/photos/2022-10/860px/...,nra.lv,Latvian,Latvia


### Scraping des articles
Les titres retournés par GDELT sont tronqués donc nous devons scraper avant de pouvoir les analyser.

In [47]:
import os
import sys
path_root = os.path.abspath('../../climateguard')
sys.path.append(path_root)
from climateguard.news_scrapper import NewsScraper, scrape_single_article

In [49]:
import multiprocessing
from functools import partial


scraper = NewsScraper()
scrape_func = partial(scrape_single_article, scraper)
num_cores = multiprocessing.cpu_count()

with multiprocessing.Pool(processes=4) as pool:
    # Map the scraping function to the URLs in parallel
    results = pool.map(scrape_func, articles_df.url)

Scraped: No piektdienas būs pieejams biļetes uz “Depeche Mode” koncertu Tallinā
Content length: 2276
Date: 
---
Scraped: RECENZIJA: Robbie Williams “XXV”
Content length: 3084
Date: 
---
Scraped: Ukrainas kara 223. diena. Jaunākā informācija [papildināts 21:27]
Content length: 8340
Date: 
---
Scraped: "Summer Sound" jau izziņo pirmos nākamā gada festivāla mūziķus
Content length: 1379
Date: 
---
Scraped: Putins parakstījis dekrētu par Zaporižjas AES nodošanu Krievijas valsts īpašumā
Content length: 643
Date: 
---
Scraped: "The Times": Putins uz Ukrainu sūta militāro kodolvilcienu
Content length: 1730
Date: Scraped: Austrija apstrīd Eiropas Savienības lēmumu klasificēt kodolenerģiju kā ilgtspējīgu
---

Content length: 1444Scraped: Rekordists: Tukumā aizturēts vīrietis, pret kuru ierosināts jau 39.kriminālprocess

Content length: 561Date: 
---
Date: 

---
Scraped: Ukrainas kara 224. diena. Jaunākā informācija [papildināts 20:56]
Content length: 7514
Date: 
---
Scraped: Tiks pastiprinātas m

In [50]:
# Separate the results into scraped articles and failed URLs
scraped_articles = [article for article, _ in results if article]
failed_urls = [url for _, url in results if url]

In [71]:
scraped_df = pd.DataFrame([a.model_dump() for a in scraped_articles])

In [202]:
len(scraped_df)

843

In [106]:
scraped_df.to_csv('../../data/scraped_articles/nra_la_notheme.csv')

### Translate title and flag as potentially mentioning climate

In [111]:
from openai import AsyncOpenAI
client = OpenAI()

In [79]:
from pydantic import BaseModel, Field

class FirstFilterResult(BaseModel):
    title_en: str = Field(description="English translation of the title")
    p_interesting: float = Field(description="Probability inferred from the title that the article mentions topics related to climate change, the environment, resource use, etc")

In [113]:
async def flag_article(title: str) -> FirstFilterResult:
    prompt = f"""

    Tu es expert en désinformation sur les sujets environnementaux, expert en science climatique et sachant tout sur le GIEC.
    Je vais te donner un titre d'article issu d'un média conservateur letton.
    A partir de ce titre, tu vas devoir inférer si cet article a le potentiel de contenir de la désinformation sur les sujets liés au climat et à l'environnement.
    Ce travail constitue un premier filtre avant d'analyser plus en détail les articles d'intérêt.
    Ne te limite pas aux titres mentionnant explicitement ces sujets : par exemple, un article sur l'économie, sur l'industrie, sur la technologie peut tout aussi bien contenir
    des allégations fausses sur ce sujet (technosolutionisme, relativisme climatique, etc).
    A l'inverse, il est peu probable qu'un article sur du sport, sur le développement personnel ou de presse people parle de ces sujets.
    La sortie demandée est la traduction anglaise du titre ainsi que la probabilité que l'article contienne du contenu d'intérêt pour notre tâche.

    <titre>
    {title}
    </titre>
        """

    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        response_format=FirstFilterResult,
    )
    return completion.choices[0].message.parsed

In [116]:
import asyncio # replace with tqdm.asyncio.tqdm.gather
tasks = [flag_article(title) for title in scraped_df.title]
results = await asyncio.gather(*tasks)

In [118]:
new_df = pd.DataFrame([r.model_dump() for r in results])

In [129]:
scraped_df = pd.concat([scraped_df, new_df], axis=1)

In [150]:
scraped_df.to_csv('../../data/scraped_articles/nra_la_notheme_processed.csv')

In [200]:
for i in range(len(scraped_df)):
    r = scraped_df.iloc[i]
    print(f"{r['p_interesting']} \t {r['title_en']}")

0.05 	 "Summer Sound" announces the first musicians for next year's festival
0.75 	 Austria challenges the European Union's decision to classify nuclear energy as sustainable
0.1 	 Students are invited to prepare the best speech of the President of the Republic to the nation on November 18
0.05 	 EVENING NEWS. Šipkēvics admits - feels tired
0.4 	 Gulbe: Deposit system forces retailers to raise food prices
0.1 	 Ukraine war 226th day. Latest information [updated at 20:54]
0.1 	 "It is a myth that cancer treatment is covered by the state and is free for the patient," cancer patients raise alarm about expensive treatment
0.1 	 Lithuania raises the minimum wage to 840 euros
0.1 	 Biden pardons those convicted of marijuana possession
0.05 	 The supporters of Rosļikovs, Šlesers, and Rinkēvičs received the highest percentage of votes from specific election lists
0.1 	 226th Day of Ukraine War. Latest Information [updated 20:54]
0.2 	 China plans to appoint a compliant religious leader in Tibe

### Detect claims in flagged articles

In [152]:
df_to_analyze = scraped_df[scraped_df.p_interesting > 0.4]

In [169]:
from tqdm.asyncio import tqdm
from climateguard.models import Article
from climateguard.detect_claims import adetect_claims

In [165]:
articles = [Article(**r) for r in df_to_analyze.to_dict(orient='records')]

In [170]:
tasks = [adetect_claims(art, "Latvian") for art in articles]
claim_results = await tqdm.gather(*tasks)

  0%|          | 0/37 [00:00<?, ?it/s]

100%|██████████| 37/37 [00:10<00:00,  3.49it/s]


In [173]:
claim_results[0]

(Claims(article_needs_fact_checking=True, claims=[Claim(quote='Austrijas enerģētikas ministre Leonore Geveslere ir solījusi, ka Austrija apstrīdēs ES lēmumu atzīt kodolenerģiju par zaļu, norādot, kā šī enerģija ir novecojusi un pārāk dārga, turklāt pastāv nopietnas bažas par drošību un neskaidrība, ko iesākt ar kodolatkritumiem.', claim='Nuclear energy is outdated and too expensive, and there are serious safety concerns and uncertainties about what to do with nuclear waste.', context="The Austrian energy minister is expressing strong opposition to the EU's classification of nuclear energy as sustainable, citing concerns about its age, cost, safety, and waste management.", analysis="While concerns about the safety and waste management of nuclear energy are valid and widely discussed, the claim that nuclear energy is 'outdated and too expensive' is misleading. Many studies indicate that nuclear energy can be a cost-effective and low-carbon energy source, especially when considering its r

In [190]:
to_check = [(t[0].claims, i) for i, t in enumerate(claim_results) if t[0].article_needs_fact_checking]

In [191]:
for claims, i in to_check:
    print('Article index:', i)
    for claim in claims:
        print('Claim')
        print(claim.claim)
        print('Context')
        print(claim.context)
        print('Analysis')
        print(claim.analysis)
        print()
    print('-'*20)

Article index: 0
Claim
Nuclear energy is outdated and too expensive, and there are serious safety concerns and uncertainties about what to do with nuclear waste.
Context
The Austrian energy minister is expressing strong opposition to the EU's classification of nuclear energy as sustainable, citing concerns about its age, cost, safety, and waste management.
Analysis
While concerns about the safety and waste management of nuclear energy are valid and widely discussed, the claim that nuclear energy is 'outdated and too expensive' is misleading. Many studies indicate that nuclear energy can be a cost-effective and low-carbon energy source, especially when considering its role in reducing greenhouse gas emissions. Additionally, advancements in nuclear technology, such as small modular reactors and improved waste management solutions, challenge the notion that nuclear energy is outdated. Therefore, this claim may misrepresent the current state and potential of nuclear energy in the context o

In [203]:
df_to_analyze.iloc[23].url

'https://nra.lv/pasaule/462925-danija-ka-pirma-valsts-pasaule-ar-nodokli-apliks-majlopu-gazes.htm'

In [None]:
df_to_analyze.iloc[33].url

'https://www.la.lv/klimata-parmainas-ir-reala-un-globala-problema'