<a href="https://colab.research.google.com/github/ahmetcankaratas/sentiment-analysis-turkish-news/blob/main/sentiment_analysis_turkish_news.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Requirements

In [9]:
!pip install feedparser pandas sacremoses  > /dev/null 2>&1

import feedparser
import pandas as pd

from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#Step 1: Online News Data Collection

In [10]:
# Define keywords related to Turkey
turkey_keywords = [
    "Turkey", "Türkiye", "Ankara", "Istanbul", "Erdogan", "TUR", "Ottoman", "Cappadocia", "Bosphorus", "Galata",
    "Turkish", "Istanbul", "TURK", "Ataturk", "Marmara","Turkish lira","Galatasaray", "Turkish doner", "Turkey National Team"
    "Erdogan administration", "Turkish politics", "Turkish economy", "Turkish military", "Baykar"
]

def fetch_rss_articles(feed_url):
    """
    Fetches articles from a given RSS feed and filters those related to Turkey.
    Parameters:
        feed_url (str): The URL of the RSS feed.
        max_articles (int): Maximum number of articles to fetch.
    Returns:
        pd.DataFrame: A DataFrame containing title and summary of filtered articles related to Turkey (no links).
    """
    # Parse the RSS feed
    feed = feedparser.parse(feed_url)
    articles = []

    # Extract article details and filter for Turkey-related articles
    for entry in feed.entries:
        article = {
            'title': entry.title,
            'summary': entry.summary
        }

        # Check if the title or summary contains any Turkey-related keywords
        if any(keyword.lower() in article['title'].lower() or keyword.lower() in article['summary'].lower() for keyword in turkey_keywords):
            articles.append(article)

    # Return filtered articles as a DataFrame
    return pd.DataFrame(articles)

# Step 2: Define a list of RSS feed URLs
rss_feeds = [
"http://feeds.bbci.co.uk/news/world/rss.xml", #BBC News
"http://rss.cnn.com/rss/edition_world.rss", #CNN
"http://feeds.reuters.com/reuters/worldnews", #Reuters
"https://www.aljazeera.com/xml/rss/all.xml", #Al Jazeera English
"https://rss.nytimes.com/services/xml/rss/nyt/World.xml", #The New York Times
"https://www.theguardian.com/world/rss", #The Guardian
"http://feeds.washingtonpost.com/rss/world", #Washington Post
"https://www.npr.org/rss/rss.php?id=1004", #NPR
"https://www.bloomberg.com/feed/podcast/bloomberg", #Bloomberg
"http://www.ft.com/rss/home/us", #Financial Times
"https://abcnews.go.com/abcnews/internationalheadlines", #ABC News
"https://www.cbsnews.com/latest/rss/world", #CBS News
"http://feeds.nbcnews.com/nbcnews/public/world", #NBC News
"https://www.yahoo.com/news/rss/world", #Yahoo News
"https://apnews.com/apf-intlnews", #Associated Press
"http://feeds.feedburner.com/euronews/en/home/", #Euronews
"https://rss.dw.com/rdf/rss-en-all", #Deutsche Welle
"https://www.france24.com/en/rss", #France 24
"https://www.independent.co.uk/news/world/rss", #The Independent
"https://feeds2.feedburner.com/time/world", #TIME
"https://www.newsweek.com/rss", #Newsweek
"https://www.economist.com/sections/international/rss.xml", #The Economist
"https://www.voanews.com/rss", #VOA News
"https://feeds.skynews.com/feeds/rss/world.xml", #Sky News
"https://www.theglobeandmail.com/world/?service=rss", #The Globe and Mail
"https://www.thestar.com/content/thestar/feed.RSSManagerServlet.articles.world.xml", #Toronto Star
"https://www.smh.com.au/rss/world.xml", #The Sydney Morning Herald
"https://www.abc.net.au/news/feed/51120/rss.xml", #ABC Australia
#"https://www.cbc.ca/cmlink/rss-world", #CBC News
"https://www.hindustantimes.com/rss/world/rssfeed.xml", #Hindustan Times
"https://timesofindia.indiatimes.com/rssfeeds/296589292.cms", #Times of India
"https://www.thehindu.com/news/international/feeder/default.rss", #The Hindu
"https://www.scmp.com/rss/5/feed", #South China Morning Post
"https://www.japantimes.co.jp/rss/world.xml", #Japan Times
"http://www.koreaherald.com/rss/international.xml", #Korea Herald
"https://gulfnews.com/rss", #Gulf News
"https://www.khaleejtimes.com/rss", #Khaleej Times
"https://www.arabnews.com/cat/6/rss.xml", #Arab News
"https://feeds.elpais.com/mrss-s/pages/ep/site/elpais.com/portada", #El País
"https://www.lemonde.fr/rss/une.xml", #Le Monde
"http://www.spiegel.de/international/index.rss", #Der Spiegel
"https://xml.corriereobjects.it/rss/mondo.xml", #Corriere della Sera
"https://www.repubblica.it/rss/esteri/rss2.0.xml", #La Repubblica
"https://e00-elmundo.uecdn.es/elmundo/rss/portada.xml", #El Mundo
"https://www.rt.com/rss/news/", #RT (Russia Today)
"https://tass.com/rss/v2.xml", #TASS
"https://www.aa.com.tr/en/rss/default?cat=world", #Anadolu Agency
"http://www.xinhuanet.com/english/rss/worldrss.xml", #Xinhua News
"https://www.globaltimes.cn/rss/world.xml", #Global Times
"https://www.chinadaily.com.cn/rss/world_rss.xml", #China Daily
"https://www.nation.co.ke/news/world", #The Nation
"https://www.smh.com.au/rss/world.xml", #The Sydney Morning Herald
"https://www.dailystar.co.uk/news/weird-news/rss.xml", #The Daily Star
"https://www.washingtonpost.com/rss/world/", #The Washington Post
"https://www.cnbc.com/world/?view=rss", #CNBC World
"https://www.reuters.com/rssFeed/worldNews", #Reuters World News
"https://www.businessinsider.com/rss/world", #Business Insider
"https://www.bbc.com/news/world/rss.xml", #BBC News
"https://www.ft.com/rss/world", #Financial Times
"https://www.abc.net.au/news/feed/51120/rss.xml", #ABC Australia
"https://www.huffpost.com/section/world-news/feed", #HuffPost World
"https://www.bbc.com/news/world/rss.xml", #BBC News
"https://www.theguardian.com/world/rss", #The Guardian
"https://www.cnn.com/world/rss", #CNN World
"https://www.economist.com/sections/international/rss.xml", #The Economist
"https://www.aljazeera.com/xml/rss/all.xml", #Al Jazeera English
"https://www.nytimes.com/section/world/rss", #New York Times
"https://www.skynews.com.au/rss/world", #Sky News
"https://www.foxnews.com/world/rss", #Fox News
"https://www.npr.org/rss/rss.php?id=1004", #NPR World
"https://www.scientificamerican.com/rss/world/", #Scientific American World
"https://www.wired.com/category/world/feed/", #Wired World
"https://www.cbsnews.com/rss/world/", #CBS News
"https://www.straitstimes.com/singapore/world", #The Straits Times
"https://www.bbc.co.uk/news/world/rss.xml", #BBC News
"https://www.deccanherald.com/rss/world.xml", #Deccan Herald World
"https://www.dawn.com/rss/world", #Dawn News
"https://www.spectator.co.uk/rss/world.xml", #The Spectator
"https://www.heraldsun.com.au/rss/world", #Herald Sun
"https://www.theage.com.au/rss/world", #The Age
"https://www.newstimes.com/rss/world.xml", #News Times
"https://www.telegraph.co.uk/rss/world", #The Telegraph
"https://www.dailytimes.com.pk/rss/world", #Daily Times
"https://www.ft.com/world/rss", #Financial Times World
"https://www.thehindubusinessline.com/rss/world", #Business Line World
"https://www.independent.co.uk/news/world/rss", #The Independent World
"https://www.thedailybeast.com/rss/world", #The Daily Beast
"https://www.tribuneindia.com/rss/world", #Tribune India
"https://www.theblaze.com/stories/rss/world", #The Blaze World
"https://www.sundaytimes.lk/rss/world", #Sunday Times
"https://www.timeslive.co.za/rss/world", #Times Live
"https://www.thejakartapost.com/rss/world", #The Jakarta Post
"https://www.trtworld.com/rss", #TRT World
"https://www.thedrum.com/rss/world", #The Drum World
"https://www.irishnews.com/rss/world", #Irish News World
"https://www.mirror.co.uk/rss/world", #The Mirror World
"https://www.news.com.au/world", #News.com.au World
"https://www.bostonglobe.com/metro/region/world/?service=rss", #The Boston Globe World
"https://www.sfgate.com/world/rss", #SFGate World
"https://www.washingtontimes.com/rss/world/", #Washington Times World
"https://www.fresnobee.com/news/world/world-news/article111695292.html", #Fresno Bee World
"https://www.nytimes.com/rss", #New York Times
"https://www.smh.com.au/rss/world.xml", #Sydney Morning Herald
"https://www.chicagotribune.com/world/rss", #Chicago Tribune
"https://www.latimes.com/world/rss", #Los Angeles Times World
"https://www.courier-journal.com/rss/world", #Courier Journal World
"https://www.independent.co.uk/news/world/rss", #Independent World
"https://www.usatoday.com/rss/world/", #USA Today World
"https://www.spectator.co.uk/rss/world.xml", #Spectator World
"https://www.theguardian.com/world/rss" #The Guardian World
]

# Step 3: Initialize an empty DataFrame to store all news articles
all_news_data = pd.DataFrame(columns=['title', 'summary'])

# Step 4: Collect data from each RSS feed and append it to the DataFrame
for feed_url in rss_feeds:
    print(f"Fetching data from: {feed_url}")
    feed_data = fetch_rss_articles(feed_url)  # Fetch up to 50 articles per feed
    all_news_data = pd.concat([all_news_data, feed_data], ignore_index=True)  # Append to the main DataFrame

# Step 5: Save the collected data to a .pkl (pickle) file on Google Drive
file_path = '/content/drive/MyDrive/Projects/Deeplearning/news.pkl'  # Specify the file path
all_news_data.to_pickle(file_path)  # Save the DataFrame to .pkl file

# Step 6: Display the filtered data
print(f"\nTotal articles related to Turkey collected: {len(all_news_data)}")
print("\nPreview of the collected data:")
print(all_news_data.to_string(index=False))  # Display data without the index

Fetching data from: https://rss.nytimes.com/services/xml/rss/nyt/World.xml
Fetching data from: https://www.reutersagency.com/feed/?best-regions=asia&post_type=best
Fetching data from: https://www.aljazeera.com/xml/rss/all.xml
Fetching data from: http://feeds.bbci.co.uk/news/world/rss.xml
Fetching data from: https://rss.cnn.com/rss/cnn_world.rss
Fetching data from: https://www.theguardian.com/world/rss
Fetching data from: https://www.france24.com/en/rss
Fetching data from: https://www.japantimes.co.jp/feed/topstories
Fetching data from: https://www.koreatimes.co.kr/www/rss/nation.xml
Fetching data from: https://www.smh.com.au/rss/world.xml
Fetching data from: https://www.thelocal.fr/rss
Fetching data from: https://www.themoscowtimes.com/rss
Fetching data from: https://www.thenationalnews.com/uae/rss
Fetching data from: https://www.thenewslens.com/rss
Fetching data from: https://www.thepaper.cn/rss_detail.jsp?channelID=25904&byids=
Fetching data from: https://www.thestandard.com.hk/newsf

#Step 2: Neural Machine Translation


In [5]:
import pandas as pd
from transformers import MarianMTModel, MarianTokenizer

from google.colab import drive
drive.mount('/content/drive')


# Step 1: Load the collected data from the .pkl file
file_path = '/content/drive/MyDrive/Projects/Deeplearning/Data/news.pkl'  # Path to the .pkl file

# Read the data from the pickle file
all_news_data = pd.read_pickle(file_path)

# Step 2: Load the translation model and tokenizer for English to Turkish translation
model_name = 'Helsinki-NLP/opus-mt-tc-big-en-tr'  # Translation model for English to Turkish
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name)

# Step 3: Define a function to translate the articles' titles and summaries
def translate_to_turkish(text):
    """
    Translates the input text from English to Turkish using MarianMTModel.
    """
    # Tokenize the input text
    translated = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

    # Generate translation using the model
    translation = model.generate(**translated)

    # Decode the translated text
    translated_text = tokenizer.decode(translation[0], skip_special_tokens=True)

    return translated_text

# Step 4: Translate titles and summaries of the collected articles
all_news_data['title_tr'] = all_news_data['title'].apply(translate_to_turkish)
all_news_data['summary_tr'] = all_news_data['summary'].apply(translate_to_turkish)

# Step 5: Display the translated data
print("\nTranslated Data Preview:")
print(all_news_data[['title', 'summary', 'title_tr', 'summary_tr']].head())

# Save the translated data back to a .pkl file if needed
translated_file_path = '/content/drive/MyDrive/Projects/Deeplearning/Data/news_translated.pkl'
all_news_data.to_pickle(translated_file_path)  # Save translated DataFrame

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


config.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/470M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/337 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/797k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/833k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.50M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]




Translated Data Preview:
                                               title  \
1  Syria’s New Government Steps Up Pursuit of Ass...   
2  Playing to Win: How a First Nation Turned Arou...   
3  Takeaways From a Times Correspondent’s Return ...   
4  In Syria, U.S. Hopes to Avoid Replay of Afghan...   

                                             summary  \
0  With the war now over, the Taliban are welcomi...   
1  Finding the remnants of the old dictatorship a...   
2  Steered by an entrepreneurial chief, Membertou...   
3                              Here’s what he found.   
4  American officials are wary as they try to per...   

                                            title_tr  \
0  Afganistan'a Büyüyen Turistler Grubu Girişimi ...   
1   Suriye'nin Yeni Hükümeti Esad Sadıklarını Arıyor   
2  Kazanmak İçin Oynamak: İlk Bir Ulus Servetleri...   
3              Times muhabirinin Afganistan'a dönüşü   
4  Suriye'de ABD, Afganistan'ın Yeniden Oynanması...   

                   

#Step 3: Sentiment Analysis:



In [7]:
from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load the sentiment analysis model and tokenizer for Turkish
model_name = 'dbmdz/bert-base-turkish-cased'

try:
    sentiment_model = BertForSequenceClassification.from_pretrained(model_name)
    sentiment_tokenizer = BertTokenizer.from_pretrained(model_name)
    print(f"Sentiment analysis model '{model_name}' loaded successfully.")
except Exception as e:
    print(f"Error loading sentiment analysis model: {e}")

# Function to predict sentiment (positive or negative)
def predict_sentiment(text):
    try:
        # Tokenize the text
        inputs = sentiment_tokenizer(text, return_tensors="pt", padding=True, truncation=True)
        # Get the model's predictions
        with torch.no_grad():
            outputs = sentiment_model(**inputs)
        logits = outputs.logits
        predicted_class_id = torch.argmax(logits, dim=-1).item()

        # Return 'positive' or 'negative' based on the prediction
        return 'positive' if predicted_class_id == 1 else 'negative'
    except Exception as e:
        return f"Error during sentiment prediction: {e}"

# Apply sentiment analysis on the translated titles and summaries
all_news_data['title_sentiment'] = all_news_data['title'].apply(predict_sentiment)
all_news_data['summary_sentiment'] = all_news_data['summary'].apply(predict_sentiment)

# Save the sentiment analysis results to a new .pkl file
sentiment_file_path = '/content/drive/MyDrive/Projects/Deeplearning/news_sentiment.pkl'  # Updated file path
all_news_data.to_pickle(sentiment_file_path)

# Display a preview of the sentiment analysis results
print(all_news_data[['title', 'summary','title_sentiment', 'summary_sentiment']].head())


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dbmdz/bert-base-turkish-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Sentiment analysis model 'dbmdz/bert-base-turkish-cased' loaded successfully.
                                               title  \
1  Syria’s New Government Steps Up Pursuit of Ass...   
2  Playing to Win: How a First Nation Turned Arou...   
3  Takeaways From a Times Correspondent’s Return ...   
4  In Syria, U.S. Hopes to Avoid Replay of Afghan...   

                                             summary  \
0  With the war now over, the Taliban are welcomi...   
1  Finding the remnants of the old dictatorship a...   
2  Steered by an entrepreneurial chief, Membertou...   
3                              Here’s what he found.   
4  American officials are wary as they try to per...   

                                            title_tr  \
0  Afganistan'a Büyüyen Turistler Grubu Girişimi ...   
1   Suriye'nin Yeni Hükümeti Esad Sadıklarını Arıyor   
2  Kazanmak İçin Oynamak: İlk Bir Ulus Servetleri...   
3              Times muhabirinin Afganistan'a dönüşü   
4  Suriye'de ABD, Afgani

In [9]:
all_news_data[['title', 'summary', 'title_tr', 'summary_tr', 'title_sentiment', 'summary_sentiment']]

Unnamed: 0,title,summary,title_tr,summary_tr,title_sentiment,summary_sentiment
0,"Ignoring Warnings, a Growing Band of Tourists ...","With the war now over, the Taliban are welcomi...",Afganistan'a Büyüyen Turistler Grubu Girişimi ...,"Artık savaş sona erdiğinde, Taliban yabancı ge...",negative,negative
1,Syria’s New Government Steps Up Pursuit of Ass...,Finding the remnants of the old dictatorship a...,Suriye'nin Yeni Hükümeti Esad Sadıklarını Arıyor,Eski diktatörlüğün kalıntılarını bulmak ve onl...,negative,negative
2,Playing to Win: How a First Nation Turned Arou...,"Steered by an entrepreneurial chief, Membertou...",Kazanmak İçin Oynamak: İlk Bir Ulus Servetleri...,Girişimci bir şef olan Nova Scotia'daki Member...,positive,negative
3,Takeaways From a Times Correspondent’s Return ...,Here’s what he found.,Times muhabirinin Afganistan'a dönüşü,İşte bulduğu şey.,positive,positive
4,"In Syria, U.S. Hopes to Avoid Replay of Afghan...",American officials are wary as they try to per...,"Suriye'de ABD, Afganistan'ın Yeniden Oynanması...","Amerikalı yetkililer, Suriye'de kontrolü elind...",negative,negative
5,Residents Turn to Home Lifting In Response to ...,"As climate change intensifies, flooding is eme...","Konut sakinleri, sel tehdidine yanıt olarak ev...","İklim değişikliği yoğunlaştıkça, sel daha önce...",negative,negative
6,Some African Leaders Are Optimistic About Trump,"In his first term, Donald Trump denigrated Afr...",Bazı Afrikalı Liderler Trump Hakkında İyimser,"Donald Trump, ilk döneminde Afrika ülkelerini ...",negative,negative
7,"Syria’s Alawite Minority, Favored by the Assad...",Amid an outcry for justice and accountability ...,"Suriye'nin Alevi Azınlığı, Esadlar Tarafından ...",Çevrimiçi adalet ve hesap verebilirlik ve tehd...,negative,negative
8,"A Century of Human Detritus, Visualized",“Technostuff” built in the last 100 years outw...,"Bir Yüzyıl İnsan Detritus, Görselleştirilmiş","Son 100 yılda inşa edilen ""Teknoloji"" dünyadak...",negative,negative
9,South Korean Lawmakers Impeach Acting Presiden...,The vote was the second major impeachment in t...,Güney Koreli Milletvekilleri Kriz Derinleşiyor...,"Oylama, Başkan Yoon'un talihsiz sıkıyönetim te...",negative,negative
