## Sample "About Me" Text

In [None]:
brand_text ={"Klrana" : """
Hej, we’re Klarna
We’re here to set the new standard for how people shop and pay–proudly Swedish.

Smarter shopping starts here
At Klarna, we're redefining the shopping experience to help people get more out of their money. Here's how:
Pay your way: Choose from interest-free payment plans and customizable payment options.
Earn while you shop: Receive cashback directly in your Klarna balance.
Secure shopping: Keep your data safe using advanced encryption and 24/7 fraud monitoring.
Our History
Made in Stockholm
Three young Swedish entrepreneurs had a brilliant idea back in 2005—but not the best name. Kreditor became Klarna in 2010.
Unicorn status unlocked
In 2012, Klarna reached a significant milestone with a $1B+ valuation, recognizing our rapid growth and market impacts.
Klarna's family grows
In 2014, Klarna acquired SOFORT, creating Europe’s fastest-growing online payments group.
A bold new era
In 2017, Klarna transformed from blue to bold pink, and secured a full banking license, expanding our capabilities to serve customers.
The Klarna app arrives
Launched in 2018, the Klarna app provides users with tools for smarter shopping, budget management, and seamless payments across online and in-store experiences.
The smart spending leader
As of 2025, Klarna is a leader in digital payments, offering flexible payment options, cashback, and financial tools to millions of shoppers worldwide.

The numbers don't lie
100m
active consumers

720k
M erchants

2.9m
transactions per day

26
C ountries supported

Trusted by the world's most loved brands

Meet the board
The board is Klarna’s top decision-making body, overseeing strategy, operations, and corporate governance to ensure accountability to both the organization and investors.

Get to know our culture
With offices around the world, our global team blends start-up energy with a drive to create bold, impactful change and redefine smarter spending.
"""}

## Questions ##

1. What we do with typos? Like C ountries?

## Sample "Review" Text

## Keywork Matching


def get_daily_sentiment(
    api_key: str,
    query_keyword: str,
    start_date: Union[str, datetime],
    end_date: Union[str, datetime],
    verbose: bool = False
) -> pd.DataFrame:
    """
    Fetches daily sentiment scores for a given keyword based on news headlines.

    Args:
        api_key (str): NewsAPI API key.
        query_keyword (str): Search term for news (e.g., "Apple", "bond market").
        start_date (str or datetime): Start date (inclusive). Format: "YYYY-MM-DD" or datetime.
        end_date (str or datetime): End date (inclusive). Format: "YYYY-MM-DD" or datetime.
        verbose (bool): Print progress for each day.

    Returns:
        pd.DataFrame: Daily sentiment with:
            - 'date' (datetime.date)
            - 'sent_pos' (float): Proportion of positive headlines
            - 'sent_neg' (float): Proportion of negative headlines
            - 'sent_neu' (float): Proportion of neutral headlines
    """

    if isinstance(start_date, str):
        start_date = datetime.strptime(start_date, "%Y-%m-%d")
    if isinstance(end_date, str):
        end_date = datetime.strptime(end_date, "%Y-%m-%d")

    current = start_date
    result = []

    while current <= end_date:
        from_date = current.strftime("%Y-%m-%d")
        to_date = (current + timedelta(days=1)).strftime("%Y-%m-%d")

        headlines = []  # ensure it's defined before the try block

        try:
            headlines = get_news_headlines(api_key, query_keyword, from_date, to_date)
            if verbose:
                print(f"{query_keyword} on {from_date}: {len(headlines)} headlines")
            sent = get_sentiment_breakdown(headlines)
        except Exception as e:
            print(f"Error on {from_date}: {e}")
            sent = {'sent_pos': 0, 'sent_neg': 0, 'sent_neu': 1}  # default to neutral

        result.append({
            "date": current.date(),
            "sent_pos": sent['sent_pos'],
            "sent_neg": sent['sent_neg'],
            "sent_neu": sent['sent_neu'],
        })

        current += timedelta(days=1)

    return pd.DataFrame(result)

dfs = []

for keyword in general_news_keywords:
    df = get_daily_sentiment(
        api_key=API_KEY,
        query_keyword=keyword,
        start_date=start_date,
        end_date=end_date,
        verbose=False
    )
    df['keyword'] = keyword
    dfs.append(df)

df_genral_news = pd.concat(dfs)

## Clean Data

In [None]:
import re
import unicodedata
import nltk
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')


def clean_and_tokenize(raw_text: str) -> str:
    """
    Clean Brand's 'About Us' text.
    Cleans the input text and returns both:
    - cleaned text (str)
    - tokenized list of words (List[str])
    """
    # Normalize unicode dashes, quotes, etc.
    # Fixes weird characters (like curly quotes → straight quotes, long dashes → `-`)
    text = unicodedata.normalize("NFKC", raw_text)

    # Lowercase
    text = text.lower()

    # Remove weird line breaks and extra spacing
    text = re.sub(r'\n{2,}', '\n', text)      # Multiple blank lines into single ones (removes visual spacing)
    text = re.sub(r'[ \t]{2,}', ' ', text)    # Multiple spaces/tabs into just one (normalization)
    text = re.sub(r'[^a-z0-9\s]', '', text)   # Removes special charachters, keeps only alphabet, digit, and spaces

    text = text.strip()

    tockenized_text = word_tokenize(text)

    return tockenized_text


In [None]:
import pandas as pd
rows = []
for brand, raw_text in brand_text.items():
    tokens = clean_and_tokenize(raw_text)
    rows.append({
        "brand": brand,
        "tokens": tokens
    })

df = pd.DataFrame(rows)
print(df)

# Pipeline

In [None]:
from transformers import pipeline

# Load sentiment pipeline
sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
