# Aldo Group's AI Readiness based on the web


1. **Web Scraping**:
   - **Objective**: Gather the latest articles mentioning both "ALDO" and AI-related keywords.
   - **Alternatives**:
     - Utilize libraries like `requests` to fetch web pages and `BeautifulSoup` for parsing HTML content.
     - ✅ Use APIs connectors like Google's Programmable Search Engine.

2. **Data Storage**:
   - Store the extracted data in a CSV file and call it after: Web scraping, Preprocessing, and Sentiment Analysis

3. **Data Preprocessing**:
   - **Objective**: Clean the article content for accurate sentiment analysis.
   - **Steps**:
     - Convert text to lowercase.
     - Remove punctuation, numbers, and special characters.
     - Eliminate stop words and names.
     - Perform stemming or lemmatization to reduce words to their base forms.

4. **Sentiment Analysis**:
   - **Objective**: Determine the sentiment of each article.
   - **Tools**:
     - Use libraries like `TextBlob` or `VADER` from the `nltk` library.
   - **Process**:
     - Analyze the preprocessed content to assign a sentiment score ranging from -1 (very negative) to 1 (very positive).
     - Add a new column "Sentiment Score" to the CSV file to store the average of the two scores. 

In [None]:
# Install
#!pip install requests beautifulsoup4 textblob nltk

# Imports
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
import re

In [None]:
API_KEY = "replace with your key"
CX = "replace with your cx"   # Google search engine ID
QUERY = "Aldo Group AI" # Replace with your query

headers = {"user-agent": "replace with your user agent"} #Find your agent on: https://httpbin.org/user-agent

url = f"https://www.googleapis.com/customsearch/v1?key={API_KEY}&cx={CX}&q={QUERY}"

response = requests.get(url)
data = response.json()

# Get dates from search results
for item in data.get("items", []):
    print(item.get("title"), item.get("link"))


data


ALDO Group Invests in Predictive AI for Demand Forecasting - ASUG https://www.asug.com/insights/aldo-group-invests-in-predictive-ai-for-demand-forecasting
ALDO Group Prioritizes Predictive AI Over Generative AI. Here's ... https://www.cxtoday.com/customer-data-platform/aldo-group-prioritizes-predictive-ai-over-generative-ai-heres-why/
Inside ALDO's in-house generative AI and machine learning strategy ... https://digiday.com/marketing/inside-aldos-in-house-generative-ai-and-machine-learning-strategy/
ALDO Group | Contentful https://www.contentful.com/case-studies/aldo-group/
Aldo's AI Investments Include Celebrity Picks and Demand ... https://consumergoods.com/aldos-ai-investments-include-celebrity-picks-and-demand-forecasting
How Aldo puts its best foot forward with a unified e-commerce ... https://emarsys.com/why-emarsys/success-stories/how-aldo-puts-its-best-foot-forward-with-a-unified-e-commerce-system/
ALDO Group's AI Revolution | Fatih Nayebi, Ph.D. posted on the ... https://www.l

{'kind': 'customsearch#search',
 'url': {'type': 'application/json',
  'template': 'https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json'},
 'queries': {'request': [{'title': 'Google Custom Search - Aldo Group AI',
    'totalResults': '13800000',
    'searchTerms': 'Aldo Group AI',
    'count': 10,
    'startIndex': 1,
    'inputEncoding': 'utf8',
    'outputEncoding': 'utf8',
    'safe': 'off

In [None]:
def parse_date(raw_date):
    """Convert various date formats to dd-mm-yyyy"""
    try:
        # Clean up common date string issues
        raw_date = raw_date.strip().replace('\n', ' ')
        
        # Handle ISO format dates with time (e.g., "2023-12-31T15:30:00Z")
        iso_match = re.match(r'(\d{4}-\d{2}-\d{2})', raw_date)
        if iso_match:
            raw_date = iso_match.group(1)
        
        # Define priority order of date formats to try
        date_formats = [
            '%Y-%m-%d',        # ISO format
            '%d %B %Y',        # 25 December 2023
            '%B %d, %Y',       # December 25, 2023
            '%b %d, %Y',       # Dec 25, 2023
            '%d/%m/%Y',        # 25/12/2023
            '%m/%d/%Y',        # 12/25/2023
            '%d-%m-%Y',        # 25-12-2023
            '%m-%d-%Y',        # 12-25-2023
            '%d.%m.%Y',        # 25.12.2023
            '%Y%m%d',          # 20231225
        ]
        
        for fmt in date_formats:
            try:
                parsed_date = datetime.strptime(raw_date, fmt)
                return parsed_date.strftime('%d-%m-%Y')
            except ValueError:
                continue
                
        return "Unrecognized format"
    except Exception as e:
        return f"Date parse error: {str(e)}"

def extract_date_from_url(url):
    try:
        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(url, headers=headers, timeout=10)
        soup = BeautifulSoup(response.text, "html.parser")
        
        # 1. Check meta tags first (most reliable)
        meta_properties = [
            ('article:published_time', None),
            ('og:published_time', None),
            ('date', None),
            ('DC.date.issued', 'name'),
            ('PublicationDate', 'name')
        ]
        
        for prop, attr in meta_properties:
            meta = soup.find('meta', {attr or 'property': prop})
            if meta and meta.get('content'):
                return parse_date(meta['content'])
        
        # 2. Search visible text patterns
        months = r"|".join([
            "January", "February", "March", "April", "May", "June",
            "July", "August", "September", "October", "November", "December",
            "Jan", "Feb", "Mar", "Apr", "May", "Jun",
            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
        ])
        
        patterns = [
            r'\d{4}-\d{2}-\d{2}',                      # ISO dates
            r'\d{1,2}/\d{1,2}/\d{4}',                   # US/UK dates
            fr'({months})[\s,]+(\d{{1,2}})[\s,]+(\d{{4}})',  # Month name dates
            r'\d{1,2}\.\d{1,2}\.\d{4}',                 # European date format
            r'\d{{1,2}}\s+(?:st|nd|rd|th)\s+{months}\s+\d{{4}}'  # "25th December 2023"
        ]
        
        for pattern in patterns:
            matches = re.finditer(pattern, soup.get_text(), re.IGNORECASE)
            for match in matches:
                date_str = match.group().replace(',', '').strip()
                parsed = parse_date(date_str)
                if "error" not in parsed.lower():
                    return parsed
        
        return "Date not found"
   
    except Exception as e:
        return f"Error fetching date: {str(e)}"


for item in data.get("items", []):
    url = item.get("link")
    date = extract_date_from_url(url)
    print(f"Title: {item.get('title')}")
    print(f"Date: {date}\n")

Title: ALDO Group Invests in Predictive AI for Demand Forecasting - ASUG
Date: Unrecognized format

Title: ALDO Group Prioritizes Predictive AI Over Generative AI. Here's ...
Date: Date not found

Title: Inside ALDO's in-house generative AI and machine learning strategy ...
Date: 08-07-2024

Title: ALDO Group | Contentful
Date: Date not found

Title: Aldo's AI Investments Include Celebrity Picks and Demand ...
Date: 29-07-2024

Title: How Aldo puts its best foot forward with a unified e-commerce ...
Date: Unrecognized format

Title: ALDO Group's AI Revolution | Fatih Nayebi, Ph.D. posted on the ...
Date: Date not found

Title: DataSphere Lab & ALDO Group Partnership: Shaping the Future of ...
Date: Date not found

Title: SCALE AI is announcing investments of more than $20 million in 5 ...
Date: 28-09-2023

Title: Retail Gen AI Hackathon | Bensadoun School of Retail Management ...
Date: Unrecognized format



In [17]:
# Add dates to the dataframe
df = pd.DataFrame(data['items'])
df['date'] = df['link'].apply(extract_date_from_url)
keep = ["title", "date", "link", "displayLink", "snippet"]
df = df[keep] if not df.empty else pd.DataFrame(columns=keep)

df


Unnamed: 0,title,date,link,displayLink,snippet
0,ALDO Group Invests in Predictive AI for Demand...,Unrecognized format,https://www.asug.com/insights/aldo-group-inves...,www.asug.com,"Aug 16, 2024 ... ALDO Group is helping to deve..."
1,ALDO Group Prioritizes Predictive AI Over Gene...,Date not found,https://www.cxtoday.com/customer-data-platform...,www.cxtoday.com,"Aug 1, 2024 ... Matthieu Houle, CIO of ALDO Gr..."
2,Inside ALDO's in-house generative AI and machi...,08-07-2024,https://digiday.com/marketing/inside-aldos-in-...,digiday.com,"Jul 8, 2024 ... Amidst the artificial intellig..."
3,ALDO Group | Contentful,Date not found,https://www.contentful.com/case-studies/aldo-g...,www.contentful.com,Content plays a key role in the ALDO Group's e...
4,Aldo's AI Investments Include Celebrity Picks ...,29-07-2024,https://consumergoods.com/aldos-ai-investments...,consumergoods.com,"Jul 29, 2024 ... As part of its artificial int..."
5,How Aldo puts its best foot forward with a uni...,Unrecognized format,https://emarsys.com/why-emarsys/success-storie...,emarsys.com,"The Business. With its signature brands ALDO, ..."
6,"ALDO Group's AI Revolution | Fatih Nayebi, Ph....",Date not found,https://www.linkedin.com/posts/thefatih_2023-i...,www.linkedin.com,"Jul 12, 2024 ... A comprehensive document show..."
7,DataSphere Lab & ALDO Group Partnership: Shapi...,Date not found,https://www.mcgill.ca/bensadoun-school/data-sp...,www.mcgill.ca,The DataSphere Lab (DSL) at McGill University'...
8,SCALE AI is announcing investments of more tha...,28-09-2023,https://www.scaleai.ca/scale-ai-is-announcing-...,www.scaleai.ca,"Sep 28, 2023 ... The ALDO Group, a leading Can..."
9,Retail Gen AI Hackathon | Bensadoun School of ...,Unrecognized format,https://www.mcgill.ca/bensadoun-school/feature...,www.mcgill.ca,This first edition of the Retail Gen AI Hackat...


In [18]:
# Function to fetch text after the title
def get_text(url, title):
    try:
        response = requests.get(url, headers=headers, timeout=15)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, "html.parser")
        
        # Extract text from all relevant tags
        text_elements = soup.find_all(['p', 'div', 'span'])
        text = " ".join([element.get_text() for element in text_elements])
        
        # Split the text into words
        words = text.split()
        
        # Try to find the title in the text
        title_words = title.split()
        title_index = None

        for i in range(len(words) - len(title_words) + 1):
            if words[i:i + len(title_words)] == title_words:
                title_index = i + len(title_words)  # Start after title
                break

        # If title found, extract 300 words after it
        if title_index is not None and title_index + 300 < len(words):
            return " ".join(words[title_index:title_index + 300])
        else:
            return " ".join(words[:300])  # Fallback: First 300 words
    
    except Exception as e:
        print(f"Error fetching text from {url}: {e}")
        return None
    
# call the function
url = [link for link in df["link"]]
# take first five words in the title and make single string
title_5 = [" ".join(title.split()[:5]) for title in df["title"]]


# Get text in url and add to the dataframe
df["body"] = [get_text(url, title) for url, title in zip(url, title_5)]
df


Error fetching text from https://www.cxtoday.com/customer-data-platform/aldo-group-prioritizes-predictive-ai-over-generative-ai-heres-why/: 403 Client Error: Forbidden for url: https://www.cxtoday.com/customer-data-platform/aldo-group-prioritizes-predictive-ai-over-generative-ai-heres-why/


Unnamed: 0,title,date,link,displayLink,snippet,body
0,ALDO Group Invests in Predictive AI for Demand...,Unrecognized format,https://www.asug.com/insights/aldo-group-inves...,www.asug.com,"Aug 16, 2024 ... ALDO Group is helping to deve...",AI for Demand Forecasting Author ASUG Staff Pu...
1,ALDO Group Prioritizes Predictive AI Over Gene...,Date not found,https://www.cxtoday.com/customer-data-platform...,www.cxtoday.com,"Aug 1, 2024 ... Matthieu Houle, CIO of ALDO Gr...",
2,Inside ALDO's in-house generative AI and machi...,08-07-2024,https://digiday.com/marketing/inside-aldos-in-...,digiday.com,"Jul 8, 2024 ... Amidst the artificial intellig...",Subscribe | Login Reader Digiday+ Member Subsc...
3,ALDO Group | Contentful,Date not found,https://www.contentful.com/case-studies/aldo-g...,www.contentful.com,Content plays a key role in the ALDO Group's e...,LivestreamJoin us March 31 as we unveil the fu...
4,Aldo's AI Investments Include Celebrity Picks ...,29-07-2024,https://consumergoods.com/aldos-ai-investments...,consumergoods.com,"Jul 29, 2024 ... As part of its artificial int...",Subscribe Subscribe Subscribe Subscribe News L...
5,How Aldo puts its best foot forward with a uni...,Unrecognized format,https://emarsys.com/why-emarsys/success-storie...,emarsys.com,"The Business. With its signature brands ALDO, ...",foot forward with a unified e-commerce system ...
6,"ALDO Group's AI Revolution | Fatih Nayebi, Ph....",Date not found,https://www.linkedin.com/posts/thefatih_2023-i...,www.linkedin.com,"Jul 12, 2024 ... A comprehensive document show...",LinkedIn and 3rd parties use essential and non...
7,DataSphere Lab & ALDO Group Partnership: Shapi...,Date not found,https://www.mcgill.ca/bensadoun-school/data-sp...,www.mcgill.ca,The DataSphere Lab (DSL) at McGill University'...,Partnership Retail Gen AI Hackathon Our Team C...
8,SCALE AI is announcing investments of more tha...,28-09-2023,https://www.scaleai.ca/scale-ai-is-announcing-...,www.scaleai.ca,"Sep 28, 2023 ... The ALDO Group, a leading Can...",of more than $20 million in 5 AI projects for ...
9,Retail Gen AI Hackathon | Bensadoun School of ...,Unrecognized format,https://www.mcgill.ca/bensadoun-school/feature...,www.mcgill.ca,This first edition of the Retail Gen AI Hackat...,"Fall 2024 September 26 - October 4, 2024 The s..."


In [20]:
# save the dataframe to csv
df.to_csv("output data/aldo_AI_search_results.csv", index=False)

## Data Preprocessing

In [2]:
import nltk
from nltk.corpus import stopwords, names
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords, names
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer

import pandas as pd
import nltk
import re
import string
from nltk.corpus import stopwords, names
from nltk.tokenize import word_tokenize


import string
import re


#nltk.download('stopwords')
#nltk.download('names')
#nltk.download('punkt')
#nltk.download('punkt_tab')


In [3]:
search_results = pd.read_csv("output data/aldo_AI_search_results.csv")

# Drop rows where body is missing
search_results = search_results.dropna(subset=["body"])

search_results.shape

(9, 6)

In [4]:
stop_words = set(stopwords.words('english'))
names_dict = set(names.words())
names_dict = {key.lower() for key in names_dict}

print("Names in the dictionary: ", list(names_dict)[:10], "etc")
print("Stop words in the dictionary: ", list(stop_words)[:10], "etc")

Names in the dictionary:  ['dwaine', 'josephina', 'amory', 'jacky', 'webb', 'wood', 'joselyn', 'leo', 'corabelle', 'elvin'] etc
Stop words in the dictionary:  ['here', 't', 'these', 'were', 'on', 'not', 'y', 'shouldn', 'when', "i've"] etc


In [5]:
stop_words = set(stopwords.words('english'))
names_dict = set(names.words())

def clean_text(text):
    if not isinstance(text, str):
        return ""
    
    # Text cleaning steps
    text = re.sub(r"[^a-zA-Z0-9.']+", " ", text)
    text = text.lower()
    text = re.sub(r"\d+(\.\d+)?%", " ", text)
    text = re.sub(r"\b\d+(\.\d+)?\b", " ", text)
    text = re.sub(r"www\.[^\s]+", "", text)
    text = re.sub(r"http[^\s]+", "", text)
    text = re.sub(r"\b\S*\.com\b", "", text)
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = re.sub(r"\s+", " ", text).strip()
    
    words = word_tokenize(text)
    words = [word for word in words 
             if word not in stop_words
             and word not in names_dict
             and word.capitalize() not in names_dict]
    
    return " ".join(words)

# Clean the text data
search_results['cleaned_text'] = search_results['body'].fillna('').apply(clean_text)

# Save results
#df.to_csv("cleaned_data.csv", index=False)

search_results

Unnamed: 0,title,date,link,displayLink,snippet,body,cleaned_text
0,ALDO Group Invests in Predictive AI for Demand...,Unrecognized format,https://www.asug.com/insights/aldo-group-inves...,www.asug.com,"Aug 16, 2024 ... ALDO Group is helping to deve...",AI for Demand Forecasting Author ASUG Staff Pu...,ai demand forecasting author asug staff publis...
2,Inside ALDO's in-house generative AI and machi...,08-07-2024,https://digiday.com/marketing/inside-aldos-in-...,digiday.com,"Jul 8, 2024 ... Amidst the artificial intellig...",Subscribe | Login Reader Digiday+ Member Subsc...,subscribe login reader digiday member subscrib...
3,ALDO Group | Contentful,Date not found,https://www.contentful.com/case-studies/aldo-g...,www.contentful.com,Content plays a key role in the ALDO Group's e...,LivestreamJoin us March 31 as we unveil the fu...,livestreamjoin us march unveil future digital ...
4,Aldo's AI Investments Include Celebrity Picks ...,29-07-2024,https://consumergoods.com/aldos-ai-investments...,consumergoods.com,"Jul 29, 2024 ... As part of its artificial int...",Subscribe Subscribe Subscribe Subscribe News L...,subscribe subscribe subscribe subscribe news l...
5,How Aldo puts its best foot forward with a uni...,Unrecognized format,https://emarsys.com/why-emarsys/success-storie...,emarsys.com,"The Business. With its signature brands ALDO, ...",foot forward with a unified e-commerce system ...,foot forward unified e commerce system puts be...
6,"ALDO Group's AI Revolution | Fatih Nayebi, Ph....",Date not found,https://www.linkedin.com/posts/thefatih_2023-i...,www.linkedin.com,"Jul 12, 2024 ... A comprehensive document show...",LinkedIn and 3rd parties use essential and non...,linkedin 3rd parties use essential non essenti...
7,DataSphere Lab & ALDO Group Partnership: Shapi...,Date not found,https://www.mcgill.ca/bensadoun-school/data-sp...,www.mcgill.ca,The DataSphere Lab (DSL) at McGill University'...,Partnership Retail Gen AI Hackathon Our Team C...,partnership retail ai hackathon team contact u...
8,SCALE AI is announcing investments of more tha...,28-09-2023,https://www.scaleai.ca/scale-ai-is-announcing-...,www.scaleai.ca,"Sep 28, 2023 ... The ALDO Group, a leading Can...",of more than $20 million in 5 AI projects for ...,million ai projects intelligent supply chains ...
9,Retail Gen AI Hackathon | Bensadoun School of ...,Unrecognized format,https://www.mcgill.ca/bensadoun-school/feature...,www.mcgill.ca,This first edition of the Retail Gen AI Hackat...,"Fall 2024 September 26 - October 4, 2024 The s...",fall september october second edition bensadou...


In [None]:
def lemma_stem_text(text):
    # Tokenize the text
    words = word_tokenize(text)
    
    # Lemmatize the words
    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(word) for word in words]
    
    # Stem the words
    stemmer = PorterStemmer()
    words = [stemmer.stem(word) for word in words]
    
    return " ".join(words)

# Apply the function to the cleaned text
search_results["lemmatized_stemmed_text"] = search_results["cleaned_text"].apply(lemma_stem_text)
search_results

Unnamed: 0,title,date,link,displayLink,snippet,body,cleaned_text,lemmatized_stemmed_text
0,ALDO Group Invests in Predictive AI for Demand...,Unrecognized format,https://www.asug.com/insights/aldo-group-inves...,www.asug.com,"Aug 16, 2024 ... ALDO Group is helping to deve...",AI for Demand Forecasting Author ASUG Staff Pu...,ai demand forecasting author asug staff publis...,ai demand forecast author asug staff publish r...
2,Inside ALDO's in-house generative AI and machi...,08-07-2024,https://digiday.com/marketing/inside-aldos-in-...,digiday.com,"Jul 8, 2024 ... Amidst the artificial intellig...",Subscribe | Login Reader Digiday+ Member Subsc...,subscribe login reader digiday member subscrib...,subscrib login reader digiday member subscrib ...
3,ALDO Group | Contentful,Date not found,https://www.contentful.com/case-studies/aldo-g...,www.contentful.com,Content plays a key role in the ALDO Group's e...,LivestreamJoin us March 31 as we unveil the fu...,livestreamjoin us march unveil future digital ...,livestreamjoin u march unveil futur digit expe...
4,Aldo's AI Investments Include Celebrity Picks ...,29-07-2024,https://consumergoods.com/aldos-ai-investments...,consumergoods.com,"Jul 29, 2024 ... As part of its artificial int...",Subscribe Subscribe Subscribe Subscribe News L...,subscribe subscribe subscribe subscribe news l...,subscrib subscrib subscrib subscrib news lates...
5,How Aldo puts its best foot forward with a uni...,Unrecognized format,https://emarsys.com/why-emarsys/success-storie...,emarsys.com,"The Business. With its signature brands ALDO, ...",foot forward with a unified e-commerce system ...,foot forward unified e commerce system puts be...,foot forward unifi e commerc system put best f...
6,"ALDO Group's AI Revolution | Fatih Nayebi, Ph....",Date not found,https://www.linkedin.com/posts/thefatih_2023-i...,www.linkedin.com,"Jul 12, 2024 ... A comprehensive document show...",LinkedIn and 3rd parties use essential and non...,linkedin 3rd parties use essential non essenti...,linkedin 3rd parti use essenti non essenti coo...
7,DataSphere Lab & ALDO Group Partnership: Shapi...,Date not found,https://www.mcgill.ca/bensadoun-school/data-sp...,www.mcgill.ca,The DataSphere Lab (DSL) at McGill University'...,Partnership Retail Gen AI Hackathon Our Team C...,partnership retail ai hackathon team contact u...,partnership retail ai hackathon team contact u...
8,SCALE AI is announcing investments of more tha...,28-09-2023,https://www.scaleai.ca/scale-ai-is-announcing-...,www.scaleai.ca,"Sep 28, 2023 ... The ALDO Group, a leading Can...",of more than $20 million in 5 AI projects for ...,million ai projects intelligent supply chains ...,million ai project intellig suppli chain back ...
9,Retail Gen AI Hackathon | Bensadoun School of ...,Unrecognized format,https://www.mcgill.ca/bensadoun-school/feature...,www.mcgill.ca,This first edition of the Retail Gen AI Hackat...,"Fall 2024 September 26 - October 4, 2024 The s...",fall september october second edition bensadou...,fall septemb octob second edit bensadoun schoo...


In [17]:
# Save the results
search_results.to_csv("output data/preprocessed_aldo_AI_search_results.csv", index=False)

## Sentiment Analysis

## About VADER and TextBlob
* VADER was initially used for social media text analytics. It is even able to handle emojis!
    - VADER is lexicon based
* TextBlob is strong at analyzing longer text.
    - TextBlob is rule based

Both score ranges are between -1 (negative sentiment) and 1 (positive sentiment)

In [27]:
from textblob import TextBlob
from nltk.sentiment import SentimentIntensityAnalyzer

#nltk.download('vader_lexicon')

In [28]:
# look at vader_lexicon
sia = SentimentIntensityAnalyzer()
sia.lexicon

{'$:': -1.5,
 '%)': -0.4,
 '%-)': -1.5,
 '&-:': -0.4,
 '&:': -0.7,
 "( '}{' )": 1.6,
 '(%': -0.9,
 "('-:": 2.2,
 "(':": 2.3,
 '((-:': 2.1,
 '(*': 1.1,
 '(-%': -0.7,
 '(-*': 1.3,
 '(-:': 1.6,
 '(-:0': 2.8,
 '(-:<': -0.4,
 '(-:o': 1.5,
 '(-:O': 1.5,
 '(-:{': -0.1,
 '(-:|>*': 1.9,
 '(-;': 1.3,
 '(-;|': 2.1,
 '(8': 2.6,
 '(:': 2.2,
 '(:0': 2.4,
 '(:<': -0.2,
 '(:o': 2.5,
 '(:O': 2.5,
 '(;': 1.1,
 '(;<': 0.3,
 '(=': 2.2,
 '(?:': 2.1,
 '(^:': 1.5,
 '(^;': 1.5,
 '(^;0': 2.0,
 '(^;o': 1.9,
 '(o:': 1.6,
 ")':": -2.0,
 ")-':": -2.1,
 ')-:': -2.1,
 ')-:<': -2.2,
 ')-:{': -2.1,
 '):': -1.8,
 '):<': -1.9,
 '):{': -2.3,
 ');<': -2.6,
 '*)': 0.6,
 '*-)': 0.3,
 '*-:': 2.1,
 '*-;': 2.4,
 '*:': 1.9,
 '*<|:-)': 1.6,
 '*\\0/*': 2.3,
 '*^:': 1.6,
 ',-:': 1.2,
 "---'-;-{@": 2.3,
 '--<--<@': 2.2,
 '.-:': -1.2,
 '..###-:': -1.7,
 '..###:': -1.9,
 '/-:': -1.3,
 '/:': -1.3,
 '/:<': -1.4,
 '/=': -0.9,
 '/^:': -1.0,
 '/o:': -1.4,
 '0-8': 0.1,
 '0-|': -1.2,
 '0:)': 1.9,
 '0:-)': 1.4,
 '0:-3': 1.5,
 '0:03': 1.9,
 '

In [29]:
search_results = pd.read_csv("output data/preprocessed_aldo_AI_search_results.csv")

In [None]:
sia = SentimentIntensityAnalyzer()

def get_sentiment(text):
    if not text.strip():
        return 0  # Neutral for empty text
    blob_score = TextBlob(text).sentiment.polarity
    vader_score = sia.polarity_scores(text)['compound']
    
    # Average
    average = (blob_score + vader_score) / 2
    
    # Return individual 
    return blob_score, vader_score, average

# Add each score to a separate column
search_results['blob_score'], search_results['vader_score'], search_results['sentiment_score'] = \
    zip(*search_results['lemmatized_stemmed_text'].apply(get_sentiment))

search_results



Unnamed: 0,title,date,link,displayLink,snippet,body,cleaned_text,lemmatized_stemmed_text,blob_score,vader_score,sentiment_score
0,ALDO Group Invests in Predictive AI for Demand...,Unrecognized format,https://www.asug.com/insights/aldo-group-inves...,www.asug.com,"Aug 16, 2024 ... ALDO Group is helping to deve...",AI for Demand Forecasting Author ASUG Staff Pu...,ai demand forecasting author asug staff publis...,ai demand forecast author asug staff publish r...,0.076984,0.9719,0.524442
1,Inside ALDO's in-house generative AI and machi...,08-07-2024,https://digiday.com/marketing/inside-aldos-in-...,digiday.com,"Jul 8, 2024 ... Amidst the artificial intellig...",Subscribe | Login Reader Digiday+ Member Subsc...,subscribe login reader digiday member subscrib...,subscrib login reader digiday member subscrib ...,0.087302,0.9648,0.526051
2,ALDO Group | Contentful,Date not found,https://www.contentful.com/case-studies/aldo-g...,www.contentful.com,Content plays a key role in the ALDO Group's e...,LivestreamJoin us March 31 as we unveil the fu...,livestreamjoin us march unveil future digital ...,livestreamjoin u march unveil futur digit expe...,0.31,0.9694,0.6397
3,Aldo's AI Investments Include Celebrity Picks ...,29-07-2024,https://consumergoods.com/aldos-ai-investments...,consumergoods.com,"Jul 29, 2024 ... As part of its artificial int...",Subscribe Subscribe Subscribe Subscribe News L...,subscribe subscribe subscribe subscribe news l...,subscrib subscrib subscrib subscrib news lates...,0.106463,0.9856,0.546031
4,How Aldo puts its best foot forward with a uni...,Unrecognized format,https://emarsys.com/why-emarsys/success-storie...,emarsys.com,"The Business. With its signature brands ALDO, ...",foot forward with a unified e-commerce system ...,foot forward unified e commerce system puts be...,foot forward unifi e commerc system put best f...,0.175685,0.9801,0.577893
5,"ALDO Group's AI Revolution | Fatih Nayebi, Ph....",Date not found,https://www.linkedin.com/posts/thefatih_2023-i...,www.linkedin.com,"Jul 12, 2024 ... A comprehensive document show...",LinkedIn and 3rd parties use essential and non...,linkedin 3rd parties use essential non essenti...,linkedin 3rd parti use essenti non essenti coo...,0.0,0.6908,0.3454
6,DataSphere Lab & ALDO Group Partnership: Shapi...,Date not found,https://www.mcgill.ca/bensadoun-school/data-sp...,www.mcgill.ca,The DataSphere Lab (DSL) at McGill University'...,Partnership Retail Gen AI Hackathon Our Team C...,partnership retail ai hackathon team contact u...,partnership retail ai hackathon team contact u...,0.1,0.6249,0.36245
7,SCALE AI is announcing investments of more tha...,28-09-2023,https://www.scaleai.ca/scale-ai-is-announcing-...,www.scaleai.ca,"Sep 28, 2023 ... The ALDO Group, a leading Can...",of more than $20 million in 5 AI projects for ...,million ai projects intelligent supply chains ...,million ai project intellig suppli chain back ...,-0.067803,0.9794,0.455798
8,Retail Gen AI Hackathon | Bensadoun School of ...,Unrecognized format,https://www.mcgill.ca/bensadoun-school/feature...,www.mcgill.ca,This first edition of the Retail Gen AI Hackat...,"Fall 2024 September 26 - October 4, 2024 The s...",fall september october second edition bensadou...,fall septemb octob second edit bensadoun schoo...,0.119318,0.9855,0.552409


In [31]:
# Save the results
search_results.to_csv("output data/sentiment_aldo_AI_search_results.csv", index=False)

## Improvements for the Next Round:
- Get article dates from delimiter "..." in snippet
- Use more sentiment analyzers like BERT and customized BERTs
- Explore topic analytics, text analytics from each article to add understanding