### Recommender System Chocolate Confectionary Products
This file has the goal to make a recommender system from previously gathered and anlysed data and provide the user with the following:
- Sustainability scores for each chocolate confectionary product
- Recommendation of a similar chocoalte confectionary product that is more sustianable
- Provide the top XXX news articles for environmental, economic and social sustainability

The code is taken from Azaough (2022) and is adjusted, improved, and shaped to get the following system. The file will start with data pre-processing,<br>
after which the scoring will be defined and the final system will be presented. 

### Data pre-processing

----------------------------------------------------------------------------

In [1]:
#Importing the necessary libraries for data pre-processing
import numpy as np  #For numerical computations
import pandas as pd #Data analysis and manipulation
import pickle #For working with pickle files
import csv #For working with csv files
from sklearn.metrics.pairwise import cosine_similarity #Calculating cosine similarity
from sklearn.feature_extraction.text import CountVectorizer #Convert text data into matrix of token counts
import ast #Converting string that looks like a list into list
from ast import literal_eval #Converting string that looks like a list into list
import tkinter as tk #Used to get the display screen
from tkinter import ttk #Used to get the display screen

### Importing the datasets
First, the different datasets needed are loaded to be used.

In [2]:
#Importing the albert heijn dataset
ah_file_path = r"C:\Users\ghuiskens\Thesis\Chocolate scrapers\AH_scraper\ah_chocolate.csv"
ah_df = pd.read_csv(ah_file_path, encoding='latin-1')
ah_df.head(1)

Unnamed: 0,product_id,product_naam,prijs,kilo_prijs,omschrijving,inhoud,ingredienten,kenmerken,allergie_bevat,allergie_kan_bevatten,leverancier,adres_leverancier,product_url
0,768327,AH Chocolade zeevruchten,1.99,7.96,"['Chocoladebonbons', 'met 44% hazelnootpralinÃ...",250 Gram,"['IngrediÃ«nten: suiker, 23% ', 'Â°, cacaobote...",['Papier en/of hout van gecertificeerde herkom...,"Melk, Noten, Lactose, Hazelnoot","Amandel, Cashewnoot, Macadamianoot, Pecannoot,...",Albert Heijn,"Provincialeweg 11, 1506 MA ZAANDAM, Nederland",https://www.ah.nl/producten/product/wi172425/a...


In [3]:
#Importing the plus dataset
plus_file_path = r"C:\Users\ghuiskens\Thesis\Chocolate scrapers\Plus_scraper\plus_chocolate.csv"
plus_df = pd.read_csv(plus_file_path, encoding='latin-1')
plus_df.head(1)

Unnamed: 0,product_id,product_naam,prijs,kilo_prijs,omschrijving,inhoud,ingredienten,kenmerken,allergie_bevat,allergie_kan_bevatten,leverancier,adres_leverancier,product_url
0,325711.0,Chocoladetablet Puur hazelnoot,1.39,13.9,Pure chocolade met hele hazelnoten (25 %). Cho...,100 gram,"['suiker, cacaomassa, HAZELNOTEN, cacaoboter, ...",['Rainforest Alliance'],"['Melk', 'Noten', 'Soja']",,Alfred Ritter,D-71111 Waldenbuch Deutschland/Germany/Allemagne,https://www.plus.nl/product/ritter-sport-choco...


In [4]:
#Combining the albert heijn and plus dataset into one
#Create a new column for each dataframe to be able to recommend products from the same supermarket
ah_df['supermarket'] = 'albert heijn'
plus_df['supermarket'] = 'plus'

#Merge them together
supermarket_product_df = pd.concat([ah_df, plus_df], ignore_index = True)

#Drop the columns that are not needed
supermarket_product_df = supermarket_product_df.drop(columns=['allergie_bevat', 'allergie_kan_bevatten', 'adres_leverancier'])
supermarket_product_df.head(1)

Unnamed: 0,product_id,product_naam,prijs,kilo_prijs,omschrijving,inhoud,ingredienten,kenmerken,leverancier,product_url,supermarket
0,768327.0,AH Chocolade zeevruchten,1.99,7.96,"['Chocoladebonbons', 'met 44% hazelnootpralinÃ...",250 Gram,"['IngrediÃ«nten: suiker, 23% ', 'Â°, cacaobote...",['Papier en/of hout van gecertificeerde herkom...,Albert Heijn,https://www.ah.nl/producten/product/wi172425/a...,albert heijn


In [5]:
#Importing the classficiation and news article dataset
classification_pickle_file_path = r"C:\Users\ghuiskens\Thesis\News analysis\News classification\gdelt_classification_input_recommender.pkl"

#Load the serialized DataFrame from the pickle file
with open(classification_pickle_file_path, 'rb') as f:
    classification_loaded_serialized_df = f.read()

#Deserialize the pickled DataFrame
gdelt_article_df = pickle.loads(classification_loaded_serialized_df)
gdelt_article_df.head(1)

Unnamed: 0,published,text,title,url,producer_in_article,social_sustainability,environmental_sustainability,economic_sustainability
0,2022-03-15 05:45:00,[These ancient creatures can squeeze through t...,Progressive Charlestown,http://www.progressive-charlestown.com/search?...,[copar],0.379858,0.420189,0.487699


In [6]:
#Importing the sentiment score dataset
sentiment_pickle_file_path =  r'C:\Users\ghuiskens\Thesis\News analysis\News sentiment\sentiment_input_recommender.pkl'

#Load the serialized DataFrame from the pickle file
with open(sentiment_pickle_file_path, 'rb') as f:
    sentiment_loaded_serialized_df = f.read()

#Deserialize the pickled DataFramemal
sentiment_score_df = pickle.loads(sentiment_loaded_serialized_df)
sentiment_score_df.head(1)

Unnamed: 0,producer,normalized_sentiment_score
0,haribo,0.4788


In [7]:
#Merge the sentiment scores with the product dataframe
supermarket_product_df = pd.merge(supermarket_product_df, sentiment_score_df, left_on='leverancier', right_on='producer', how='left')

#Create a new column 'sentiment_rating' and fill it with normalized_sentiment_score where applicablenor
supermarket_product_df['sentiment_rating'] = supermarket_product_df['normalized_sentiment_score']

#Drop the 'producer' and 'normalized_sentiment_score' columns as they are no longer needed
supermarket_product_df.drop(['producer', 'normalized_sentiment_score'], axis=1, inplace=True)

#Replace NaN values with 0.0 so that they do not get punished for not having a sentiment rating
supermarket_product_df['sentiment_rating'].fillna(0.0, inplace=True)

### Cleaning data
_______________________

In [8]:
#Turn prijs, kilo_prijs, inhoud_gewicht, omschrijving, product_naam, product_id, and ingredienten into readable string variable
supermarket_product_df['product_id'] = supermarket_product_df['product_id'].astype(int)

#Replace empty strings with missing values
supermarket_product_df['product_naam'] = supermarket_product_df['product_naam'].str.strip().replace('', np.nan)
supermarket_product_df['omschrijving'] = supermarket_product_df['omschrijving'].str.strip().replace('', np.nan)
supermarket_product_df['inhoud'] = supermarket_product_df['inhoud'].str.strip().replace('', np.nan)
supermarket_product_df['ingredienten'] = supermarket_product_df['ingredienten'].str.strip().replace('', np.nan)

In [9]:
#drop rows missing a product_id or kenmerken
supermarket_product_df = supermarket_product_df.dropna(subset=['product_id'])
supermarket_product_df = supermarket_product_df.reset_index(drop=True)

#Drop rows with NaN values in the 'producer_in_article' column
gdelt_article_df = gdelt_article_df.dropna(subset=['producer_in_article'])

#Also remove rows with empty lists
gdelt_article_df = gdelt_article_df[gdelt_article_df['producer_in_article'].apply(lambda x: len(x) > 0)]

In [10]:
#Replace NaN values with empty lists
supermarket_product_df['kenmerken'].fillna('[]', inplace=True)

#Convert the string representations into actual lists
supermarket_product_df['kenmerken'] = supermarket_product_df['kenmerken'].apply(ast.literal_eval)

#Make dummies out of relevant features in the kenmerken column
kenmerken = pd.get_dummies(supermarket_product_df['kenmerken'].apply(pd.Series).stack(dropna=False)).sum(level=0)
kenmerken['Rainforest Alliance'] = kenmerken['Rainforest Alliance'] + kenmerken['Rainforest Alliance people & nature']
kenmerken["Tony\'s Open Chain"] = kenmerken['Tony\'s Open Chain'] + kenmerken['Samen maken we chocolade 100% slaafvrij']
kenmerken['UTZ'] = kenmerken['UTZ'] + kenmerken['UTZ Cocoa']
kenmerken['EU-biologisch'] = kenmerken['Europa Bio'] + kenmerken['Biologisch']
kenmerken['Fairtrade'] = kenmerken['Fairtrade'] + kenmerken['FairTrade COCOA']

#Drop irrelevant kenmerken or ones that have been unified under one name
kenmerken.drop('ConformitÃ© EuropÃ©enne (CE) â EU Conformance', axis=1, inplace=True)
kenmerken.drop('Europa Bio', axis=1, inplace=True)
kenmerken.drop('Biologisch', axis=1, inplace=True)
kenmerken.drop('UTZ Cocoa', axis=1, inplace=True)
kenmerken.drop('FairTrade COCOA', axis=1, inplace=True)
kenmerken.drop('Samen maken we chocolade 100% slaafvrij', axis=1, inplace=True)
kenmerken.drop('Papier en/of hout van gecertificeerde herkomst', axis=1, inplace=True)
kenmerken.drop('Rainforest Alliance people & nature', axis=1, inplace=True)
kenmerken.drop('Lactosevrij', axis=1, inplace=True)
kenmerken.drop('Vegetarisch', axis=1, inplace=True)
kenmerken.drop('Glutenvrij', axis=1, inplace=True)
kenmerken.drop('Gluten vrij', axis=1, inplace=True)
kenmerken.drop('Groene Punt', axis=1, inplace=True)
kenmerken.drop('Triman', axis=1, inplace=True)
kenmerken.drop('Society of the Plastics Industry (SPI)', axis=1, inplace=True)
kenmerken.drop('Sustainable Palm Oil RSPO Certified', axis=1, inplace=True)
kenmerken.drop('Recyclebaar', axis=1, inplace=True)
kenmerken.drop('Veganistisch', axis=1, inplace=True)

  kenmerken = pd.get_dummies(supermarket_product_df['kenmerken'].apply(pd.Series).stack(dropna=False)).sum(level=0)
  kenmerken = pd.get_dummies(supermarket_product_df['kenmerken'].apply(pd.Series).stack(dropna=False)).sum(level=0)


In [11]:
#Join the kenmerken back with the supermarket df
supermarket_df = pd.merge(supermarket_product_df, kenmerken, left_index=True, right_index=True)

In [12]:
#Insert the Milieu Centraal Ratings
mc = {'name': ['UTZ', 'Rainforest Alliance', 'Fairtrade', 'EU-biologisch', 'EKO', 'Demeter', 'Cocoa Horizons', 'Cocoa Life', "Tony's Open Chain", 'Nestle Cocoa Plan'],
        'environment': [5, 5, 3, 3, 3, 3, 2, 0, 0, 0],
        'social': [5, 5, 5, 0, 0, 2, 1, 0, 0, 0],
        'control': [5, 5, 5, 4, 4, 4, 1, 0, 0, 0],
        'transparency': [5, 5, 4, 4, 5, 4, 2, 0, 5, 0]
        }
mc = pd.DataFrame.from_dict(mc)

In [13]:
#Set the columns to use in the calculations of scores
#Set the social score
mc['social_score'] = mc['social']

#Set the environmental score
mc['environment_score'] = mc['environment']

#Calculate the transparency and control score
mc['control_transparency_score'] = (mc['control'] + mc['transparency'])/ 2

In [14]:
#Split the social, environmental and control_transparency binary values into seperate columns per certificate
#Social
supermarket_df["UTZ_social"] = supermarket_df["UTZ"]
supermarket_df["Rainforest_alliance_social"] = supermarket_df["Rainforest Alliance"]
supermarket_df["Fairtrade_social"] = supermarket_df["Fairtrade"]
supermarket_df["EU-biologisch_social"] = supermarket_df["EU-biologisch"]
supermarket_df["Cocoa_horizons_social"] = supermarket_df["Cocoa Horizons"]
supermarket_df["Cocoa_life_social"] = supermarket_df["Cocoa Life"]
supermarket_df["Tonys_open_chain_social"] = supermarket_df["Tony's Open Chain"]
supermarket_df["Nestle_cocoa_plan_social"] = supermarket_df["Nestle Cocoa Plan"]

#Environment
supermarket_df["UTZ_environment"] = supermarket_df["UTZ"]
supermarket_df["Rainforest_alliance_environment"] = supermarket_df["Rainforest Alliance"]
supermarket_df["Fairtrade_environment"] = supermarket_df["Fairtrade"]
supermarket_df["EU-biologisch_environment"] = supermarket_df["EU-biologisch"]
supermarket_df["Cocoa_horizons_environment"] = supermarket_df["Cocoa Horizons"]
supermarket_df["Cocoa_life_environment"] = supermarket_df["Cocoa Life"]
supermarket_df["Tonys_open_chain_environment"] = supermarket_df["Tony's Open Chain"]
supermarket_df["Nestle_cocoa_plan_environment"] = supermarket_df["Nestle Cocoa Plan"]

#Control_transparency
supermarket_df["UTZ_control_transparency"] = supermarket_df["UTZ"]
supermarket_df["Rainforest_alliance_control_transparency"] = supermarket_df["Rainforest Alliance"]
supermarket_df["Fairtrade_control_transparency"] = supermarket_df["Fairtrade"]
supermarket_df["EU-biologisch_control_transparency"] = supermarket_df["EU-biologisch"]
supermarket_df["Cocoa_horizons_control_transparency"] = supermarket_df["Cocoa Horizons"]
supermarket_df["Cocoa_life_control_transparency"] = supermarket_df["Cocoa Life"]
supermarket_df["Tonys_open_chain_control_transparency"] = supermarket_df["Tony's Open Chain"]
supermarket_df["Nestle_cocoa_plan_control_transparency"] = supermarket_df["Nestle Cocoa Plan"]

In [15]:
#Insert overall ratings value into the certificate columns in the supermarket_df frame
#Social
supermarket_df['UTZ_social'] = np.where(supermarket_df["UTZ"]==1, mc.loc[mc['name'] == "UTZ", 'social_score'].item(), 0)
supermarket_df['Rainforest_alliance_social'] = np.where(supermarket_df["Rainforest Alliance"]==1, mc.loc[mc['name'] == "Rainforest Alliance", 'social_score'].item(), 0)
supermarket_df['Fairtrade_social'] = np.where(supermarket_df["Fairtrade"]==1, mc.loc[mc['name'] == "Fairtrade", 'social_score'].item(), 0)
supermarket_df['EU-biologisch_social'] = np.where(supermarket_df["EU-biologisch"]==1, mc.loc[mc['name'] == "EU-biologisch", 'social_score'].item(), 0)
supermarket_df['Cocoa_horizons_social'] = np.where(supermarket_df["Cocoa Horizons"]==1, mc.loc[mc['name'] == "Cocoa Horizons", 'social_score'].item(), 0)
supermarket_df['Cocoa_life_social'] = np.where(supermarket_df["Cocoa Life"]==1, mc.loc[mc['name'] == "Cocoa Life", 'social_score'].item(), 0)
supermarket_df['Tonys_open_chain_social'] = np.where(supermarket_df["Tony's Open Chain"]==1, mc.loc[mc['name'] == "Tony's Open Chain", 'social_score'].item(), 0)
supermarket_df['Nestle_cocoa_plan_social'] = np.where(supermarket_df["Nestle Cocoa Plan"]==1, mc.loc[mc['name'] == "Nestle Cocoa Plan", 'social_score'].item(), 0)

#Environment
supermarket_df['UTZ_environment'] = np.where(supermarket_df["UTZ"]==1, mc.loc[mc['name'] == "UTZ", 'environment_score'].item(), 0)
supermarket_df['Rainforest_alliance_environment'] = np.where(supermarket_df["Rainforest Alliance"]==1, mc.loc[mc['name'] == "Rainforest Alliance", 'environment_score'].item(), 0)
supermarket_df['Fairtrade_environment'] = np.where(supermarket_df["Fairtrade"]==1, mc.loc[mc['name'] == "Fairtrade", 'environment_score'].item(), 0)
supermarket_df['EU-biologisch_environment'] = np.where(supermarket_df["EU-biologisch"]==1, mc.loc[mc['name'] == "EU-biologisch", 'environment_score'].item(), 0)
supermarket_df['Cocoa_horizons_environment'] = np.where(supermarket_df["Cocoa Horizons"]==1, mc.loc[mc['name'] == "Cocoa Horizons", 'environment_score'].item(), 0)
supermarket_df['Cocoa_life_environment'] = np.where(supermarket_df["Cocoa Life"]==1, mc.loc[mc['name'] == "Cocoa Life", 'environment_score'].item(), 0)
supermarket_df['Tonys_open_chain_environment'] = np.where(supermarket_df["Tony's Open Chain"]==1, mc.loc[mc['name'] == "Tony's Open Chain", 'environment_score'].item(), 0)
supermarket_df['Nestle_cocoa_plan_environment'] = np.where(supermarket_df["Nestle Cocoa Plan"]==1, mc.loc[mc['name'] == "Nestle Cocoa Plan", 'environment_score'].item(), 0)

#Control_transparency
supermarket_df['UTZ_control_transparency'] = np.where(supermarket_df["UTZ"]==1, mc.loc[mc['name'] == "UTZ", 'control_transparency_score'].item(), 0)
supermarket_df['Rainforest_alliance_control_transparency'] = np.where(supermarket_df["Rainforest Alliance"]==1, mc.loc[mc['name'] == "Rainforest Alliance", 'control_transparency_score'].item(), 0)
supermarket_df['Fairtrade_control_transparency'] = np.where(supermarket_df["Fairtrade"]==1, mc.loc[mc['name'] == "Fairtrade", 'control_transparency_score'].item(), 0)
supermarket_df['EU-biologisch_control_transparency'] = np.where(supermarket_df["EU-biologisch"]==1, mc.loc[mc['name'] == "EU-biologisch", 'control_transparency_score'].item(), 0)
supermarket_df['Cocoa_horizons_control_transparency'] = np.where(supermarket_df["Cocoa Horizons"]==1, mc.loc[mc['name'] == "Cocoa Horizons", 'control_transparency_score'].item(), 0)
supermarket_df['Cocoa_life_control_transparency'] = np.where(supermarket_df["Cocoa Life"]==1, mc.loc[mc['name'] == "Cocoa Life", 'control_transparency_score'].item(), 0)
supermarket_df['Tonys_open_chain_control_transparency'] = np.where(supermarket_df["Tony's Open Chain"]==1, mc.loc[mc['name'] == "Tony's Open Chain", 'control_transparency_score'].item(), 0)
supermarket_df['Nestle_cocoa_plan_control_transparency'] = np.where(supermarket_df["Nestle Cocoa Plan"]==1, mc.loc[mc['name'] == "Nestle Cocoa Plan", 'control_transparency_score'].item(), 0)

In [16]:
#Round the certificate scores to two decimals
#Social
supermarket_df["UTZ_social"] = supermarket_df["UTZ_social"].round(decimals=2)
supermarket_df["Rainforest_alliance_social"] = supermarket_df["Rainforest_alliance_social"].round(decimals=2)
supermarket_df["Fairtrade_social"] = supermarket_df["Fairtrade_social"].round(decimals=2)
supermarket_df["EU-biologisch_social"] = supermarket_df["EU-biologisch_social"].round(decimals=2)
supermarket_df["Cocoa_horizons_social"] = supermarket_df["Cocoa_horizons_social"].round(decimals=2)
supermarket_df["Cocoa_life_social"] = supermarket_df["Cocoa_life_social"].round(decimals=2)
supermarket_df["Tonys_open_chain_social"] = supermarket_df["Tonys_open_chain_social"].round(decimals=2)
supermarket_df['Nestle_cocoa_plan_social'] = supermarket_df['Nestle_cocoa_plan_social'].round(decimals=2)

#Environment
supermarket_df["UTZ_environment"] = supermarket_df["UTZ_environment"].round(decimals=2)
supermarket_df["Rainforest_alliance_environment"] = supermarket_df["Rainforest_alliance_environment"].round(decimals=2)
supermarket_df["Fairtrade_environment"] = supermarket_df["Fairtrade_environment"].round(decimals=2)
supermarket_df["EU-biologisch_environment"] = supermarket_df["EU-biologisch_environment"].round(decimals=2)
supermarket_df["Cocoa_horizons_environment"] = supermarket_df["Cocoa_horizons_environment"].round(decimals=2)
supermarket_df["Cocoa_life_environment"] = supermarket_df["Cocoa_life_environment"].round(decimals=2)
supermarket_df["Tonys_open_chain_environment"] = supermarket_df["Tonys_open_chain_environment"].round(decimals=2)
supermarket_df['Nestle_cocoa_plan_environment'] = supermarket_df['Nestle_cocoa_plan_environment'].round(decimals=2)

#Control_transparency
supermarket_df["UTZ_control_transparency"] = supermarket_df["UTZ_control_transparency"].round(decimals=2)
supermarket_df["Rainforest_alliance_control_transparency"] = supermarket_df["Rainforest_alliance_control_transparency"].round(decimals=2)
supermarket_df["Fairtrade_control_transparency"] = supermarket_df["Fairtrade_control_transparency"].round(decimals=2)
supermarket_df["EU-biologisch_control_transparency"] = supermarket_df["EU-biologisch_control_transparency"].round(decimals=2)
supermarket_df["Cocoa_horizons_control_transparency"] = supermarket_df["Cocoa_horizons_control_transparency"].round(decimals=2)
supermarket_df["Cocoa_life_control_transparency"] = supermarket_df["Cocoa_life_control_transparency"].round(decimals=2)
supermarket_df["Tonys_open_chain_control_transparency"] = supermarket_df["Tonys_open_chain_control_transparency"].round(decimals=2)
supermarket_df['Nestle_cocoa_plan_control_transparency'] = supermarket_df['Nestle_cocoa_plan_control_transparency'].round(decimals=2)


In [17]:
#Calculate overall social & environmental rating
#Social
supermarket_df["social_rating"] = (supermarket_df["UTZ_social"] + supermarket_df["Rainforest_alliance_social"] + supermarket_df["Fairtrade_social"] + supermarket_df["EU-biologisch_social"] + supermarket_df["Cocoa_horizons_social"] + supermarket_df["Cocoa_life_social"] + supermarket_df["Tonys_open_chain_social"] + supermarket_df["Nestle_cocoa_plan_social"])

#Environment
supermarket_df["environment_rating"] = (supermarket_df["UTZ_environment"] + supermarket_df["Rainforest_alliance_environment"] + supermarket_df["Fairtrade_environment"] + supermarket_df["EU-biologisch_environment"] + supermarket_df["Cocoa_horizons_environment"] + supermarket_df["Cocoa_life_environment"] + supermarket_df["Tonys_open_chain_environment"] + supermarket_df["Nestle_cocoa_plan_environment"])

#Control_transparency
supermarket_df["control_transparency_rating"] = (supermarket_df["UTZ_control_transparency"] + supermarket_df["Rainforest_alliance_control_transparency"] + supermarket_df["Fairtrade_control_transparency"] + supermarket_df["EU-biologisch_control_transparency"] + supermarket_df["Cocoa_horizons_control_transparency"] + supermarket_df["Cocoa_life_control_transparency"] + supermarket_df["Tonys_open_chain_control_transparency"] + supermarket_df["Nestle_cocoa_plan_control_transparency"])

### Cleaning the text present
______________________
The following columns will be cleaned:
- Ingredienten
- Omschrijving

In [18]:
#Clean ingredienten string
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.lower()
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace('<span class="product-info-ingredients_containsallergen__1slys">', "")
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace('<span>', "")
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace('</span>', "")
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("ingrediënten:", "")
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("ingrediënten", "")
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("ingredi nten", "")
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("\d+", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("%", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("rainforest alliance gecertificeerd. zie voor meer informatie ra.org", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("rainforest alliance gecertificeerd. lees meer op ra.org", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("rainforest alliance gecertificeerd. www.ra.org.", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("visit info.fairtrade.net/sourcing", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("¹", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("ngrediënten:", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("www.ra.org", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("ra.org", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("'", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(":", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(";", '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(".", ' ')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(" ", ',')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("*", "", regex=True)
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("°", "", regex=True)
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("<br>", "", regex=True)
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r"\(.*\)","", regex=True)
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r'<[^>]+>', '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r'[^a-zA-Z\s]', ' ')             
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r'\s+', ',')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r'^,', '')
supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r'$', '')

  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("\d+", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("rainforest alliance gecertificeerd. zie voor meer informatie ra.org", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("rainforest alliance gecertificeerd. lees meer op ra.org", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("rainforest alliance gecertificeerd. www.ra.org.", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("visit info.fairtrade.net/sourcing", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("www.ra.org", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace("ra.org", '')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(".", ' ')
  supermarket_df['ingredienten'] = supermarket_df['ingredienten'].str.replace(r'<[^>]+>',

In [19]:
#Clean omschrijvingen string
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.lower()
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace('<span class="product-info-ingredients_containsallergen__1slys">', "")
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace('<span>', "")
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace('</span>', "")
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("ingrediënten:", "")
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("ingrediënten", "")
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("\d+", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("%", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("rainforest alliance gecertificeerd. zie voor meer informatie ra.org", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("rainforest alliance gecertificeerd. lees meer op ra.org", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("rainforest alliance gecertificeerd. www.ra.org.", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("visit info.fairtrade.net/sourcing", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("¹", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("ngrediënten:", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("www.ra.org", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("ra.org", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("'", '')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(".", ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(",", ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(":", ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(";", ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(" ", ',')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("gecertificeerte", ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("gecertificeerte", ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("*", "", regex=True)
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("°", "", regex=True)
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("<br>", "", regex=True)
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(r"\(.*\)","", regex=True)
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(r'[^a-zA-Z\s]', ' ')
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(r'\s+', ' ')  
supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.split().apply(','.join)

  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("\d+", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("rainforest alliance gecertificeerd. zie voor meer informatie ra.org", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("rainforest alliance gecertificeerd. lees meer op ra.org", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("rainforest alliance gecertificeerd. www.ra.org.", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("visit info.fairtrade.net/sourcing", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("www.ra.org", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace("ra.org", '')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(".", ' ')
  supermarket_df['omschrijving'] = supermarket_df['omschrijving'].str.replace(r'[^a-zA-Z\

____________________________
Creating a new dataframe for input in the recommender system with the following columns:
- product_naam
- omschrijving
- ingredienten
- kenmerken
- social_rating
- environment_rating
- control_transparency_rating

In [20]:
#Make new data frame out of relevant columns for the recommender system
supermarket_recommender_df = supermarket_df[['product_naam', 'omschrijving', 'ingredienten', 'kenmerken', 'social_rating','environment_rating', 'control_transparency_rating', 'sentiment_rating', 'supermarket']]

### Recommender system
--------------------------------------

In [21]:
#Placing an index in the dataframe
supermarket_recommender_df.reset_index(inplace=True)

In [22]:
#Scaling the rating columns social, environment, and control_transparency from 0 to 1 through normalizing
a, b = 0, 1
x, y = supermarket_recommender_df[['social_rating', 'environment_rating', 'control_transparency_rating']].min(), supermarket_recommender_df[['social_rating', 'environment_rating', 'control_transparency_rating']].max()
supermarket_recommender_df[['social_rating', 'environment_rating', 'control_transparency_rating']] = (supermarket_recommender_df[['social_rating', 'environment_rating', 'control_transparency_rating']] - x) / (y - x) * (b - a) + a

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  supermarket_recommender_df[['social_rating', 'environment_rating', 'control_transparency_rating']] = (supermarket_recommender_df[['social_rating', 'environment_rating', 'control_transparency_rating']] - x) / (y - x) * (b - a) + a


In [23]:
#Manually make a object containing the stop_words
stop_word = ["aan","aangaande","aangezien","achte","achter","achterna","af","afgelopen","al","aldaar","aldus","alhoewel","alias","alle","allebei","alleen","alles","als","alsnog","altijd",
"altoos","ander","andere","anders","anderszins","beetje","behalve","behoudens","beide","beiden","ben","beneden","bent","bepaald","betreffende","bij","bijna","bijv","binnen","binnenin",
"blijkbaar","blijken","boven","bovenal","bovendien","bovengenoemd","bovenstaand","bovenvermeld","buiten","bv","daar","daardoor","daarheen","daarin","daarna","daarnet","daarom","daarop",
"daaruit","daarvanlangs","dan","dat","de","deden","deed","der","derde","derhalve","dertig","deze","dhr","die","dikwijls","dit","doch","doe","doen","doet","door","doorgaand","drie","duizend",
"dus","echter","een","eens","eer","eerdat","eerder","eerlang","eerst","eerste","eigen","eigenlijk","elk","elke","en","enig","enige","enigszins","enkel","er","erdoor","erg","ergens","etc",
"etcetera","even","eveneens","evenwel","gauw","ge","gedurende","geen","gehad","gekund","geleden","gelijk","gemoeten","gemogen","genoeg","geweest","gewoon","gewoonweg","haar","haarzelf","had",
"hadden","hare","heb","hebben","hebt","hedden","heeft","heel","hem","hemzelf","hen","het","hetzelfde","hier","hierbeneden","hierboven","hierin","hierna","hierom","hij","hijzelf","hoe","hoewel",
"honderd","hun","hunne","ieder","iedere","iedereen","iemand","iets","ik","ikzelf","in","inderdaad","inmiddels","intussen","inzake","is","ja","je","jezelf","jij","jijzelf","jou","jouw","jouwe",
"juist","jullie","kan","klaar","kon","konden","krachtens","kun","kunnen","kunt","laatst","later","liever","lijken","lijkt","maak","maakt","maakte","maakten","maar","mag","maken","me","meer",
"meest","meestal","men","met","mevr","mezelf","mij","mijn","mijnent","mijner","mijzelf","minder","miss","misschien","missen","mits","mocht","mochten","moest","moesten","moet","moeten","mogen",
"mr","mrs","mw","na","naar","nadat","nam","namelijk","nee","neem","negen","nemen","nergens","net","niemand","niet","niets","niks","noch","nochtans","nog","nogal","nooit","nu","nv","of","ofschoon",
"om","omdat","omhoog","omlaag","omstreeks","omtrent","omver","ondanks","onder","ondertussen","ongeveer","ons","onszelf","onze","onzeker","ooit","ook","op","opnieuw","opzij","over","overal","overeind",
"overige","overigens","paar","pas","per","precies","recent","redelijk","reeds","rond","rondom","samen","sedert","sinds","sindsdien","slechts","sommige","spoedig","steeds","tamelijk","te","tegen","tegenover",
"tenzij","terwijl","thans","tien","tiende","tijdens","tja","toch","toe","toen","toenmaals","toenmalig","tot","totdat","tussen","twee","tweede","u","uit","uitgezonderd","uw","vaak","vaakwat","van","vanaf","vandaan",
"vanuit","vanwege","veel","veeleer","veertig","verder","verscheidene","verschillende","vervolgens","via","vier","vierde","vijf","vijfde","vijftig","vol","volgend","volgens","voor","vooraf","vooral","vooralsnog",
"voorbij","voordat","voordezen","voordien","voorheen","voorop","voorts","vooruit","vrij","vroeg","waar","waarom","waarschijnlijk","wanneer","want","waren","was","wat","we","wederom","weer","weg","wegens","weinig",
"wel","weldra","welk","welke","werd","werden","werder","wezen","whatever","wie","wiens","wier","wij","wijzelf","wil","wilden","willen","word","worden","wordt","zal","ze","zei","zeker","zelf","zelfde","zelfs","zes",
"zeven","zich","zichzelf","zij","zijn","zijne","zijzelf","zo","zoals","zodat","zodra","zonder","zou","zouden","zowat","zulk","zulke","zullen","zult",'g','kg','mg','cm','p','per','l','cl','ml','ten','minste','gram',
'kilo','kilogram','millie','milliegram','centi','centigram', 'liter', 'andere', 'bevatten', 'kan', 'landbouw', '  '
'sporen', 'uit', 'sporen', 'rainforest', 'alliance', 'fairtrade', 'gecertificeerd', 'certificeren', 'certificaat', ':', ';', 'mogelijk', 'bevat', 'voor', 'meer', 'informatie', 'ra.org', '', 'waarvan', "tony's chocolonely",
"ritter sport","côte d'or","delicata","extra","fijne","fijn","lekker","lekkere","open","chain","opgezet","www.tonysopenchain.com","samenwerkingsprincipes","werken","werkt","lees","meer","minder","op", "onder", "heerlijk",
"door", "cocoa life", "creeert", "krachtig", "onmiskenbare", "Cote dor", "bouwt", "zorgvuldige", "selectie", "ongerept", "dankt", "cocoa horizons", "toegevoegde", "toegevoegd", "reep", 'milka', 'nestlé', 'nestle',
]

In [24]:
#Fill missing features with '' so the machine will not give problems if there are no values
features = ['ingredienten', 'omschrijving']
for feature in features:
    supermarket_recommender_df[feature] = supermarket_recommender_df[feature].fillna('')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  supermarket_recommender_df[feature] = supermarket_recommender_df[feature].fillna('')


In [25]:
#combine the features into one text and make a seperate column
def combined_features(row):
    return row['omschrijving']+" "+row['ingredienten'] 
supermarket_recommender_df["combined_features"] = supermarket_recommender_df.apply(combined_features, axis =1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  supermarket_recommender_df["combined_features"] = supermarket_recommender_df.apply(combined_features, axis =1)


In [26]:
#Make a list from the combined features, remove all stopwords and empty strings, and return back into a string value
combined_features1 = supermarket_recommender_df["combined_features"]
combined_features1 = supermarket_recommender_df["combined_features"].str.split(',')
combined_features1 = combined_features1.values.tolist()

def check_about(lists:list):
    for i,j in enumerate(lists):
        if isinstance(j,list):
            check_about(j)
        else:
            lists[i]=lists[i].strip(' ')
    return lists
combined_features1 = check_about(combined_features1)

combined_features1 = [[x for x in y if x not in stop_word] for y in combined_features1]

supermarket_recommender_df["combined_features"] = combined_features1

supermarket_recommender_df["combined_features"] = supermarket_recommender_df["combined_features"].apply(lambda x: ' '.join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  supermarket_recommender_df["combined_features"] = combined_features1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  supermarket_recommender_df["combined_features"] = supermarket_recommender_df["combined_features"].apply(lambda x: ' '.join(x))


In [27]:
#Use CountVectorizer function to count all occurences of a certain word
cv = CountVectorizer()
count_matrix = cv.fit_transform(supermarket_recommender_df["combined_features"])

In [28]:
#Apply a cosine similarity matrix
cosine_sim = cosine_similarity(count_matrix)

In [29]:
#Assign social rating, environment rating, and control transparency rating to an object
social_rating = supermarket_recommender_df.social_rating 
environment_rating = supermarket_recommender_df.environment_rating
control_transparency_rating = supermarket_recommender_df.control_transparency_rating
sentiment_rating = supermarket_recommender_df.sentiment_rating

### Recommender System with Working Selection Screen
____________________________
The display functionality of the recommender system has been taken from teh python library tkinter and adapted to work with this data (2023).
Finding recommendations for an article can be done through following these steps:
- Select the supermarket of which you want to find the chocolate confectionary assortment
- Chose the product you want other recommendations for
- If you want more information on the producer of the article go to the next code block
- E.g. Use Kitkat 5-pack to be sure that news articles are available inthe next block

In [30]:
#Write a function to create the recommendations per item
def populate_product_dropdown(selected_supermarket):
    products_for_supermarket = sorted(supermarket_recommender_df[supermarket_recommender_df['supermarket'] == selected_supermarket]['product_naam'].tolist())
    combo_var.set("")
    chocolate_combo['values'] = products_for_supermarket

#Define the function to get title from index
def get_title_from_index(index):
    return supermarket_recommender_df.loc[index, ["product_naam", 'social_rating', 'environment_rating', 'control_transparency_rating', 'sentiment_rating', 'similarity_score', 'kenmerken', 'supermarket']]

def get_index_from_title(title):
    return supermarket_recommender_df[supermarket_recommender_df.product_naam == title]['index'].values[0]

def show_recommendations():
    global recommendations_list
    recommendations_list = []
    selected_supermarket = supermarket_combo_var.get()
    selected_chocolate = combo_var.get()
    chocolate_index = get_index_from_title(selected_chocolate)
    similar_chocolate = list(enumerate(((cosine_sim[chocolate_index]*0.3) + (social_rating*0.2) + (environment_rating*0.2) + (control_transparency_rating*0.2) + (sentiment_rating*0.1))))
    supermarket_recommender_df['similarity_score'] = cosine_sim[chocolate_index]
    sorted_similar_chocolate = sorted(similar_chocolate, key=lambda x:x[1], reverse=True)
    
    #Clear previous recommendations
    result_text.delete(1.0, tk.END)
    
    #Define output format
    output_format = (
        "Recommendation {}: {}\n"
        "social_rating: {:.2f}\n"
        "environment_rating: {:.2f}\n"
        "control_transparency_rating: {:.2f}\n"
        "sentiment_rating: {:.2f}\n"
        "similarity_score: {:.2f}\n"
        "kenmerken: {}\n"
    )

    displayed_recommendations = 0
    for i, chocolate in enumerate(sorted_similar_chocolate, start=1):
        recommendation = get_title_from_index(chocolate[0])
        if recommendation['supermarket'] == selected_supermarket:
            formatted_output = output_format.format(
                displayed_recommendations + 1,
                recommendation[0],
                recommendation[1],
                recommendation[2],
                recommendation[3],
                recommendation[4],
                recommendation[5],
                recommendation[6]
            )
            result_text.insert(tk.END, formatted_output + "\n")
            displayed_recommendations += 1
            recommendations_list.append(recommendation[0])
        
        #Stop if the 5 recommendations are reached
        if displayed_recommendations >= 5:
            break 

#Create the main window
root = tk.Tk()
root.title("Chocolate Recommender")
window_width = 900
window_height = 900
root.geometry(f"{window_width}x{window_height}")

#Add a label for instructions
instructions_label = tk.Label(root, text="Select the supermarket and chocolate confectionary product you want to have a different option for", font=("Helvetica", 12, "bold"), bg="white")
instructions_label.grid(row=0, column=0, padx=10, pady=10, columnspan=2)

#Create a frame for supermarket selection
supermarket_frame = tk.Frame(root, bg="white")
supermarket_frame.grid(row=1, column=0, padx=10, pady=10, sticky="w")

#Create a label for supermarket selection
supermarket_label = tk.Label(supermarket_frame, text="Supermarket:", bg="white")
supermarket_label.grid(row=0, column=0, padx=5, pady=5)

#Create a dropdown window for supermarket selection
supermarket_options = sorted(supermarket_recommender_df['supermarket'].unique())
supermarket_combo_var = tk.StringVar()
supermarket_combo = ttk.Combobox(supermarket_frame, textvariable=supermarket_combo_var, values=supermarket_options, width=30)
supermarket_combo.grid(row=0, column=1, padx=5, pady=5)
supermarket_combo.bind("<<ComboboxSelected>>", lambda event: populate_product_dropdown(supermarket_combo_var.get()))

#Create a frame for chocolate selection
chocolate_frame = tk.Frame(root, bg="white")
chocolate_frame.grid(row=2, column=0, padx=10, pady=10, sticky="w")

#Create a label for chocolate selection
chocolate_label = tk.Label(chocolate_frame, text="Chocolate:", bg="white")
chocolate_label.grid(row=0, column=0, padx=5, pady=5)

#Create a dropdown window for chocolate selection
chocolate_options = sorted(supermarket_recommender_df['product_naam'].tolist())
combo_var = tk.StringVar()
chocolate_combo = ttk.Combobox(chocolate_frame, textvariable=combo_var, values=chocolate_options, width=30)
chocolate_combo.grid(row=0, column=1, padx=5, pady=5)

#Create a frame for buttons
button_frame = tk.Frame(root, bg="white")
button_frame.grid(row=3, column=0, padx=10, pady=10, columnspan=2)

#Create a "Check other options" button
check_button = tk.Button(button_frame, text="Check more sustainable options", command=show_recommendations)
check_button.grid(row=0, column=0, padx=5, pady=5)

#Create a text widget to display recommendations
result_text = tk.Text(root, height=42, width=110)
result_text.grid(row=4, column=0, padx=10, pady=10, columnspan=2)
root.mainloop()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  supermarket_recommender_df['similarity_score'] = cosine_sim[chocolate_index]


### Get information on the recommendations for the product selected
----------------------
To find more information on the producer of the chocolate confectionary recommendations do the following:
- Select the one of the recommendations to find out more on the producer
- Select the topic which you want news articles on
- Find the title and an initial previeuw of the text.
- If the article is interesting and the the user wants to view it complete it is possible to click the link and find the complete article

In [31]:
#Create special dataframes for the different topics above the minimal threshold
social_news_articles_df = gdelt_article_df[gdelt_article_df['social_sustainability'] > 0.65]
environmental_news_articles_df = gdelt_article_df[gdelt_article_df['environmental_sustainability'] > 0.65]
economic_news_articles_df = gdelt_article_df[gdelt_article_df['economic_sustainability'] > 0.65]

# Create the main window
root = tk.Tk()
root.title("Recommendation Articles")

# Create a variable to store the selected product name
selected_product_name = tk.StringVar()
selected_product_name.set(recommendations_list[0])  # Set default value

# Create a dropdown for product name selection
product_dropdown = ttk.Combobox(root, textvariable=selected_product_name, values=recommendations_list, width=100)
product_dropdown.grid(row=0, column=0, padx=10, pady=10, columnspan=3)

# Define a function to display relevant articles
def show_articles(topic):
    selected_producer = supermarket_product_df[
        supermarket_product_df['product_naam'] == selected_product_name.get()
    ]['leverancier'].values[0]
    
    selected_producer = selected_producer.lower()
    print(f"Leverancier for selected product: {selected_producer}")  # Print leverancier

    if topic == 'Social':
        relevant_articles = social_news_articles_df[
            social_news_articles_df['producer_in_article'].apply(lambda x: selected_producer in x)
        ]
    elif topic == 'Economic':
        relevant_articles = economic_news_articles_df[
            economic_news_articles_df['producer_in_article'].apply(lambda x: selected_producer in x)
        ]
    elif topic == 'Environmental':
        relevant_articles = environmental_news_articles_df[
            environmental_news_articles_df['producer_in_article'].apply(lambda x: selected_producer in x)
        ]
    
    #Limit to the top 3 articles or less if there are fewer matching articles
    relevant_articles = relevant_articles.head(3)
    
    #Create a text widget to display the relevant articles
    articles_text.delete(1.0, tk.END)  # Clear previous text
    if relevant_articles.empty:
        articles_text.insert(tk.END, f"Producer not mentioned in {topic.lower()} articles\n")
    else:
        for _, article in relevant_articles.iterrows():
            
            print(article['url'])
            #Create the hyperlink text
            complete_article_text = "Click here to view the complete article"
            
            #Get the first 100 characters of the article text
            short_article_text = article['text'][:2]
            
            #Link
            url= article['url']

            #Combine all information
            article_info = (
                f"Title: {article['title']}\n"
                f"Text: {short_article_text}...\n"
                f"Paste in browser for article: {url}\n"
            )
            
            #Insert the article info without the URL
            articles_text.insert(tk.END, article_info)
            
            #Insert the clickable link
            #articles_text.tag_configure("link", foreground="blue", underline=True)
            #articles_text.insert(tk.END, complete_article_text, ("link", article['url']))
            #articles_text.tag_bind("link", "<Button-1>", lambda event, url=url: webbrowser.open_new(url))
            
            # Add a newline after the link
            articles_text.insert(tk.END, "\n\n")

#Create buttons for sorting and displaying articles
environmental_button = tk.Button(root, text="Environmental", command=lambda: show_articles('Environmental'))
environmental_button.grid(row=1, column=0, padx=10, pady=10)

social_button = tk.Button(root, text="Social", command=lambda: show_articles('Social'))
social_button.grid(row=1, column=1, padx=10, pady=10)

economic_button = tk.Button(root, text="Economic", command=lambda: show_articles('Economic'))
economic_button.grid(row=1, column=2, padx=10, pady=10)

#Create a text widget to display articles
articles_text = tk.Text(root, height=42, width=110)
articles_text.grid(row=2, column=0, padx=10, pady=10, columnspan=3)
root.mainloop()

Leverancier for selected product: nestle
https://www.sfchronicle.com/food/article/Popular-chocolate-makers-are-being-sued-for-child-16093197.php
https://www.thesun.co.uk/news/13812316/child-labour-slavery-weekly-shop-violence-torture/
https://www.latimes.com/business/story/2019-12-17/africa-big-chocolate


### Bibliography
1) Azouagh, T. (2022). Leveraging technology to evaluate the sustainability of retail products and make it transparent to consumers. Unpublished master's thesis, Amsterdam University of Applied Sciences.
2) Python Software Foundation. (n.d.). Tkinter - Python interface to Tcl/Tk. Retrieved August 23, 2023, from https://docs.python.org/3/library/tkinter.html