# Mapping Rating IDs to Categories in Kununu Reviews
-------------------

> <i>Description: In this notebook, we analyze and map rating IDs from Kununu reviews to specific categories, we use in our classification and analysis.</i>

We will use output excel file to add more specific keywords to our categories, calculate sentiment for textual data from rating text in Kununu reviews and map score ids into categories as well.

Input Files: 
1) reviews_merged.csv

Output:
1) Kununu_rating_ids_mapped.xlsx

In [None]:
import pandas as pd
import numpy as np
import re
from collections import Counter

* reviews_merged.csv is a result of merging Glassdoor and Kununu translated files. 

In [3]:
df = pd.read_csv('reviews_merged.csv')
print(df.head())
# To see what type of data is in ratings_translated
print(type(df['ratings_translated'].iloc[0]))

Unnamed: 0,uuid,date,year,rating,position,position_code,department,pros,cons,suggestion,ratings_translated,country,file,concatenated_ratings,summary_translated
0,ca6e64a6-c45e-4b04-9d85-8ff633cbe289,2024-09-21 00:00:00+00:00,2024,3.7,employee,1.0,Corporate,,Far too much is spent on the campus instead of...,"Too many departments, responsibilities, so tha...","[{'id': 'salary', 'score': 2, 'roundedScore': ...",Germany,kununu,salary: Worse than expected. No salary increas...,
1,b11b7978-d151-4249-a747-3ba7501e1bad,2024-09-05 00:00:00+00:00,2024,3.3,employee,1.0,Logistics,A great employer with lots of opportunities. W...,Already mentioned.,Look more closely over the shoulders of the ma...,"[{'id': 'atmosphere', 'score': 4, 'roundedScor...",Germany,kununu,"atmosphere: Great colleagues, leadership okay,...",
2,fe76c408-b3a7-4e8d-be08-4bb67d0868da,2024-08-30 00:00:00+00:00,2024,4.8,apprentice,2.0,Logistics,The many employee benefits in which you partic...,,,"[{'id': 'apprenticeshiptasks', 'score': 4, 'ro...",Germany,kununu,"tasks: Partly very redundant, but still intere...",
3,fdd82a74-7524-4567-9844-1e3eed861f8c,2024-08-17 00:00:00+00:00,2024,3.8,contractor,3.0,Retail,The pay is OK. It's easy to talk to superiors ...,,Offer a little more real-life balance for empl...,"[{'id': 'atmosphere', 'score': 5, 'roundedScor...",Germany,kununu,,
4,977c452c-f91c-4f2f-ae7d-b26e045eaddb,2024-08-01 00:00:00+00:00,2024,4.5,employee,1.0,Corporate,"Good image, great products and many benefits f...",career opportunities,improve career opportunities,"[{'id': 'atmosphere', 'score': 5, 'roundedScor...",Germany,kununu,atmosphere: Working in an open-plan office tak...,


In [5]:
# Function to extract 'id' values from strings
def extract_ids_from_string(x):
    """
    Extracts 'id' values from a JSON-like string using regular expressions.

    Parameters:
    - x (str): A string containing nested JSON-like structures with 'id' key-value pairs.

    Returns:
    - list: A list of extracted 'id' values as strings. If the input is not a string, returns an empty list.
    """
    if isinstance(x, str):
        # Use a regex to find patterns like 'id': 'something'
        ids = re.findall(r"'id': '(\w+)'", x)
        return ids
    else:
        return []  # If it's not a string, return an empty list

In [6]:
# Applying the function to the 'ratings_translated' column
id_list = df['ratings_translated'].apply(extract_ids_from_string)

# Flattening the list of lists and count occurrences of each id
id_flat = [item for sublist in id_list for item in sublist]
id_counts = Counter(id_flat)

# Converting the counts to a DataFrame for better readability
id_counts_df = pd.DataFrame(id_counts.items(), columns=['id', 'count'])

print(id_counts_df)

Unnamed: 0,id,count
0,salary,587
1,communication,583
2,atmosphere,589
3,image,584
4,workLife,582
5,career,579
6,environment,579
7,teamwork,587
8,oldColleagues,560
9,leadership,588


In [7]:
# Mapping each id name into a category:
id_to_category = {
    "salary": "Financial Compensation & Benefits",
    "communication": "Leadership & Communication",
    "atmosphere": "Collaboration & Teamwork & Social Culture",
    "image": "Authenticity",
    "workLife": "Work-Life Balance & Flexibility",
    "career": "Professional Development and Continuous Learning",
    "environment": "Collaboration & Teamwork & Social Culture",
    "teamwork": "Collaboration & Teamwork & Social Culture",
    "oldColleagues": "Collaboration & Teamwork & Social Culture",
    "leadership": "Leadership & Communication",
    "workConditions": "Work-Life Balance & Flexibility",
    "equality": "Diversity & Equity & Inclusion",
    "tasks": "Professional Development and Continuous Learning",
    "apprenticeshiptasks": "Professional Development and Continuous Learning",
    "apprenticeshipatmosphere": "Collaboration & Teamwork & Social Culture",
    "apprenticeshipcareer": "Professional Development and Continuous Learning",
    "apprenticeshipworkHours": "Work-Life Balance & Flexibility",
    "apprenticeshipsalary": "Financial Compensation & Benefits",
    "apprenticeshipsupervisor": "Leadership & Communication",
    "apprenticeshipfun": "Collaboration & Teamwork & Social Culture",
    "apprenticeshipvariation": "Professional Development and Continuous Learning",
    "apprenticeshiprespect": "Collaboration & Teamwork & Social Culture"
}

id_counts_df['category'] = id_counts_df['id'].map(id_to_category)
df = id_counts_df
print(df)

In [12]:
# Saving new df file as excel file
df.to_excel('Kununu_rating_ids_mapped.xlsx')

### End of the notebook