Happy employees -> Happy customers

Kind of metrics to use to indicate that employee is happy?

To use base conda environment to run scraping codes

### Scraping codes to scrape glassdoor's DBS reviews (Sort to show the most recent reviews)

Countries of interest: Singapore, Taiwan, Hong Kong, India, China, Indonesia

Positions of interest: Contract, Part-time, Full-time

To select companies based on P&L 

Group 1: Foreign banks in Singapore: Source (https://sgbanks.com)
(JP Morgan Singapore, Goldman Sachs)
 
Group 2: Local Banks (UOB, OCBC)
- UOB ratings - 3.5
- OCBC ratings - 3.6

Group 3: Source (https://www.glassdoor.com/Explore/top-information-technology-companies-singapore_IS.4,26_ISEC10013_IL.37,46_IM1123.htm) 
Tech Companies (Google, Apple, Microsoft)

### Steps to be automated:
1. On page 1, clicking on "Continue Reading" for each review will cause a pop up to appear
2. Click on 'x' for pop up
3. After scraping all info for 1st page, move on to next page
4. A 'sign up' screen will then appear
5. Click on blue 'sign in' button
6. Input email and password into 'sign in' fields and click on blue 'sign in' button
7. After step 6, will land on pg 2
8. Proceed to click on 'Continue Reading' for all reviews. But this time, no pop up will appear
9. Scrape wanted info and move on to next page
10. Repeat step 9 for the rest of the pages

*Dataframe to be saved as xlsx as saving in csv may have some problem*

`scraping_glassdoor_v2` is slower than `scraping_glassdoor_v1` but will be able to get around certain exceptions during scraping
- `scraping_glassdoor_v2` will be able to scrape greater variety of websites than `scraping_glassdoor_v1`

In [None]:
# https://www.glassdoor.sg/Reviews/Google-Reviews-E9079_P2.htm?sort.sortType=RD&sort.ascending=false&filter.iso3Language=eng&filter.employmentStatus=PART_TIME&filter.employmentStatus=CONTRACT&filter.employmentStatus=REGULAR

In [1]:
# Codes to scrape Glassdoor for reviews like DBS - those that do not produce any exception
import time
import math
import requests
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

def scraping_glassdoor_v1(num_reviews:int, url:str, email:str, password:str):
    
    '''
    num_reviews: Number of reviews to be retrieved
    url: Input url that reviews should be scraped from (vary according to country of interest)
    email: Input email that is registered on glassdoor
    password: Input password 
    '''

    date_occupation=[]
    rating=[]
    pros_list=[]
    cons_list=[]
    country_list=[]
    page_list=[]
    current_or_former_employee=[]
    recommended=[]
    ceoApproval=[]
    business_Outlook=[]
    

    driver = webdriver.Chrome(r'C:\Webdriver\chromedriver')
    driver.get(url)
    
    for i in range(math.ceil(num_reviews/10)):
        
        intermediate_page_list=[]

        if EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='v2__EIReviewDetailsV2__continueReading']")):
            time.sleep(5)
            if i == 0: # Do the following steps if it is the first page

                # If there are "continue reading" buttons, click on those to show full reviews
                for review in driver.find_elements_by_class_name("v2__EIReviewDetailsV2__continueReading"):
                    review.click()

                    # If prompt appears, click x to close it 
                    try:
                        driver.find_element_by_class_name("modal_closeIcon").click()  #clicking on the X.
                        driver.find_element_by_class_name("modal_closeIcon-svg").click()  #clicking on the X.
                    except NoSuchElementException:
                        pass
            else:
                for review in driver.find_elements_by_class_name("v2__EIReviewDetailsV2__continueReading"):
                    driver.execute_script("var x= document.getElementsByClassName('v2__EIReviewDetailsV2__continueReading')[0];"+"x.click();")

            source = driver.page_source
            soup = BeautifulSoup(source,'lxml')
            
            # To obtain recommended/ ceo approval/ business outlook
            for div in soup.find_all('div', attrs={'class':'mt-xxsm'}):
                intermediate=[]
                j=0
                for div2 in div.div.div.div.div:
                    j+=1
                    for div3 in div2:
                        try:
                            if j % 3 == 1:
                                if str(div3.svg.get('class')[1]) == 'css-10xv9lv-svg':
                                    recommended.append("Neutral")
                                elif str(div3.svg.get('class')[1]) == "css-hcqxoa-svg":
                                    recommended.append("Recommended")
                                else:
                                    recommended.append("Not Recommended")
                            elif j % 3 == 2:
                                if str(div3.svg.get('class')[1]) == 'css-10xv9lv-svg' or str(div3.svg.get('class')[1]) == 'css-1h93d4v-svg':
                                    ceoApproval.append("Neutral")
                                elif str(div3.svg.get('class')[1]) == "css-hcqxoa-svg":
                                    ceoApproval.append("Recommended")
                                else:
                                    ceoApproval.append("Not Recommended")
                            else:
                                if str(div3.svg.get('class')[1]) == 'css-10xv9lv-svg' or str(div3.svg.get('class')[1]) == 'css-1h93d4v-svg':
                                    business_Outlook.append("Neutral")
                                elif str(div3.svg.get('class')[1]) == "css-hcqxoa-svg":
                                    business_Outlook.append("Recommended")
                                else:
                                    business_Outlook.append("Not Recommended")
                        except:
                            pass
            
            for date_occ in soup.find_all('span', attrs={'class':'authorJobTitle middle common__EiReviewDetailsStyle__newGrey'}):
                date_occupation.append(date_occ.text)

            # Add all ratings into ratings list
            for ratings in soup.find_all('span', attrs={'class':'ratingNumber mr-xsm'}):
                rating.append(ratings.text)
            
            # Get employee type (current/ former employee)
            for employee_type in soup.find_all('span', attrs={'class':'pt-xsm pt-md-0 css-1qxtz39 eg4psks0'}):
                current_or_former_employee.append(employee_type)

            # Add all pros into pros_list
            for pros in soup.find_all('span', attrs={'data-test':'pros'}):
                pros_list.append(pros.text)

            # Add all cons into cons list
            for cons in soup.find_all('span', attrs={'data-test':'cons'}):
                cons_list.append(cons.text)
            
            # To go to next page
            WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='nextButton css-1hq9k8 e13qs2071']")))
            driver.execute_script("var x= document.getElementsByClassName('nextButton css-1hq9k8 e13qs2071')[0];"+"x.click();")

            
            if i == 0: # Sign in pop up will only appear for page 1
                if EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='link ml-xxsm']")):

                    # Click on Sign in button (after page 1) to get from sign up to sign in page
                    WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='link ml-xxsm']")))
                    driver.execute_script("var x= document.getElementsByClassName('link ml-xxsm')[0];"+"x.click();")

                    email_field = driver.find_element(By.ID,'hardsellUserEmail')
                    password_field = driver.find_element(By.ID,'hardsellUserPassword')

                    email_field.send_keys(email)
                    password_field.send_keys(password)

                    # Click sign in button after inputting login credentials
                    WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='gd-ui-button mt-std minWidthBtn css-14xfqow evpplnh0']")))
                    driver.execute_script("var x= document.getElementsByClassName('gd-ui-button mt-std minWidthBtn css-14xfqow evpplnh0')[0];"+"x.click();")
                    
    country_list.append(soup.find_all('h1', attrs={'class':'eiReviews__EIReviewsPageStyles__newPageHeader'})[0])
    country_list = country_list * num_reviews
    
    return date_occupation, rating, pros_list, cons_list, country_list, page_list, current_or_former_employee, recommended, ceoApproval, business_Outlook

date_occupation, rating, pros_list, cons_list, country_list, page_list, current_or_former_employee, recommended, ceoApproval, business_Outlook = scraping_glassdoor_v1(4608, 'https://www.glassdoor.sg/Reviews/DBS-Bank-Reviews-E611812.htm?sort.sortType=RD&sort.ascending=false&filter.iso3Language=eng&filter.employmentStatus=CONTRACT&filter.employmentStatus=PART_TIME&filter.employmentStatus=REGULAR',\
                                                                                                                                                                    "jlow9842@yahoo.com",
                                                                                                                                                                    "Sawadatsunayoshi98")

  driver = webdriver.Chrome(r'C:\Webdriver\chromedriver')
  for review in driver.find_elements_by_class_name("v2__EIReviewDetailsV2__continueReading"):
  driver.find_element_by_class_name("modal_closeIcon").click()  #clicking on the X.
  driver.find_element_by_class_name("modal_closeIcon-svg").click()  #clicking on the X.
  for review in driver.find_elements_by_class_name("v2__EIReviewDetailsV2__continueReading"):


In [3]:
date_occupation_df = pd.DataFrame(date_occupation, columns = ['Date_occupation'])
date_occupation_df[['Date', 'Occupation']]=date_occupation_df['Date_occupation'].str.split("-", n=1, expand=True)
date_occupation_df=date_occupation_df.drop(['Date_occupation'],axis=1)

pros_df = pd.DataFrame(pros_list, columns = ['Pros'])
cons_df = pd.DataFrame(cons_list, columns = ['Cons'])
rating_df = pd.DataFrame(rating, columns = ['Rating'])
country_df = pd.DataFrame(country_list, columns = ['Review_type'])
current_or_former_df = pd.DataFrame(current_or_former_employee, columns = ['Employee Type'])
recommended_df = pd.DataFrame(recommended, columns = ['Recommended Or Not'])
ceoApproval_df = pd.DataFrame(ceoApproval, columns = ['CEO Approval'])
business_Outlook_df = pd.DataFrame(business_Outlook, columns = ['Business Outlook'])

full_df = pd.concat([country_df, date_occupation_df, current_or_former_df, pros_df, cons_df, recommended_df, ceoApproval_df, business_Outlook_df, rating_df], axis=1)

full_df.to_excel("..\Employee Reviews\DBS Reviews\All DBS Reviews.xlsx", index=False)
# full_df
full_df.drop_duplicates()

Unnamed: 0,Review_type,Date,Occupation,Employee Type,Pros,Cons,Recommended Or Not,CEO Approval,Business Outlook,Rating
0,DBS Bank Reviews,27 Jun 2022,Software Engineer,"Current Employee, more than 5 years",Good work life balance and salary,"Hierarchical, does not listen to feedback from...",Recommended,Recommended,Recommended,4.0
1,DBS Bank Reviews,27 Jun 2022,Senior Associate,Current Employee,Annual leave granted per year is good,Number of carry forward leave allow is too little,Neutral,Neutral,Neutral,4.0
2,DBS Bank Reviews,27 Jun 2022,Senior Associate,Current Employee,Infrastructure is good and the ambience,No work Life balance.. managers force to work ...,Not Recommended,Neutral,Neutral,1.0
3,DBS Bank Reviews,27 Jun 2022,Senior Data Analyst,"Former Contractor, more than 1 year",opportunities to work on interesting and chall...,"some politics, heavy workload on some projects",Neutral,Neutral,Neutral,3.0
4,DBS Bank Reviews,27 Jun 2022,Graduate Associate,"Current Employee, more than 1 year",you get exposed to multiple skillsets,you need to get exposed to multiple skillsets,Neutral,Neutral,Neutral,4.0
...,...,...,...,...,...,...,...,...,...,...
4594,DBS Bank Reviews,20 Oct 2009,Analyst,Former Employee,the brand name of the company,my boss was a slave driver who did not promote...,Not Recommended,Neutral,Neutral,2.0
4595,DBS Bank Reviews,27 Aug 2009,Assistant Vice President,Former Employee,"The brand name in the Asia arena; stability, l...","Very bureaucratic. Seniority matters the most,...",Neutral,Neutral,Neutral,3.0
4596,DBS Bank Reviews,3 Aug 2009,Assistant Vice President,Current Employee,people are not the smartest around; hence very...,far too many deadwood esp in middle management...,Neutral,Not Recommended,Neutral,2.0
4597,DBS Bank Reviews,22 Aug 2008,Vice President - DII,Former Employee,"DBS has good job security for personel, the pa...","DBS is a big dinosaur, as in slow to respond t...",Not Recommended,Neutral,Neutral,3.0


In [5]:
import pandas as pd
intermediate1 = pd.read_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM_Reviews_1.csv')
intermediate2 = pd.read_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM_Reviews_2.csv')
intermediate3 = pd.read_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM_Reviews_3.csv')
intermediate4 = pd.read_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM_Reviews_4.csv')
intermediate5 = pd.read_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM_Reviews_5.csv')
intermediate6 = pd.read_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM_Reviews_6.csv')

full_df = pd.concat([intermediate1, intermediate2, intermediate3, intermediate4, intermediate5, intermediate6],axis=0)
full_df = full_df[~pd.isna(full_df['Date'])] 
full_df.drop_duplicates().reset_index().drop(['index'], axis=1)#.to_csv(r'C:\Users\jingh\OneDrive\Desktop\DBS Internship stuff\Employee Reviews Documents\Employee Reviews\JPM Reviews\All_JPM.csv', index=False)
# intermediate.drop_duplicates()

### AmbitionBox Scraping Codes

In [103]:
# url = 'https://www.ambitionbox.com/reviews/dbs-bank-reviews?sort_by=latest&location=hyderabad,mumbai,chennai,pune,bengaluru,kolkata,new-delhi,salem,karur,coimbatore,delhi-ncr,vijayawada,navi-mumbai,surat,gurgaon,bhupalpalle-telangana,erode,durg,noida,bangalore-rural,oddanchatram,ludhiana,puducherry,chittoor,visakhapatnam,warangal,nashik,madurai,rajahmundry,namakkal,kadapa,ponnur,mahbubnagar,vijayapura,karaikal,nellore,tindivanam,ahmedabad,tiruppur,kolhapur,rajapalayam,virudhunagar,anand,devakottai-tamil-nadu,srikakulam,pudukkottai,jaipur,navsari,ballari,mandya,tiruchirappalli,ranchi,chenani-jammu-and-kashmir,dharwad,dindigul,palayamkottai-tamil-nadu,amalapuram,shivamogga,gandhinagar,dhamtari,ongole,anantapur,ravulapalem,khammam,ashta,markapur,guntur,hosur,sirsi,ranibennur,cuddalore,tirupati,gandhidham,bhilwara,viluppuram,chitradurga,thanjavur,mannargudi-tamil-nadu,thane,dubai,cumbum-andhra-pradesh,peravurani,faridabad,proddatur,mangaluru,bhubaneswar,hosapete,chityal-telangana,rajkot,mangalagiri,jabalpur,mumbai-suburban-maharashtra,palakkad'
date_list = []
occupation_location_list = []
department_list = []
ratings_list = []
likes = []
dislikes = []
content_type_list = []
content_list = []

for i in range(1,10):
    url_front = "https://www.ambitionbox.com/reviews/dbs-bank-reviews?page="
    url_end = "&location=hyderabad,mumbai,chennai,pune,bengaluru,kolkata,new-delhi,salem,karur,coimbatore,delhi-ncr,vijayawada,navi-mumbai,surat,gurgaon,bhupalpalle-telangana,erode,durg,noida,bangalore-rural,oddanchatram,ludhiana,puducherry,chittoor,visakhapatnam,warangal,nashik,madurai,rajahmundry,namakkal,kadapa,ponnur,mahbubnagar,vijayapura,karaikal,nellore,tindivanam,ahmedabad,tiruppur,kolhapur,rajapalayam,virudhunagar,anand,devakottai-tamil-nadu,srikakulam,pudukkottai,jaipur,navsari,ballari,mandya,tiruchirappalli,ranchi,chenani-jammu-and-kashmir,dharwad,dindigul,palayamkottai-tamil-nadu,amalapuram,shivamogga,gandhinagar,dhamtari,ongole,anantapur,ravulapalem,khammam,ashta,markapur,guntur,hosur,sirsi,ranibennur,cuddalore,tirupati,gandhidham,bhilwara,viluppuram,chitradurga,thanjavur,mannargudi-tamil-nadu,thane,dubai,cumbum-andhra-pradesh,peravurani,faridabad,proddatur,mangaluru,bhubaneswar,hosapete,chityal-telangana,rajkot,mangalagiri,jabalpur,mumbai-suburban-maharashtra,palakkad&sort_by=latest"
    url = url_front + str(i) + url_end
    driver = webdriver.Chrome(r'C:\Webdriver\chromedriver')
    driver.get(url)

    # Login into linkedin account in order to view all reviews
    driver.execute_script("var x= document.getElementsByClassName('purple-btn')[1];"+"x.click();")
    driver.find_elements_by_class_name("social-login-button")[2].click()

    email_field = driver.find_element(By.ID, 'username')
    password_field = driver.find_element(By.ID, 'password')

    email_field.send_keys("jinghan987@gmail.com")
    password_field.send_keys("Sawadatsunayoshi98")
    driver.execute_script("var x= document.getElementsByClassName('btn__primary--large')[0];"+"x.click();")
    time.sleep(2)
    driver.execute_script("var x= document.getElementsByClassName('modal-default-button right')[0];"+"x.click();")

    for review in driver.find_elements_by_class_name("read-more sbold-list-header"):
        driver.execute_script("var x= document.getElementsByClassName('read-more sbold-list-header')[0];"+"x.click();")
    
    source = driver.page_source
    soup = BeautifulSoup(source,'lxml')
    
    for date in soup.find_all('span', attrs={'class':'status caption-subdued'}):
        date_list.append(date)
    
    # Content refers to "likes"/ "dislikes"/ "work details" within each review
    for content_type in soup.find_all('h3', attrs={'class':'input-fields sub-heading'}):
        if content_type.text == "Likes" or content_type.text == "Dislikes":
            content_type_list.append(content_type.text)
    
    for content in soup.find_all('p', attrs={'class':'body-medium overflow-wrap'}):
        content_list.append(content.text)
        
content_type_list

  driver = webdriver.Chrome(r'C:\Webdriver\chromedriver')
  driver.find_elements_by_class_name("social-login-button")[2].click()
  for review in driver.find_elements_by_class_name("read-more sbold-list-header"):


['Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Likes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',
 'Dislikes',
 'Likes',


In [104]:
content_list

['Management is listening emloyees and solving employee pain points.',
 'High volume of work.',
 'Equality and the Best treatment',
 'Nothing',
 'Great office space. Decent learning opportunities. Free cab to office.',
 'Should provide lunch at office. Tough to get time for extra learning apart from job related tech stack.',
 'Company culture',
 'Health care benifits',
 'Best in market',
 'Fabulous performance',
 'No mircromanagment,\r\nLota if perks and gifts\r\nCool looking office',
 'Tech decision taken by non tech managers\r\nAlways in hurry for new release, nobidy takes care of code quality.',
 'You will get to learn a lot',
 "Given opportunity's to learn new skills and experiences ",
 'Good working culture',
 'Ctc is not market benchmarked',
 'Good team working',
 'I have no drawbacks',
 'Work Culture ',
 'No ',
 'You will be stagnant no growth opportunities in long run company is good in case you are planning pregnancy or just want to settle down with out pay increase ',
 'Work 