# Web scraping career sites with Selenium

The challenge here was to scrape job listing results from several oil company career sites, then amalgamate the results into a single, searchable DataFrame. Scraping data from dynamic web pages is, it seems, no easy task and the code is difficult to maintain. But it's been a good learning experience.

New to web scraping, I quickly realised these sites are difficult to scrape as the search results are paginated (spread across multiple pages), e.g. 25 results at a time. The script must navigate to and iterate over each set of results, appending each to a DataFrame. Also, as the content is dynamic and prone to change, the script is not robust and can easily break. Of course, each site is entirely unique and the format of each job listing can be inconsistent.

The script uses Selenium module to perform webscraping of two sites, Hallibuton and BP career sites, as I found these to be the easier to scrape. It retreives only the job's headline information (role, location, etc.), and not the detailed content behind each listing. 

conda activate webscrape
environment webscrape.txt

In [1]:
import csv
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
import math

pd.set_option('display.max_colwidth', None)

# Useful links
# https://www.edureka.co/blog/web-scraping-with-python/
# Crawling Pages with Selenium: https://www.youtube.com/watch?v=Z3vFdtZ7d-g
# https://stackoverflow.com/questions/48734820/parsing-a-site-where-url-doesnt-change-with-selenium-python

Career sites that are scrapable:

<b>Halliburton</b>
    # https://jobs.halliburton.com/search/?q=&sortColumn=referencedate&sortDirection=desc&startrow=
    
<b>BP (two sites return the same results)</b>
    # https://www.bp.com/bpcareersnavapp/home/KeywordJobSearch
    # https://careers.bpglobal.com/TGnewUI/Search/home/HomeWithPreLoad?PageType=JobDetails&partnerid=25078&siteid=5012&jobid=127642#keyWordSearch=&locationSearch=

## Halliburton

Upon running this cell a new Chrome browser window will open and navigate to:<br>
https://jobs.halliburton.com/search/?q=&sortColumn=referencedate&sortDirection=desc&startrow=

The script will iterate over each page of results, appending results to lists which will be converted to a DataFrame.

In [16]:
# ChromeDriver is a separate executable that Selenium WebDriver uses to control Chrome. 
driver = webdriver.Chrome("C:/webdrivers/chromedriver.exe")

# Calculate number of pages to iterate over (max_page_num)
url = "https://jobs.halliburton.com/search/?q=&sortColumn=referencedate&sortDirection=desc&startrow="
driver.get(url)

# Find the number of results returned by search
num_results = driver.find_element_by_xpath("""//*[@id="content"]/div[2]/div/div[4]/div/div[1]/div/div/div/span[1]/b[2]""")    

num = int(num_results.text)/25
max_page_num = math.ceil(num)

# start_row is 0, then add 25 each iteration
start_row = 0

data = pd.DataFrame([])

titles_list = []
locations_list = []
dates_list = []
list_of_hrefs = []

# Append each resulting record to the DataFrame

for i in range(1, max_page_num + 1):

    url = "https://jobs.halliburton.com/search/?q=&sortColumn=referencedate&sortDirection=desc&startrow=" + str(start_row)
    
    start_row += 25

    driver.get(url)
    
    titles = driver.find_elements_by_class_name("colTitle")
    locations = driver.find_elements_by_css_selector(".colLocation.hidden-phone")
    dates = driver.find_elements_by_css_selector(".colDate.hidden-phone")
    
    content_blocks = driver.find_elements_by_class_name("jobTitle")

    for block in content_blocks:
        elements = block.find_elements_by_tag_name("a")
        for el in elements:
            list_of_hrefs.append(el.get_attribute("href"))
            
    list_of_hrefs.pop(0)

    #print(list_of_hrefs) 

    num_page_items = len(titles)
            
    for i in range(num_page_items):        
        titles_list.append(titles[i].text)
        locations_list.append(locations[i].text)
        dates_list.append(dates[i].text)

driver.close()

In [17]:
# Check the number of results

print(len(titles_list))
print(len(locations_list))
print(len(dates_list))
print(len(list_of_hrefs))

21
21
21
21


In [4]:
# Provide column headers

df_halliburton = pd.DataFrame({'Title': titles_list, 'Location': locations_list, 'Date': dates_list, 'Link': list_of_hrefs})

In [5]:
# Add a new column with the company name (required for when we start appending data from other companies)

df_halliburton['Company'] = 'halliburton'

In [6]:
# Convert all strings to lower case to improve searchability

df_halliburton = df_halliburton.apply(lambda x: x.astype(str).str.lower())

In [7]:
# Check we still see the same number of results

len(df_halliburton)

21

In [8]:
# Return the results

df_halliburton.head()

Unnamed: 0,Title,Location,Date,Link,Company
0,associate hse coordinator,"houston, tx, us, 77032","may 28, 2020",https://jobs.halliburton.com/job/houston-associate-hse-coordinator-tx-77032/651300800/,halliburton
1,"consultant, sr","siberia, cun, co","may 28, 2020",https://jobs.halliburton.com/job/siberia-consultant%2c-sr-cun/651356900/,halliburton
2,sql database analyst,"singapore, 05, sg, 637131","may 26, 2020",https://jobs.halliburton.com/job/singapore-sql-database-analyst-05-637131/644501400/,halliburton
3,field servcie rep. ii - drilling fluids,"georgetown, de, gy","may 25, 2020",https://jobs.halliburton.com/job/georgetown-field-servcie-rep_-ii-drilling-fluids-de/635756800/,halliburton
4,field professional i-drilling fluids,"georgetown, de, gy","may 25, 2020",https://jobs.halliburton.com/job/georgetown-field-professional-i-drilling-fluids-de/635755300/,halliburton


In [9]:
# Make the URLs clickable

def make_clickable(val):
    # target _blank to open new window
    return '<a target="_blank" href="{}">{}</a>'.format(val, val)

df_halliburton.style.format({'Link': make_clickable})

Unnamed: 0,Title,Location,Date,Link,Company
0,associate hse coordinator,"houston, tx, us, 77032","may 28, 2020",https://jobs.halliburton.com/job/houston-associate-hse-coordinator-tx-77032/651300800/,halliburton
1,"consultant, sr","siberia, cun, co","may 28, 2020",https://jobs.halliburton.com/job/siberia-consultant%2c-sr-cun/651356900/,halliburton
2,sql database analyst,"singapore, 05, sg, 637131","may 26, 2020",https://jobs.halliburton.com/job/singapore-sql-database-analyst-05-637131/644501400/,halliburton
3,field servcie rep. ii - drilling fluids,"georgetown, de, gy","may 25, 2020",https://jobs.halliburton.com/job/georgetown-field-servcie-rep_-ii-drilling-fluids-de/635756800/,halliburton
4,field professional i-drilling fluids,"georgetown, de, gy","may 25, 2020",https://jobs.halliburton.com/job/georgetown-field-professional-i-drilling-fluids-de/635755300/,halliburton
5,multi-chem process engineer / project manager,"pasadena, tx, us, 77507","may 21, 2020",https://jobs.halliburton.com/job/pasadena-multi-chem-process-engineer-project-manager-tx-77507/646368500/,halliburton
6,cloud engineer,"bogota, dc, co","may 20, 2020",https://jobs.halliburton.com/job/bogota-cloud-engineer-dc/646104000/,halliburton
7,cnc machinist (associate to senior),"singapore, 05, sg, 637131","may 16, 2020",https://jobs.halliburton.com/job/singapore-cnc-machinist-%28associate-to-senior%29-05-637131/636450800/,halliburton
8,senior field engineer - mwd/lwd,"georgetown, de, gy","may 14, 2020",https://jobs.halliburton.com/job/georgetown-senior-field-engineer-mwdlwd-de/630897400/,halliburton
9,driller helper - driller,"belle fourche, sd, us, 57717","may 12, 2020",https://jobs.halliburton.com/job/belle-fourche-driller-helper-driller-sd-57717/649030200/,halliburton


## BP (METHOD A)

In [19]:
with open('results.csv', 'w') as f:
    f.write("Title" + "\t" + "Location" +  "\t" + "Date" + "\t" + "Link\n")
    
driver = webdriver.Chrome("C:/webdrivers/chromedriver.exe")

#Calculate number of pages to iterate over (max_page_num)
### TO DO: REMOVE ALL MAX PAGE NUMS FROM THIS CELL AS I DON'T KNOW IT
### NEED TO INSERT TRY BLOCKS FOR CLICKING TO NEXT PAGE?
### WHY EMPTY LIST FROM SECOND PAGE?

url = "https://www.bp.com/bpcareersnavapp/home/KeywordJobSearch"
#url = "https://www.bp.com/en/global/corporate/careers/search-and-apply.html"

#Also available via:
#https://careers.bpglobal.com/TGnewUI/Search/home/HomeWithPreLoad?PageType=JobDetails&partnerid=25078&siteid=5012&jobid=128428#home
#Where all results are listed under:
#https://careers.bpglobal.com/TGnewUI/Search/home/HomeWithPreLoad?PageType=JobDetails&partnerid=25078&siteid=5012&jobid=128428#keyWordSearch=&locationSearch=

driver.get(url)

from selenium.webdriver.support.ui import Select

select = Select(driver.find_element_by_xpath('//*[@id="app_inside_body"]/div[1]/div[3]/select'))
select.select_by_visible_text("United Kingdom")

element = driver.find_element_by_xpath('//*[@id="searchButton"]')
element.click()

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.element_to_be_clickable((By.XPATH, '//*[@id="app_body"]/div[2]/div/button'))
)
element.click()

max_page_num = 1

roles_list = []

roles = driver.find_elements_by_class_name("panel-title")
    
num_page_items = len(roles)
            
for i in range(num_page_items):        
    roles_list.append(roles[i].text)

#print(roles_list)


###Switch to page 2

element = driver.find_element_by_xpath('//*[@id="app_inside_body"]/div[3]/div/div/ul/li[3]/a')
element.click()

#roles_list_2 = []

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

force_delay = driver.implicitly_wait(5)
force_delay

roles = driver.find_elements_by_class_name("panel-title")

num_page_items = len(roles)
            
for i in range(num_page_items):        
    roles_list.append(roles[i].text)

print(roles_list)


# ###Switch to Page 3

element = driver.find_element_by_xpath('//*[@id="app_inside_body"]/div[3]/div/div/ul/li[4]/a')
element.click()

#roles_list_3 = []

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

force_delay = driver.implicitly_wait(5)
force_delay

roles = driver.find_elements_by_class_name("panel-title")

num_page_items = len(roles)
            
for i in range(num_page_items):        
    roles_list.append(roles[i].text)

print(roles_list)

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="app_inside_body"]/div[3]/div/div/ul/li[3]/a"}
  (Session info: chrome=77.0.3865.120)


In [82]:
# BP results are not in a tabulated format like Halliburtson.
# Each listing is a string that needs parsed and manipulated.

roles_list = [item.replace("Expert,", "Expert -") for item in roles_list]
roles_list = [item.replace("Expert Analyst,", "Expert Analyst -") for item in roles_list]
roles_list = [item.replace("Expert Team Leader,", "Expert Team Leader -") for item in roles_list]
roles_list = [item.replace("Analyst,", "Analyst -") for item in roles_list]
roles_list = [item.replace("Senior Analyst,", "Senior Analyst -") for item in roles_list]

# Some records have 2 locations, separated by comma. Messing things up.
# Here fix United Kingdom - United States, but not a universal fix! Might need to count things.

roles_list = [w.replace(', U', ' - U') for w in roles_list]

roles_list = [w.replace('\n\n', ', ') for w in roles_list]
roles_list

['Senior Counsel- Upstream Technology Data and Brands, United Kingdom, Legal Group',
 'Power Originator, United Kingdom, Supply & Trading Group',
 'Concept Development Engineer, United Kingdom, Project Management Group',
 'Senior Direct Tax Advisor, United Kingdom, Tax Group',
 'Global Brand Manager EDGE, United Kingdom, Marketing Group',
 'Commercial Manager, United Kingdom, Retail Group',
 'Service Owner (Enterprise Data), United Kingdom, IT&S Group',
 'Coreflood Technician, United Kingdom, Research & Technology Group',
 'Reservoir Development Performance Engineer, United Kingdom, Subsurface Group',
 'Global Digital Manager, United Kingdom, Procurement & Supply Chain Management Group',
 'Digital Fleet Marketing Advisor Europe, United Kingdom, Marketing Group',
 'Architecture Lead, United Kingdom, IT&S Group',
 'Corporate Reporting coordinator, United Kingdom, Communications & External Affairs Group',
 'Expert - Data Science, United Kingdom, IT&S Group',
 'Expert Team Leader - Blendin

In [83]:
# Python3 program to convert list into a list of lists 
  
def extractDigits(lst): 
    res = [] 
    for el in lst: 
        sub = el.split(', ') 
        res.append(sub) 
      
    return(res) 
                  
# Driver code 
roles_list = extractDigits(roles_list)

In [91]:
df_bp = pd.DataFrame(roles_list)
df_bp

#TO DO - NOTE LINE 85!  Job has multiple locations: "Spain - United Kingdom - United States of America"

Unnamed: 0,0,1,2,3,4,5,6
0,Senior Counsel- Upstream Technology Data and Brands,United Kingdom,Legal Group,,,,
1,Power Originator,United Kingdom,Supply & Trading Group,,,,
2,Concept Development Engineer,United Kingdom,Project Management Group,,,,
3,Senior Direct Tax Advisor,United Kingdom,Tax Group,,,,
4,Global Brand Manager EDGE,United Kingdom,Marketing Group,,,,
5,Commercial Manager,United Kingdom,Retail Group,,,,
6,Service Owner (Enterprise Data),United Kingdom,IT&S Group,,,,
7,Coreflood Technician,United Kingdom,Research & Technology Group,,,,
8,Reservoir Development Performance Engineer,United Kingdom,Subsurface Group,,,,
9,Global Digital Manager,United Kingdom,Procurement & Supply Chain Management Group,,,,


In [84]:
df_bp = pd.DataFrame(roles_list, columns = ['A', 'B', 'C'])
df_bp["Title"] = df_bp["A"] + ' - ' + df_bp["C"]
df_bp["Company"] = 'BP'
df_bp.rename(columns={'B': 'Location'}, inplace=True)
df_bp.drop(['A', 'C'], axis=1, inplace=True)
df_bp['Date'] = None
df_bp['Link'] = None
df_bp = df_bp.apply(lambda x: x.astype(str).str.lower())
df_bp

AssertionError: 3 columns passed, passed data had 7 columns

## Combine DataFrames

In [85]:
df_combined = df_halliburton.append(df_bp, ignore_index=True, sort=True)
df_combined = df_combined.reindex(columns=['Title', 'Location', 'Date', 'Link', 'Company'])
df_combined

Unnamed: 0,Title,Location,Date,Link,Company
0,service specialist i - nitrogen international,"cuidad del carmen, tab, mx, 24121","feb 5, 2020",https://jobs.halliburton.com/job/villahermosa-engineer-well-design-tab-86035/628146500/,halliburton
1,shop supervisor- baroid,"evansville, wy, us, 82636","feb 5, 2020",https://jobs.halliburton.com/job/aberdeen-mgr%2c-hr-business-partner-abe-ab21-0gn/628114700/,halliburton
2,"well site supv, ii","paraiso, tab, mx, 86604","feb 5, 2020",https://jobs.halliburton.com/job/luanda-customer-financial-services-leader-lua/628135300/,halliburton
3,compliance investigator,"houston, tx, us, 77032","feb 5, 2020",https://jobs.halliburton.com/search/?q=&sortcolumn=sort_title&sortdirection=desc#hdrtitle,halliburton
4,svc spec-cable spooling (assoc - i),"ross, nd, us, 58776","feb 5, 2020",https://jobs.halliburton.com/job/villahermosa-principal-ii-des-tab-86037/628146100/,halliburton
5,maint engineer - logging and perforating,"tripoli, tb, ly","feb 5, 2020",https://jobs.halliburton.com/job/cunduacan-advising-engineer-dd-tab-86693/628157400/,halliburton
6,service leader - coiled tubing,"williston, nd, us, 58801","feb 5, 2020",https://jobs.halliburton.com/job/cunduacan-electricalmechanical-lwd-tech-ii-tab-86693/628150600/,halliburton
7,mechanical engineer - pumping equipment,"duncan, ok, us, 73533","feb 5, 2020",https://jobs.halliburton.com/job/cunduacan-electricalmechanical-lwd-tech-ii-tab-86693/628151100/,halliburton
8,material handler,"tripoli, tb, ly","feb 5, 2020",https://jobs.halliburton.com/job/cunduacan-downhole-mechanical-technician-iv-tab-86693/628152500/,halliburton
9,entry level field engineer - logging and perf,"al-khobar, 04, sa, 31952","feb 5, 2020",https://jobs.halliburton.com/job/cunduacan-electricalmechanical-geopilot-technician-ii-tab-86693/628153000/,halliburton


In [86]:
#Either run entire thing and filter based on keywords and locations.
#Or actually search for multiple pages (probably trickier and less efficient way to do it)

lst_keywords = ['geologist', 'geoscientist', 'geoscience', 'decisionspace', #'software dev', 
                'software qa',  'e&p', 'exploration', #'data', 
                'data analyst', 'data engineer', 'data scientist', 'data science', 
                'data specialist', 'data manager', 'data management', 'data portfolio', 
                'data lead', 'data technician', 'geoscience data', 'geo-science data', 
                'geo science data', 'geoscience tech', 'geoscience application', 
                'geoscience product', 'geological data', 'geology data', 
                'technical data', 'technical application', 'subsurface data', 
                'e&p data', 'exploration data', 'geoscience tech', 'subsurface specialist', 
                'applications specialist', 'software portfolio', 'well data', 'python', 'sql']

print('|'.join(lst_keywords))

lst_locations = [ 'gb,', 'hu,', 'no,',
                'uk', 'united kingdom', 'norway', 'budapest']

print('|'.join(lst_locations))

#Need to create conversions between locations, for example.

geologist|geoscientist|geoscience|decisionspace|software qa|e&p|exploration|data analyst|data engineer|data scientist|data science|data specialist|data manager|data management|data portfolio|data lead|data technician|geoscience data|geo-science data|geo science data|geoscience tech|geoscience application|geoscience product|geological data|geology data|technical data|technical application|subsurface data|e&p data|exploration data|geoscience tech|subsurface specialist|applications specialist|software portfolio|well data|python|sql
gb,|hu,|no,|uk|united kingdom|norway|budapest


In [87]:
filt = df_combined['Title'].str.contains('|'.join(lst_keywords)) & df_combined['Location'].str.contains('|'.join(lst_locations))
#filt.head()

In [88]:
df_combined = df_combined.loc[filt]
#df_combined.sort_values(by='Date')
df_combined

Unnamed: 0,Title,Location,Date,Link,Company
49,"united kingdom aberdeen: surface data logging, logging geologist - 83743","aberdeen, abe, gb, ab21 0gn","feb 5, 2020",https://jobs.halliburton.com/job/ploiesti-service-operator-ii-fracacid-ph-107025/610867200/,halliburton
50,"united kingdom aberdeen: surface data logging, senior logging geologist - 83745","aberdeen, abe, gb, ab21 0gn","feb 5, 2020",https://jobs.halliburton.com/job/belle-fourche-senior-heavy-truck-driver-sd-57717/619211800/,halliburton
201,"software qa, sr","abingdon, oxf, gb, ox14 4rw","jan 31, 2020",https://jobs.halliburton.com/job/houston-geoscientist-i-iii-tx-77032/626847600/,halliburton
577,expert - data science - it&s group,united kingdom,none,none,bp
582,analyst - data science - it&s group,united kingdom,none,none,bp
583,senior analyst - data science - it&s group,united kingdom,none,none,bp
627,expert - data science - it&s group,united kingdom,none,none,bp
632,analyst - data science - it&s group,united kingdom,none,none,bp
633,senior analyst - data science - it&s group,united kingdom,none,none,bp


In [20]:
# #BP (METHOD B)

    
# driver = webdriver.Chrome("C:/webdrivers/chromedriver.exe")

# url = "https://careers.bpglobal.com/TGnewUI/Search/home/HomeWithPreLoad?PageType=JobDetails&partnerid=25078&siteid=5012&jobid=127642#keyWordSearch=&locationSearch="

# driver.get(url)

# max_page_num = 2

# roles_list_1 = []

# for i in range(1, max_page_num + 1):
    
#     roles = driver.find_elements_by_class_name("jobList ng-scope")
    
#     num_page_items = len(roles)
            
#     for i in range(num_page_items):        
#         roles_list_1.append(roles[i].text)

# print(roles_list_1)

In [12]:
#https://careers.slb.com/experienced-roles
#https://careers.ihsmarkit.com/search.php?searchkeyword=&searchlocation=
#https://jobs.lr.org/search/?createNewAlert=false&q=&locationsearch=&optionsFacetsDD_department=&optionsFacetsDD_country=&optionsFacetsDD_shifttype=

#https://www.bp.com/en/global/corporate/careers/search-and-apply.html (forces you to pick one category)
#https://jobs.shell.com/search-jobs?k=
#https://careers.peopleclick.eu.com/careerscp/client_statoil/external/en_US/search.do
#http://careers.conocophillips.com/job-search-results/

#https://spiritenergy.wd3.myworkdayjobs.com/SpiritInternet/
#https://www.neptuneenergy.com/careers/available-jobs/
#https://dno.easycruit.com/
#https://lundin-norway.no/career/?lang=en (descriptions all in English? Each Title opens a new page)

# https://molgroup.taleo.net/careersection/external/jobsearch.ftl#
#https://molgroup.taleo.net/careersection/external/jobsearch.ftl?f=LOCATION(2205100397)|JOB_FIELD(11005100397)|JOB_SCHEDULE(1)&a=null&multiline=false&ignoreSavedQuery&sasNo=828405011511
#https://molgroup.taleo.net/careersection/external/jobdetail.ftl?job=20000396&tz=GMT%2B01%3A00&tzname=Europe%2FBudapest

In [10]:
# WAYS TO PARSE URL STRINGS

#import urllib.parse
#urllib.parse.urljoin('/media/path/', 'js/foo.js')

#urllib.parse.urljoin('//*[@id="searchresults"]/tbody/tr[', num, ']/td[', num, ']/span/a')

# num=2

# def urljoin(*args):
#     """
#     Joins given arguments into an url. Trailing but not leading slashes are stripped for each argument.
#     """
#     return ''.join(map(lambda x: str(x), args))

# urljoin('//*[@id="searchresults"]/tbody/tr[', num, ']/td[', num, ']/span/a')

In [18]:
# num_results = driver.find_element_by_xpath("""//*[@id="content"]/div[2]/div/div[4]/div/div[1]/div/div/div/span[1]/b[2]""")    
# #temp = 2 #for testing purposes only
# temp = int(num_results.text)/25
# max_page_num = math.ceil(temp)

# #start_row is 0, then add 25 each iteration
# start_row = 0

# data = pd.DataFrame([])

# titles_list = []
# locations_list = []
# dates_list = []
# list_of_hrefs = []

# for i in range(1, max_page_num + 1):

#     url = "https://jobs.halliburton.com/search/?q=&sortColumn=referencedate&sortDirection=desc&startrow=" + str(start_row)
    
#     start_row += 25

#     driver.get(url)
    
#     titles = driver.find_elements_by_class_name("colTitle")
#     locations = driver.find_elements_by_css_selector(".colLocation.hidden-phone")
#     dates = driver.find_elements_by_css_selector(".colDate.hidden-phone")
    
#     #TODO: REWITE THIS SECTION TO MATCH MY NAMING CONVENTION
#     #list_of_hrefs = []
#     content_blocks = driver.find_elements_by_class_name("jobTitle")

#     for block in content_blocks:
#         elements = block.find_elements_by_tag_name("a")
#         for el in elements:
#             list_of_hrefs.append(el.get_attribute("href"))
            
#     list_of_hrefs.pop(0)

#     print(list_of_hrefs) 

#     num_page_items = len(titles)

# #     with open('results.csv', 'a') as f:
# #         for i in range(num_page_items):
# #             f.write(titles[i].text + "\t" + locations[i].text + "\t" + dates[i].text + "\t" + link[i].text + "\n")
            
#     for i in range(num_page_items):        
#         titles_list.append(titles[i].text)
#         locations_list.append(locations[i].text)
#         dates_list.append(dates[i].text)
    
#         for item in items:
#             href = item.get_attribute('href')
#             print(href)   

# driver.close()