# Selenium Library

<font size=4> Selenium is a web application framework that can be used for: </font><br>
    1. <font size=3> web scraping activities.</font><br>
    2. <font size=3> browser automation. (browsing, clicking, downloading, filling forms, ...)
</font>

In [1]:
import pandas as pd

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import time
from termcolor import colored

# Selenium web driver: 
Selenium uses web driver to automate the browser, where we can provide commands to control browser

In [2]:
DATA_PATH = "../data/"
DRIVER_PATH = "/home/mahmoud/Downloads/chromedriver"
BASE_URL = "https://wuzzuf.net/search/jobs/?q=data+science&a=hpb"

SAVE = False
SEPARATOR = f"\n{colored(70*'#', 'red')}\n"


# First scraping script:
driver = webdriver.Chrome(DRIVER_PATH)
driver.get("https://www.google.com/")

print("Title: ", colored(driver.title, 'red'))

time.sleep(3)


driver.close()

  driver = webdriver.Chrome(DRIVER_PATH)


Title:  [31mGoogle[0m


# Locating web elements:

# HTML main elements: 

### 1. DIV: 

The `<div>` tag is used as a container for HTML elements
![image.png](attachment:image.png)

---------------------------

### 2. Headers: (h1 -> h6):
The `<header>` HTML element represents introductory content
# H1
## H2
### H3
#### H4
##### H5
###### H6

---------------------------

### 3. P: 
The `<p>` HTML element represents a paragraph

<p> this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.this is paragraph.</p>

---------------------------

### 4. a: 
the `<a>` HTML element represents anchor, creates hyperlink to web page files.

<a href='https://www.google.com/'>anchor to Google</a>

---------------------------

### 5. input: 
The `<input>` tag specifies an input field where the user can enter data.

### Input types: <br>
button: <input type="button"><br>
checkbox: <input type="checkbox"><br>
color: <input type="color"><br>
date: <input type="date"><br>
datetime-local: <input type="datetime-local"><br>
email: <input type="email"><br>
file: <input type="file"><br>
image <input type="image"><br>
month: <input type="month"><br>
number: <input type="number"><br>
password: <input type="password"><br>
radio: <input type="radio"><br>
range: <input type="range"><br>
reset: <input type="reset"><br>
search: <input type="search"><br>
submit: <input type="submit"><br>
tel: <input type="tel"><br>
text: <input type="text"><br>
time: <input type="time"><br>
url: <input type="url"><br>
week: <input type="week"><br>

#### We can access each element using one of the following locators:
1. ID guarantee to be <font color='red'>unique</font>.
2. name.
3. class name.

![image-2.png](attachment:image-2.png)

#### Note:

If multiple elements have the same class it will return the first element.

For more details check: [Additional resources](#add) below

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [3]:
# Create webdriver instance 
driver = webdriver.Chrome(DRIVER_PATH)

driver.get(BASE_URL)
print(colored("Title of the web page: ", "green"))
print(driver.title, end=SEPARATOR)

  driver = webdriver.Chrome(DRIVER_PATH)


[32mTitle of the web page: [0m
Job Search | WUZZUF
[31m######################################################################[0m


In [4]:
out = driver.find_element(By.CLASS_NAME, 'css-9i2afk') # div that have all sub-divs
print(colored("Contents of main dev: ", "green"))
print(out.text, end=SEPARATOR)

[32mContents of main dev: [0m
Filters
Back to all jobs
448 Jobs found
Data Science Software Engineer Intern
Seuqel Solutions - Cairo, Egypt
2 days ago
Internship
Student · 0 - 1 Yrs of Exp · IT/Software Development · Engineering - Telecom/Technology ·
Computer Science
· Algorithms
· Information Technology (IT)
· Python
· Software
· Software Development
· Software Engineering
· Programming
Data Science/Machine learning/AI Instructor
EpsilonAI - Nasr City, Cairo, Egypt
20 days ago
Full Time
Part Time
Entry Level · 1+ Yrs of Exp · IT/Software Development · Engineering - Telecom/Technology · Training/Instructor ·
Data Science
·
Computer Science
· Computer Engineering
· Python
· Machine Learning
· Deep Learning
· Artificial Intelligence (AI)
Data Management Engineer
Ejada - Cairo, Egypt
11 days ago
Full Time
Entry Level · 3 - 5 Yrs of Exp · IT/Software Development · Engineering - Telecom/Technology ·
Computer Science
·
Data Engineering
Data Analyst
Al Ahly capital holding - Al Ahly Tamkee

In [5]:
sub_div = out.find_element(By.CLASS_NAME, 'css-pkv5jc')   # sub div
print(colored("Contents of first sub dev: ", "green"))
print(sub_div.text, end=SEPARATOR)

[32mContents of first sub dev: [0m
Data Science Software Engineer Intern
Seuqel Solutions - Cairo, Egypt
2 days ago
Internship
Student · 0 - 1 Yrs of Exp · IT/Software Development · Engineering - Telecom/Technology ·
Computer Science
· Algorithms
· Information Technology (IT)
· Python
· Software
· Software Development
· Software Engineering
· Programming
[31m######################################################################[0m


In [6]:
job_element = sub_div.find_element('class name', "css-m604qf")
job = job_element.text
print(f"Job of first div: {colored(job, 'green')}")

Job of first div: [32mData Science Software Engineer Intern[0m


In [7]:
company_element = sub_div.find_element(By.CLASS_NAME, "css-17s97q8")
company = company_element.text
print(f"Company of first div: {colored(company, 'green')}")

Company of first div: [32mSeuqel Solutions -[0m


In [8]:
location_element = sub_div.find_element(By.CLASS_NAME, "css-5wys0k")
location = location_element.text
print(f"Location of first div: {colored(location, 'green')}")

Location of first div: [32mCairo, Egypt[0m


In [9]:
puplishment_time_div = sub_div.find_element(By.CLASS_NAME, "css-4c4ojb")
publishment_time = puplishment_time_div.text
print(f"Publishment time of first div: {colored(publishment_time, 'green')}")

Publishment time of first div: [32m2 days ago[0m


In [10]:
j_type_element = sub_div.find_element(By.CLASS_NAME, "css-1lh32fc")
j_type = j_type_element.text
print(f"Job type of first div: {colored(j_type, 'green')}")

Job type of first div: [32mInternship[0m


In [11]:
driver.close()

In [12]:
job, company, location, publishment_time, j_type

('Data Science Software Engineer Intern',
 'Seuqel Solutions -',
 'Cairo, Egypt',
 '2 days ago',
 'Internship')

In [None]:
# scrap all jobs titles
URL = "https://wuzzuf.net/search/jobs/?a=hpb&q=data%20science&start="

'''
i=0  -> https://wuzzuf.net/search/jobs/?a=hpb&q=data%20science&start=0
i=1  -> https://wuzzuf.net/search/jobs/?a=hpb&q=data%20science&start=1
'''

j_titles = []
companies = []
locations = []
time_of_publishments = []
job_type = []
required_exp = []
years_of_exp = []
keys = []
i = -1

driver = webdriver.Chrome(DRIVER_PATH)
driver.get(URL)

n_jobs_div = driver.find_element(By.CLASS_NAME, 'css-tbpo9i')
n_jobs = n_jobs_div.find_element(By.CLASS_NAME, 'css-12razwi').text
n_jobs = int(n_jobs.split()[0])

print(f"Number of jobs found: {colored(n_jobs, 'red')}")

while True:
    i = i + 1
    driver = webdriver.Chrome(DRIVER_PATH)
    try:
        driver.get(URL + str(i))
    except: 
        break
        
    main = driver.find_element(By.CLASS_NAME, 'css-9i2afk') # return main div
    divs = main.find_elements(By.CLASS_NAME, 'css-pkv5jc')


    for div in divs:
        j_title = div.find_element(By.CLASS_NAME, 'css-m604qf').text
        company = div.find_element(By.CLASS_NAME, 'css-17s97q8').text
        location = div.find_element(By.CLASS_NAME, 'css-5wys0k').text
        
        try :
            time_of_publishment = div.find_element(By.CLASS_NAME, 'css-4c4ojb').text 
        except :
            time_of_publishment = div.find_element(By.CLASS_NAME, 'css-do6t5g').text
        
        j_type = div.find_element(By.CLASS_NAME, 'css-1lh32fc').text

        
        j_titles.append(j_title)
        locations.append(location)
        companies.append(company)
        time_of_publishments.append(time_of_publishment)
        job_type.append(j_type)


    sub_divs = main.find_elements(By.CLASS_NAME, 'css-y4udm8')
    for sub_div in sub_divs:
        req_exp = [*map(lambda x : x.text, sub_div.find_elements(By.CLASS_NAME, 'css-o171kl'))]
        k = req_exp[1:]
        req_exp = req_exp[0]
        

        required_exp.append(req_exp)
        keys.append(k)

    if len(j_titles)==n_jobs or i==2:
        break
        
driver.close()

  driver = webdriver.Chrome(DRIVER_PATH)


In [None]:
j_titles

In [None]:
df = pd.DataFrame({'Job Title': j_titles, 
                   'Location' : locations, 
                   'Company' : companies, 
                   "Time of Publishment" : time_of_publishments, 
                   "Job Type" : job_type, 
                   'Years of Experience' : required_exp, 
                   'Skills' : keys})

WUZZUF_PATH = DATA_PATH + '/wuzzuf_DS.csv' 

if SAVE: 
    df.to_csv(WUZZUF_PATH, index=False)
display(df.head())
print(f"Number of rows: {len(df)}")

### Insert Input in forms: 

In [None]:
URL = "https://wuzzuf.net/jobs/egypt" # Main wuzzuf page
try:
    driver = webdriver.Chrome(DRIVER_PATH)
    driver.get(URL)

    search = driver.find_element(By.CLASS_NAME, 'form')
    
    search_bar = search.find_element(By.NAME, 'q')
    
    search_bar.send_keys('Data Science')
    search_bar.send_keys(Keys.RETURN)
    
    
    print(driver.title)
    time.sleep(3)
    
finally:  
    driver.close()

## Navigate webpage

In [None]:
URL = "https://wuzzuf.net/search/jobs/?a=hpb&q=data%20science&start="

driver = webdriver.Chrome(DRIVER_PATH)
driver.get(URL + '0')
country_element = driver.find_element(By.LINK_TEXT, 'Jobs in Cairo')
country_element.click()

driver.implicitly_wait(5)

search = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, "css-z60wl1")))
search_bar = search.find_element(By.NAME, "q")
search_bar.send_keys('Data Science')

time.sleep(5)

search_bar.send_keys(Keys.RETURN)


# return to previous page
driver.back()

# move forward
driver.forward()

driver.close()

## ActionChains

In [None]:
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome(DRIVER_PATH)
driver.get('https://orteil.dashnet.org/cookieclicker/')
lang_box = driver.find_element(By.ID, 'promptAnchor')
driver.implicitly_wait(2)

lang = lang_box.find_element(By.ID, 'langSelect-EN')
lang.click()

# Create Actions Chain
actions = ActionChains(driver)
cookie = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.ID, "bigCookie")))

cookie.click()
actions.click(cookie) # Action 1: Click on the cookie

for i in range(10):
    actions.click(cookie)

actions.perform()
driver.close()

In [None]:
# Pop cat

driver = webdriver.Chrome(DRIVER_PATH)
driver.get("https://popcat.click/")

app = driver.find_element(By.ID, "app")

actions = ActionChains(driver)

actions.click(app)

for i in range(10):
    actions.click(app)
    
actions.perform()
driver.close()

<a id="add"></a>
## Additional Resources:
[Selenium Documentation](https://selenium-python.readthedocs.io/)