# Working with **Selenium webdriver** and **undetected-chromedriver** in Kaggle

<div class = 'alert alert-info' role = 'alert'>
  We can't directly use <strong>Selenium</strong> here. On direct use of <em>selenium</em> like <em>pandas</em>, it'll throw an error like the below.
</div>
<br>
<img src = 'https://user-images.githubusercontent.com/83589431/236695253-01af47e7-fb58-435c-bfbd-fb8f9473b992.png' width = 80%>

<strong>But</strong> there are ways to make it work. We'll see one version here. 🪄

# Installing libraries

- [Selenium](https://selenium-python.readthedocs.io/installation.html)
- [undetected-chromedriver](https://github.com/ultrafunkamsterdam/undetected-chromedriver)

In [1]:
!pip install selenium -q
!pip install undetected-chromedriver -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.11.0 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
tensorflow-serving-api 2.11.0 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
pymc3 3.11.5 requires numpy<1.22.2,>=1.15.0, but you have numpy 1.23.5 which is incompatible.
pymc3 3.11.5 requires scipy<1.8.0,>=1.7.3, but you have scipy 1.9.3 which is incompatible.
librosa 0.10.0.post2 requires soundfile>=0.12.1, but you have soundfile 0.11.0 which is incompatible.
kfp 1.8.20 requires google-api-python-client<2,>=1.7.8, but you have google-api-python-client 2.86.0 which is incompatible.
kfp 1.8.20 requires PyYAML<6,>=5.3, but you have pyyaml 6.0 which is incompatible.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.6 which is incompatible.[0m[

# Preparing Chrome to work in Kaggle

- We need to download chromedriver to kaggle.
- ChromeDriver can be downlodable from [chromedriver.chromium.org](https://chromedriver.chromium.org/).


<img src = 'https://user-images.githubusercontent.com/83589431/236694660-16ba1546-8f2d-4282-8593-68dac1cdd478.gif' width = 70%>

How to download *ChromeDriver* **Kaggle** ?
- Visit [chromedriver.chromium.org](https://chromedriver.chromium.org/).
- Check for the **Latest stable release** of ChromeDriver (at present v113.0.5672.63 is the latest)
- Copy the link of **`chromedriver_linux64.zip`** from **Index** page by right-clicking on it.

**Still doubut,** please chromedriver the above **GIF** for visual download instructions.

- Once we have the driver link, **`! wget <paste-the-copied-link-here>`**

In [2]:
# Downloading the chromedriver to Kaggle
!wget https://chromedriver.storage.googleapis.com/113.0.5672.63/chromedriver_linux64.zip
    
# Making that file executable
!chmod 755 /kaggle/working/chromedriver_linux64.zip

--2024-03-13 13:20:26--  https://chromedriver.storage.googleapis.com/113.0.5672.63/chromedriver_linux64.zip
Resolving chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)... 142.250.148.207, 142.251.172.207, 209.85.200.207, ...
Connecting to chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)|142.250.148.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7315061 (7.0M) [application/zip]
Saving to: ‘chromedriver_linux64.zip’


2024-03-13 13:20:27 (91.4 MB/s) - ‘chromedriver_linux64.zip’ saved [7315061/7315061]



In [3]:
# Dowloading Google Chrome's latest stable version to Kaggle
!wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    
# Installing Google Chrome
!sudo apt-get update
!sudo apt install ./google-chrome-stable_current_amd64.deb -y 

--2024-03-13 13:20:29--  https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
Resolving dl.google.com (dl.google.com)... 172.253.119.190, 172.253.119.136, 172.253.119.93, ...
Connecting to dl.google.com (dl.google.com)|172.253.119.190|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106053828 (101M) [application/x-debian-package]
Saving to: ‘google-chrome-stable_current_amd64.deb’


2024-03-13 13:20:29 (226 MB/s) - ‘google-chrome-stable_current_amd64.deb’ saved [106053828/106053828]

Get:1 http://packages.cloud.google.com/apt gcsfuse-focal InRelease [1225 B]
Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease
Get:3 https://packages.cloud.google.com/apt cloud-sdk InRelease [6361 B]
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Err:1 http://packages.clo

**Let's verify the *Chrome* installation**

For this case the output should be ***Google Chrome 113.0.5672.63***

In [4]:
!google-chrome --version

Google Chrome 122.0.6261.128 


<img src = 'https://user-images.githubusercontent.com/83589431/237309858-e521dfca-667c-44d9-9914-7a5674c552d3.gif' width = 20%><br>

<div class = 'alert alert-success' role = 'alert'>
  Now it's the time to test the chrome drivers.
</div>

# Using Selenium-chromedriver

In [5]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

from selenium.webdriver.common.by import By

In [6]:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--disable-setuid-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--remote-debugging-port=9222')
service = Service('/kaggle/working/chromedriver/chromedriver')

**Let's  find out some text to confirm *Selenium* is working**

<img src = 'https://user-images.githubusercontent.com/83589431/237303838-89137892-0833-4370-900c-7fb65576fe2c.jpg' width =70%>

<img src = 'https://user-images.githubusercontent.com/83589431/237305592-05857628-624f-419c-883b-21ba4e681d98.png' width = 70%>


- In order to find the above marked text, we can see utilize the feature **XPATH**
- The text is present inside **`h1`** tag under **`class`** name **`badge-link__title`**.
- So the XPATH will be **`//h1[@class='badge-link__title']`**

<img src = 'https://user-images.githubusercontent.com/83589431/237309844-ffd07c38-ff26-4dcf-af95-3beda956ee84.gif' width = 20%><br>

<div class = 'alert alert-success' role = 'alert'>
  <p>We have successfully ran <strong>Selenium</strong> inside Kaggle.</p>
</div>

# Using undetected-chromedriver

In [7]:
import undetected_chromedriver as uc

In [8]:
driver = uc.Chrome(service = service, options = options)

#uc_driver.get('https://duckduckgo.com/')

In [9]:
!pip install gspread 

Collecting gspread
  Downloading gspread-6.0.2-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.9/53.9 kB[0m [31m888.1 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting StrEnum==0.4.15
  Downloading StrEnum-0.4.15-py3-none-any.whl (8.9 kB)
Installing collected packages: StrEnum, gspread
Successfully installed StrEnum-0.4.15 gspread-6.0.2
[0m

<img src = 'https://user-images.githubusercontent.com/83589431/237309844-ffd07c38-ff26-4dcf-af95-3beda956ee84.gif' width = 20%><br>

<div class = 'alert alert-success' role = 'alert'>
  <p>We have successfully ran <strong>undetected-chromedriver</strong> inside Kaggle.</p>
</div>

In [10]:
import pandas as pd
import gspread


In [11]:
gc = gspread.service_account(filename = "/kaggle/input/key-json/integral-hold-416907-82b173ff1425.json")


In [12]:
sheet = gc.open_by_url("https://docs.google.com/spreadsheets/d/108dnSSwlUtCGHmoAOyTI4FsuSnWzOZFF0dYuf8vodq4/edit?usp=sharing")
#sheet2 = gc.open_by_url("https://docs.google.com/spreadsheets/d/12M6NtFxw0-MtlwuHPMUckFR7DCdxubaNdoaGKj4Vld4/edit?usp=sharing")

In [13]:
sheet
worksheet = sheet.add_worksheet(title="West Virginia3", rows=1, cols=1)
#worksheet2 = sheet.add_worksheet(title="West Virginia2",rows=1,cols=1)

In [14]:
!pip install gspread pandas gspread-dataframe


Collecting gspread-dataframe
  Downloading gspread_dataframe-3.3.1-py2.py3-none-any.whl (8.0 kB)
Installing collected packages: gspread-dataframe
Successfully installed gspread-dataframe-3.3.1
[0m

In [15]:
from gspread_dataframe import set_with_dataframe

In [16]:
import sys
import time
import json
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [17]:
name = []
address = []
contact = []
link = []
best_match = []
zipcode = []
counter=0
for i in range(24701,26886):
    driver.get(f"https://www.mealsonwheelsamerica.org/signup/aboutmealsonwheels/find-programs?filter={i}")
    pagesource = driver.page_source
    soup = BeautifulSoup(pagesource, 'html.parser')
    boxs = soup.find_all('div', {'class': 'findmeal-result'})
    counter+=1
    if(counter%100==0):
        df = pd.DataFrame({"name":name,"address":address,"contact":contact,"link":link,"zipcode":zipcode})
        set_with_dataframe(worksheet,df)
    for box in boxs:
        zipcode.append(i)
        try:
            print("Name: ", box.find('h2').text.strip())
            name.append(box.find('h2').text.strip())
        except:
            name.append("NIL")
        try:
            print("Address: ",box.find('p').text.strip())
            address.append(box.find('p').text.strip())
        except:
            address.append('NIL')
        try:
            links = box.find_all('a')
            if(len(links)==2):
                print("Phone: ",links[1].text.strip())
                contact.append(links[1].text.strip())
            else:
                print("Phone: ",links[0].text.strip())
                contact.append(links[0].text.strip())
        except:
            contact.append("NIL")
        try:
            links = box.find_all('a')
            if(len(links)==1):
                link.append("NIL")
                continue
            print("Link: ",box.find('a')['href'])
            link.append(box.find('a')['href'])
        except:
            print("LINK NIL")
            link.append("NIL")
    

Name:  Mercer Community Action of South Eastern WV
Address:  307 Federal St Ste 323, Bluefield, WV  24701
Phone:  (304)324-8397
Link:  https://www.casewv.org/casewvcommissiononaging
Name:  Appalachian Area Agency on Aging
Address:  1460 E Main St, Box 2, Princeton, WV  24740
Phone:  (304)425-1147
Link:  http://www.aaaa@citlink.net
Name:  Council on Aging, Inc.
Address:  695 Mountaineer Highway, Mullens, WV  25882
Phone:  (304)294-8800
Link:  http://www.wccoa.com
Name:  McDowell County Commission on Aging
Address:  725 Stewart St, Welch, WV  24801
Phone:  (304)436-6588
Link:  http://mcdowellcoa.org/
Name:  West Virginia Bureau of Senior Services
Address:  1900 Kanawha Blvd E, Charleston, WV  25305
Phone:  (304)558-3317
Link:  http://www.wvseniorservices.gov
Name:  Appalachian Area Agency on Aging
Address:  1460 E Main St, Box 2, Princeton, WV  24740
Phone:  (304)425-1147
Link:  http://www.aaaa@citlink.net
Name:  Mercer Community Action of South Eastern WV
Address:  307 Federal St Ste 32