#### Let’s now take a look at how we can deal with this use case of Javascript using requests and Beautiful Soup:

In [4]:
import requests
from bs4 import BeautifulSoup
url = 'http://www.webscrapingfordatascience.com/simplejavascript/'
r = requests.get(url)
html_soup = BeautifulSoup(r.text, 'html.parser')
# No tag will be found here
ul_tag = html_soup.find('ul')
print(ul_tag)
# Show the JavaScript code
script_tag = html_soup.find('script', attrs={'src': None})
print(script_tag)

None
<script>
	$(function() {
	document.cookie = "jsenabled=1";
	$.getJSON("quotes.php", function(data) {
		var items = [];
		$.each(data, function(key, val) {
			items.push("<li id='" + key + "'>" + val + "</li>");
		});
		$("<ul/>", {
			html: items.join("")
			}).appendTo("body");
		});
	});
	</script>


 We have no way to parse and query the actual JavaScript code.
In simple situations such as this one, this is not necessarily a problem. We know
that the browser is making requests to a page at “quotes.php”, and that we need to set a
cookie. We can still scrape the data directly:



In [2]:
#if sent request without cookies
import requests
url = 'http://www.webscrapingfordatascience.com/simplejavascript/quotes.php'
# Note that cookie values need to be provided as strings
r = requests.get(url)
print(r.json())

['Are you using a web scraper?']


In [3]:
import requests
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
my_session = requests.Session()
# Get the main page first to obtain the PHPSESSID cookie
r = my_session.get(url)
# Manually set the nonce cookie
my_session.cookies.update({
    'nonce': '2315'
    })
r = my_session.get(url + 'quotes.php', params={'p': '0'})
print(r.text)
# Shows: No quotes for you!

No quotes for you!


Sadly, this doesn’t work. Figuring out why requires some creative thinking, though
we can take a guess at what might be going wrong here. We’re getting a fresh session
identifier by visiting the main page as if we were coming from a new browsing session
to provide the “PHPSESSID” cookie. However, we’re reusing the “nonce” cookie value
that our browser was using. The web page might see that this “nonce” value does not
match with the “PHPSESSID” information. As such, we have no choice but to also reuse 
the “PHPSESSID” value. Again, yours might be different (inspect your browser’s network 
requests to see which values it is sending for your session):


nonce=1497; _ga=GA1.2.1481335662.1625916386; PHPSESSID=li86h0i1o5igp31sge3ej1338h

In [5]:
import requests
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
my_cookies = {
    'nonce':'1497',
    'PHPSESSID': 'li86h0i1o5igp31sge3ej1338h'
    }
r = requests.get(url + 'quotes.php', params={'p': '0'}, cookies=my_cookies)
print(r.text)

<div class="quote decode">TGlmZSBpcyBhYm91dCBtYWtpbmcgYW4gaW1wYWN0LCBub3QgbWFraW5nIGFuIGluY29tZS4gLUtldmluIEtydXNlDQo=</div><div class="quote decode">CVdoYXRldmVyIHRoZSBtaW5kIG9mIG1hbiBjYW4gY29uY2VpdmUgYW5kIGJlbGlldmUsIGl0IGNhbiBhY2hpZXZlLiDigJNOYXBvbGVvbiBIaWxsDQo=</div><div class="quote decode">CVN0cml2ZSBub3QgdG8gYmUgYSBzdWNjZXNzLCBidXQgcmF0aGVyIHRvIGJlIG9mIHZhbHVlLiDigJNBbGJlcnQgRWluc3RlaW4NCg==</div><br><br><br><br><a class="jscroll-next" href="quotes.php?p=3">Load more quotes</a>


### Installing and using Selenium and Web driver, you should put chromedriver.exe in the same path as your working directory or set the path of the file in a variable and use this in the command webdriver.Chrome(path)

Just as was the case with requests and Beautiful Soup, installing Selenium itself is
simple with pip (refer back to section 1.2.1 if you still need to set up Python 3 and pip):
pip install -U selenium
Next, we need to download a WebDriver. Head to
https://sites.google.com/a/chromium.org/chromedriver/downloads 
and download the latest release file matching
your platform (Windows, Mac, or Linux). The ZIP file you downloaded will contain an
executable called “chromedriver.exe” on Windows or just “chromedriver” otherwise. 
The easiest way to make sure Selenium can see this executable is simply by making sure
it is located in the same directory as your Python scripts, in which case the following
small example should work right away:

In [2]:
from selenium import webdriver
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
web_driv = "E:\\17- web scrapping\\Practical_Web_Scrapping_Python_data_science-\\5 - Working with JavaScript and Selenium\\chromedriver.exe"
driver = webdriver.Chrome(web_driv)
driver.get(url)
input('Press ENTER to close the automated browser')
driver.quit()

Press ENTER to close the automated browser


Copy the content Using Silenium

In [1]:
from selenium import webdriver
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
driver = webdriver.Chrome()
# Set an implicit wait
driver.implicitly_wait(10)
driver.get(url)
for quote in driver.find_elements_by_class_name('quote'):
    print(quote.text)
input('Press ENTER to close the automated browser')
driver.quit()

Life is about making an impact, not making an income. -Kevin Kruse
Whatever the mind of man can conceive and believe, it can achieve. –Napoleon Hill
Strive not to be a success, but rather to be of value. –Albert Einstein
Press ENTER to close the automated browser


In [3]:
## this code does not work with Wildcard use find_elements_*
from selenium import webdriver
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
driver = webdriver.Chrome()
# Set an implicit wait
driver.implicitly_wait(10)
driver.get(url)
for quote in driver.find_element_*:
    print(quote.text)
input('Press ENTER to close the automated browser')
driver.quit()

SyntaxError: invalid syntax (<ipython-input-3-65f6334f66a1>, line 7)

implicit time in silenium

In [5]:
from selenium import webdriver
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
driver = webdriver.Chrome()
# Set an implicit wait
driver.implicitly_wait(10)
driver.get(url)
for quote in driver.find_elements_by_class_name('quote'):
    print(quote.text)
input('Press ENTER to close the automated browser')
driver.quit()

Life is about making an impact, not making an income. -Kevin Kruse
Whatever the mind of man can conceive and believe, it can achieve. –Napoleon Hill
Strive not to be a success, but rather to be of value. –Albert Einstein
Press ENTER to close the automated browser


Expilicit Waiting 

In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'http://www.webscrapingfordatascience.com/complexjavascript/'
driver = webdriver.Chrome()
driver.get(url)
quote_elements = WebDriverWait(driver, 10).until(
 EC.presence_of_all_elements_located(
 (By.CSS_SELECTOR, ".quote:not(.decode)")
 )
)
print(quote_elements)
for quote in quote_elements:
    print(quote.text)
input('Press ENTER to close the automated browser')
driver.quit()

[<selenium.webdriver.remote.webelement.WebElement (session="c466a3fe59fcfb3a9bcbd53a3645a68a", element="d9bf7563-d5cf-426f-b32f-a307b66e422e")>, <selenium.webdriver.remote.webelement.WebElement (session="c466a3fe59fcfb3a9bcbd53a3645a68a", element="e583a57c-d4b7-4025-a937-1428bc6ea257")>, <selenium.webdriver.remote.webelement.WebElement (session="c466a3fe59fcfb3a9bcbd53a3645a68a", element="4c5b8589-d688-4cbe-b280-200575f5228d")>]
Life is about making an impact, not making an income. -Kevin Kruse
Whatever the mind of man can conceive and believe, it can achieve. –Napoleon Hill
Strive not to be a success, but rather to be of value. –Albert Einstein
Press ENTER to close the automated browser


Define a custom condition in silenium 

In [12]:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
class at_least_n_elements_found(object):
    def __init__(self, locator, n):
        self.locator = locator
        self.n = n
    def __call__(self, driver):
 # Do something here and return False or something else
 # Depending on whether the condition holds
        elements = driver.find_elements(*self.locator)
        if len(elements) >= self.n:
            return elements
        else:
            return False
driver.get(url)
element = WebDriverWait(driver, 10).until(
 at_least_n_elements_found((By.CLASS_NAME, 'my_class'), 3)
)


# for quote in element:
#     print(quote.text)
# input('Press ENTER to close the automated browser')
# driver.quit()


In [4]:
class at_least_n_elements_found(object):
    def __init__(self, locator, n):
        self.locator = locator
        self.n = n
    def __call__(self, driver):
 # Do something here and return False or something else
 # Depending on whether the condition holds
        elements = driver.find_elements(*self.locator)
        if len(elements) >= self.n:
            return elements
        else:
            return False
element = at_least_n_elements_found((By.CLASS_NAME, 'my_class'), 3)

In [8]:
print(element.locator)

('class name', 'my_class')
