# Scraping JavaScripts

A lot of credit is owed to <a href="https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ">sentdex</a> for his awesome tutorials.

In this tutorial we will elarn how to scrap dynamically updated data from webpages.

In [None]:
js_test = soup.find('p', class_='jtest')
print(js_test)

The problem is that we are not a client, we are not a browser. What we need to do is to mimic the behavior of a browser and run the javascript hidden in the source code

In [None]:
import sys

import bs4 as bs

from PyQt5.QtGui import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebKitWidgets import QWebPage

<code>sys></code> module is required since <code>PyQT</code> wants access to system variables.

What we will do now is create our own browser

In [None]:
class Client(QWebPage):
    
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec()
        
    def on_page_load(self):
        self.app.quit()

Client <a href="http://www.python-course.eu/python3_inheritance.php">inherits</a> from QWebpage. In the first step we are defining our <code>app</code> application by asigning the <code>QApplication</code> class to. We then initialize the <code>QWebpage</code>. Then, after initializing this class it we are connecting the method <code>on_page_load</code>, which we write ourselves, and execute it right after the loading is finished

We are giving the <code>url</code> to Qt. 

In [None]:
url = 'https://dacatay.com'
client_response = Client(url)
source = client_response.mainFrame().toHtml()

<code>client_response</code> is now a <code>QWebPage</code> object and we can use it methods. We will than create our <code>soup</code> with the <code>source</code> like so

In [None]:
soup = bs.BeautifulSoup(source, 'lxml')


and again run our test from the beginning

In [None]:
js_test = soup.find('p', class_='jtest')
print(js_test)

## Some remarks on efficiency

The way we have implemented our Client above will ultimately result in two bottlenecks for us with regard to efficiency. Those being server request and response latencies as well our CPU processing.

The technique above

Another problem is that server request and response times may not be instant. For instance, if you are trying to crawl 100 URLs at an average latency of 300 miliseconds, the crawling will take a whopping 30 seconds, not taking into account the time to process the code.

 is that we are not able to utilize multi-threading. However, we willt ackle the topic of multithreading in this tutorial on how to build an automated web crawler.