# Template

This is a template for a Jupyter notebook to help prototyping scrapers.

It contains the basic steps to setup a proper, helpful notebook.

## Autoreload

This extension will reload used python objects.

In [4]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Working directory

Let's make sure we are at the root working directory (not notebooks).

In [10]:
%pwd

'C:\\Users\\ROSA_L\\PycharmProjects\\scraper\\notebooks'

In [11]:
%cd ..

C:\Users\ROSA_L\PycharmProjects\scraper


## Logging
This will setup basic logging capabilities.

In [6]:
import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)

## Factory

The scraper factory provides a simple and programatic way to load scrapers dynamically, without need to declare imports.

In [12]:
from iea_scraper.core import factory

### Loading a scraper

The command to load a scraper using the factory is:

```
    job = factory.get_scraper_job(<module name>, <scraper package name>, [<optional parameters>, ...])
```
An example below:

In [13]:
job = factory.get_scraper_job('br_gov_anp', 'br_oil_prod', full_load=True)

2022-05-13 11:07:17,234 - iea_scraper.core.factory - DEBUG - Loading module iea_scraper.jobs.br_gov_anp.br_oil_prod
2022-05-13 11:07:17,422 - iea_scraper.core.factory - DEBUG - Getting class BrOilProdJob
2022-05-13 11:07:17,955 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:54311/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "pageLoadStrategy": "normal", "goog:chromeOptions": {"prefs": {"download.default_directory": "C:\\Users\\ROSA_L\\PycharmProjects\\scraper\\filestore"}, "extensions": [], "args": ["--headless", "--disable-dev-shm-usage", "window-size=1920x1480"]}}}, "desiredCapabilities": {"browserName": "chrome", "pageLoadStrategy": "normal", "goog:chromeOptions": {"prefs": {"download.default_directory": "C:\\Users\\ROSA_L\\PycharmProjects\\scraper\\filestore"}, "extensions": [], "args": ["--headless", "--disable-dev-shm-usage", "window-size=1920x1480"]}}}
2022-05-13 11:07:17,960 - urllib3.connectionpool - DEBU

In [14]:
del job

2022-05-13 11:07:25,401 - selenium.webdriver.remote.remote_connection - DEBUG - DELETE http://localhost:54311/session/bb2eb75268bfa2cfef6341c1e6ed9701/window {}
2022-05-13 11:07:25,424 - urllib3.connectionpool - DEBUG - http://localhost:54311 "DELETE /session/bb2eb75268bfa2cfef6341c1e6ed9701/window HTTP/1.1" 200 12
2022-05-13 11:07:25,425 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
