-
Notifications
You must be signed in to change notification settings - Fork 13
Documentation
Here is an example of a single threaded automation. First, we will import all the appropriate pieces.
from time import sleep
# Database models used to interact with databases if needed.
from Project import Models
# The environment variable loader. These variables can be set in the .env file.
# This is important if we want to create configurable web automations / scrapers.
from Config.Environment import env, env_driver
# Controllers utilize a `WebDriver` instance in order to control the various web
# pages that need to be scraped.
from SiteAutomations.Examples import GoogleExample, BingExampleControllers are kept in the SiteAutomations folder.
# The quitting contexts helps to `close()` and `quit` the WebDriver instance if
# something goes wrong.
from Helpers.Contexts import quitting
from selenium.webdriver.support.wait import WebDriverWaitTo get a hold of our WebDriver instance, we need to use the env_driver and env functions. env('BROWSER') will return the name of the browser set in the .env file and env_driver takes the name of the browser, and returns
the appropriate WebDriver instance. The quitting function is used to open the WebDriver instance the same way you would open a file using with. When you add all of this together, you get:
# This could be written as:
#
# browser = env("BROWSER")
# web_driver = env_driver(browser)
# with quitting(web_driver()) as driver:
# pass
with quitting(env_driver(env("BROWSER"))()) as driver:
# Do stuff.Now that we have a valid WebDriver instance, we can instantiate our Controllers and do some work.
# Get an instance of `WebDriverWait`.
wait = WebDriverWait(driver, 30)
# Pass the web driver to the site automation along with anything
# else it might need to do its job. This could include an
# instance of `WebDriverWait`, and even the collection of
# Models.
google_search = GoogleExample.GoogleSearch(driver, wait, Models)
bing_search = BingExample.BingSearch(driver, wait, Models)
# Do stuff with your controllers.
google_search.do_search('google wiki')
sleep(5)
bing_search.do_search('bing wiki')
sleep(5)There are 2 types of controllers. The first type is really just a class that controls a WebDriver instance. The other is a class that inherits from IndependentController, and controls a WebDriver instance. The only difference is instances of IndpendentController attach their instance of WebDriver after they're instantiated using it's attach_driver method. This facilitates the use of the ThreadedCommandFactory and CommandFactory objects. A basic controller might look something like:
class Google(object):
def __init__(self, driver, wait):
self.driver = driver
self.wait = wait
def do_search(self, search_term):
self.driver.get('https://google.com')
# Type search
search_input = self.driver.find_element_by_name('q')
search_input.send_keys(search_term)
# Click search button.
search_button = self.driver.find_element_by_name('btnG')
search_button.click()
self.wait.until(lambda the_driver: the_driver.find_element_by_id('resultStats').is_displayed())
return selfOr, if you wanted to use one of the CommandFactory objects, it might look like this:
from Helpers.Controllers import has_kwargs
# Inherit from IndependentController to automatically get access to the `attach_driver` method.
class ThreadedGoogleSearch(IndependentController):
def __init__(self, models):
self.models = models
# Using the @has_kwargs decorator allows keyword arguments to be
# passed to the method. When you assemble a command pack for the
# CommandManager, just include an instance of the Kwargs object.
@has_kwargs
def do_search(self, search_term, some_kwarg='some value'):
print some_kwarg
self.driver.get('https://google.com')
# Type search
search_input = self.driver.find_element_by_name('q')
search_input.send_keys(search_term)
# Click search button.
search_button = self.driver.find_element_by_name('btnG')
search_button.click()
self.wait.until(lambda the_driver: the_driver.find_element_by_id('resultStats').is_displayed())
return selfThreadedCommandFactory and CommandFactory are used to create Command objects, which are used to execute Controller methods. This facilitates the use of separate WebDrivers for each Controller(each controller gets it's own browser). Both CommandFactory objects inherit from BaseCommandFactory, which sets up the dict like functionality, and also the base methods that make up the factories. In order to use one of these factories, you must pass a dict of Controllers to the factory.
# Grab the Models that the Controllers need. They aren't used, just as an example.
from Project import Models
# Grab the Controllers.
from SiteAutomations.Examples import GoogleExample, BingExample
# And lastly the CommandFactory
from Helpers.Commands import ThreadedCommandFactory
# Here we set up the dict of controllers.
controllers = {
'google': GoogleExample.ThreadedGoogleSearch(Models),
'bing': BingExample.ThreadedBingSearch(Models)
}
# Get the CommandFactory instance by passing it the Controllers.
cmd_factory = ThreadedCommandFactory(controllers, logging=False)Once we have a CommandFactory, we can create the Command instance. The Command instance is used to execute the various commands(methods) your controllers have. This is done by creating a dict of tuples. Use the same keys you used in the dict of Controllers. Pass a function that takes a Controller as it's first argument, and this new dict to cmd_factory.create_command.
# Setting up the Command pack.
search_command = {
'google': ('google wiki',), # note how single arguments still need to be passed as a tuple
'bing': ('bing wiki',)
}
# Here we pass an anonymous function as the fist argument,
# and search_command as the second.
cmd = cmd_factory.create_command(
lambda controller, *args: controller.do_search(*args),
search_command
)
# Start the command!
cmd.start()This will execute the do_search method on each controller, in their own threads, meaning it will only take as long as the longest method to finish executing.
Jobs allow you to create what are basically commands that are executed from the command line interface, or by using the Project.Jobs.run_job function. Jobs are kept in the Projects/Jobs directory.
If running the Job using the run_job function, the filename of the Job minus the extension is used in the function call. Projects/Jobs/ExampleJob.py, for example, would be executed using run_job('ExampleJob').