-
Notifications
You must be signed in to change notification settings - Fork 13
Documentation
This is where you can learn how to use the Slack micro-framework to scrape the web good, and do other things good too.
After installing the library, you'll need to generate a new project...
Open up a python console(or idle), do an import and call a function:
from Slack import make_project
make_project('some/folder/Some Awesome Project')This will create the project folder. Inside the project folder there will be a SiteAutomations folder for your Controllers, and some project files.
The .env file is used to hold static variables. It's used for things like setting up your database connection, or holding API keys:
# DB_TYPE values: sql, mysql, postgresql, berkeley
DB_TYPE=sql
DB=default.db
DB_HOST=localhost
DB_PORT=3306
DB_USERNAME=None
DB_PASSWORD=None
models.py is where you put your peewee models.
Finally, running the 'migrations.pyfile will drop all the tables in your database(based on what is defined inmodels.py`), then recreate them. It's used in the database design side of things when you're developing.
Here is an example of a single threaded automation. First, we will import all the appropriate pieces.
from time import sleep
# Database models used to interact with databases if needed.
import models
# The environment variable loader. These variables can be set in the .env file.
# This is important if we want to create configurable web automations / scrapers.
from Slack.Environment import env, env_driver
# Controllers utilize a `WebDriver` instance in order to control the various web
# pages that need to be scraped.
from Slack.SiteAutomations.Examples import GoogleExample, BingExampleControllers are kept in the SiteAutomations folder. In all of these examples, we are importing Slack's example controllers which are located at Slack/SiteAutomations/Examples, so you can play with those, or write your own Controllers in the SiteAutomations` folder in your projects folder.
# The quitting contexts helps to `close()` and `quit` the WebDriver instance if
# something goes wrong.
from Slack.Helpers.Contexts import quitting
from selenium.webdriver.support.wait import WebDriverWaitTo get a hold of our WebDriver instance, we need to use the env_driver and env functions. env('BROWSER') will return the name of the browser set in the .env file and env_driver takes the name of the browser, and returns
the appropriate WebDriver instance. The quitting function is used to open the WebDriver instance the same way you would open a file using with. When you add all of this together, you get:
# This could be written as:
#
# browser = env("BROWSER")
# web_driver = env_driver(browser)
# with quitting(web_driver()) as driver:
# pass
with quitting(env_driver(env("BROWSER"))()) as driver:
# Do stuff.Now that we have a valid WebDriver instance, we can instantiate our Controllers and do some work.
# Get an instance of `WebDriverWait`.
wait = WebDriverWait(driver, 30)
# Pass the web driver to the site automation along with anything
# else it might need to do its job. This could include an
# instance of `WebDriverWait`, and even the collection of
# Models.
google_search = GoogleExample.GoogleSearch(driver, wait, Models)
bing_search = BingExample.BingSearch(driver, wait, Models)
# Do stuff with your controllers.
google_search.do_search('google wiki')
sleep(5)
bing_search.do_search('bing wiki')
sleep(5)There are 2 types of controllers. The first type is really just a class that controls a WebDriver instance. The other is a class that inherits from IndependentController, and controls a WebDriver instance. The only difference is instances of IndpendentController attach their instance of WebDriver after they're instantiated using it's attach_driver method. This facilitates the use of the ThreadedCommandFactory and CommandFactory objects. A basic controller might look something like:
class Google(object):
def __init__(self, driver, wait):
self.driver = driver
self.wait = wait
def do_search(self, search_term):
self.driver.get('https://google.com')
# Type search
search_input = self.driver.find_element_by_name('q')
search_input.send_keys(search_term)
# Click search button.
search_button = self.driver.find_element_by_name('btnG')
search_button.click()
self.wait.until(lambda the_driver: the_driver.find_element_by_id('resultStats').is_displayed())
return selfOr, if you wanted to create Command objects with the ThreadedCommandFactory objects, it might look like this:
from Slack.Helpers.Controllers import has_kwargs
# Inherit from IndependentController to automatically get access to the `attach_driver` method.
class ThreadedGoogleSearch(IndependentController):
def __init__(self, models):
self.models = models
# Using the @has_kwargs decorator allows keyword arguments to be
# passed to the method. When you assemble a command pack for the
# CommandManager, just include an instance of the Kwargs object.
@has_kwargs
def do_search(self, search_term, some_kwarg='some value'):
print some_kwarg
self.driver.get('https://google.com')
# Type search
search_input = self.driver.find_element_by_name('q')
search_input.send_keys(search_term)
# Click search button.
search_button = self.driver.find_element_by_name('btnG')
search_button.click()
self.wait.until(lambda the_driver: the_driver.find_element_by_id('resultStats').is_displayed())
return selfThreadedCommandFactory and CommandFactory are used to create Command objects, which are used to execute Controller methods. This facilitates the use of separate WebDrivers for each Controller(each controller gets it's own browser). Both CommandFactory objects inherit from BaseCommandFactory, which sets up the dict like functionality, and also the base methods that make up the factories. In order to use one of these factories, you must pass a dict of Controllers to the factory.
# Grab the Models that the Controllers need. They aren't used, just as an example.
import models
# Grab the Example Controllers.
from Slack.SiteAutomations.Examples import GoogleExample, BingExample
# And lastly the CommandFactory
from Slack.Helpers.Commands import ThreadedCommandFactory
# Here we set up the dict of controllers.
controllers = {
'google': GoogleExample.ThreadedGoogleSearch(Models),
'bing': BingExample.ThreadedBingSearch(Models)
}
# Get the CommandFactory instance by passing it the Controllers.
cmd_factory = ThreadedCommandFactory(controllers, logging=False)Once we have a CommandFactory, we can create the Command instance. The Command instance is used to execute the various commands(methods) your controllers have. This is done by creating a dict of tuples. Use the same keys you used in the dict of Controllers. Pass a function that takes a Controller as it's first argument, and this new dict to cmd_factory.create_command.
# Setting up the Command pack.
search_command = {
'google': ('google wiki',), # note how single arguments still need to be passed as a tuple
'bing': ('bing wiki',)
}
# Here we pass an anonymous function as the fist argument,
# and search_command as the second.
cmd = cmd_factory.create_command(
lambda controller, *args: controller.do_search(*args),
search_command
)
# Start the command!
cmd.start()This will execute the do_search method on each controller, in their own threads, meaning it will only take as long as the longest method to finish executing.