# Selenium On Docker
This notebook does the following:
1. Spin up an exernal Selenium Docker container on the host.
2. Configures the remote Selenium Webdriver.
3. Sends commands to the Selenium Webdriver:
    We will be downloading Tour de France GPX routes from Strava.
4. Remove the container.

## Imports

In [14]:
import docker
import os
import time
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options

## Start Selenium container

In [15]:
cwd = os.getcwd()
local_downloads = '{}/downloads'.format(cwd)
sel_downloads = '/home/seluser/downloads'
client = docker.from_env()
container = client.containers.run('selenium/standalone-chrome', \
        volumes=['{}:{}'.format(local_downloads, sel_downloads),
                 '/dev/shm:/dev/shm'], \
        ports={'4444/tcp':4444},
        network='container_bridge',
        detach=True)
cli = docker.APIClient()

## Configure webdriver

In [16]:
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920x1080")
chrome_driver = '{}:4444/wd/hub'.format('http://127.0.0.1') # This is only required for local development

# wait for remote, unless timeout.
while True:
    try:
        driver = webdriver.Remote(
            command_executor=chrome_driver,
            desired_capabilities=DesiredCapabilities.CHROME, options=options)
        print('remote ready')
        break
    except:
        print('remote not ready, sleeping for ten seconds.')
        time.sleep(10)
        
# Enable downloads in headless chrome.
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': sel_downloads}}
command_result = driver.execute("send_command", params)

remote not ready, sleeping for ten seconds.
remote ready


## Download Tour de France GPX files
The configured webdriver will be used to download the GPX files of the 2019 Tour de France from Strava.

## Import strava creds
Since we're logging into Strava via Facebook, the creds is simpy a python file of the form:
* email = Facebook_Email
* password = Facebook_Password

In [17]:
import creds

In [18]:
from selenium_scripts.strava_commands import race_gpx

In [19]:
# Get the activity feed of our Athlete.
# Since we're after the Tour de France, we are interested in the month of July.


url = 'https://www.strava.com/pros/1855274' + \
        '#interval_type?chart_type=miles&interval_type=month' + \
        '&interval=201907&year_offset=0'

race_gpx(driver, creds.email, creds.password, url, '2019-07-09', local_downloads)

Successfully logged in to strava
Site not rendered correctly, trying again in 5 seconds
Site not rendered correctly, trying again in 5 seconds
Site not rendered correctly, trying again in 5 seconds
file successfully downloded to /Users/harry.daniels/Documents/medium/airflow_selenium/downloads/TDF_stage_4.gpx


## Remove docker container

In [20]:
container.remove(force=True)
print('Removed container: {}'.format(container.id))

Removed container: 263c0b7ad9d68333838937726e6ff69b746e65f71782f02aa2ed19e0e322803b
