# Apple Mobility Trends

Extracting mobility trends data from Apple website:

URL: https://www.apple.com/covid19/mobility



Requirement for using selenium for automating browser interaction:

- install browser driver:
https://sites.google.com/a/chromium.org/chromedriver/downloads
- have Chrome browser installed

If you have a problem while starting selenium, it's likely your Chrome has been upgraded.
Check the link above for new version of the webdriver, put it at ..\drivers and try again.


## Workflow

This notebook performs the following steps:

1. Open the website
1. Click 'All Data CSV' button
1. Download the csv file
1. Upload the file to vipenta into '/home/omrworker/projects/iea_scraper/filestore'
    
Each step is described below. To execute each one, click on the code and 'Ctrl+Enter'.

## Step by Step

### 1. Open the website

In [1]:
from pathlib import Path
import utils

DOWNLOAD_TIMEOUT = 120
WAIT = 10

# get browser driver's path
cd_cmd = !echo %cd%
current_dir = Path(cd_cmd[0])
root_dir = current_dir.parent.parent

# Form URL
APPLE_URL = "https://www.apple.com/covid19/mobility"

driver = utils.get_driver(current_dir)
driver.get(APPLE_URL)


download dir: G:\OMRiqmacros\codebase\scraper\AppleMobilityTrends 
driver path: G:\OMRiqmacros\codebase\drivers\chromedriver.exe


### 4. Download the file, rename it and upload it

The following code:

1. downloads the file 
1. upload it to the server
1. closes the browser

Wait until the results appear on the screen, and then Shift+Enter on the next cell.

In [2]:
from selenium.webdriver.support.ui import Select
import time
import datetime
from datetime import datetime, timedelta
from time import sleep

today_minus_two = datetime.strftime(datetime.now() - timedelta(days=2),'%Y-%m-%d')
default_filename="applemobilitytrends-"+today_minus_two+".csv"
current_path = current_dir / default_filename

# remove existing file
try:
   current_path.unlink()
except FileNotFoundError:
    pass

time.sleep(3)
# download file
download_button = driver.find_element_by_class_name('download-button-container')
download_button.click()

# name of the file
elems = driver.find_elements_by_css_selector(".download-button-container [href]")
for elem in elems:
    link=elem.get_attribute('href')
    filename=link[-34:]

new_path = current_dir / filename
    
# wait for the file
utils.wait_file(new_path, WAIT, DOWNLOAD_TIMEOUT)


# let's upload the file
remote_path = f'/home/omrworker/projects/iea_scraper/filestore/{filename}'
utils.upload_file(new_path, remote_path)

print('Successfully finished.')
driver.close()

File G:\OMRiqmacros\codebase\scraper\AppleMobilityTrends\applemobilitytrends-2020-04-25.csv available. Waited 10 seconds.
Uploading file from G:\OMRiqmacros\codebase\scraper\AppleMobilityTrends\applemobilitytrends-2020-04-25.csv to /home/omrworker/projects/iea_scraper/filestore/applemobilitytrends-2020-04-25.csv
Successfully finished.


# HAPPY END!

To exit, go to menu "File" and select "Quit".
Ask Luis or Pierre to run the process to load the file into IEA-External-DB.

In [4]:
# For the transformation of the dataframe:

import pandas as pd

#Reading the CSV file
data = pd.read_csv("filestore/applemobilitytrends-2020-04-26.csv")

display(data)

new_dataframe=pd.melt(data,id_vars=['geo_type','region','transportation_type', 'alternative_name'],var_name='date',value_name='value')

display(new_dataframe)


Unnamed: 0,geo_type,region,transportation_type,alternative_name,2020-01-13,2020-01-14,2020-01-15,2020-01-16,2020-01-17,2020-01-18,...,2020-04-17,2020-04-18,2020-04-19,2020-04-20,2020-04-21,2020-04-22,2020-04-23,2020-04-24,2020-04-25,2020-04-26
0,country/region,Albania,driving,,100.0,95.30,101.43,97.20,103.55,112.67,...,29.26,22.94,24.55,31.51,33.59,31.69,33.94,30.22,25.22,30.39
1,country/region,Albania,walking,,100.0,100.68,98.93,98.46,100.85,100.13,...,34.58,27.76,27.93,36.72,34.46,35.39,34.80,34.63,29.00,35.22
2,country/region,Argentina,driving,,100.0,97.07,102.45,111.21,118.45,124.01,...,27.17,23.19,14.54,26.67,27.25,27.61,28.73,30.99,25.92,16.57
3,country/region,Argentina,walking,,100.0,95.11,101.37,112.67,116.72,114.14,...,18.80,17.03,10.59,18.44,19.01,18.47,20.39,22.32,23.31,16.36
4,country/region,Australia,driving,,100.0,102.98,104.21,108.63,109.08,89.00,...,47.51,36.90,53.34,56.93,58.06,59.69,62.87,47.84,41.89,55.39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1153,sub-region,Île-de-France Region,driving,,100.0,102.06,103.61,110.01,107.47,113.34,...,18.66,16.11,15.53,19.42,19.96,20.49,20.81,19.81,17.59,17.29
1154,sub-region,Örebro County,driving,Örebro län,100.0,101.70,105.33,108.03,115.57,123.96,...,114.92,109.73,109.08,103.07,107.73,112.93,107.07,120.65,116.40,114.68
1155,sub-region,Östergötland County,driving,,100.0,96.87,98.81,103.48,109.64,113.50,...,103.26,102.60,101.58,102.81,102.61,109.11,106.38,114.71,117.23,108.38
1156,sub-region,Ústí nad Labem Region,driving,Ústecký kraj,100.0,101.57,107.63,112.87,120.32,126.19,...,112.33,113.68,103.49,101.84,108.48,107.01,108.22,107.30,107.54,119.49


Unnamed: 0,geo_type,region,transportation_type,alternative_name,date,value
0,country/region,Albania,driving,,2020-01-13,100.00
1,country/region,Albania,walking,,2020-01-13,100.00
2,country/region,Argentina,driving,,2020-01-13,100.00
3,country/region,Argentina,walking,,2020-01-13,100.00
4,country/region,Australia,driving,,2020-01-13,100.00
...,...,...,...,...,...,...
121585,sub-region,Île-de-France Region,driving,,2020-04-26,17.29
121586,sub-region,Örebro County,driving,Örebro län,2020-04-26,114.68
121587,sub-region,Östergötland County,driving,,2020-04-26,108.38
121588,sub-region,Ústí nad Labem Region,driving,Ústecký kraj,2020-04-26,119.49


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)

In [3]:
%cd ..

C:\Users\ROSA_L\PycharmProjects\scraper


In [5]:
from scraper.jobs.com_apple.light_job import Job

apple = Job()
apple.run()

2020-05-25 19:11:14,051 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://127.0.0.1:56109/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "platformName": "any", "goog:chromeOptions": {"prefs": {"download.default_directory": "C:\\Users\\ROSA_L\\PycharmProjects\\scraper\\filestore"}, "extensions": [], "args": ["--headless", "--disable-dev-shm-usage"]}}}, "desiredCapabilities": {"browserName": "chrome", "version": "", "platform": "ANY", "goog:chromeOptions": {"prefs": {"download.default_directory": "C:\\Users\\ROSA_L\\PycharmProjects\\scraper\\filestore"}, "extensions": [], "args": ["--headless", "--disable-dev-shm-usage"]}}}
2020-05-25 19:11:14,056 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:56109
2020-05-25 19:11:15,946 - urllib3.connectionpool - DEBUG - http://127.0.0.1:56109 "POST /session HTTP/1.1" 200 681
2020-05-25 19:11:15,951 - selenium.webdriver.remote.remote_connection - DEBUG - Finished

## Test scraper

Testing the scraper.


In [3]:
%cd ..
from scraper.core import factory

job = factory.get_scraper_job('com_apple', 'mobility_trends')
job.run()

C:\Users\ROSA_L\PycharmProjects


  exec(code_obj, self.user_global_ns, self.user_ns)
