# Process: Scrape Server REST Endpoints and Measure Performance

Organization: Esri

Author: Alberto Nieto (anieto@esri.com)

Date: 8/31/2018

## Process Overview:

1. Parse NTAD HTML to get the list of individual service URLs

2. Iterate: For each service URL:

    - Send an export request with specified URL params
    - Measure total seconds for response

3. Build report
    - For each service, compile metrics

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import arcgis

In [2]:
def build_services_list_from_rest_endpoint(rest_endpoint_url):
    
    page = requests.get(rest_endpoint_a)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    services_list = []
    
    for a in soup.find_all('a', href=True):
        services_list.append(rest_endpoint_url+a['href'])
    
    return services_list

In [3]:
def measure_response(url):
    return requests.get(url).elapsed.total_seconds()

In [6]:
t1_a_url = r"https://maps.bts.dot.gov/services/rest/services/NTAD/North_American_Rail_Lines/MapServer"
t1_b_url = r"https://geo.dot.gov/server/rest/services/NTAD/North_American_Rail_Lines/MapServer"
t1_c_url = r"https://geo.dot.gov/server/rest/services/Railroad_Lines_DataStore_MapImage/MapServer"

In [7]:
base_test_url_query = r"/export?dpi=96&transparent=true&format=png8&bbox=-17696592.60548181%2C1556663.6270353035%2C-8229104.803719484%2C9486111.894215303&bboxSR=102100&imageSR=102100&size=1108%2C928&f=image"

In [8]:
measure_response(t1_a_url+base_test_url_query)

18.155502

In [9]:
measure_response(t1_b_url+base_test_url_query)

327.317757

In [12]:
measure_response(t1_c_url+base_test_url_query)

0.347701

#### Iteration metrics gathering

In [25]:
rest_endpoint_a = r"https://maps.bts.dot.gov/services/rest/services/NTAD"
rest_endpoint_b = r"https://geo.dot.gov/server/rest/services/NTAD"

In [39]:
page = requests.get(rest_endpoint_a)
soup = BeautifulSoup(page.content, 'html.parser')
soup

<html lang="en">
<head>
<title>Folder: NTAD</title>
<link href="/services/rest/static/main.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<table class="userTable" width="100%">
<tr>
<td class="titlecell">
ArcGIS REST Services Directory
</td>
<td align="right">
<a href="https://maps.bts.dot.gov/services/login?returnUrl=https://maps.bts.dot.gov/services/rest/services">Login</a>
</td>
</tr>
</table>
<table class="navTable" width="100%">
<tr valign="top">
<td class="breadcrumbs">
<a href="/services/rest/services">Home</a>
&gt; <a href="/services/rest/services">services</a>
&gt; <a href="/services/rest/services/NTAD">NTAD</a>
</td>
<td align="right">
<a href="https://maps.bts.dot.gov/services/sdk/rest/02ss/02ss00000057000000.htm" target="_blank">Help</a> | <a href="https://maps.bts.dot.gov/services/rest/services/NTAD?f=help" target="_blank">API Reference</a>
</td>
</tr>
</table>
<table>
<tr>
<td class="apiref">
<a href="?f=pjson" target="_blank">JSON</a>
| <a href="https://maps.bts.

In [38]:
list(soup.find_all('a', href=True))

[<a href="https://maps.bts.dot.gov/services/login?returnUrl=https://maps.bts.dot.gov/services/rest/services">Login</a>,
 <a href="/services/rest/services">Home</a>,
 <a href="/services/rest/services">services</a>,
 <a href="/services/rest/services/NTAD">NTAD</a>,
 <a href="https://maps.bts.dot.gov/services/sdk/rest/02ss/02ss00000057000000.htm" target="_blank">Help</a>,
 <a href="https://maps.bts.dot.gov/services/rest/services/NTAD?f=help" target="_blank">API Reference</a>,
 <a href="?f=pjson" target="_blank">JSON</a>,
 <a href="https://maps.bts.dot.gov/services/services/NTAD?wsdl">SOAP</a>,
 <a href="http://www.arcgis.com/home/webmap/viewer.html?featurecollection=https%3A%2F%2Fmaps.bts.dot.gov%2Fservices%2Frest%2Fservices%2FNTAD%3Ff%3Djson%26option%3Dfootprints&amp;supportsProjection=true&amp;supportsJSONP=true" target="_blank">ArcGIS Online map viewer</a>,
 <a href="/services/rest/services/NTAD/Airports/MapServer">NTAD/Airports</a>,
 <a href="/services/rest/services/NTAD/AlternativeFu

In [37]:
for a in soup.find_all('a', href=True):
    print("Found the URL:", a['href'])

Found the URL: https://maps.bts.dot.gov/services/login?returnUrl=https://maps.bts.dot.gov/services/rest/services
Found the URL: /services/rest/services
Found the URL: /services/rest/services
Found the URL: /services/rest/services/NTAD
Found the URL: https://maps.bts.dot.gov/services/sdk/rest/02ss/02ss00000057000000.htm
Found the URL: https://maps.bts.dot.gov/services/rest/services/NTAD?f=help
Found the URL: ?f=pjson
Found the URL: https://maps.bts.dot.gov/services/services/NTAD?wsdl
Found the URL: http://www.arcgis.com/home/webmap/viewer.html?featurecollection=https%3A%2F%2Fmaps.bts.dot.gov%2Fservices%2Frest%2Fservices%2FNTAD%3Ff%3Djson%26option%3Dfootprints&supportsProjection=true&supportsJSONP=true
Found the URL: /services/rest/services/NTAD/Airports/MapServer
Found the URL: /services/rest/services/NTAD/AlternativeFuelingStations/MapServer
Found the URL: /services/rest/services/NTAD/Amtrak_Routes/MapServer
Found the URL: /services/rest/services/NTAD/Amtrak_Stations/MapServer
Found th

In [42]:
soup.find_all('li')

[<li><a href="/services/rest/services/NTAD/Airports/MapServer">NTAD/Airports</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/AlternativeFuelingStations/MapServer">NTAD/AlternativeFuelingStations</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Amtrak_Routes/MapServer">NTAD/Amtrak_Routes</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Amtrak_Stations/MapServer">NTAD/Amtrak_Stations</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Bikeshare/MapServer">NTAD/Bikeshare</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Congressional_Districts/MapServer">NTAD/Congressional_Districts</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/CoreBasedStatisticalAreas/MapServer">NTAD/CoreBasedStatisticalAreas</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Counties/MapServer">NTAD/Counties</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Dams/MapServer">NTAD/Dams</a> (MapServe

In [43]:
for li in soup.find_all('li'):
    print(li[0].attrs['href'])

KeyError: 0

In [52]:
tags = soup.find_all('li')
tags

[<li><a href="/services/rest/services/NTAD/Airports/MapServer">NTAD/Airports</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/AlternativeFuelingStations/MapServer">NTAD/AlternativeFuelingStations</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Amtrak_Routes/MapServer">NTAD/Amtrak_Routes</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Amtrak_Stations/MapServer">NTAD/Amtrak_Stations</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Bikeshare/MapServer">NTAD/Bikeshare</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Congressional_Districts/MapServer">NTAD/Congressional_Districts</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/CoreBasedStatisticalAreas/MapServer">NTAD/CoreBasedStatisticalAreas</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Counties/MapServer">NTAD/Counties</a> (MapServer)</li>,
 <li><a href="/services/rest/services/NTAD/Dams/MapServer">NTAD/Dams</a> (MapServe

In [53]:
type(tags)

bs4.element.ResultSet

In [67]:
for t in tags:
    print(t.text)

NTAD/Airports (MapServer)
NTAD/AlternativeFuelingStations (MapServer)
NTAD/Amtrak_Routes (MapServer)
NTAD/Amtrak_Stations (MapServer)
NTAD/Bikeshare (MapServer)
NTAD/Congressional_Districts (MapServer)
NTAD/CoreBasedStatisticalAreas (MapServer)
NTAD/Counties (MapServer)
NTAD/Dams (MapServer)
NTAD/Fatality_Analysis_Reporting_System (MapServer)
NTAD/Freight_Analysis_Framework_Network (MapServer)
NTAD/Freight_Analysis_Framework_Regions (MapServer)
NTAD/Highway_Performance_Monitoring_System_Arterial (MapServer)
NTAD/Highway_Performance_Monitoring_System_Freeways (MapServer)
NTAD/Highway_Performance_Monitoring_System_Interstate (MapServer)
NTAD/Highway_Performance_Monitoring_System_Major_Collector (MapServer)
NTAD/Highway_Performance_Monitoring_System_Minor_Arterial (MapServer)
NTAD/Highway_Performance_Monitoring_System_Minor_Collector (MapServer)
NTAD/Highway_Performance_Monitoring_System (MapServer)
NTAD/Intermodal_Passenger_Connectivity_Database_IPCD (MapServer)
NTAD/Intermodal_Transit_F

In [41]:
build_services_list_from_rest_endpoint(rest_endpoint_a)

['https://maps.bts.dot.gov/services/rest/services/NTADhttps://maps.bts.dot.gov/services/login?returnUrl=https://maps.bts.dot.gov/services/rest/services',
 'https://maps.bts.dot.gov/services/rest/services/NTAD/services/rest/services',
 'https://maps.bts.dot.gov/services/rest/services/NTAD/services/rest/services',
 'https://maps.bts.dot.gov/services/rest/services/NTAD/services/rest/services/NTAD',
 'https://maps.bts.dot.gov/services/rest/services/NTADhttps://maps.bts.dot.gov/services/sdk/rest/02ss/02ss00000057000000.htm',
 'https://maps.bts.dot.gov/services/rest/services/NTADhttps://maps.bts.dot.gov/services/rest/services/NTAD?f=help',
 'https://maps.bts.dot.gov/services/rest/services/NTAD?f=pjson',
 'https://maps.bts.dot.gov/services/rest/services/NTADhttps://maps.bts.dot.gov/services/services/NTAD?wsdl',
 'https://maps.bts.dot.gov/services/rest/services/NTADhttp://www.arcgis.com/home/webmap/viewer.html?featurecollection=https%3A%2F%2Fmaps.bts.dot.gov%2Fservices%2Frest%2Fservices%2FNTAD