# Cliff Diving Web Scraping

This code scrapes cliff diving spots from the [Ultimate Wild Trip](https://www.ultimatewildtrip.com/en/cliff-diving-in-australia/) website.

The Google Maps Geolocation API is then used to get the latitude and longitude of the spot so it can be plotted on the map. Geopy is a convenient python wrapper for the API.

In [6]:
import requests, bs4
import pandas as pd
from urllib.parse import urljoin
from geopy.geocoders import GoogleV3
from geopy.extra.rate_limiter import RateLimiter

To use the Google Maps Geolocation service, you need an API key. You can sign up for free [here](https://console.cloud.google.com/home) then input the key below.

In [1]:
api_key = input()

In [12]:
geolocator = GoogleV3(api_key=api_key,
                      domain='maps.googleapis.com',
                      user_agent="get-indoor-rock-climbing-coord")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=0.3)

In [8]:
def get_soup(url):
    res = requests.get(url)
    res.raise_for_status()
    return bs4.BeautifulSoup(res.text, "lxml")

The function **get_soup** is used to get the raw html. All of the diving spots are in a *div* tag with the name *et_pb_blurb_description* as highlighted below:


![img](CliffDiving.png)

In [9]:
soup = get_soup('https://www.ultimatewildtrip.com/en/cliff-diving-in-australia/')
spots = soup.find_all("div", {"class": "et_pb_blurb_description"})


The place, location and height can then be extracted using a combination of finding elements, splitting and combining into an f-string.

In [14]:
df = pd.DataFrame(columns=['name', 'Sport', 'routes', 'url', 'lat', 'long'])

for i, spot in enumerate(spots):
    df.loc[i, 'url'] = spot.find('iframe')['src']
    df.loc[i, 'location'] = f"{spot.find_all('p')[1].text.split(':')[1]} {spot.find_all('p')[2].text.split(':')[1]}"
    try:
        df.loc[i, 'routes'] = f"{spot.find_all('p')[3].text.split(':')[1].strip()}"
    except IndexError:
        continue
df.Sport = "Cliff Diving"

Finally, the geocoding is applied to get the latitude and longitude.

In [13]:
df['geo'] = df.location.apply(geocode)
df['lat'] = df.geo.apply(lambda x: x.latitude if x else pd.NA)
df['long'] = df.geo.apply(lambda x: x.longitude if x else pd.NA)
df.name = df.location.apply(lambda x: x.split(',')[0])
df

Unnamed: 0,name,Sport,routes,url,lat,long,location,geo
0,Cliffs in the Ord River The Kimberley,Cliff Diving,> 12-28m,https://www.youtube-nocookie.com/embed/IMCyRdR...,-17.437711,127.999223,"Cliffs in the Ord River The Kimberley, WA","(Bungle Bungle Caravan Park & Tour Company, Gr..."
1,Jervis Bay New South Wales,Cliff Diving,>6-20m,https://www.youtube-nocookie.com/embed/tTC18X3...,-35.04808,150.744677,Jervis Bay New South Wales,"(Jervis Bay, New South Wales, Australia, (-35...."
2,Jerusalem Bay Sydney,Cliff Diving,> 10m,https://www.youtube-nocookie.com/embed/vJh_N8M...,-33.589133,151.186589,"Jerusalem Bay Sydney, NSW","(Jerusalem Bay Trail, New South Wales, Austral..."
3,Blairgowrie jumping rock Bridgwater Bay,Cliff Diving,> 7m,https://www.youtube-nocookie.com/embed/Qw0D5Ds...,-38.371971,144.766982,"Blairgowrie jumping rock Bridgwater Bay, Vic...","(Bridgewater Bay, Stairs, Blairgowrie VIC 3942..."
4,Tasmania >5-15m,Cliff Diving,,https://www.youtube-nocookie.com/embed/M9yX335...,-42.040906,146.808732,Tasmania >5-15m,"(Tasmania, Australia, (-42.0409059, 146.8087323))"
5,Darwin,Cliff Diving,,https://www.youtube-nocookie.com/embed/3BTFtyW...,-12.463733,130.844445,"Darwin, NT >5-18m","(Darwin NT, Australia, (-12.4637333, 130.84444..."
6,Watson Bay-Camp Cove Sydney,Cliff Diving,>6m,https://www.youtube-nocookie.com/embed/TFAf3rk...,-33.839307,151.278377,"Watson Bay-Camp Cove Sydney, NSW","(Camp Cove, New South Wales, Australia, (-33.8..."
7,New South Wales >6-15m,Cliff Diving,,https://www.youtube-nocookie.com/embed/yNefNOT...,-31.253218,146.921099,New South Wales >6-15m,"(New South Wales, Australia, (-31.2532183, 146..."
8,Tahmoor,Cliff Diving,,https://www.youtube-nocookie.com/embed/f3agMEx...,-34.228844,150.590756,"Tahmoor, NSW >6-14m","(Tahmoor NSW 2573, Australia, (-34.2288439, 15..."
9,Mount Martha,Cliff Diving,,https://www.youtube-nocookie.com/embed/QQOZ9F8...,-38.267,145.018,"Mount Martha, Melbourne >5.50m","(Mount Martha VIC 3934, Australia, (-38.267, 1..."


In [62]:
df.drop(['geo', 'location'], axis=1).to_excel('_data/cliffdiving.xlsx', index=None)