## Scraping Events on visitseattle.org

https://visitseattle.org/

- Where are our data of interest?
  - List page
  - Detail page
- How to turn pages for list page?
    - URL parameters
    - Pagination
- How to get data from detail page?
    - HTML structure
    - CSS selector

In [3]:
import requests

In [4]:
res = requests.get("https://visitseattle.org/events/page/1")

In [5]:
with open("visitseattle.html", "w") as f:
    f.write(res.text)

In [6]:
from bs4 import BeautifulSoup

In [7]:
soup = BeautifulSoup(res.text, "html.parser")

In [9]:
links = soup.select("#searchform div.search-result-preview > div > h3 > a")
links

[<a href="https://visitseattle.org/events/glen-teriyaki/" title="Glen Teriyaki">Glen Teriyaki </a>,
 <a href="https://visitseattle.org/events/greta-matassa-sextet/" title="Greta Matassa Sextet">Greta Matassa Sextet </a>,
 <a href="https://visitseattle.org/events/holding-absence/" title="Holding Absence">Holding Absence </a>,
 <a href="https://visitseattle.org/events/nellie-mckay/" title="Nellie McKay">Nellie McKay </a>,
 <a href="https://visitseattle.org/events/amber-liu/" title="Amber Liu">Amber Liu </a>,
 <a href="https://visitseattle.org/events/disability-justice/" title="Disability Justice">Disability Justice </a>,
 <a href="https://visitseattle.org/events/hughes-bros-presents/" title="Hughes Bros Presents">Hughes Bros Presents </a>,
 <a href="https://visitseattle.org/events/sarya-wu/" title="sarya wu">sarya wu </a>,
 <a href="https://visitseattle.org/events/the-sweet-lillies/" title="The Sweet Lillies">The Sweet Lillies </a>]

In [10]:
urls = [link["href"] for link in links]
urls

['https://visitseattle.org/events/glen-teriyaki/',
 'https://visitseattle.org/events/greta-matassa-sextet/',
 'https://visitseattle.org/events/holding-absence/',
 'https://visitseattle.org/events/nellie-mckay/',
 'https://visitseattle.org/events/amber-liu/',
 'https://visitseattle.org/events/disability-justice/',
 'https://visitseattle.org/events/hughes-bros-presents/',
 'https://visitseattle.org/events/sarya-wu/',
 'https://visitseattle.org/events/the-sweet-lillies/']

In [12]:
# Get detail page

url = urls[0]

res = requests.get(url)

with open('./visitseattle_detail.html', 'w') as f:
    f.write(res.text)

In [13]:
soup = BeautifulSoup(res.text, "html.parser")

In [18]:
title_ele = soup.select_one("div.medium-6.columns.event-top > h1")
title_ele

<h1 class="page-title" itemprop="headline">Glen Teriyaki</h1>

In [19]:
title_ele.text

'Glen Teriyaki'

In [15]:
soup.select("div.medium-6.columns.event-top > h4")

[<h4><span>1/16/2024</span> | <span> Sea Monster Lounge</span></h4>]

## Web API

### Weather.gov
https://www.weather.gov/documentation/services-web-api
https://api.weather.gov/points/{latitude},{longitude}

### Geo location

https://nominatim.openstreetmap.org/search.php?q=seattle&format=jsonv2


In [16]:
soup.select("div.medium-6.columns.event-top > a:nth-child(3)")

[<a class="button big medium black category" href="/?s=&amp;frm=events&amp;event_type=music">Music</a>]

## Practice

Please finish the scraper for this page

In [21]:
import requests

In [24]:
location_name = "seattle"
res = requests.get(f"https://nominatim.openstreetmap.org/search.php?q={location}&format=jsonv2")
location = res.json()
location

[{'place_id': 312908827,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 237385,
  'lat': '47.6038321',
  'lon': '-122.330062',
  'category': 'boundary',
  'type': 'administrative',
  'place_rank': 16,
  'importance': 0.6729791735643788,
  'addresstype': 'city',
  'name': 'Seattle',
  'display_name': 'Seattle, King County, Washington, United States',
  'boundingbox': ['47.4810022', '47.7341354', '-122.4596960', '-122.2244330']}]

In [25]:
lat, lon = location[0]['lat'], location[0]['lon']

In [27]:
res = requests.get(f"https://api.weather.gov/points/{lat},{lon}")
weather_point = res.json()
weather_point

{'@context': ['https://geojson.org/geojson-ld/geojson-context.jsonld',
  {'@version': '1.1',
   'wx': 'https://api.weather.gov/ontology#',
   's': 'https://schema.org/',
   'geo': 'http://www.opengis.net/ont/geosparql#',
   'unit': 'http://codes.wmo.int/common/unit/',
   '@vocab': 'https://api.weather.gov/ontology#',
   'geometry': {'@id': 's:GeoCoordinates', '@type': 'geo:wktLiteral'},
   'city': 's:addressLocality',
   'state': 's:addressRegion',
   'distance': {'@id': 's:Distance', '@type': 's:QuantitativeValue'},
   'bearing': {'@type': 's:QuantitativeValue'},
   'value': {'@id': 's:value'},
   'unitCode': {'@id': 's:unitCode', '@type': '@id'},
   'forecastOffice': {'@type': '@id'},
   'forecastGridData': {'@type': '@id'},
   'publicZone': {'@type': '@id'},
   'county': {'@type': '@id'}}],
 'id': 'https://api.weather.gov/points/47.6038,-122.3301',
 'type': 'Feature',
 'geometry': {'type': 'Point', 'coordinates': [-122.3301, 47.6038]},
 'properties': {'@id': 'https://api.weather.gov

In [29]:
forecast_url = weather_point['properties']['forecast']
forecast_url

'https://api.weather.gov/gridpoints/SEW/125,68/forecast'

In [30]:
res = requests.get(forecast_url)
res.json()

{'@context': ['https://geojson.org/geojson-ld/geojson-context.jsonld',
  {'@version': '1.1',
   'wx': 'https://api.weather.gov/ontology#',
   'geo': 'http://www.opengis.net/ont/geosparql#',
   'unit': 'http://codes.wmo.int/common/unit/',
   '@vocab': 'https://api.weather.gov/ontology#'}],
 'type': 'Feature',
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-122.338331, 47.6159569],
    [-122.33210799999999, 47.5954304],
    [-122.30157759999999, 47.5996357],
    [-122.30779399999999, 47.6201625],
    [-122.338331, 47.6159569]]]},
 'properties': {'updated': '2024-01-16T16:40:22+00:00',
  'units': 'us',
  'forecastGenerator': 'BaselineForecastGenerator',
  'generatedAt': '2024-01-16T17:50:05+00:00',
  'updateTime': '2024-01-16T16:40:22+00:00',
  'validTimes': '2024-01-16T10:00:00+00:00/P7DT15H',
  'elevation': {'unitCode': 'wmoUnit:m', 'value': 73.152},
  'periods': [{'number': 1,
    'name': 'Today',
    'startTime': '2024-01-16T09:00:00-08:00',
    'endTime': '2024-01-16T18:00:00-0