Let's extract information about the local weather from the National Weather Service website.

In [None]:
weather_url = "https://www.theweathernetwork.com/ca/weather/alberta/calgary"

import requests
result = requests.get(weather_url)

result.text[:100]

The page has information about the extended forecast for the next week, including time of day, temperature, and a brief description of the conditions.


The result is quite messy! Let’s make this more readable:

In [None]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(result.text, 'html.parser')

print(soup.prettify()[:1000])

Let’s define a function to request and parse a HTML web page

In [None]:
def getAndParseURL(url):
    result = requests.get(url)
    soup = BeautifulSoup(result.text, 'html.parser')
    return(soup)

print(getAndParseURL(weather_url).prettify()[:1000])

Let's inspect the page using Chrome Devtools. 

In [5]:
getAndParseURL(weather_url).find("div", class_ = "wx-info_card")

<div class="wx-info_card"> <!-- TITLE --> <h1> <span>	Calgary, <abbr title="Alberta">AB</abbr> Weather	</span> </h1> <span class="updatedon" id="updatedon"></span> </div>

Let’s dive deeper in the tree by adding the other child tags:

In [6]:
getAndParseURL(weather_url).find("div", class_ = "wx-info_card").h1.span

<span>	Calgary, <abbr title="Alberta">AB</abbr> Weather	</span>

But we only need the text contained in the ‘h1’ value.
We can get this by adding .get(“h1”) to the previous instruction:

In [7]:
result_1 = getAndParseURL(weather_url).find("div", class_ = "wx-info_card").h1.get_text()
print(result_1)

result_2 = getAndParseURL(weather_url).find("div", class_ = "wx-info_card").h1.get_text()[2:-2]
print(result_2)

 	Calgary, AB Weather	 
Calgary, AB Weather


The page has information about the extended forecast for the next week, temperature, and a brief description of the conditions.

In [8]:
getAndParseURL(weather_url).find("div", id = "sevenday_nitro").get_text()

'   Next 7 Days    ☀    Show/Hide   ☾        Feels like   Night   Day   POP   Wind ()   Wind gust ()   Hrs of Sun       \n\n'

In [60]:
seven_day = getAndParseURL(weather_url).find("div", class_ = "sevenDay")

column_titles = [x.get_text() for x in seven_day.findAll("div", class_ = "legendColumn")]

column_titles.insert(0, 'Date') 
column_titles[0:]



['Date',
 'Feels like',
 'Night',
 'Day',
 'POP',
 'Wind ()',
 'Wind gust ()',
 'Hrs of Sun']

In [83]:
#seven_day_nitro = getAndParseURL(weather_url).find("div", class_ = "wxRow").get_text()
#seven_day_nitro
#table_body = seven_day.find("div", class_ = "divTableBody").div
#table_body
date_column = [a.get_text() for a in getAndParseURL(weather_url).findAll("div", class_ = "wxRow")]
column_titles[0:]

['Date',
 'Feels like',
 'Night',
 'Day',
 'POP',
 'Wind ()',
 'Wind gust ()',
 'Hrs of Sun']

We can now combine the data into a Pandas DataFrame and analyze it.

In [49]:
import pandas as pd
weather_7day = pd.DataFrame(columns=column_titles)
weather_7day

Unnamed: 0,Date,Feels like,Night,Day,POP,Wind (),Wind gust (),Hrs of Sun
