## Case study: Weather!

### Downloading weather data

We now know enough to proceed with extracting information about the local weather from the National Weather Service website!

The local weather of Boulder, CO is: https://forecast.weather.gov/MapClick.php?lat=40.0466&lon=-105.2523#.YwpRBy2B1f0

Time to Start Scraping!

We now know enough to download the page and start parsing it. In the below code, we will:

*  Download the web page containing the forecast.
*  Create a BeautifulSoup class to parse the page.
*  Find the div with id seven-day-forecast, and assign to seven_day
*  Inside seven_day, find each individual forecast item.
Extract and print the first forecast item.


In [1]:
import requests
from bs4 import BeautifulSoup

page = requests.get("https://forecast.weather.gov/MapClick.php?lat=40.0466&lon=-105.2523#.YwpRBy2B1f0")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
print(forecast_items)

[<div class="tombstone-container"><p class="period-name">Today</p><p><img alt="Today: Mostly sunny, with a high near 57. Breezy, with a west wind 15 to 20 mph. " class="forecast-icon" src="newimages/medium/wind_sct.png" title="Today: Mostly sunny, with a high near 57. Breezy, with a west wind 15 to 20 mph. "/></p><p class="temp temp-high">High: 57 °F</p><p class="short-desc">Mostly Sunny<br/>and Breezy</p></div>, <div class="tombstone-container"><p class="period-name">Tonight</p><p><img alt="Tonight: A chance of rain showers after 11pm, mixing with snow after 4am.  Partly cloudy, with a low around 34. Windy, with a west wind 14 to 19 mph increasing to 27 to 32 mph after midnight.  Chance of precipitation is 30%. Little or no snow accumulation expected. " class="forecast-icon" src="DualImage.php?i=hi_nshwrs&amp;j=nrasn&amp;ip=30&amp;jp=30" title="Tonight: A chance of rain showers after 11pm, mixing with snow after 4am.  Partly cloudy, with a low around 34. Windy, with a west wind 14 to 

In [2]:
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Today
 </p>
 <p>
  <img alt="Today: Mostly sunny, with a high near 57. Breezy, with a west wind 15 to 20 mph. " class="forecast-icon" src="newimages/medium/wind_sct.png" title="Today: Mostly sunny, with a high near 57. Breezy, with a west wind 15 to 20 mph. "/>
 </p>
 <p class="temp temp-high">
  High: 57 °F
 </p>
 <p class="short-desc">
  Mostly Sunny
  <br/>
  and Breezy
 </p>
</div>



### Extracting information of tonight

As we can see, inside the forecast item tonight is all the information we want. There are four pieces of information we can extract:

*  The name of the forecast item — in this case, Tonight.
*  The description of the conditions — this is stored in the title property of img.
*  A short description of the conditions — in this case, Sunny and hot.
*  The temperature hight — in this case, 98 degrees.


We’ll extract the name of the forecast item, the short description, and the temperature first, since they’re all similar:

In [3]:
period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()
print(period)
print(short_desc)
print(temp)

Today
Mostly Sunnyand Breezy
High: 57 °F


Now, we can extract the title attribute from the img tag. To do this, we just treat the BeautifulSoup object like a dictionary, and pass in the attribute we want as a key:

In [4]:
img = tonight.find("img")
desc = img['title']
print(desc)

Today: Mostly sunny, with a high near 57. Breezy, with a west wind 15 to 20 mph. 


### Extract all nights!

Now that we know how to extract each individual piece of information, we can combine our knowledge with CSS selectors and list comprehensions to extract everything at once.

In the below code, we will:

Select all items with the class period-name inside an item with the class tombstone-container in seven_day.
Use a list comprehension to call the get_text method on each BeautifulSoup object.

In [5]:
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['Today',
 'Tonight',
 'Monday',
 'Monday Night',
 'Tuesday',
 'Tuesday Night',
 "New Year's Day",
 'Wednesday Night',
 'Thursday']

As we can see above, our technique gets us each of the period names, in order.

We can apply the same technique to get the other three fields:

In [6]:
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]

print(short_descs)
print(temps)
print(descs)

['Mostly Sunnyand Breezy', 'Windy. ChanceShowers thenChanceRain/Snow', 'PatchyBlowing Dustand VeryWindy', 'Partly Cloudy', 'Mostly Sunny', 'Mostly Clear', 'Mostly Sunny', 'Mostly Cloudy', 'Slight ChanceSnow Showersthen MostlySunny']
['High: 57 °F', 'Low: 34 °F', 'High: 43 °F', 'Low: 18 °F', 'High: 36 °F', 'Low: 16 °F', 'High: 40 °F', 'Low: 25 °F', 'High: 45 °F']
['Today: Mostly sunny, with a high near 57. Breezy, with a west wind 15 to 20 mph. ', 'Tonight: A chance of rain showers after 11pm, mixing with snow after 4am.  Partly cloudy, with a low around 34. Windy, with a west wind 14 to 19 mph increasing to 27 to 32 mph after midnight.  Chance of precipitation is 30%. Little or no snow accumulation expected. ', 'Monday: Patchy blowing dust before 3pm. Sunny, with a high near 43. Very windy, with a west wind 36 to 46 mph decreasing to 20 to 30 mph. Winds could gust as high as 70 mph. ', 'Monday Night: Partly cloudy, with a low around 18. West wind 6 to 11 mph becoming light  after midni

### Deal with data

We can now combine the data into a Pandas DataFrame and analyze it. A DataFrame is an object that can store tabular data, making data analysis easy. If you want to learn more about Pandas, check out our free to start course here.

In order to do this, we’ll call the DataFrame class, and pass in each list of items that we have. We pass them in as part of a dictionary.

Each dictionary key will become a column in the DataFrame, and each list will become the values in the column:

In [7]:
import pandas as pd
weather = pd.DataFrame({
    "period": periods,
    "short_desc": short_descs,
    "temp": temps,
    "desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,Today,Mostly Sunnyand Breezy,High: 57 °F,"Today: Mostly sunny, with a high near 57. Bree..."
1,Tonight,Windy. ChanceShowers thenChanceRain/Snow,Low: 34 °F,"Tonight: A chance of rain showers after 11pm, ..."
2,Monday,PatchyBlowing Dustand VeryWindy,High: 43 °F,"Monday: Patchy blowing dust before 3pm. Sunny,..."
3,Monday Night,Partly Cloudy,Low: 18 °F,"Monday Night: Partly cloudy, with a low around..."
4,Tuesday,Mostly Sunny,High: 36 °F,"Tuesday: Mostly sunny, with a high near 36. Ca..."
5,Tuesday Night,Mostly Clear,Low: 16 °F,"Tuesday Night: Mostly clear, with a low around..."
6,New Year's Day,Mostly Sunny,High: 40 °F,"New Year's Day: Mostly sunny, with a high near..."
7,Wednesday Night,Mostly Cloudy,Low: 25 °F,"Wednesday Night: Mostly cloudy, with a low aro..."
8,Thursday,Slight ChanceSnow Showersthen MostlySunny,High: 45 °F,Thursday: A slight chance of snow showers befo...


Now let's save it to CSV.

In [9]:
weather.to_csv('Boulder_Weather_7_Days.csv')

## Your Task
Now use your location, and repeat the process!

In [10]:
url = 'https://forecast.weather.gov/MapClick.php?lat=40.6925&lon=-73.991'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
tonight = forecast_items[0]

In [11]:
period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()

In [12]:
img = tonight.find("img")
desc = img['title']
print(desc,'\n')

period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods


This Afternoon: A 20 percent chance of showers before 1pm.  Mostly cloudy, with a high near 55. South wind 8 to 11 mph.  



['This Afternoon',
 'Tonight',
 'Monday',
 'Monday Night',
 'Tuesday',
 'Tuesday Night',
 "New Year's Day",
 'Wednesday Night',
 'Thursday']

In [13]:
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]

print(short_descs)
print(temps)
print(descs)

['Slight ChanceShowers', 'ShowersLikely thenHeavy Rainand Breezy', 'Showers thenSunny', 'Mostly Clear', 'Mostly Sunnythen SlightChanceShowers', 'Rain', 'ChanceShowers', 'Mostly Cloudy', 'Mostly Sunny']
['High: 55 °F', 'Low: 55 °F', 'High: 56 °F⇓', 'Low: 41 °F', 'High: 52 °F', 'Low: 50 °F', 'High: 51 °F', 'Low: 35 °F', 'High: 41 °F']
['This Afternoon: A 20 percent chance of showers before 1pm.  Mostly cloudy, with a high near 55. South wind 8 to 11 mph. ', 'Tonight: Showers, mainly after 10pm. The rain could be heavy at times.  Steady temperature around 55. Breezy, with a south wind 14 to 20 mph.  Chance of precipitation is 100%. New precipitation amounts between a half and three quarters of an inch possible. ', 'Monday: Showers, mainly before 7am.  Temperature falling to around 52 by 5pm. West wind around 14 mph, with gusts as high as 26 mph.  Chance of precipitation is 80%. New precipitation amounts between a tenth and quarter of an inch possible. ', 'Monday Night: Mostly clear, with 

In [14]:
import pandas as pd
weather = pd.DataFrame({
    "period": periods,
    "short_desc": short_descs,
    "temp": temps,
    "desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,This Afternoon,Slight ChanceShowers,High: 55 °F,This Afternoon: A 20 percent chance of showers...
1,Tonight,ShowersLikely thenHeavy Rainand Breezy,Low: 55 °F,"Tonight: Showers, mainly after 10pm. The rain ..."
2,Monday,Showers thenSunny,High: 56 °F⇓,"Monday: Showers, mainly before 7am. Temperatu..."
3,Monday Night,Mostly Clear,Low: 41 °F,"Monday Night: Mostly clear, with a low around ..."
4,Tuesday,Mostly Sunnythen SlightChanceShowers,High: 52 °F,Tuesday: A 20 percent chance of showers after ...
5,Tuesday Night,Rain,Low: 50 °F,Tuesday Night: Rain. Steady temperature aroun...
6,New Year's Day,ChanceShowers,High: 51 °F,New Year's Day: A 50 percent chance of showers...
7,Wednesday Night,Mostly Cloudy,Low: 35 °F,"Wednesday Night: Mostly cloudy, with a low aro..."
8,Thursday,Mostly Sunny,High: 41 °F,"Thursday: Mostly sunny, with a high near 41."


In [16]:
weather.to_csv('Brooklyn_Weather_7_Days.csv', index=False)