# Basic BeautifulSoup Demo
modified from [eholowko](https://github.com/eholowko/bootcamp-jupyter-web_scraping/blob/master/Weather_in_San_Francisco.ipynb)

**Goal**: Scrape weather.gov to collect the "Extended Forecast" information:

- Day: ```period```
- Short Description: ```short_desc```
- Full Description: ```desc```
- Temperature in Fahrenheit: ```temp_fahrenheit```

## Tasks to Achive the Goal

1. Get the "Extended Forecast" from weather.gov for Blacksburg, VA.
2. Isolate the "Extended Forecast" information
3. Clean the data into the above variables.
4. Output as a CSV


## Import code libs

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime, timedelta

## "Get" a requested web page

In [2]:
bb_page = requests.get("https://forecast.weather.gov/MapClick.php?lat=37.2265&lon=-80.4109")

## Parse the request's ```Response``` object ```bb_page```

Objects include multiple properties that you can access. So, the ```requests.get()``` assigned a [```Response```](https://requests.readthedocs.io/en/latest/api/#requests.Response) object to the ariable ```bb_page```. 

You can use the following 2 properties to access the HTML differently:

1. ```bb_page.text``` returns a Unicode ```string```.
2. ```bb_page.content``` returns the HTML as ```bytes```

We will use the ```bb_page.content``` below to tell BS to help us parse the "soup" of HTML.

(Hey, I didn't name these things. lol)

In [3]:
soup = BeautifulSoup(bb_page.content, 'html.parser')

Use the ```.find``` method to isolate single elements. In thise case, using an ```id``` attribute with a particular value of interest.

In [4]:
extended_forecast = soup.find(id="seven-day-forecast")
print(extended_forecast.prettify())

<div class="panel panel-default" id="seven-day-forecast">
 <div class="panel-heading">
  <b>
   Extended Forecast for
  </b>
  <h2 class="panel-title">
   Blacksburg VA
  </h2>
 </div>
 <div class="panel-body" id="seven-day-forecast-body">
  <div id="seven-day-forecast-container">
   <ul class="list-unstyled" id="seven-day-forecast-list">
    <li class="forecast-tombstone">
     <div class="tombstone-container">
      <p class="period-name">
       Veterans
       <br/>
       Day
      </p>
      <p>
       <img alt="Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. " class="forecast-icon" src="newimages/medium/shra100.png" title="Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. "/>
      </

Use the ```.find_all``` method to get ALL of the elements with the ```class``` attribute of 'tombstone-cotnainer'

In [5]:
forecast_items = extended_forecast.find_all(class_="tombstone-container")

print(forecast_items)

[<div class="tombstone-container">
<p class="period-name">Veterans<br/>Day</p>
<p><img alt="Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. " class="forecast-icon" src="newimages/medium/shra100.png" title="Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. "/></p><p class="short-desc">Showers</p><p class="temp temp-high">High: 68 °F</p></div>, <div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Showers, mainly before 11pm.  Low around 53. South wind 7 to 13 mph becoming west after midnight. Winds could gust as high as 26 mph.  Chance of precipitation is 90%. New precipitation amounts between a quarter and half of an inch possible. " class="fore

Check the data to ensure it is what you want

In [6]:
today = forecast_items[0]
print(today.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Veterans
  <br/>
  Day
 </p>
 <p>
  <img alt="Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. " class="forecast-icon" src="newimages/medium/shra100.png" title="Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. "/>
 </p>
 <p class="short-desc">
  Showers
 </p>
 <p class="temp temp-high">
  High: 68 °F
 </p>
</div>


In [7]:
period = today.find(class_="period-name").get_text()
short_desc = today.find(class_="short-desc").get_text()
temp = today.find(class_="temp").get_text()

print(period)
print(short_desc)
print(temp)

VeteransDay
Showers
High: 68 °F


In [8]:
img = today.find("img")

desc = img["title"]
print(desc)

Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. 


In [9]:
period_tags = extended_forecast.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['VeteransDay',
 'Tonight',
 'Saturday',
 'SaturdayNight',
 'Sunday',
 'SundayNight',
 'Monday',
 'MondayNight',
 'Tuesday']

In [10]:
short_desc = [sd.get_text() for sd in extended_forecast.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in extended_forecast.select(".tombstone-container .temp")]
desc = [d["title"] for d in extended_forecast.select(".tombstone-container img")]
print(short_desc)
print(temps)
print(desc)

['Showers', 'Showers thenChanceShowers', 'ChanceShowers', 'Slight ChanceShowers', 'Mostly Sunny', 'Mostly Clear', 'Mostly Sunny', 'Partly Cloudy', 'ChanceRain/Snowthen ChanceRain']
['High: 68 °F', 'Low: 53 °F', 'High: 57 °F', 'Low: 32 °F', 'High: 41 °F', 'Low: 24 °F', 'High: 45 °F', 'Low: 26 °F', 'High: 41 °F']
['Veterans Day: Showers and possibly a thunderstorm.  High near 68. Southeast wind around 10 mph.  Chance of precipitation is 100%. New rainfall amounts between a quarter and half of an inch possible. ', 'Tonight: Showers, mainly before 11pm.  Low around 53. South wind 7 to 13 mph becoming west after midnight. Winds could gust as high as 26 mph.  Chance of precipitation is 90%. New precipitation amounts between a quarter and half of an inch possible. ', 'Saturday: A chance of showers, mainly after 11am.  Partly sunny, with a high near 57. West wind around 6 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. ', 'Saturday Nigh

In [11]:
weather = pd.DataFrame({"period":periods,
                       "short_desc":short_desc,
                       "temp":temps,
                       "desc":desc})
weather

Unnamed: 0,period,short_desc,temp,desc
0,VeteransDay,Showers,High: 68 °F,Veterans Day: Showers and possibly a thunderst...
1,Tonight,Showers thenChanceShowers,Low: 53 °F,"Tonight: Showers, mainly before 11pm. Low aro..."
2,Saturday,ChanceShowers,High: 57 °F,"Saturday: A chance of showers, mainly after 11..."
3,SaturdayNight,Slight ChanceShowers,Low: 32 °F,Saturday Night: A slight chance of showers bef...
4,Sunday,Mostly Sunny,High: 41 °F,"Sunday: Mostly sunny, with a high near 41. Nor..."
5,SundayNight,Mostly Clear,Low: 24 °F,"Sunday Night: Mostly clear, with a low around 24."
6,Monday,Mostly Sunny,High: 45 °F,"Monday: Mostly sunny, with a high near 45."
7,MondayNight,Partly Cloudy,Low: 26 °F,"Monday Night: Partly cloudy, with a low around..."
8,Tuesday,ChanceRain/Snowthen ChanceRain,High: 41 °F,"Tuesday: A chance of snow before 10am, then a ..."


In [12]:
temp_nums = weather["temp"].str.extract("(\d+)",expand=False)
weather["temp_fahrenheit"] = temp_nums.astype("int")
temp_nums

0    68
1    53
2    57
3    32
4    41
5    24
6    45
7    26
8    41
Name: temp, dtype: object

In [13]:
weather = weather.drop(["temp"],axis=1)

In [14]:
weather

Unnamed: 0,period,short_desc,desc,temp_fahrenheit
0,VeteransDay,Showers,Veterans Day: Showers and possibly a thunderst...,68
1,Tonight,Showers thenChanceShowers,"Tonight: Showers, mainly before 11pm. Low aro...",53
2,Saturday,ChanceShowers,"Saturday: A chance of showers, mainly after 11...",57
3,SaturdayNight,Slight ChanceShowers,Saturday Night: A slight chance of showers bef...,32
4,Sunday,Mostly Sunny,"Sunday: Mostly sunny, with a high near 41. Nor...",41
5,SundayNight,Mostly Clear,"Sunday Night: Mostly clear, with a low around 24.",24
6,Monday,Mostly Sunny,"Monday: Mostly sunny, with a high near 45.",45
7,MondayNight,Partly Cloudy,"Monday Night: Partly cloudy, with a low around...",26
8,Tuesday,ChanceRain/Snowthen ChanceRain,"Tuesday: A chance of snow before 10am, then a ...",41


In [15]:
weather["desc"] = weather["desc"].str.replace("([\w]*[\s]*[\w]*:)([\w]*)","")

  weather["desc"] = weather["desc"].str.replace("([\w]*[\s]*[\w]*:)([\w]*)","")


Add datetime objects

In [16]:
def add_datetime(df_weather,current_day):
  '''
  Add datetime objects to the day-night pairs
  '''
  track_period = 0
  dict_weather = df_weather.to_dict('records')
  dict_weather_date = []
  for period in range(len(dict_weather)):
    
    # Check if first set
    if period == 0:

      dict_weather_date.append({
        'date': (datetime.now() + timedelta(days=(track_period))),
        'period': dict_weather[period]['period'],
        'short_desc': dict_weather[period]['short_desc'],
        'desc': dict_weather[period]['desc'],
        'temp_fahrenheit': dict_weather[period]['temp_fahrenheit']
      })

      dict_weather_date.append({
        'date': (datetime.now() + timedelta(days=(track_period))),
        'period': dict_weather[(period+1)]['period'],
        'short_desc': dict_weather[(period+1)]['short_desc'],
        'desc': dict_weather[(period+1)]['desc'],
        'temp_fahrenheit': dict_weather[(period+1)]['temp_fahrenheit']
      })

      # Update track_period
      track_period = track_period+1

    else:

      # Check if last period
      if period == (len(dict_weather)-1):

        dict_weather_date.append({
          'date': (datetime.now() + timedelta(days=(track_period))),
          'period': dict_weather[period]['period'],
          'short_desc': dict_weather[period]['short_desc'],
          'desc': dict_weather[period]['desc'],
          'temp_fahrenheit': dict_weather[period]['temp_fahrenheit']
        })
        
      else:
        # Check if even
        if period % 2 == 0:

          dict_weather_date.append({
            'date': (datetime.now() + timedelta(days=(track_period))),
            'period': dict_weather[period]['period'],
            'short_desc': dict_weather[period]['short_desc'],
            'desc': dict_weather[period]['desc'],
            'temp_fahrenheit': dict_weather[period]['temp_fahrenheit']
          })

          dict_weather_date.append({
            'date': (datetime.now() + timedelta(days=(track_period))),
            'period': dict_weather[(period+1)]['period'],
            'short_desc': dict_weather[(period+1)]['short_desc'],
            'desc': dict_weather[(period+1)]['desc'],
            'temp_fahrenheit': dict_weather[(period+1)]['temp_fahrenheit']
          })

          # Update track_period
          track_period = track_period+1

  df_weather_date = pd.DataFrame(dict_weather_date)

  return df_weather_date

In [17]:
current_day = datetime.now()

df_weather_datetime = add_datetime(weather, current_day)

df_weather_datetime

Unnamed: 0,date,period,short_desc,desc,temp_fahrenheit
0,2022-11-11 14:58:26.984030,VeteransDay,Showers,Showers and possibly a thunderstorm. High ne...,68
1,2022-11-11 14:58:26.984036,Tonight,Showers thenChanceShowers,"Showers, mainly before 11pm. Low around 53. ...",53
2,2022-11-12 14:58:26.984037,Saturday,ChanceShowers,"A chance of showers, mainly after 11am. Part...",57
3,2022-11-12 14:58:26.984058,SaturdayNight,Slight ChanceShowers,A slight chance of showers before 1am. Mostl...,32
4,2022-11-13 14:58:26.984063,Sunday,Mostly Sunny,"Mostly sunny, with a high near 41. Northwest ...",41
5,2022-11-13 14:58:26.984064,SundayNight,Mostly Clear,"Mostly clear, with a low around 24.",24
6,2022-11-14 14:58:26.984066,Monday,Mostly Sunny,"Mostly sunny, with a high near 45.",45
7,2022-11-14 14:58:26.984066,MondayNight,Partly Cloudy,"Partly cloudy, with a low around 26.",26
8,2022-11-15 14:58:26.984067,Tuesday,ChanceRain/Snowthen ChanceRain,"A chance of snow before 10am, then a chance o...",41


In [18]:
# Output as CSV
df_weather_datetime.to_csv('./weather-gov-seven-day-forecast.csv', index=False)