During the course, you will be working on a term project to either pull data from an API or scrape a webpage.  You will need to select either an API (different than Twitter) or a Webpage and create a process in Python that will extract data into a formatted dataset. 

* Your formatted dataset with at least 15-20 variables (if the API or Webpage you selected doesn’t have that many fields available on it, you will want to search again, or do multiple!)  

* Your code or screenshots of your code outlining the steps and process you had to take to pull data from the API or web page and the steps you took to format the data.  

* 2 Data Transformation/Clean-up Steps (can be any that we learned in class)  

* A 250-word paper summarizing your steps and any challenges you ran into during the project.  Discuss the importance and relevance of this type of process if you were a data scientist.  How often do you think you would have to do this to get the data you need? 

<font color="blue"> The following analysis contains 6 variables. THerefore, I would be submitting a second web scraping program to satisfy the 15-20 variables criteria. </font>

In this program, I am collecting wether data for my current location from *weather.gov*. Then I am putting it in a *pandas* data frame, doing basic clean up and doing some analysis on the data.

In [1]:
import requests
import pandas as pd

from bs4 import BeautifulSoup

# Parameterize the location
latitute = 41.25861
longitude = -95.93779

In [5]:
# Request weather data for my current location (Omaha, NE)
url = "http://forecast.weather.gov/MapClick.php?lat="+str(latitute)+"&lon="+str(longitude)
page = requests.get(url)
# Parse the page content and store in a beautiful soup object
soup = BeautifulSoup(page.content, 'html.parser')

# Find tag to grab the next seven days forecast data
find_day = soup.find_all('div')


bs4.element.ResultSet

In [3]:
seven_day = soup.find(id='seven-day-forecast')
seven_day

<div class="panel panel-default" id="seven-day-forecast">
<div class="panel-heading">
<b>Extended Forecast for</b>
<h2 class="panel-title">
	    	    2 Miles SSW Carter Lake IA	</h2>
</div>
<div class="panel-body" id="seven-day-forecast-body">
<div class="headline-title">Winter Weather Advisory</div>
</div></div><ul class="list-unstyled" id="seven-day-forecast-list" style="padding-top: 80px"><li class="forecast-tombstone current-hazard current-hazard-advisory" onclick="$('#headline-detail-now').toggle(); $('#headline-detail').hide()">
<div class="top-bar"> <div id="headline-detail-now"><div>Winter Weather Advisory until February 23, 12:00pm</div></div><span class="tab"></span><span class="fa fa-info-circle"></span></div><div class="tombstone-container">
<p class="period-name">NOW until<br/>12:00pm Sat</p>
<p><img alt="" class="forecast-icon" src="DualImage.php?i=nsn_ip&amp;j=nfzra&amp;ip=90&amp;jp=80" title=""/></p><p class="short-desc">Winter Weather Advisory</p></div></li><li class="

In [30]:
# After searching for the tag and container
forecast_items = seven_day.find_all(class_="tombstone-container")

# To check the data, print tonight's forecast
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Tonight
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Tonight: Cloudy, with a low around 21. East wind around 7 mph. " class="forecast-icon" src="newimages/medium/novc.png" title="Tonight: Cloudy, with a low around 21. East wind around 7 mph. "/>
 </p>
 <p class="short-desc">
  Cloudy
 </p>
 <p class="temp temp-low">
  Low: 21 °F
 </p>
</div>


In [31]:
# Store the periods
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]

# Store the corresponding summary forecasts
summary_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]

# Store the corresponding temperature
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]

# Store the corresponding detail forecasts
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]

print(periods)
print(summary_descs)
print(temps)
print(descs)

['Tonight', 'Friday', 'FridayNight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight', 'Monday', 'MondayNight']
['Cloudy', 'Partly Sunnythen ChanceWintry Mix', 'Wintry MixLikely thenChanceFreezingDrizzle', 'Chance WintryMix thenWintry Mixand PatchyBlowing Snow', 'Heavy Snowand AreasBlowing Snow', 'PatchyBlowing Snowand Blustery', 'Partly Cloudy', 'Mostly Cloudy', 'Mostly Cloudythen ChanceSnow']
['Low: 21 °F', 'High: 32 °F', 'Low: 31 °F', 'High: 32 °F', 'Low: 19 °F', 'High: 24 °F', 'Low: 5 °F', 'High: 19 °F', 'Low: 9 °F']
['Tonight: Cloudy, with a low around 21. East wind around 7 mph. ', 'Friday: A slight chance of freezing drizzle between 1pm and 3pm, then a chance of snow and freezing drizzle.  Mostly cloudy, with a high near 32. Southeast wind 7 to 10 mph.  Chance of precipitation is 20%.', 'Friday Night: Freezing drizzle likely, possibly mixed with snow and sleet before 7pm, then freezing drizzle likely, possibly mixed with sleet between 7pm and 10pm, then freezing drizzle like

In [32]:
# Create a pandas dataframe to store the next seven days of weather forecast
weather = pd.DataFrame({
        "Period": periods, 
        "Summary Forecast": summary_descs, 
        "Temperature": temps, 
        "Detail Forecast":descs
    })
weather

Unnamed: 0,Period,Summary Forecast,Temperature,Detail Forecast
0,Tonight,Cloudy,Low: 21 °F,"Tonight: Cloudy, with a low around 21. East wi..."
1,Friday,Partly Sunnythen ChanceWintry Mix,High: 32 °F,Friday: A slight chance of freezing drizzle be...
2,FridayNight,Wintry MixLikely thenChanceFreezingDrizzle,Low: 31 °F,"Friday Night: Freezing drizzle likely, possibl..."
3,Saturday,Chance WintryMix thenWintry Mixand PatchyBlowi...,High: 32 °F,"Saturday: Freezing drizzle and sleet, possibly..."
4,SaturdayNight,Heavy Snowand AreasBlowing Snow,Low: 19 °F,Saturday Night: Snow before midnight. The snow...
5,Sunday,PatchyBlowing Snowand Blustery,High: 24 °F,"Sunday: Patchy blowing snow before 4pm. Sunny,..."
6,SundayNight,Partly Cloudy,Low: 5 °F,"Sunday Night: Partly cloudy, with a low around 5."
7,Monday,Mostly Cloudy,High: 19 °F,"Monday: Mostly cloudy, with a high near 19."
8,MondayNight,Mostly Cloudythen ChanceSnow,Low: 9 °F,Monday Night: A chance of snow after midnight....


In [33]:
# I want to convert the temperature to centigrade and display both. To achieve that, I am extracting the numeric value
temp_nums = weather["Temperature"].str.extract("(?P<temp_num>\d+)", expand=False)
weather["temp_num"] = temp_nums.astype('int')
weather

Unnamed: 0,Period,Summary Forecast,Temperature,Detail Forecast,temp_num
0,Tonight,Cloudy,Low: 21 °F,"Tonight: Cloudy, with a low around 21. East wi...",21
1,Friday,Partly Sunnythen ChanceWintry Mix,High: 32 °F,Friday: A slight chance of freezing drizzle be...,32
2,FridayNight,Wintry MixLikely thenChanceFreezingDrizzle,Low: 31 °F,"Friday Night: Freezing drizzle likely, possibl...",31
3,Saturday,Chance WintryMix thenWintry Mixand PatchyBlowi...,High: 32 °F,"Saturday: Freezing drizzle and sleet, possibly...",32
4,SaturdayNight,Heavy Snowand AreasBlowing Snow,Low: 19 °F,Saturday Night: Snow before midnight. The snow...,19
5,Sunday,PatchyBlowing Snowand Blustery,High: 24 °F,"Sunday: Patchy blowing snow before 4pm. Sunny,...",24
6,SundayNight,Partly Cloudy,Low: 5 °F,"Sunday Night: Partly cloudy, with a low around 5.",5
7,Monday,Mostly Cloudy,High: 19 °F,"Monday: Mostly cloudy, with a high near 19.",19
8,MondayNight,Mostly Cloudythen ChanceSnow,Low: 9 °F,Monday Night: A chance of snow after midnight....,9


In [34]:
# Convert the temperature to centigrade and store it in a variable
temp_cent = pd.to_numeric(temp_nums)
temp_cent = (temp_cent-32)*(5/9)
temp_cent

0    -6.111111
1     0.000000
2    -0.555556
3     0.000000
4    -7.222222
5    -4.444444
6   -15.000000
7    -7.222222
8   -12.777778
Name: temp_num, dtype: float64

In [35]:
# Check average temperature for the next seven days
weather["temp_num"].mean()

21.333333333333332

In [36]:
# Store the centigrade value as a dataframe column
weather["temp_C"] = temp_cent.astype('int').astype('str')
weather

Unnamed: 0,Period,Summary Forecast,Temperature,Detail Forecast,temp_num,temp_C
0,Tonight,Cloudy,Low: 21 °F,"Tonight: Cloudy, with a low around 21. East wi...",21,-6
1,Friday,Partly Sunnythen ChanceWintry Mix,High: 32 °F,Friday: A slight chance of freezing drizzle be...,32,0
2,FridayNight,Wintry MixLikely thenChanceFreezingDrizzle,Low: 31 °F,"Friday Night: Freezing drizzle likely, possibl...",31,0
3,Saturday,Chance WintryMix thenWintry Mixand PatchyBlowi...,High: 32 °F,"Saturday: Freezing drizzle and sleet, possibly...",32,0
4,SaturdayNight,Heavy Snowand AreasBlowing Snow,Low: 19 °F,Saturday Night: Snow before midnight. The snow...,19,-7
5,Sunday,PatchyBlowing Snowand Blustery,High: 24 °F,"Sunday: Patchy blowing snow before 4pm. Sunny,...",24,-4
6,SundayNight,Partly Cloudy,Low: 5 °F,"Sunday Night: Partly cloudy, with a low around 5.",5,-15
7,Monday,Mostly Cloudy,High: 19 °F,"Monday: Mostly cloudy, with a high near 19.",19,-7
8,MondayNight,Mostly Cloudythen ChanceSnow,Low: 9 °F,Monday Night: A chance of snow after midnight....,9,-12


In [37]:
# Update the temperature string in the Temperature column with both F and C 
weather["Temperature"] = weather["Temperature"] + " / " + weather["temp_C"] + "°C"
weather

Unnamed: 0,Period,Summary Forecast,Temperature,Detail Forecast,temp_num,temp_C
0,Tonight,Cloudy,Low: 21 °F / -6°C,"Tonight: Cloudy, with a low around 21. East wi...",21,-6
1,Friday,Partly Sunnythen ChanceWintry Mix,High: 32 °F / 0°C,Friday: A slight chance of freezing drizzle be...,32,0
2,FridayNight,Wintry MixLikely thenChanceFreezingDrizzle,Low: 31 °F / 0°C,"Friday Night: Freezing drizzle likely, possibl...",31,0
3,Saturday,Chance WintryMix thenWintry Mixand PatchyBlowi...,High: 32 °F / 0°C,"Saturday: Freezing drizzle and sleet, possibly...",32,0
4,SaturdayNight,Heavy Snowand AreasBlowing Snow,Low: 19 °F / -7°C,Saturday Night: Snow before midnight. The snow...,19,-7
5,Sunday,PatchyBlowing Snowand Blustery,High: 24 °F / -4°C,"Sunday: Patchy blowing snow before 4pm. Sunny,...",24,-4
6,SundayNight,Partly Cloudy,Low: 5 °F / -15°C,"Sunday Night: Partly cloudy, with a low around 5.",5,-15
7,Monday,Mostly Cloudy,High: 19 °F / -7°C,"Monday: Mostly cloudy, with a high near 19.",19,-7
8,MondayNight,Mostly Cloudythen ChanceSnow,Low: 9 °F / -12°C,Monday Night: A chance of snow after midnight....,9,-12


In [38]:
# Add a flag for night weather
is_night = weather["Period"].str.contains("night|Night")
weather["Is_Night"] = is_night
weather

Unnamed: 0,Period,Summary Forecast,Temperature,Detail Forecast,temp_num,temp_C,Is_Night
0,Tonight,Cloudy,Low: 21 °F / -6°C,"Tonight: Cloudy, with a low around 21. East wi...",21,-6,True
1,Friday,Partly Sunnythen ChanceWintry Mix,High: 32 °F / 0°C,Friday: A slight chance of freezing drizzle be...,32,0,False
2,FridayNight,Wintry MixLikely thenChanceFreezingDrizzle,Low: 31 °F / 0°C,"Friday Night: Freezing drizzle likely, possibl...",31,0,True
3,Saturday,Chance WintryMix thenWintry Mixand PatchyBlowi...,High: 32 °F / 0°C,"Saturday: Freezing drizzle and sleet, possibly...",32,0,False
4,SaturdayNight,Heavy Snowand AreasBlowing Snow,Low: 19 °F / -7°C,Saturday Night: Snow before midnight. The snow...,19,-7,True
5,Sunday,PatchyBlowing Snowand Blustery,High: 24 °F / -4°C,"Sunday: Patchy blowing snow before 4pm. Sunny,...",24,-4,False
6,SundayNight,Partly Cloudy,Low: 5 °F / -15°C,"Sunday Night: Partly cloudy, with a low around 5.",5,-15,True
7,Monday,Mostly Cloudy,High: 19 °F / -7°C,"Monday: Mostly cloudy, with a high near 19.",19,-7,False
8,MondayNight,Mostly Cloudythen ChanceSnow,Low: 9 °F / -12°C,Monday Night: A chance of snow after midnight....,9,-12,True


In [39]:
# Prepare final dataset and get rid of redundant columns
weather = weather.drop(columns=["temp_num","temp_C"])
weather

Unnamed: 0,Period,Summary Forecast,Temperature,Detail Forecast,Is_Night
0,Tonight,Cloudy,Low: 21 °F / -6°C,"Tonight: Cloudy, with a low around 21. East wi...",True
1,Friday,Partly Sunnythen ChanceWintry Mix,High: 32 °F / 0°C,Friday: A slight chance of freezing drizzle be...,False
2,FridayNight,Wintry MixLikely thenChanceFreezingDrizzle,Low: 31 °F / 0°C,"Friday Night: Freezing drizzle likely, possibl...",True
3,Saturday,Chance WintryMix thenWintry Mixand PatchyBlowi...,High: 32 °F / 0°C,"Saturday: Freezing drizzle and sleet, possibly...",False
4,SaturdayNight,Heavy Snowand AreasBlowing Snow,Low: 19 °F / -7°C,Saturday Night: Snow before midnight. The snow...,True
5,Sunday,PatchyBlowing Snowand Blustery,High: 24 °F / -4°C,"Sunday: Patchy blowing snow before 4pm. Sunny,...",False
6,SundayNight,Partly Cloudy,Low: 5 °F / -15°C,"Sunday Night: Partly cloudy, with a low around 5.",True
7,Monday,Mostly Cloudy,High: 19 °F / -7°C,"Monday: Mostly cloudy, with a high near 19.",False
8,MondayNight,Mostly Cloudythen ChanceSnow,Low: 9 °F / -12°C,Monday Night: A chance of snow after midnight....,True


__<div style="text-align:center">End of Code</div>__