<img src="https://raw.githubusercontent.com/afo/data-x-plaksha/master/imgsource/dx_logo.png" align="left"></img><br><br><br><br>


## SOLUTIONS Breakout: Web scraping & web crawling

**Author List**: Alexander Fred Ojala

**Original Sources**: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ & https://www.dataquest.io/blog/web-scraping-tutorial-python/

**License**: Feel free to do whatever you want to with this code

**Compatibility:** Python 2.x and 3.x

---
<a id='sec4'></a>
# Breakout problem


In this week's breakout you should extract live weather data in Berkeley from:

[http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971](http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971)

* Task scrape
    * period / day (as Tonight, Friday, FridayNight etc.
    * the temperature for the period (as Low, High)
    * the long weather description (e.g. Partly cloudy, with a low around 49..)
    
Store the scraped data strings in a Pandas DataFrame



**Hint:** The weather information is found in a div tag with `id='seven-day-forecast'`



# Breakout solution

In [1]:
import requests
import bs4 as bs
import pandas as pd

source = requests.get('http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971').content
soup = bs.BeautifulSoup(source,features='lxml')

In [2]:
forecast = soup.find(id='seven-day-forecast')

In [3]:
print(forecast.prettify())

<div class="panel panel-default" id="seven-day-forecast">
 <div class="panel-heading">
  <b>
   Extended Forecast for
  </b>
  <h2 class="panel-title">
   Berkeley CA
  </h2>
 </div>
 <div class="panel-body" id="seven-day-forecast-body">
  <div id="seven-day-forecast-container">
   <ul class="list-unstyled" id="seven-day-forecast-list">
    <li class="forecast-tombstone">
     <div class="tombstone-container">
      <p class="period-name">
       Tonight
       <br/>
       <br/>
      </p>
      <p>
       <img alt="Tonight: Mostly clear, with a low around 46. North wind 6 to 8 mph. " class="forecast-icon" src="newimages/medium/nfew.png" title="Tonight: Mostly clear, with a low around 46. North wind 6 to 8 mph. "/>
      </p>
      <p class="short-desc">
       Mostly Clear
      </p>
      <p class="temp temp-low">
       Low: 46 °F
      </p>
     </div>
    </li>
    <li class="forecast-tombstone">
     <div class="tombstone-container">
      <p class="period-name">
       Thanksgi

In [4]:
day = [d.text for d in forecast.find_all(class_='period-name')]
temp = [temp.text for temp in forecast.find_all(class_='temp')]
desc = forecast.find_all('img')
short_desc = [f.text for f in forecast.find_all(class_="short-desc")]

In [5]:
print(day)
print()
print(temp)
print(short_desc)

['Tonight', 'ThanksgivingDay', 'ThursdayNight', 'Friday', 'FridayNight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight']

['Low: 46 °F', 'High: 63 °F', 'Low: 44 °F', 'High: 61 °F', 'Low: 42 °F', 'High: 61 °F', 'Low: 42 °F', 'High: 60 °F', 'Low: 44 °F']
['Mostly Clear', 'Sunny', 'Mostly Clear', 'Sunny', 'Mostly Clear', 'Sunny', 'Mostly Clear', 'Mostly Sunny', 'Partly Cloudy']


In [6]:
# extract weather description
desc_list=list()
for txt in desc:
    print(txt.get('alt'))
    desc_list.append(txt.get('alt'))

Tonight: Mostly clear, with a low around 46. North wind 6 to 8 mph. 
Thanksgiving Day: Sunny, with a high near 63. North northeast wind 10 to 14 mph. 
Thursday Night: Mostly clear, with a low around 44. Northeast wind 3 to 7 mph. 
Friday: Sunny, with a high near 61. Northeast wind around 7 mph. 
Friday Night: Mostly clear, with a low around 42. Calm wind. 
Saturday: Sunny, with a high near 61.
Saturday Night: Mostly clear, with a low around 42.
Sunday: Mostly sunny, with a high near 60.
Sunday Night: Partly cloudy, with a low around 44.


In [9]:
pd.set_option('display.max_colwidth', -1) # to print full results
df = pd.DataFrame({'day':day,'temp':temp, 'short_desc':short_desc,'desc':desc_list})
print('Berkeley 7 day weather forecast')
df

Berkeley 7 day weather forecast


  """Entry point for launching an IPython kernel.


Unnamed: 0,day,temp,short_desc,desc
0,Tonight,Low: 46 °F,Mostly Clear,"Tonight: Mostly clear, with a low around 46. North wind 6 to 8 mph."
1,ThanksgivingDay,High: 63 °F,Sunny,"Thanksgiving Day: Sunny, with a high near 63. North northeast wind 10 to 14 mph."
2,ThursdayNight,Low: 44 °F,Mostly Clear,"Thursday Night: Mostly clear, with a low around 44. Northeast wind 3 to 7 mph."
3,Friday,High: 61 °F,Sunny,"Friday: Sunny, with a high near 61. Northeast wind around 7 mph."
4,FridayNight,Low: 42 °F,Mostly Clear,"Friday Night: Mostly clear, with a low around 42. Calm wind."
5,Saturday,High: 61 °F,Sunny,"Saturday: Sunny, with a high near 61."
6,SaturdayNight,Low: 42 °F,Mostly Clear,"Saturday Night: Mostly clear, with a low around 42."
7,Sunday,High: 60 °F,Mostly Sunny,"Sunday: Mostly sunny, with a high near 60."
8,SundayNight,Low: 44 °F,Partly Cloudy,"Sunday Night: Partly cloudy, with a low around 44."


In [8]:
pd.options.display.max_colwidth=50 #change back to default max col_width