# Scraping the NATIONAL WEATHER SERVICE

<img src="https://i.imgur.com/ISXSvo0.png">

Bueno, lo primero que se hace para realizar este ejercicio es explorar la estructura de la pagina web, eso se hace directamente desde l navegador por herramientas del desarrollador (en Chrome).

Allí podemos navegar por la estrctura y encontrar especificamente el id y la clase que queremos buscar.

En este caso, se quiere obtener el forecast de los próximos 7 días en la ciudad de San Francisco.

<img src="https://i.imgur.com/ZVFj0Fm.jpg">

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
# Solicitar acceso a la pagina web
page = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")

# crear el objeto soup
soup = BeautifulSoup(page.content, 'html.parser')

# Buscar el id correspondiente a los seven days forecast
seven_day = soup.find(id="seven-day-forecast")

# dentro del id buscar la class correspondiente a los forecast
forecast_items = seven_day.find_all(class_="tombstone-container")

forecast_items

[<div class="tombstone-container">
 <p class="period-name">Today<br/><br/></p>
 <p><img alt="Today: Partly sunny, with a high near 60. West wind 11 to 20 mph, with gusts as high as 25 mph. " class="forecast-icon" src="newimages/medium/bkn.png" title="Today: Partly sunny, with a high near 60. West wind 11 to 20 mph, with gusts as high as 25 mph. "/></p><p class="short-desc">Partly Sunny</p><p class="temp temp-high">High: 60 °F</p></div>,
 <div class="tombstone-container">
 <p class="period-name">Tonight<br/><br/></p>
 <p><img alt="Tonight: Increasing clouds, with a low around 51. West wind 15 to 20 mph, with gusts as high as 25 mph. " class="forecast-icon" src="newimages/medium/nbkn.png" title="Tonight: Increasing clouds, with a low around 51. West wind 15 to 20 mph, with gusts as high as 25 mph. "/></p><p class="short-desc">Increasing<br/>Clouds</p><p class="temp temp-low">Low: 51 °F</p></div>,
 <div class="tombstone-container">
 <p class="period-name">Sunday<br/><br/></p>
 <p><img alt

In [4]:
# obtener el pronostico de tonight
tonight = forecast_items[1]

print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Tonight
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Tonight: Increasing clouds, with a low around 51. West wind 15 to 20 mph, with gusts as high as 25 mph. " class="forecast-icon" src="newimages/medium/nbkn.png" title="Tonight: Increasing clouds, with a low around 51. West wind 15 to 20 mph, with gusts as high as 25 mph. "/>
 </p>
 <p class="short-desc">
  Increasing
  <br/>
  Clouds
 </p>
 <p class="temp temp-low">
  Low: 51 °F
 </p>
</div>


Dentro del pronóstico de Tonight hay 4 elementos, class=period-name, img alt que contiene la descripción de
las condiciones, class = short-desc que contiene Increasing Clouds y class = temp temp-low que contiene la temperatura.

procederemos a obtener estos valores limpios.

In [5]:
# Buscar dentro de tonight la clase correspondiente
period = tonight.find(class_="period-name").get_text()

short_desc = tonight.find(class_="short-desc").get_text()

temp = tonight.find(class_="temp").get_text()

print(period)
print(short_desc)
print(temp)

Tonight
IncreasingClouds
Low: 51 °F


In [6]:
img = tonight.find("img")
desc = img['title']

print(desc)

Tonight: Increasing clouds, with a low around 51. West wind 15 to 20 mph, with gusts as high as 25 mph. 


# Extraer toda la información a Pandas

<img src="https://i.imgur.com/oeRlYx8.jpg">

In [18]:
# Buscar dentro de seven_day la class tombstone-container y dentro de esta .period-name 
period_tags = seven_day.select(".tombstone-container .period-name")

periods = [pt.get_text() for pt in period_tags]

print(type(period_tags))
periods

<class 'list'>


['Today',
 'Tonight',
 'Sunday',
 'SundayNight',
 'Monday',
 'MondayNight',
 'Tuesday',
 'TuesdayNight',
 'Wednesday']

In [10]:
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]

print(short_descs)
print()
print(temps)
print()
print(descs)

['Partly Sunny', 'IncreasingClouds', 'Patchy Fogthen PartlySunny andBreezy', 'Patchy Fog', 'Patchy Fogthen Sunny', 'Mostly Clear', 'Sunny', 'Partly Cloudy', 'Mostly Sunny']

['High: 60 °F', 'Low: 51 °F', 'High: 62 °F', 'Low: 51 °F', 'High: 65 °F', 'Low: 51 °F', 'High: 65 °F', 'Low: 53 °F', 'High: 63 °F']

['Today: Partly sunny, with a high near 60. West wind 11 to 20 mph, with gusts as high as 25 mph. ', 'Tonight: Increasing clouds, with a low around 51. West wind 15 to 20 mph, with gusts as high as 25 mph. ', 'Sunday: Patchy fog before 11am.  Otherwise, cloudy, then gradually becoming mostly sunny, with a high near 62. Breezy, with a west wind 13 to 22 mph, with gusts as high as 28 mph. ', 'Sunday Night: Patchy fog after 11pm.  Otherwise, mostly cloudy, with a low around 51. West wind 15 to 21 mph, with gusts as high as 26 mph. ', 'Monday: Patchy fog before 11am.  Otherwise, mostly sunny, with a high near 65. West wind 13 to 20 mph, with gusts as high as 25 mph. ', 'Monday Night: Most

<img src="https://i.imgur.com/ibqq4eW.png">

In [13]:
import pandas as pd
weather = pd.DataFrame({
        "period": periods, 
        "short_desc": short_descs, 
        "temp": temps, 
        "desc":descs
    })
weather

Unnamed: 0,desc,period,short_desc,temp
0,"Today: Partly sunny, with a high near 60. West...",Today,Partly Sunny,High: 60 °F
1,"Tonight: Increasing clouds, with a low around ...",Tonight,IncreasingClouds,Low: 51 °F
2,"Sunday: Patchy fog before 11am. Otherwise, cl...",Sunday,Patchy Fogthen PartlySunny andBreezy,High: 62 °F
3,Sunday Night: Patchy fog after 11pm. Otherwis...,SundayNight,Patchy Fog,Low: 51 °F
4,"Monday: Patchy fog before 11am. Otherwise, mo...",Monday,Patchy Fogthen Sunny,High: 65 °F
5,"Monday Night: Mostly clear, with a low around 51.",MondayNight,Mostly Clear,Low: 51 °F
6,"Tuesday: Sunny, with a high near 65.",Tuesday,Sunny,High: 65 °F
7,"Tuesday Night: Partly cloudy, with a low aroun...",TuesdayNight,Partly Cloudy,Low: 53 °F
8,"Wednesday: Mostly sunny, with a high near 63.",Wednesday,Mostly Sunny,High: 63 °F


http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extract.html

In [15]:
# Obtener los caracteres númerico bajo una expresión regular
temp_nums = weather["temp"].str.extract("(?P<temp_num>\d+)", expand=False)

# Convertirlos a int
weather["temp_num"] = temp_nums.astype('int')


print(temp_nums)
print(weather["temp_num"].mean())

0    60
1    51
2    62
3    51
4    65
5    51
6    65
7    53
8    63
Name: temp_num, dtype: object
57.888888888888886


In [16]:
is_night = weather["temp"].str.contains("Low")

weather["is_night"] = is_night

is_night

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
Name: temp, dtype: bool