# Getting 48h weather forecast and past 7d weather from NWS via webscraping

This script is intended to collect data from the National Weather Service (NWS) into a **_pandas_** dataframe and to create a simple overview figure using **_matplotlib_**. I am later planning to run this script periodically to create an updated figure. 

Data sources: 
1. [NWS Metar Reports](https://www.wrh.noaa.gov/zoa/getobext.php?sid=KCHO)
2. [NWS Hourly Forecast](https://forecast.weather.gov/MapClick.php?lat=38.1386&lon=-78.4528&lg=english&&FcstType=digital)

In [67]:
import requests
import pandas as pd
from bs4 import BeautifulSoup


In [20]:
SiteID = "KCHO"
SiteCoord = {'lat': 38.1386, 'lon': -78.4528} # Lat and Lon Coordinates 

ReportSite = 'https://www.wrh.noaa.gov/zoa/getobext.php?sid=' + SiteID 
ForecastSite = 'https://forecast.weather.gov/MapClick.php?lat={:6.4f}&lon={:6.4f}&lg=english&&FcstType=digital'.format(SiteCoord['lat'],SiteCoord['lon'])


Now parse site to objects readable by Beautiful Sope

In [22]:
ReportPage = requests.get(ReportSite)
ForecastPage = requests.get(ForecastSite)


In [24]:
Status = [ReportPage.status_code, ForecastPage.status_code]
Status

[200, 200]

In [26]:
soup = BeautifulSoup(ReportPage.content, 'html.parser')
print(soup.prettify())

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="300" http-equiv="refresh"/>
  <style type="text/css">
   <!--
A:link { color: #0000FF; text-decoration: none; font-family: Arial, Helvetica, San Serif}
A:Visited { color: #0000FF; text-decoration: none; font-family: Arial, Helvetica, San Serif}
A:hover { color : #FF0000;text-decoration: underline; font-family: Arial, Helvetica, San Serif}
table { font-size: 9pt; font-family: Arial, Helvetica, San Serif}
A { font-size: 9pt; font-family: Arial, Helvetica, San Serif}
.formbox { margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px}
-->
  </style>
  <style type="text/css">
   .tabberContainer{
						position:relative; left:20px; top:0px;
						<!--[if IE 7]>
						width:600px;
						<![endif]-->
            <!--[else]>
            width:700px;
            <![endif]-->
					}

					/* $Id: example.css,v 1.5 

In [28]:
len(list(soup.children))

9

In [29]:
[type(item) for item in list(soup.children)]

[bs4.element.Doctype,
 bs4.element.Tag,
 bs4.element.NavigableString,
 bs4.element.Tag,
 bs4.element.Tag,
 bs4.element.NavigableString,
 bs4.element.Tag,
 bs4.element.NavigableString,
 bs4.element.NavigableString]

In [39]:
html = list(soup.children)[1]

In [111]:
table_body = soup.find('table',class_="inner-timeseries")
rows = table_body.find_all('tr')
tabs=[]
HeaderLines = 3
ColumnNames = ['Time', 'Temp', 'Dewpoint', 'RelHum', 'WindDir', 'WindSp', 'Visibility', 'WX', 'Clouds', 'SLP', 'Altimeter',
               'StationP', '6h TMAX', '6h TMIN', '24h TMAX', '24h TMIN','QC']

for row in rows[HeaderLines:]:
    cols=row.find_all('td')
    cols=[x.text.strip() for x in cols]
    tabs.append(cols)

df = pd.DataFrame(tabs, columns=ColumnNames) 

CurrentYear = pd.to_datetime('today').year
YDiff = CurrentYear-1900
df['Time'] = pd.to_datetime(df['Time'], format='%d %b %H:%M %p')
df['Time'] = df['Time'].apply(lambda x: x + pd.DateOffset(years=YDiff))

2018

In [110]:
df.Time[1]

Timestamp('2018-06-07 04:10:00')

In [112]:
table_body = soup.find_all('table')

rows = table_body[1].find_all('tr')
cols=rows[2].find_all('td')
cols=rows[3].find_all('td')

In [108]:


cols

[<td>Most Recent Observation:</td>, <td>Thu, 07 Jun 4:15 pm</td>]