<h4><p style = "font-family:georgia,garamond,serif;font-size:16px;font-style:italic;">Grabbing Frontal and Surface High/Low Data from WPC current archive on THREDDS Server Using BeautifulSoup and Urllib</p></h4>

In [1]:
# Imports

import pandas as pd
from bs4 import BeautifulSoup
import urllib.request
import requests
from datetime import datetime

#### Set a time and date for the desired file download

In [16]:
date = datetime(2021,3,17,0)
#date = datetime.now()
HH = f"{date:%H}"
MM = f"{date:%m}"
DD = f"{date:%d}"
YYYY = date.year

URL = "https://thredds-test.unidata.ucar.edu/thredds/catalog/noaaport/text/fronts/catalog.html"
response = urllib.request.urlopen(URL)
html = response.read()

soup = BeautifulSoup(html)


#### Get all available date-time text (.txt) files currently on the THREDDS Server

* ~1 month of past data from current date back

In [17]:
urls = []
names = []
for i, link in enumerate(soup.findAll('a')):
    FULLURL = URL + link.get('href')
    if FULLURL.endswith('.txt'):
        urls.append(soup.select('a')[i].attrs['href'])


#### Raw Fronts are <em>not</em> temporal spaced evenly;

#### Hi-Res Fronts <em>are</em> spaced evenly every 3 hours

In [26]:
urls_hires = [s for s in urls if "highres" in s]
urls_hires # Every three hours 
#urls_hires[::8] # Every day at 00Z

['https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_0000.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_0300.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_0600.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_0900.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_1200.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_1500.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_1800.txt',
 'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210712_2100.txt',
 'https://thredds-test.unidata.ucar.edu/

In [19]:
# Take a rough look at the date range available for the high-res fronts

base_len = len("'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/")
print(f"Furthest back date-time:\n     {urls_hires[0][base_len-1:]}\n")
print(f"Most recent date-time:\n     {urls_hires[-1][base_len-1:]}")

Furthest back date-time:
     Fronts_highres_KWBC_20210712_0000.txt

Most recent date-time:
     Fronts_highres_KWBC_20210811_1500.txt


#### Pythonic way of downloading file:

##### ```urllib.request.urlopen``` to download the text and write to textfile(s)
* See below for the often simpler ```wget``` for command line routine also

In [11]:
save_path = "/Users/chowdahead/wx-data/"
save_path = "/Users/chowdahead/Documents/GitHub/folium-watch-warn/data/"

In [12]:
high_res = True

In [21]:
date

datetime.datetime(2021, 8, 11, 19, 0, 25, 895735)

In [30]:
#urls_hires

one_url = urls_hires[-1]
print(one_url)

print(urls_hires[-1][-37:])
file_name = urls_hires[-1][-37:]

res = urllib.request.urlopen(one_url)
fronts_txt = open(save_path+file_name, 'wb')
fronts_txt.write(res.read())
fronts_txt.close()

https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210811_1500.txt
Fronts_highres_KWBC_20210811_1500.txt


In [54]:
one_url

'https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/Fronts_highres_KWBC_20210317_1800.txt'

In [59]:
fronts_data = pd.read_fwf(one_url,
                          header=None)

In [68]:
front_str = "LOWS"
fronts_index = [i for i in range(fronts_data.shape[0]) if front_str in fronts_data.iloc[i][0]]


KeyError: 12

In [64]:
def front_parse_latlon(latlon_code="0560149",print_on=True):
    '''Grab lat and lon from coded values from WPC frontal analysis data
    
    Arguments
    ---------
    code : string
        ***** required - 7 digit string *****
        * format in XXXYYYY, with:
        XXX : 3-digit latitude; with a period in front of last digit
            ex. 384 -> 38.4 -> 38.4 deg north
            ex. 045 -> 04.5 -> 4.5 deg north
            ex. 009 -> 00.9 -> 0.9 deg north
        YYYY : 4-digit longitude; with a period in front of last digit
            ex. 1147 -> 114.7 -> 114.7 deg west
            ex. 0979 -> 097.9 -> 97.9 deg west
            ex. 0035 -> 003.5 -> 3.5 deg west
            
    Returns
    -------
    lat : str
        converted latitude
    lon : str
        converted longitude
    '''
    print(latlon_code)
    if len(latlon_code) != 7:
        raise Exception(f"Wrong number of digits in coded lat/lon: {latlon_code}\n\n"+\
                        f"Coded lat/lon number of digits: {len(latlon_code)}\n"+\
                        
                       "\nPlease check data and ensure you are given coded lat/lon pairs with 7 digits.\n"+\
                    "\nsee docs\n")
    lat_raw = latlon_code[0:3]
    lon_raw = latlon_code[3:]
    if print_on == True:
        print("-----------------------------------------------------")
        print(f"raw latitude: {lat_raw}\nraw longitude: {lon_raw}\n")
    
    lat = f"{latlon_code[0:2]}.{latlon_code[2:3]}".strip("00")
    lon = f"{latlon_code[3:-1]}.{latlon_code[-1:]}".strip("00")
    if print_on == True:
        print(f"converted latitude (N): {float(lat)}\nconverted longitude (W): {float(lon)}\n")
    
    return lat,lon

In [65]:
def get_front_lat_lon(fronts_data,index):
    
    fronts_data = fronts_data.iloc[index][0].split()
    
    fronts = [front_parse_latlon(i,print_on=False) for i in fronts_data[1:]]
    
    lats_fronts = [float(i[0]) for i in fronts]
    # make the longitudes negative since they are in degrees west
    lons_fronts = [-float(i[1]) for i in fronts]
    return lats_fronts,lons_fronts

In [66]:
for i in fronts_index:    
    lats_low,lons_low = get_front_lat_lon(fronts_data,i)

1007


Exception: Wrong number of digits in coded lat/lon: 1007

Coded lat/lon number of digits: 4

Please check data and ensure you are given coded lat/lon pairs with 7 digits.

see docs


##### Note - 

##### A common problem is that the coded lat/lon values aren't always 7-digits. What happens then?

* Run an exception; however the data will not be able to be parsed correctly...

---

#### Quicker way to get files from ```wget``` cmd tool

In [140]:
import os
print(f'Downloading : {txt_url} \nFrom : {base_url}')
os.system(f'wget : {one_url}')

Downloading : Fronts_highres_KWBC_20201026_0000.txt 
From : https://thredds-test.unidata.ucar.edu/thredds/fileServer/noaaport/text/fronts/


256