<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:140 px; height:140px;">
<img src="https://cloudrun.co/img/logo_noname.png" alt="Cloudrun Logo" style="height: 140px;">
</div>

<h1>Step 1: Downloaded NCEI GFS+NAM Data</h1>
<h2>By Kayla Besong</h2>

A generalized set of functions to generate directories and download Global Forecast System (GFS) high resolution (0.5 x 0.5) and North American Mesoscale Model (NAM) data from NCEI servers. Additional models can be added in the 'ncei_grab' function in alignment with other model conditional loops, following the filename and url formats.


<div style="clear:both"></div>
</div>

<hr style="height:2px;">

In [1]:
import requests
import numpy as np
import os 

In [2]:
def zero_pad(array):
    
    """
    
    Reads in array of numerical string values and pads them with zeros. Any values less than 10 receives a '0' in front of it. 
    This function is especially useful for dates.
    
    ex: input array = ['1', '11', '7', '9']
    out array = ['01', '11', '07', '09']

    """
    
    out_array = []                                 ## define empty array to refill 
    for i in array:                                ## iterate over each item 
        if i < 10:                
            out_array.append('0'+str(i))           ## append new value with padded zero to array if it is less than 10

        else:
            out_array.append(str(i)) 
    
    return out_array                               ## return new array 



In [3]:
def ncei_grab(startdate_MMDDYYYY, enddate_MMDDYYYY, init_hour, forecast_lead, forecast_step, model):
    
    '''
    
    This function reads in historical forecast data for the NAM and high resolution GFS 
    from the National Center for Environmental Information -- NCEI. More models can be specified at a longer date if desired. 
    It generates directories and subdirectories based on your inputs. Formats follow strings  
    
    
    Inputs include:
    
    startdate_MMDDYYYY -- a string for the first date you would like to obtain data. Ex: '10261994'
    enddate_MMDDYYYY -- a string for the last date you would like to obtain data. Ex: '10311994'
    init_hour -- forecast initialization hour ex: [00, 06, 12, 18] or [06, 18]
    forecast_lead -- the time you would like the forecasts to go out to range = 00 to 384 for GFS high res, and 00 to 84 for the NAM. 
    forecast_step -- increment of forecasts be it 1 hour, 3 hour or more.   
    model -- string of either 'NAM' or 'GFS half degree'
    
    This function takes some time. 
    
    '''
        
    
    study_day = zero_pad(np.arange(int(startdate_MMDDYYYY[2:4]),int(enddate_MMDDYYYY[2:4])+1,1))             ## generate year month day arrays to iterate over
    study_mon = zero_pad(np.arange(int(startdate_MMDDYYYY[0:2]),int(enddate_MMDDYYYY[0:2])+1,1))             ## arrays require zero padding for string subsetting urls/dates on model outputs
    study_year = np.arange(int(startdate_MMDDYYYY[-4:]),int(enddate_MMDDYYYY[-4:])+1,1)
    f_time = zero_pad(np.arange(0,forecast_lead+forecast_step, forecast_step))
    

    for year in study_year:
        for mon in study_mon:
            for day in study_day:
                for run_hour in init_hour:
                    
                    year_mon_day = str(year)+str(mon)+str(day)                ## this value is used constantly throughout the subsetting process 

                    print(f'Currently Downloading {year_mon_day} {run_hour}z from {model}')                     ## let it be known where the process is at

                    for time in f_time:                                                                   ## loop through number of forecast hours desired for dates

                        if model == 'GFS':                 

                            model_path = 'global-forecast-system/access/historical/forecast/grid-004-0.5-degree'       ## specified pathway for NCEI url for GFS high res

                            filename = 'gfs_4_%s_%s00_0%s.grb2' % (year_mon_day, run_hour, time)                                 ## specified exact filename for NCEI url for GFS high res

                        elif model == 'NAM':

                            model_path = 'north-american-mesoscale-model/access/historical/forecast'                   ## specifed for NAM

                            filename = 'nam_218_%s_%s00_0%s.grb2' % (year_mon_day, run_hour, time)

                        else:                                               ## just in case

                            print(model, 'Invalid model entry. Options include either NAM or GFS half degree.')


                        url = 'https://www.ncei.noaa.gov/data/%s/%s/%s/%s' % (model_path, str(year)+str(mon), year_mon_day, filename)  ## url to grab file from NCEI using string subsetting with values established above

                        r = requests.get(url)          # read the url page


                        if r.status_code == 200:       # if the page == 200 the file contains available information and the f.write will generate files with substance

                            main_dir = str(model)+'_data'                             ## generate parent directory for model and subsequent subdirectories for dates required
                            sub_dir = year_mon_day+run_hour                                     
                            try: os.makedirs(os.path.join(main_dir,sub_dir))
                            except OSError: print('Directory already exists for ' + main_dir + 'and ' + sub_dir)  ## let it be known if this directory for model + year_mon_day already exists

                            with open('%s/%s/%s' % (main_dir,sub_dir,filename),'wb') as f:
                                f.write(r.content)

                        else:

                            print(year_mon_day + time + ' is not available for download from the ' + model)  ## if r.status_code is something like 404, this allows for f.write to be passed, not generating blank files


In [4]:
%%time

ncei_grab('11112019', '11262019', ['00','12'], 72, 3, 'GFS')

Currently Downloading 20191111 00z from GFS
Currently Downloading 20191111 12z from GFS
Currently Downloading 20191112 00z from GFS
Currently Downloading 20191112 12z from GFS
Currently Downloading 20191113 00z from GFS
Currently Downloading 20191113 12z from GFS
Currently Downloading 20191114 00z from GFS
Currently Downloading 20191114 12z from GFS
Currently Downloading 20191115 00z from GFS
Currently Downloading 20191115 12z from GFS
Currently Downloading 20191116 00z from GFS
Currently Downloading 20191116 12z from GFS
2019111600is not available for download from the GFS
2019111603is not available for download from the GFS
2019111606is not available for download from the GFS
2019111609is not available for download from the GFS
2019111612is not available for download from the GFS
2019111615is not available for download from the GFS
2019111618is not available for download from the GFS
2019111621is not available for download from the GFS
2019111624is not available for download from the

In [5]:
%%time

ncei_grab('11112019', '11262019', ['00','12'], 72, 3, 'NAM')

Currently Downloading 20191111 00z from NAM
Currently Downloading 20191111 12z from NAM
Currently Downloading 20191112 00z from NAM
Currently Downloading 20191112 12z from NAM
Currently Downloading 20191113 00z from NAM
Currently Downloading 20191113 12z from NAM
Currently Downloading 20191114 00z from NAM
Currently Downloading 20191114 12z from NAM
Currently Downloading 20191115 00z from NAM
Currently Downloading 20191115 12z from NAM
Currently Downloading 20191116 00z from NAM
Currently Downloading 20191116 12z from NAM
2019111600is not available for download from the NAM
2019111603is not available for download from the NAM
2019111606is not available for download from the NAM
2019111609is not available for download from the NAM
2019111612is not available for download from the NAM
2019111615is not available for download from the NAM
2019111618is not available for download from the NAM
2019111621is not available for download from the NAM
2019111624is not available for download from the