# Web Scraping Data for Irrigation System Design Purposes

### The POWER Project from NASA (*) provides data sets from research done to support renewable energy, building energy efficiency and agricultural needs. 


Web Scraping means obtaining data from online websites. In some cases these databases have a GUI (Graphical User Interface) that allows users to interact with the program, but sometimes it´s easier to create a code that does this automatically. By doing this, an API (Application Programming Interface) is required. An API provides the instructions the program needs to perform it´s functions with a code, allowing the user to request information freely and with less steps than using the GUI. POWER NASA website allows user to either use a simple GUI or an API, having each advantages over the other. In the present document, the purpose is to automatize as much as possible the process to reduce the time invested in creating the year water balance.

In order to do so, the data required from the program to work will be:
 - Location: Latitude and Longitude of the study area

1. Import the required library:
    - urllib.request = this library contains classes and functions that allow python to open URL (Uniform Resource Locator). 

In [1]:
import urllib.request as urlib

2. Established dates:
    - start and end date for the data to acquired = use the format YYYYMMDD (minimum start date 2001/01/01):

In [2]:
start=20010101
end=20221231

3. Enter your location using latitude and longitude format:

In [3]:
latitude=42.030781
longitude=-93.631912

4. Select your community (research purpose):
    - ag: agroclimatology = RECOMMENDED for the purpose of irrigation
    - sb: sustainable buildings
    - re: renewable energy

In [4]:
community='ag'

5. Desired climate parameters:
    - TOA_SW_DWN : top-of-atmosphere shortwave downward irradiance (MJ/m^2/day)
    - T2M_MAX: temperature at 2 meters maximum (C)
    - T2M_MIN: temperature at 2 meters minimum (C)
    - PRECTOTCORR: precipitation (mm/day)
    - WS2M: wind speed at 2 meters (m/s)

In [5]:
desired_parameters=('TOA_SW_DWN','T2M_MAX', 'T2M_MIN','PRECTOTCORR','WS2M')

Some other details are going to be already established, such as temporary parameter (daily), output format (csv) and headers (disabled).

The program will create the 'request URL', it will extract the data and export it to a csv file.
Provide the name for the desired file:

In [6]:
name="data"

In [7]:
csv_name=name+".csv"

parameters=[]
for i in desired_parameters:
    parameters.append(i + "%2C")
result="".join(parameters)
result=result[:-3]

url='https://power.larc.nasa.gov/api/temporal/daily/point?start='+str(start)+'&end='+str(end)+'&latitude='+str(latitude)+'&longitude='+str(longitude)+'&community='+community+'&parameters='+result+'&format=csv&header=false'
urlib.urlretrieve(url, csv_name)

('data.csv', <http.client.HTTPMessage at 0x27fc6d130b8>)

Once the previous code has been ran, the following lines will group the data by month and obtain the monthly averages for the parameters selected previously and will export it to a new excel file.

In [8]:
#these libraries permit python to work with dataframes and arrays
import pandas as pd
import numpy as np

#this function will help filter the data and organize it month by month
def mes (x):
    if x >=1 and x<=31:
        return 'january'
    elif x >=32 and x<=59:
         return  "february"
    elif x >=60 and x<=90:
        return  'march'
    elif x >=91 and x<=120:
         return  'april'
    elif x>=121 and x<=151:
         return  'may'
    elif x>=152 and x<=181:
        return 'june'
    elif x>=182 and x<=212:
         return 'july'
    elif x>=213 and x<=243:
         return  'august'
    elif x>=244 and x<=273:
        return  'september'
    elif x>=274 and x<=304:
         return  'october'
    elif x>=305 and x<=334:
         return 'november'
    elif x>=335 and x<=365:
        return  'december'
index=['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']

data=pd.read_csv(csv_name)

data=data.query('PRECTOTCORR != -999.0 and TOA_SW_DWN != -999.0 and DOY != 366')
data['month']=data['DOY'].apply(mes)
del data['YEAR']
del data['DOY']
data=data.groupby('month').mean()
data=data.reindex(index)
data_ready=data.to_excel('water_balance_data.xlsx', sheet_name='Sheet1')

# _______________________________________________________________________________________________________ #

1. POWER | NASA Data Access viewer: https://power.larc.nasa.gov/data-access-viewer/
2. URL Handling module: https://docs.python.org/3/library/urllib.html