# Data Inspecting #

After downloading the data and setting up a Python environment with the required packages, the next step is inspecting the data contents in order to determine which weather variables are available to access.

The processing used to `inspect(this section)` and `preprocess(next section)` `.grb` is the same as in `.grb2`. But there may be some difference in `lookup key` between them. For not confusing you, we will only use the `.grb2` as an example.

Assume that we want 5 weather data: 
- `latitude`
- `longitude`
- `temperature`
- `wind_speed`
- `wind_direction` 

# Import Libraries

In [1]:
import Nio
import xarray as xr

# Open Dataset
Open the `GRIB2 data(.grb2)` with `xarray` using `PyNIO` as its engine

In [2]:
ds = xr.open_dataset("data/gfsanl_4_20171107_0000_000.grb2", engine="pynio")

# Overview of each variables
For each of the variables, print the `lookup key`, `human-readable name`, and `units of measurement`

In [3]:
for v in ds:
    print("{}, {}, {}".format(v, ds[v].attrs["long_name"], ds[v].attrs["units"]))

TMP_P0_L1_GLL0, Temperature, K
TMP_P0_L6_GLL0, Temperature, K
TMP_P0_L7_GLL0, Temperature, K
TMP_P0_L100_GLL0, Temperature, K
TMP_P0_L102_GLL0, Temperature, K
TMP_P0_L103_GLL0, Temperature, K
TMP_P0_L104_GLL0, Temperature, K
TMP_P0_2L108_GLL0, Temperature, K
TMP_P0_L109_GLL0, Temperature, K
POT_P0_L104_GLL0, Potential temperature, K
DPT_P0_L103_GLL0, Dew point temperature, K
APTMP_P0_L103_GLL0, Apparent temperature, K
SPFH_P0_L103_GLL0, Specific humidity, kg kg-1
SPFH_P0_2L108_GLL0, Specific humidity, kg kg-1
RH_P0_L4_GLL0, Relative humidity, %
RH_P0_L100_GLL0, Relative humidity, %
RH_P0_L103_GLL0, Relative humidity, %
RH_P0_2L104_GLL0, Relative humidity, %
RH_P0_L104_GLL0, Relative humidity, %
RH_P0_2L108_GLL0, Relative humidity, %
RH_P0_L200_GLL0, Relative humidity, %
RH_P0_L204_GLL0, Relative humidity, %
PWAT_P0_L200_GLL0, Precipitable water, kg m-2
SNOD_P0_L1_GLL0, Snow depth, m
WEASD_P0_L1_GLL0, Water equivalent of accumulated snow depth, kg m-2
CLWMR_P0_L100_GLL0, Cloud mixing ra

# Details of desirable variables
As you can see that there are many `lookup key` that has the same `human-readable name` e.g. `Temperature` , `U-component of wind` , `V-component of wind`

So, for more details, you can print the `full metadata` of each desirable variable

In this tutorial, we will use the `Temperature` as an example

In [4]:
temperature = ["TMP_P0_L1_GLL0", "TMP_P0_L6_GLL0", "TMP_P0_L7_GLL0", "TMP_P0_L100_GLL0", "TMP_P0_L102_GLL0", "TMP_P0_L103_GLL0", "TMP_P0_L104_GLL0", "TMP_P0_2L108_GLL0", "TMP_P0_L109_GLL0"]
for key in temperature:
    print(ds[key])
    print("-------------------------------------------------------------------------------------------")

<xarray.DataArray 'TMP_P0_L1_GLL0' (lat_0: 361, lon_0: 720)>
[259920 values with dtype=float32]
Coordinates:
  * lat_0    (lat_0) float32 90.0 89.5 89.0 88.5 ... -88.5 -89.0 -89.5 -90.0
  * lon_0    (lon_0) float32 0.0 0.5 1.0 1.5 2.0 ... 358.0 358.5 359.0 359.5
Attributes:
    center:                                         US National Weather Servi...
    production_status:                              Operational products
    long_name:                                      Temperature
    units:                                          K
    grid_type:                                      Latitude/longitude
    parameter_discipline_and_category:              Meteorological products, ...
    parameter_template_discipline_category_number:  [0 0 0 0]
    level_type:                                     Ground or water surface
    level:                                          [0.]
    forecast_time:                                  [0]
    forecast_time_units:                          

The `Coordinates` are the index of each data

You can print only the different attribute in each desirable variable, which is `level_type` to be easier to compare!

In [5]:
for key in temperature:
    print(key)
    print(ds[key].level_type)
    print("--------------------------------------")

TMP_P0_L1_GLL0
Ground or water surface
--------------------------------------
TMP_P0_L6_GLL0
Maximum wind level
--------------------------------------
TMP_P0_L7_GLL0
Tropopause
--------------------------------------
TMP_P0_L100_GLL0
Isobaric surface (Pa)
--------------------------------------
TMP_P0_L102_GLL0
Specific altitude above mean sea level (m)
--------------------------------------
TMP_P0_L103_GLL0
Specified height level above ground (m)
--------------------------------------
TMP_P0_L104_GLL0
Sigma level (sigma value)
--------------------------------------
TMP_P0_2L108_GLL0
Level at specified pressure difference from ground to level (Pa)
--------------------------------------
TMP_P0_L109_GLL0
Potential vorticity (K m2 kg-1 s-1)
--------------------------------------


At this point, it requires some technical knowledge about the meaning of each `Temperature` data

In my case, I choose the `Isobaric surface (Pa)` which is a surface in the atmosphere where the pressure is equal everywhere along that surface.

In [6]:
ds.get("TMP_P0_L100_GLL0")

But I need the temperature at the specific pressure which is `85000.0 Pa`

So we will get all pressure that in `TMP_P0_L100_GLL0` to determine whether we can get the data we want from this lookup key. 

To do that, we will convert this dataset into a `pandas.DataFrame.` to be easier to process

In [7]:
df = ds.get("TMP_P0_L100_GLL0").to_dataframe()
df.index.get_level_values("lv_ISBL0").drop_duplicates()

Float64Index([   100.0,    200.0,    300.0,    500.0,    700.0,   1000.0,
                2000.0,   3000.0,   5000.0,   7000.0,  10000.0,  15000.0,
               20000.0,  25000.0,  30000.0,  35000.0,  40000.0,  45000.0,
               50000.0,  55000.0,  60000.0,  65000.0,  70000.0,  75000.0,
               80000.0,  85000.0,  90000.0,  92500.0,  95000.0,  97500.0,
              100000.0],
             dtype='float64', name='lv_ISBL0')

At this point, you can understand all lookup key in the dataset and properly choose the one that suit your task.

In my case, I selected these lookup key.
- `TMP_P0_L1_GLL0`: Temperature(K) at ground or water surface
- `UGRD_P0_L100_GLL0`: U-component of wind(m/s) at isobaric surface (Pa)
- `VGRD_P0_L100_GLL0`: V-component of wind(m/s) at isobaric surface (Pa)

In [8]:
ds.get(["TMP_P0_L1_GLL0", "UGRD_P0_L100_GLL0", "VGRD_P0_L100_GLL0"]).to_dataframe()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,TMP_P0_L1_GLL0,UGRD_P0_L100_GLL0,VGRD_P0_L100_GLL0
lat_0,lon_0,lv_ISBL0,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
90.0,0.0,100.0,247.178085,-38.843361,4.000019
90.0,0.0,200.0,247.178085,-35.828548,1.100024
90.0,0.0,300.0,247.178085,-36.096478,0.199982
90.0,0.0,500.0,247.178085,-34.036407,-0.800006
90.0,0.0,700.0,247.178085,-31.010010,-0.700012
...,...,...,...,...,...
-90.0,359.5,90000.0,243.478073,1.780039,-6.711587
-90.0,359.5,92500.0,243.478073,1.779717,-6.711902
-90.0,359.5,95000.0,243.478073,1.776223,-6.710415
-90.0,359.5,97500.0,243.478073,1.772436,-6.710183


Fortunately, `latitude`, `longitude` and `temperature` are directly given in this dataset.

But to get the `wind_speed` and `wind_direction`, it requires some technical process to process `U-component` and `V-component`.

The detail of these process are described in the next section, **Data Processing**