### Utilizing NetCDF4 python library  

---

Sentinel dataset files on Copernicus Hub are in the form of .nc files (Net Common Data Form). To access these files programatically, the netCDF4 library will be used.

The library is already installed if Anaconda is used to install python. Otherwise, pip can be used to install with the command:


|**pip install netCDF4**|
------------------------

---



Resources used for this research:|
- [Install python with pip](https://pypi.org/project/netCDF4/)
- [How to read netCDF4 files in Python](https://www.earthinversion.com/utilities/reading-NetCDF4-data-in-python/)
- [Visualize Sentinel 5P data with Python](https://github.com/acgeospatial/Sentinel-5P/blob/master/Sentinel_5P.ipynb)

---

In [2]:
import netCDF4 as nc # Import the netCDF4 library (I gave it the alias nc to simplify the naming during usage)

# Set the url to the nc file that is downloaded 
nc_file = "Download Results\S5P_OFFL_L2__CH4____20230514T034638_20230514T052808_28925_03_020500_20230515T195331.nc"

# Open the file as a Dataset object
dataset = nc.Dataset(nc_file, 'r')

print(dataset)


<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.7
    institution: KNMI/SRON
    source: Sentinel 5 precursor, TROPOMI, space-borne remote sensing, L2
    history: 2023-05-16 04:30:10 f_s5pops tropnll2dp /mnt/data1/storage_offl_l2/cache_offl_l2/WORKING-612033466/JobOrder.612033437.xml
    summary: TROPOMI/S5P Methane 1-Orbit L2 Swath 5.5x7.0km
    tracking_id: d54caa00-25f3-431c-b8b9-13f4f34ec317
    id: S5P_OFFL_L2__CH4____20230514T034638_20230514T052808_28925_03_020500_20230515T195331
    time_reference: 2023-05-14T00:00:00Z
    time_reference_days_since_1950: 26796
    time_reference_julian_day: 2460078.5
    time_reference_seconds_since_1970: 1684022400
    time_coverage_start: 2023-05-14T04:08:12Z
    time_coverage_end: 2023-05-14T05:06:36Z
    time_coverage_duration: PT3504.398S
    time_coverage_resolution: PT0.840S
    orbit: 28925
    references: https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-5

---
There is a lot of data but not the data that we require. 

To access the required data, we will need to access the groups attribute within the data and retrieve it.

---

In [25]:
# Check what groups are in the data 
print(dataset.groups)
print('\n')
print(dataset.groups['PRODUCT'])
print('\n')
print(dataset.groups['PRODUCT'].dimensions)
print('\n')
print('\n')
print('\n')
print('\n')
print('\n')


{'PRODUCT': <class 'netCDF4._netCDF4.Group'>
group /PRODUCT:
    dimensions(sizes): scanline(4173), ground_pixel(215), corner(4), time(1), layer(12), level(13)
    variables(dimensions): int32 scanline(scanline), int32 ground_pixel(ground_pixel), int32 time(time), int32 corner(corner), int32 layer(layer), int32 level(level), int32 delta_time(time, scanline), <class 'str'> time_utc(time, scanline), uint8 qa_value(time, scanline, ground_pixel), float32 latitude(time, scanline, ground_pixel), float32 longitude(time, scanline, ground_pixel), float32 methane_mixing_ratio(time, scanline, ground_pixel), float32 methane_mixing_ratio_precision(time, scanline, ground_pixel), float32 methane_mixing_ratio_bias_corrected(time, scanline, ground_pixel)
    groups: SUPPORT_DATA, 'METADATA': <class 'netCDF4._netCDF4.Group'>
group /METADATA:
    dimensions(sizes): 
    variables(dimensions): 
    groups: QA_STATISTICS, ALGORITHM_SETTINGS, GRANULE_DESCRIPTION, ISO_METADATA, EOP_METADATA, ESA_METADATA}




---
In the groups attribute, there are 2 groups, PRODUCT and METADATA.

We are interested in the PRODUCT group so we need to inspect the variables in these groups.

The variables store the data on certain aspects of the data and can be accessed by using their respective keys as this group is in the form of a dictionary.

---

In [19]:
# Display the variables within the product group
print(dataset.groups['PRODUCT'].variables)

# Retrieve the keys for each variable within the product group
dataset.groups['PRODUCT'].variables.keys()

{'scanline': <class 'netCDF4._netCDF4.Variable'>
int32 scanline(scanline)
    units: 1
    axis: Y
    long_name: along-track dimension index
    comment: This coordinate variable defines the indices along track; index starts at 0
    _FillValue: -2147483647
path = /PRODUCT
unlimited dimensions: 
current shape = (4173,)
filling on, 'ground_pixel': <class 'netCDF4._netCDF4.Variable'>
int32 ground_pixel(ground_pixel)
    units: 1
    axis: X
    long_name: across-track dimension index
    comment: This coordinate variable defines the indices across track, from west to east; index starts at 0
    _FillValue: -2147483647
path = /PRODUCT
unlimited dimensions: 
current shape = (215,)
filling on, 'time': <class 'netCDF4._netCDF4.Variable'>
int32 time(time)
    units: seconds since 2010-01-01 00:00:00
    standard_name: time
    axis: T
    long_name: reference time for the measurements
    comment: The time in this variable corresponds to the time in the time_reference global attribute
    _F

dict_keys(['scanline', 'ground_pixel', 'time', 'corner', 'layer', 'level', 'delta_time', 'time_utc', 'qa_value', 'latitude', 'longitude', 'methane_mixing_ratio', 'methane_mixing_ratio_precision', 'methane_mixing_ratio_bias_corrected'])

---
As shown above, there multiple variables but the ones we are interested in are:
- latitude
- longitude
- methane_mixing_ratio_bias_corrected
---

In [20]:
import geopandas as gpd

lon = dataset.groups['PRODUCT'].variables['latitude'].shape
lat = dataset.groups['PRODUCT'].variables['latitude'][:3][:]

print(lat)
print(lon)

[[[-81.32951  -81.41167  -81.47737  ... -68.160934 -67.89871  -67.6216  ]
  [-81.28681  -81.36812  -81.433075 ... -68.14187  -67.87999  -67.60322 ]
  [-81.24404  -81.32451  -81.388725 ... -68.122734 -67.86119  -67.58477 ]
  ...
  [ 48.83609   48.95623   49.068466 ...  55.574165  55.631607  55.68979 ]
  [ 48.793053  48.913006  49.025063 ...  55.525955  55.58347   55.64173 ]
  [ 48.75002   48.86978   48.981663 ...  55.47783   55.53541   55.593754]]]
(1, 4173, 215)
