### Accessing different climate weather data file through AWS S3 Bucket.
  - grib
  - netcdf
  - csv

Reference:https://climatedataguide.ucar.edu/climate-tools/common-climate-data-formats-overview

### DataSets on AWS in this notebook:
  - Global Historical Climatology Network (GHCN) https://registry.opendata.aws/noaa-ghcn/
  - GOES-16 & GOES 17  & GOES 18 https://registry.opendata.aws/noaa-goes/
  - Climate Forecast System (CFS) https://registry.opendata.aws/noaa-cfs/
  - Global Forecast System (GFS) https://registry.opendata.aws/noaa-gfs-bdp-pds/
  - NREL National Solar Radiation Database (NSRDB) https://registry.opendata.aws/nrel-pds-nsrdb/
  
Whenever people mention dataset on AWS, retrieve the bucket name and put it after the following URL:
https://s3.console.aws.amazon.com/s3/buckets/<br> 
Take this notebook for example, if we are interested in the "noaa-gfs-bdp-pds" bucket,<br> the URL wiil be: https://s3.console.aws.amazon.com/s3/buckets/noaa-gfs-bdp-pds<br> Then you can see the objects details in the bucket, especially S3 URI. 

## .grib2  .grb2  .grib file

This is the only way I found to read grib file in S3 Bucket using engine `cfgrib` under `xarray`.......

Reference Documentation: https://stackoverflow.com/questions/66229140/xarray-read-remote-grib-file-on-s3-using-cfgrib

In [None]:
import xarray as xr
import fsspec
import json
import s3fs 

# add "simplecache::" to the S3 URI for grib file
GFS_Remote_Grib_S3_URI = "simplecache::s3://noaa-gfs-bdp-pds/sst.20220627/rtgssthr_grb_0.083.grib2"
GFS_My_Grib_S3_URI = "simplecache::s3://justindemo123/rtgssthr_grb_0.083.grib2"
CFS_Grib_S3_URI = "simplecache::s3://noaa-cfs-pds/cdas.20240308/cdas1.t00z.sfluxgrbl02.grib2"

In [None]:
Remote_file = fsspec.open_local(GFS_Remote_Grib_S3_URI, 
                         s3 = {'anon' : True}, 
                         filecache = {'cache_storage':'/tmp/files'})

GFS_Remote_Grib_ds = xr.open_dataset(Remote_file, engine="cfgrib")

GFS_Remote_Grib_ds

In [None]:
# need access key and secret access key to access data stored in your own S3 Bucket
My_file = fsspec.open_local(GFS_My_Grib_S3_URI, 
                            s3 = {"key": {your access key}, 
                                  "secret" : {your secret access key}}, 
                            filecache = {'cache_storage':'/tmp/files'})

GFS_My_Grib_ds = xr.open_dataset(My_file, engine="cfgrib")

GFS_My_Grib_ds

In [None]:
CFS_file = fsspec.open_local(CFS_Grib_S3_URI, 
                         s3 = {'anon' : True}, 
                         filecache = {'cache_storage':'/tmp/files'})

CFS_Grib_ds = xr.open_dataset(CFS_file, filter_by_keys={'stepType': 'avg', 'typeOfLevel': 'surface'}, engine="cfgrib")

CFS_Grib_ds

## .nc .h5 file 

In [None]:
GFS_NC_S3_URI = "s3://noaa-gfs-bdp-pds/enkfgdas.20210325/12/atmos/mem073/gdas.t12z.sfcf003.nc"
NSRDB_H5_S3_URI = "s3://nrel-pds-nsrdb/philippines/philippines_2017.h5"
GOES_NC_S3_URI = 's3://noaa-goes18/SEIS-L1b-EHIS/2024/003/02/OR_SEIS-L1b-EHIS_G18_s20240030204360_e20240030209350_c20240030210374.nc'

In [None]:
fs = s3fs.S3FileSystem(anon=True)

with fs.open(GFS_NC_S3_URI) as fileObj:
    GFS_nc_ds = xr.open_dataset(fileObj, engine='h5netcdf')

GFS_nc_ds

In [None]:
with fs.open(NSRDB_H5_S3_URI) as fileObj:
    NSRDB_h5_ds = xr.open_dataset(fileObj, backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf')

NSRDB_h5_ds

In [None]:
with fs.open(GOES_NC_S3_URI) as fileObj:
    goes_nc_ds = xr.open_dataset(fileObj, engine='h5netcdf')

goes_nc_ds

## .csv file

In [None]:
GHCN_csv_S3_URI = "s3://noaa-ghcn-pds/csv/by_year/2024.csv"

In [None]:
import pandas as pd

GHCN_csv_S3_data = pd.read_csv(GHCN_csv_S3_URI)
GHCN_csv_S3_data