# Download and convert to Zarr
This downloads SWOT Pixel Cloud products from hydroweb.next (API-Key necessary) based on a region and a period of interest.
Then is extracts information contained in the area of interest for your study, stores everything in a Zarr Database (based on the zcollection package) for future use.
Zarr (and the way we partitionned data with zcollection) is very efficient for computation. However, it is not (yet) compatible with QGIS compared to Geopackage.


## Setting the region and period of interest
Using a geopackage layer, preliminary created with, e.g. QGIS, to limit data download and database

In [1]:
import pixcdust
from pixcdust.downloaders.hydroweb_next import PixCDownloader
import geopandas as gpd
from datetime import datetime

In [2]:
# reading the area of interest polygon
gdf_geom = gpd.read_file("../data/aoi.gpkg")

dates = (
    datetime(2023,4,6),
    datetime(2023,4,8),
)

## Download
This will unfortunately lead to downloading many big files (that will be removed later). This is the only way right now, but the hydroweb.next team is working on improving that.

In [3]:
pixcdownloader = PixCDownloader(
    gdf_geom,
    dates,
    verbose=1,
    path_download='/tmp/pixc',
    )
pixcdownloader.search_download()

Downloaded products:   0%|                                                                                    …

0.00B [00:00, ?B/s]

0.00B [00:00, ?B/s]

## Extraction
Now we have all necessary files, let us extract key variables within area of interest in a Zarr (zcollection) database.
This Zarr partionned format is very efficient for time analysis, but is not currently accessible in GIS softwares such as QGIS
We are using the same geodataframe to limit the data to the area of interest

In [4]:
from pixcdust.converters.zarr import Nc2ZarrConverter
from glob import glob

In [5]:
pixc = Nc2ZarrConverter(
            path_in = glob(pixcdownloader.path_download+'/*/*nc'),
            variables=['height', 'sig0', 'classification'],
            area_of_interest=gdf_geom,
        )
pixc.database_from_nc(path_out='/tmp/pixc_zarr')

Perhaps you already have a cluster running?
Hosting the HTTP server on port 34147 instead
Exception ignored in: <function CachingFileManager.__del__ at 0x7a90c1840b80>
Traceback (most recent call last):
  File "/home/vschaffn/Documents/swot_pixc_study/pixc-env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "/home/vschaffn/Documents/swot_pixc_study/pixc-env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.

database has been succesfully created, we can remove the raw files

In [6]:
# import shutil
# shutil.rmtree('/tmp/pixc')

# Read the database
previous steps are not necessary

Now we can open this database in a xarray, or dataframe, or GeoDataFrame

In [7]:
from pixcdust.readers.zarr import ZarrReader
import datetime

pixc_read = ZarrReader(
    "/tmp/pixc_zarr"
)
pixc_read.read((datetime.datetime(2023,4,6), datetime.datetime(2023,4,8)))
pixc_read.data

Perhaps you already have a cluster running?
Hosting the HTTP server on port 33495 instead


Unnamed: 0,Array,Chunk
Bytes,2.90 MiB,1.46 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 2.90 MiB 1.46 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 3 graph layers Data type uint16 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,2.90 MiB,1.46 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.79 MiB 2.91 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 4 graph layers Data type float32 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.90 MiB,1.46 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 2.90 MiB 1.46 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 3 graph layers Data type uint16 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,2.90 MiB,1.46 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.79 MiB 2.91 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 3 graph layers Data type float32 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.79 MiB 2.91 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 4 graph layers Data type float32 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.59 MiB,5.83 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 11.59 MiB 5.83 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 3 graph layers Data type datetime64[ns] numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,11.59 MiB,5.83 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.90 MiB,1.46 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 2.90 MiB 1.46 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 3 graph layers Data type uint16 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,2.90 MiB,1.46 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.79 MiB 2.91 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 4 graph layers Data type float32 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.79 MiB 2.91 MiB Shape (1519115,) (763918,) Dask graph 2 chunks in 3 graph layers Data type float32 numpy.ndarray",1519115  1,

Unnamed: 0,Array,Chunk
Bytes,5.79 MiB,2.91 MiB
Shape,"(1519115,)","(763918,)"
Dask graph,2 chunks in 3 graph layers,2 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [8]:
gdf_pixc = pixc_read.to_geodataframe()
gdf_pixc

  gdf = self.data.xvec.to_geodataframe()


Unnamed: 0_level_0,tile_number,sig0,pass_number,longitude,height,time,cycle_number,classification,latitude
points,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,78,4.979378,16,1.504697,205.690842,2023-04-06 09:46:18,482,1.0,43.517670
1,78,1.716195,16,1.503559,201.013336,2023-04-06 09:46:18,482,1.0,43.517670
2,78,1.432429,16,1.503695,208.560715,2023-04-06 09:46:18,482,1.0,43.517693
3,78,7.839147,16,1.503859,205.756393,2023-04-06 09:46:18,482,3.0,43.517723
4,78,4.303982,16,1.504143,206.650543,2023-04-06 09:46:18,482,3.0,43.517773
...,...,...,...,...,...,...,...,...,...
1519110,78,0.414571,16,1.380191,168.646057,2023-04-07 09:36:56,483,1.0,43.686825
1519111,78,0.441882,16,1.380251,169.712540,2023-04-07 09:36:56,483,1.0,43.686836
1519112,78,0.232018,16,1.380750,171.547318,2023-04-07 09:36:56,483,1.0,43.686520
1519113,78,0.277492,16,1.380788,176.256485,2023-04-07 09:36:56,483,1.0,43.686527


Enjoy!