The objective of this notebook is to analyze the data to gain insights and better understand how the different files could be used.

***Import necessary libraries***

In [1]:
from rasterio.plot import show
import numpy as np
import matplotlib.pyplot as plt
import rasterio
import rasterio.features

Lets create the files variables

In [6]:
class_ID = 'D:/General/ExaplAInability_Data/transfer_6060512_files_e989f8bb/class_segment_polygon/class_ID_2020.tif'
segment_ID = 'D:/General/ExaplAInability_Data/transfer_6060512_files_e989f8bb/class_segment_polygon/segment_ID_2020.tif'
polygon_ID = 'D:/General/ExaplAInability_Data/transfer_6060512_files_e989f8bb/class_segment_polygon/polygon_ID.tif'

# Open the files

In [7]:
# lets try with segment_ID
with rasterio.open(segment_ID) as dataset:
    image = dataset.read(1)
image[0]    

array([   1,    1,    1, ..., 1251, 1251, 1251], dtype=uint32)

In [8]:
# try with polygon_ID
with rasterio.open(polygon_ID) as dataset2:
    image2 = dataset2.read()  
image2

array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]])

# Analyze the files

Now lets get a deeper understanding of each file.

In [9]:
def open_raster(raster_path):
    # Open the raster
    class_raster = rasterio.open(raster_path)

    # Get statistics of the raster
    print(class_raster.descriptions)
    print(class_raster.bounds)
    print(class_raster.crs)
    print(class_raster.count)  # Number of bands
    print(class_raster.width, class_raster.height)  # Dimensions
    print(class_raster.transform)  # Affine transformation matrix
    
    return class_raster

In [18]:
class_raster = open_raster(polygon_ID)

('polygon_ID',)
BoundingBox(left=402570.0, bottom=1212060.0, right=454400.0, top=1257190.0)
EPSG:32630
1
5183 4513
| 10.00, 0.00, 402570.00|
| 0.00,-10.00, 1257190.00|
| 0.00, 0.00, 1.00|


Lets create a function that gives more insights on data.

In [None]:
def plot_Raster_info(class_raster):
    min_value = class_raster.read(1).min()
    max_value = class_raster.read(1).max()
    print("Minimum value:", min_value)
    print("Maximum value:", max_value)
    print("Mean value:", class_raster.read(1).mean())
    # retreive all nan values
    print("Number of nan values:", np.count_nonzero(np.isnan(class_raster.read(1))))
    # print percentage of nan values
    print("Percentage of nan values:", np.count_nonzero(np.isnan(class_raster.read(1)))/(class_raster.width*class_raster.height)*100)
    # print number of unique values
    print("Number of unique values:", np.unique(class_raster.read(1)))
    # print frequency of unique values
    print("Frequency of unique values:", np.unique(class_raster.read(1), return_counts=True))
    # count number of not nan values
    print("Number of not nan values:", np.count_nonzero(~np.isnan(class_raster.read(1))))
    plt.hist(class_raster.read(1).ravel(), bins=100, color='blue', alpha=0.7)
    plt.xlabel('Pixel Value')
    plt.ylabel('Frequency')
    plt.title('Histogram of Pixel Values')
    plt.show()

The above function gives (amoung others) those information :  

* **For the `Class` element**  
Percentage of nan values: **99.6**581488023601  
Number of unique values: [ 1.  2.  3.  4.  5.  6.  7.  8. nan]

* **For the `Segment` element**  
Percentage of nan values: **0.0  **
Number of unique values: [      1       2       3 ... 1447916 1447917 1447918]

* **For the `Polygon` element**  
Percentage of nan values: **99.6**581488023601  
Number of unique values: [  0.  --> 997. nan ]

We also have those distributions :  
* **For the `Class` element**  
<img src="assets\class_md.png" width=500>  

* **For the `Segment` element**  
<img src="assets\segment_md.png" width=500>

* **For the `Polygon` element**  
<img src="assets\polygon_md.png" width=500>

# Conclusion on data

So as we understood it, each of those file could be interpreted as **filters/layers** to be applied on the **raster images** that we will deal with later on.  
THey are giving for each pixel its `class`, its `segment` and its `polygon`.  

The segments are **fully connected** on the map, each pixel is part of a segment.  
That's not the case with polygons, which are **bigger elements** than segments, belonging to a class.  
Those polygons appear to be distributed on the map but **not fully connected**.  

Now the `classification` is made on **polygons**, as we want to `segmentize` by **segment**, we need to extract the segments included in polygons (all polygons are classified).  
As segments might not be totally included in polygons, we will have to chose a strategy to deal with those cases, in the next notebook.