# Raster Histogram Plot

## <font color=green> This is my first program in Python!
## My main goal here is to explore libraries that allow me to plot the histogram of satellite imagery.

This is my first step to achieve my goal: be able to implement some tools to handle with satellite imagery and spatial data.

### The Algorithm

This program is divided in 3 code blocks:
- The first one imports the raster data and reads it. Some information about the data will be printed, like the geotransform, data type, driver, number of bands, width, height, and more;
- The second one converts the data into a 2-D array and calculates the histogram. In this process, some statistical information will be collected as well such as mean, variance, standard deviation, min and max values and some other interesting infos;
- The last one plots the histogram.

The code below is separated into steps to enhance the comprehension of the algorithm.

### <font color=blue> Step 01: <font color=orange> Import the libraries



In this program, each library has a purpose. I will use [rasterio](https://rasterio.readthedocs.io/en/latest/index.html#) to handle with geospatial raster data (satellite imagery). [NumPy](https://numpy.org/doc/1.18/user/index.html) will be used to manipulate the data as an array and [Seaborn](https://seaborn.pydata.org/api.html) will be used to plot the histogram.

In [10]:
import matplotlib.pyplot as plt
%matplotlib notebook
import numpy as np
import pandas as pd
import gdal

### <font color=blue> Step 02: <font color=orange> Import the raster data

In [28]:
# I will use the variable B1 to represent my dataset.
# In this case, the dataset is the Band 1 of a Landsat 8 image of Brasilia, Brazil.
# This is my first attempt, so I will use a single band for instance.

B5 = "D:\BSB\LC08_L1TP_221071_20200411_20200411_01_RT_B5.tif"

In [29]:
img = gdal.Open(B5) # The variable img is used to represent the open image.
#print(img.profile) # The method .profile returns the metadata of the raster dataset.

# There are some interesting infos like the Coordinate Reference System (CRS) and the size (columns and rows) of the scene.
# All this information gives us some insight about the size of our domain of analysis and some of its characteristics.

### <font color=blue> Step 03: <font color=orange> Convert the raster into a 2-D array

In [30]:
# The conversion process uses the method .ReadAsArray() from GDAL library
# So, in that way, the whole scene is read as an array of numbers.

img_array = img.ReadAsArray()

Let's see the data-type to make sure if the dataset has been converted in a proper way

In [31]:
type(img) # Here, the dataset is a raster

osgeo.gdal.Dataset

In [32]:
type(img_array) # Here, the dataset is a NumPy array

numpy.ndarray

Now, we have a array of numbers that represents different values in the gray scale and we can use this dataset to visualize the count of pixels for each value in the scale.

Let's plot the histogram and see the magic!

In [57]:
(n,bins) = np.histogram(img_array, bins=256)
plt.plot(n)
plt.title("Image Histogram")
plt.xlabel("Pixel Value")
plt.ylabel("Count")
plt.show()

<IPython.core.display.Javascript object>

In [36]:
pd.DataFrame(img_array)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,7561,7562,7563,7564,7565,7566,7567,7568,7569,7570
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7676,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7677,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7678,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7679,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
