## Import Libraries 
Before you start any analysis, you ought to import all the libraries that would aid you to perform your analysis.

In [1]:
%matplotlib inline
import datacube
import odc.algo
import matplotlib.pyplot as plt
from datacube.utils.cog import write_cog
from deafrica_tools.plotting import display_map, rgb

In [2]:
dc = datacube.Datacube(app="06_Basic_Analysis",config = '/etc/datacube.conf')

In [3]:
# Set the central latitude and longitude
central_lat = 0.5393
central_lon = 36.2682

# Set the buffer to load around the central coordinates
buffer = 0.5

# Compute the bounding box for the study area
study_area_lat = (central_lat - buffer, central_lat + buffer)
study_area_lon = (central_lon - buffer, central_lon + buffer)

In [4]:
display_map(x=study_area_lon, y=study_area_lat)

  all_longitude, all_latitude = transform(Proj(crs), Proj("EPSG:4326"), all_x, all_y)


## Step 2: Loading data

When asking analysis questions about vegetation, it's useful to work with optical imagery, such as  Landsat.
The Landsat  satellites have 30 metre resolution and go back to 1982, from landsat 4 to landsat 9. 

The code below sets up the required information to load the data.

In [5]:
# Set the data source - s2a corresponds to Sentinel-2A
set_product = "landsat_sr_kenya"



In [6]:
# Set the date range to load data over
set_time = ("2018-01-01", "2022-02-01")



In [7]:
# Set the measurements/bands to load
# For this analysis, we'll load the red, green, blue and near-infrared bands
set_measurements = [
    "red",
    "blue",
    "green",
    "nir"
]


In [8]:
# Set the coordinate reference system and output resolution
set_crs = 'EPSG:4326'
set_resolution = (-0.0002, 0.0002)

In [None]:
dataset = dc.load(
    product=set_product,
    x=study_area_lon,
    y=study_area_lat,
    time=set_time,
    measurements=set_measurements,
    output_crs=set_crs,
    resolution=set_resolution
)

  if geom.type in ['Point', 'MultiPoint']:
  if geom.type in ['GeometryCollection', 'MultiPolygon', 'MultiLineString']:
  if geom.type in ['LineString', 'LinearRing']:
  if geom.type == 'Polygon':


In [5]:
print(dataset)

NameError: name 'dataset' is not defined

##  Plotting data

After loading the data, it is useful to view it to understand the resolution, which observations are impacted by cloud cover, and whether there are any obvious differences between time steps.

We use the `rgb()` function to plot the data loaded in the previous step.
The `rgb()` function maps three data variables/measurements from the loaded dataset to the red, green and blue channels that are used to make a three-colour image.
There are several parameters you can experiment with:

* `time_step=n`\
This sets the time step you want to view. 
`n` can be any number from `0` to one fewer than the number of time steps you loaded. 
The number of time steps loaded is given in the print-out of the data, under the `Dimensions` heading. 
As an example, if under `Dimensions:` you see `time: 10`, then there are 6 time steps, and `time_step` can be any number between `0` and `9`.

* `bands=[red_channel, green_channel, blue_channel]`\
This sets the measurements that you want to use to make the image.
Any measurements can be mapped to the three channels, and different combinations highlight different features.
Two common combinations are
    * true colour: `bands = ["red", "green", "blue"]`
    * false colour: `bands = ["nir", "red", "green"]`
 There are several false color composite, depending on the use and the spectral resolution of the sensor
    

In [None]:
# Set the time step to view
time_step = 0

In [None]:
# Set the band combination to plot(rgb)
bands = ["red", "green", "blue"]

# Generate the image by running the rgb function
rgb(dataset, bands=bands, index=time_step, size=8)

# Format the time stamp for use as the plot title
time_string = str(dataset.time.isel(time=time_step).values).split('.')[0]  

# Set the title and axis labels
ax = plt.gca()
ax.set_title(f"Timestep {time_string}", fontweight='bold', fontsize=16)
ax.set_xlabel('Easting (m)', fontweight='bold')
ax.set_ylabel('Northing (m)', fontweight='bold')

# Display the plot
plt.show()

## Step 4: Calculate vegetation health

While it's possible to identify vegetation in the RGB image, it can be helpful to have a quantitative index to describe the health of vegetation directly. 

In this case, the [Normalised Difference Vegetation Index](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) (NDVI) can help identify areas of healthy vegetation.
For remote sensing data such as satellite imagery, it is defined as

$$
\begin{aligned}
\text{NDVI} & = \frac{(\text{NIR} - \text{Red})}{(\text{NIR} + \text{Red})}, \\
\end{aligned}
$$

where $\text{NIR}$ is the near-infrared band of the data, and $\text{Red}$ is the red band.
NDVI can take on values from -1 to 1; high values indicate healthy vegetation and negative values indicate non-vegetation (such as water). 

The following code calculates the top and bottom of the fraction separately, then computes the NDVI value directly from these components.
The calculated NDVI values are stored as their own data array.

> Note: Before we calculate NDVI, we need to convert the data type to `float32`, this will convert the nodata values in the original `uint16` dataset to `NaN`, and therefore ignore those values in the NDVI calculation.

In [None]:
# convert dataset to float32 datatype so no-data values are set to NaN
dataset =  odc.algo.to_f32(dataset)

In [None]:
# Calculate the components that make up the NDVI calculation
band_diff = dataset.nir - dataset.red
band_sum = dataset.nir + dataset.red

# Calculate NDVI and store it as a measurement in the original dataset
ndvi = band_diff / band_sum

After calculating the NDVI values, it is possible to plot them by adding the `.plot()` method to `ndvi` (the variable that the values are stored in).
The code below will plot a single image, based on the time selected with the `ndvi_time_step` variable.
Try changing this value to plot the NDVI map at different time steps.
Do you notice any differences?

> **Extension 1**: Sometimes, it is valuable to change the colour scale to something that helps with intuitively understanding the image.
For example, the "viridis" colour map shows high values in greens/yellows (mapping to vegetation), and low values in blue (mapping to water).
Try modifying the `.plot(cmap="RdYlGn")` command below to use `cmap="viridis"` instead.

In [None]:
# Set the NDVI time step to view
ndvi_time_step = 0

# This is the simple way to plot
# Note that high values are likely to be vegetation.
plt.figure(figsize=(8, 8))
ndvi.isel(time=ndvi_time_step).plot(cmap="RdYlGn", vmin=0, vmax=1)
plt.show()

> **Extension 2**: For the cell above, a single time step was selected using the `.isel()` method.
It is possible to plot all time steps by removing the `.isel()` method, and modifying the `.plot()` method to be `.plot(col='time', col_wrap=3)` where `time` is the timesteps for the images.
Plotting all of the time steps at once may make it easier to notice differences in vegetation over time.

In [None]:
plt.figure(figsize=(8, 8))
ndvi.plot(col='time', cmap="RdYlGn", vmin=0, vmax=1, col_wrap=3)
plt.show()

##  Exporting data

Sometimes, you will want to analyse satellite imagery in a GIS program, such as QGIS.
The `write_cog()` command from the Open Data Cube library allows loaded data to be exported to GeoTIFF, a commonly used file format for geospatial data. This example export an image based on the time_step provided.

> **Note**: the saved file will appear in the same directory as this notebook, and it can be downloaded from here for later use.