## Chapter 6: Quantiles

0. Introduction

    - Example program that uses Python's Numpy to generate quantiles to more effectively visualize numerical summaries.

    Prerequisites: Statistical and programming skills as outlined in Chapters 0 and 1. Raster data management from Chapters 2-5.

1. Installing Python GDAL

  **Week 1 Homework: Installing Python GDAL**

    Grading Scale:
    0. Did not attempt.
    1. Installed npm.

  **Notes**

  - To review, rasters visualize Earth information by summarizing feaures using matrices.

  - Methods for capturing data include:
      
    - Aerial Photography (nadir images, 3D point clouds and digital elevation model (DEM), infrared, etc.).

      - United States Geological Survey's (USGS) Earth Explorer provides access to several satelite sources.

      - Google, Apple, Bing Maps.

    - Potential Evapotranspiration Networks (rasters derived from temperature, humidity, and wind ground station measurements).

      - University Corporation of Atmospheric Research's NCAR.
      
      - Oregon State's PRISM.

    - Light Detection And Ranging (LiDAR) derived DEM.

    - Combination of all of the above.

  - Vehicles for capturing data include:

    - Robotic vehicles
    
    - Balloons

    - Planes or helicopters

    - Satelites

  - Each pixel represents a square of the length and width of a real world place by a number.

    - An example is RGB whose color is sometimes represented on a range from 0-255.

  - That number can be read with GDAL as a numpy variable and the matrix is transformed using various statistical techniques.

  - This chapter explains the various ways to seperate continuous information into summarizable categories without examining each pixel individually.

  **Homework Notes**

  - Use a VirtualBox or other Linux installation for this chapter.

  - The easiest way to install GDAL with Python bindings is to use the Python package manger "Conda".

    ```
      conda install -c conda-forge gdal
    ```

  - The "Conda" package manager also installs over a gigabyte worth of other packages and software that may or may not be neccessary.

    - While there's also a smaller version called "miniconda", it also installs additional libraries not required for this chapter.

  - "Conda" is the easy way because there are numerous clashes with how the standard package manager "pip" for Python >= 3.9 handles installing two versions of Numpy for raster support in GDAL.

  - To install with "pip", GDAL's /swig/python/README.md GitHub says to install libgdal headers, numpy, setuptools, and wheel.

    ```
      pip install numpy>1.0.0 wheel setuptools>=67
      pip install gdal[numpy]=="$(gdal-config --version).*"
    ```

  - This chapter requires raster support with GDAL for statistical calculations. Verify that Numpy has been installed.

    ```
      python3 -c 'from osgeo import gdal_array'
    ```

  - The other libraries include subprocess, os, and glob. An example of the imports are shown here.

    ```
      import numpy as np
      from osgeo import gdal, gdal_array
      import subprocess
      import os
      from glob import glob
    ```

  **Homework Week 1 Reminder**

2. Directory Management

  **Homework Week 1 Due**

  **Week 2 Homework: Directory Management**

    Grading Scale
    0. Did not attempt.
    1. Created folder list with Glob.

  **Homework Notes**

  - The goal of this week is to create a list of the GeoTiff file paths as strings.
  
  - The example uses the package Glob but there are several ways to accomplish this task.

  - The first step is to make a new python file.

    - For improved file management, place the python file in a project subfolder called "/scripts/".

  - Use the command "cd" to return to the root of the project's directory and get the current working directory by adding the following code to your file.

    ```
      cwd = os.getcwd()
    ```

  - Append the subfolder's file path to the current working directory.
  
    ```
      input_raster_folder = cwd + '/data/tiffs'
    ```

  - The wild card character * is used to define strings for all the GeoTiff paths in the directory.
  
    ```
      input_raster_folder_list = glob(os.path.join(input_raster_folder, '*.tif'))
    ```

  - The final command sorts the strings by number first, then by the alphabet using a built in Python method.

    ```
      input_raster_folder_list.sort()
    ```

  **Homework Week 2 Reminder**

3. GeoTiffs with GDAL Python: Part 1 of 3

  **Homework Week 2 Due**

  **Week 3 Homework: GeoTiffs with GDAL Python**

    Grading Scale
    0. Did not attempt.
    1. Reads the GeoTiffs with GDAL.

  **Homework Notes**

  - The point of this chapter is to use a for loop to automate reading GeoTiffs into Numpy matrices for statistical operations.

  - The first step is to write the iterative loop by accessing the list of GeoTiff strings created in the previous chapter and reading with the gdal method "open".

    ```
      for input_raster in input_raster_folder_list:
        dataset = gdal.Open(input_raster)
    ```

  - The next step creates a variable named "band" to define each of the file's bands with the "GetRasterBand" function. The example bands are all at the integer "1" but might varry depending on your rasters.

    ```
      for input_raster in input_raster_folder_list:
        dataset = gdal.Open(input_raster)

        band = dataset.GetRasterBand(1)
    ```

  - The remaining steps has abstracted the previous code within the "for" loop.
  
  - The "ReadAsArray" method creates a matrix or list of numbers from the GDAL band.

    ```
      for ... in ...

        ...

        array = band.ReadAsArray()
    ```

  - No data values are sometimes represented by 0, "NAN", or 9999 depending on the GeoTiff.
  
  - By using a mask with the numpy array, quantiles can be generated without worrying about calculating strings and integer datatypes.
  
  - The "GetNoDataValue" function defines an array with the file's no data.

    ```
      for ... in ...

        ...

        nodata_val = band.GetNoDataValue()
    ```

  - An "if" statement is used to define a masked python list with Numpy's "masked_equal" method for the "array" variable containing the GeoTiff's band and the aforementioned not a number array "nodata_val".

    ```
      for ... in ...

        ...

        if nodata_val is not None:
          array_mask = np.ma.masked_equal(array, nodata_val)
    ```
      
  - The Python list index function is used with the greater than or equal operator and the "min" function to generate an array that includes the integers but not the strings.

    ```
      for ... in ...

        ...
          ...

        array_ignored_nan = array_mask[array_mask >= array_mask.min()]

    ```
      
  - The final step is to create a numpy array of 0 integers with the same dimensions as the input array with the "zeros_like" function.

    ```
      for ... in ...

        ...
          ...

        output = np.zeros_like(array_mask).astype(np.uint8)

    ```

  **Homework Week 3 Reminder**

4. GeoTiffs with GDAL Python: Part 2 of 3

  **Homework Week 3 Reminder**

5. GeoTiffs with GDAL Python: Part 3 of 3

  **Homework Week 3 Reminder**

6. Review for Midterm

  **Homework Week 3 Due**

7. Midterm

8. Quantile Text: Part 1 of 4

  **Week 8 Homework: Quantile Text**

    Grading Scale
    0. Did not attempt.
    1. Text output for quantiles.

  **Notes**

  - Quantiles attempt to set break points along a dataset in which to classify information.

    - Another way to say this is that quantiles takes the range of the probability distribution and bins them into containers.

  - Quartiles or percentiles are alternative names because they are commonly used.

    - Quartiles have four groups.

    - Percentiles have 100 groups. Standardized testing sometimes uses this terminology.

  - The Numpy library has built in functions to calculate. See the example from the documentation below where 50 is the dividing point.

    ```
      import numpy as np

      a = np.array([[10, 7, 4], [3, 2, 1]])

      print(a)
      # terminal output: array([[10,  7,  4],
                                [ 3,  2,  1]])

      np.percentile(a, 50)
      # terminal output: 3.5
    ```

  **Homework Notes**

  - The idea for this week's homework assignment is to generate a text file containing the raster breaks using Numpy's "percentile" method.

    - The text file can be used as described in Chapter 4's BASH with the command tool "gdaldem color-relief" to assign RGB with more effective visual breaks for the arrays.

    - It is important to format the text file consistently because it will be used to generate the legend in Chapter 7's web application.

    - Week 12's homework will generate the raster's that have these breaks for use with GDAL Python XYZ tile builder from Chapter 4.

  - The first step is to set five variables as the percentiles 80, 60, 40, 20, and 0.
  
  - These are also rounded to five decimal places.

    ```
        # Use the numpy percentile function to calculate percentile thresholds, gotta round for scientific notation

        percentile_80 = round(np.percentile(array_ignored_nan, 80), 5)
        percentile_60 = round(np.percentile(array_ignored_nan, 60), 5)
        percentile_40 = round(np.percentile(array_ignored_nan, 40), 5)
        percentile_20 = round(np.percentile(array_ignored_nan, 20), 5)
        percentile_0 = round(np.percentile(array_ignored_nan, 0), 5)
        print(percentile_0, percentile_20, percentile_40, percentile_60, percentile_80)
    ```

  - The ouput name is assigned a variable by using splitext to append the .tif file ending and raster name to the directory path.

    ```
      txt_outname = os.path.splitext(input_raster)[0] + "_reclassed.txt"
      print(txt_outname)
    ```

  - The output name variable and calculated percentiles are used to write the file.

    ```
      with open(txt_outname, "w") as text_file:
        text_file.write(str(percentile_0) + " " + str(percentile_20) + " " + str(percentile_40) + " " + str(percentile_60) + " " + str(percentile_80))
    ```

  **Homework Week 8 Reminder**

9. Quantile Text: Part 2 of 4

  **Homework Week 8 Reminder**

10. Quantile Text: Part 3 of 4

  **Homework Week 8 Reminder**

11. Quantile Text: Part 4 of 4

  **Homework Week 8 Reminder**

12. Quantile GeoTiffs: Part 1 of 4

  **Homework Week 8 Due**

  **Week 12 Homework: Quantile GeoTiffs**

    Grading Scale
    0. Did not attempt.
    1. GeoTiff output for quantiles.

  **Homework Notes**
  
  - The raster is rebuilt using Numpy's "where" function, replacing the values from the array that fall within the percentiles with the integer 1 through 5.
  
    - Numbers between 0 and 20th percentile are replaced with 2 at the appropriate location.

    - Numbers between 20th and 40th are replaced with 3 and so on until 5, updating the "output" variable.

    ```
      output = np.where((array_mask > percentile_0), 1, output)
      output = np.where((array_mask > percentile_20), 2, output)
      output = np.where((array_mask > percentile_40), 3, output)
      output = np.where((array_mask > percentile_60), 4, output)
      output = np.where((array_mask > percentile_80), 5, output)
    ```

  - The "outname" variable is used to build a string for the output file.

    ```
      outname = os.path.splitext(input_raster)[0] + "_reclassed.tif"
    ```

  - GDAL's "SaveArray" function is used to write the reclassified GeoTiffs.

    ```
      gdal_array.SaveArray(output, outname, "gtiff", prototype=dataset)
      print(outname)
    ```

  - The rasters will be used in the next Chapter to build XYZ tiles for use in a timeslider web application.

  **Homework Week 12 Reminder**

13. Quantile GeoTiffs: Part 2 of 4

  **Homework Week 12 Reminder**

14. Quantile GeoTiffs: Part 3 of 4

  **Homework Week 12 Reminder**

15. Quantile GeoTiffs: Part 4 of 4

  **Homework Week 12 Reminder**

16. Review for Final

17. Final





