<a href="https://colab.research.google.com/github/Glifoyle/test-repo/blob/master/treeHealthClassificationHomework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# Example: Tree Health Classification
---

In this notebook we are going to revisit some of the techniques that we have seen in other notebooks to train a simple image classifier. The images that we are going to use belong to our Zao site, that of the "[Snow Monsters"](https://www.japan.travel/en/spot/661/). If you ever happen to be near Yamagata, I highly recommend that you visit.

To build our tree health classifier (s) we are going to use code from different sources:

1) We are going to use the same data that we used for our [exploratory analysis example](https://drive.google.com/file/d/12EFW8FYcLrIa3yp0SMKrse46Es1NmO8e/view?usp=sharing)  
2) In order to pass our data to Pytorch, we are going to define a Dataset just like we did for the [classification of subalpine bushes](](https://colab.research.google.com/drive/1zkdVQMv7YCBExHMJ1qCdrCba7Z1jsEi8?usp=sharing))  
4) Once we have done all that, we are also going to train Deep Learning networks, starting with what we did [here](https://colab.research.google.com/drive/17ARJbWw2h1X5rQNEkh7n7oXXA9Bvb2P9?usp=sharing)

# 1: Data Download

Let's read the image mosaics and more csv files that tell us where every tree is in its image mosaic:

In [None]:
# Download data


import os
# Read mosaics
if not os.path.exists('Zao1_190824.tif'):
    !wget https://www.dropbox.com/s/zqz7o3pidhe7cmw/Zao1_190824.tif

if not os.path.exists('Zao1_201005.tif'):
    !wget https://www.dropbox.com/s/y197ghnsojapwsj/Zao1_201005.tif

if not os.path.exists('Zao1_211005.tif'):
  #https://www.dropbox.com/s/cvfvs7ryco9sfuv/Zao_S1_211005Nv2.tif
    !wget https://www.dropbox.com/s/cvfvs7ryco9sfuv/Zao1_211005.tif

if not os.path.exists('Zao1_221013.tif'):
    !wget https://www.dropbox.com/s/65chsirodm8nxko/Zao1_221013.tif


# Read coordinate files
if not os.path.exists('Site12019.csv'):
  !wget https://www.dropbox.com/s/2s9dwf7mg8j2h46/Site12019.csv

if not os.path.exists('Site12020.csv'):
  !wget https://www.dropbox.com/s/0keqze6uwrcafhr/Site12020.csv

if not os.path.exists('Site12021.csv'):
  !wget https://www.dropbox.com/s/0f7gabxqjbl0l68/Site12021.csv

if not os.path.exists('Site12022.csv'):
  !wget https://www.dropbox.com/s/4rvkdqxxhreoogo/Site12022.csv

--2024-07-08 01:14:36--  https://www.dropbox.com/s/zqz7o3pidhe7cmw/Zao1_190824.tif
Resolving www.dropbox.com (www.dropbox.com)... 162.125.81.18, 2620:100:6031:18::a27d:5112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.81.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/r6y2m4s0wd6kr7exd8ofj/Zao1_190824.tif?rlkey=czof9lhoi6ncaevvjof4adq5l [following]
--2024-07-08 01:14:37--  https://www.dropbox.com/scl/fi/r6y2m4s0wd6kr7exd8ofj/Zao1_190824.tif?rlkey=czof9lhoi6ncaevvjof4adq5l
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucb2a06e266a2b852e2f5526bdd0.dl.dropboxusercontent.com/cd/0/inline/CWR-0VlrgEPJHLXSXsv70E5NklVSZ-Tkq02D7HrOyel17aE5Z9ERPchbiuInDFOFGy5Mm7YwnWf8H5u6JluLtG7DYWypGhxGDGbRdVoHKAg0_mdTTiej1WHh5-bjEX5lKhEZ2BpVhipD9ofQ5XvsISaC/file# [following]
--2024-07-08 01:14:37--  https://ucb2a06e266a2b852e2f5526bdd0.dl.dropboxusercontent.com/c

# 2: Create Patches

We are going to read some image mosaics and create a patch around each tree:

In [None]:
import os

# important imports
import cv2
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd


# This will allow the images that we create to be displayed nicely within the notebook
%matplotlib inline

## 2.1 Read Image Mosaics

The following function uses what we saw in the [image basics](https://colab.research.google.com/drive/1V9CjpBHjabFQCg793fiKkF0SDFGUtRtK?usp=sharing) lesson to read our mosaics.

In [None]:
def readMosaic(fileName):
    """
        Function to read RGB images and transform them
        from RBG (opencv) to RGB (matplotlib)
    """
    retVal = cv2.imread(fileName,cv2.IMREAD_COLOR)
    #return retVal[:,:,::-1]
    return retVal # in this case, we will want to store the images so we save them as they are

In [None]:
mosaic19 = readMosaic("Zao1_190824.tif")

In [None]:
mosaic20 = readMosaic("Zao1_201005.tif")

In [None]:
mosaic21 = readMosaic("Zao1_211005.tif")

In [None]:
mosaic22 = readMosaic("Zao1_221013.tif")

These mosaics are pretty big and if we try to visualize them in our free google colab machine we will run out of memory. If you want to play with the code you can download it to your own machine or use downsampled versions of these images with the following functions.

# 2.2 Reading Tree location data and health status

We now have for more csv files, let's read them with pandas.

In [None]:
# Read Health information files
data19 = pd.read_csv("Site12019.csv", sep=",")
data20 = pd.read_csv("Site12020.csv", sep=",")
data21 = pd.read_csv("Site12021.csv", sep=",")
data22 = pd.read_csv("Site12022.csv", sep=",")

We have read each file into a data frame, let's see one of them:

In [None]:
data19

Unnamed: 0,Field,geox,geoy,px,py,category
0,1,448409.6740,4.222805e+06,2317,7310,Healthy
1,2,448420.7196,4.222800e+06,2818,7513,Healthy
2,3,448425.3998,4.222803e+06,3030,7379,Healthy
3,4,448421.1876,4.222807e+06,2839,7207,Healthy
4,5,448423.2524,4.222809e+06,2933,7126,Healthy
...,...,...,...,...,...,...
467,468,448737.4991,4.222811e+06,17177,7020,Dead
468,469,448742.5879,4.222848e+06,17408,5345,Healthy
469,470,448743.1797,4.222850e+06,17435,5269,Healthy
470,471,448782.3421,4.222833e+06,19210,6055,Healthy


The "Field" column contains an id for every tree and then we have the gps and pixel coordinate of every tree top along with its health category in the year.

Having the data in pandas also allows us to transform it easily. Let's make first a list of tuples and then a dictionary for each year so we can locate the position of trees by id in a convenient way:

In [None]:
# Transform our data to tuple form
tuples19 = list(data19[["Field","px","py","category"]].to_records(index=False))

In [None]:
tuples20 = list(data20[["Field","px","py","category"]].to_records(index=False))

In [None]:
tuples21 = list(data21[["Field","px","py","category"]].to_records(index=False))

In [None]:
tuples22 = list(data22[["Field","px","py","category"]].to_records(index=False))

Now let's gropup this information together with the mosaics that have the image data. This way we will be able to create patches easily.

In [None]:
def tuplesToDictWithMosaics(tupList,mosaic):
  """
  Transform a list of tuples into a dictionary for easy location
  """
  retDict = {}
  for t in tupList:
    retDict[t[0]] = (t[1],t[2],t[3],mosaic)
  return retDict

In [None]:
# Change everything to dictionary format
dict19 = tuplesToDictWithMosaics(tuples19,mosaic19)
dict20 = tuplesToDictWithMosaics(tuples20,mosaic20)
dict21 = tuplesToDictWithMosaics(tuples21,mosaic21)
dict22 = tuplesToDictWithMosaics(tuples22,mosaic22)

## 2.3 Creating Patches And Storing them to file

Let's use the function that we have to extract treetop patches to create folders according to health status

In [None]:
def getSquare(w_size, p, img):
    """
    Function to extract a patch form an image given pixel coordinates and size
    """
    # opencv works with inverted coords, so we have to invert ours.
    return img[int(p[1])-w_size//2:int(p[1])+w_size//2, int(p[0])-w_size//2:int(p[0])+w_size//2]

Now we can create patches according to the health status of every tree:

In [None]:
from random import randint

patch_size = 100
outputPath = "HealthPatches"

#create output Folder if it does not exist
if not os.path.exists(outputPath):
    os.mkdir(outputPath)

# We have four dictionaries, let's process them in turn:
myDicts = [dict19,dict20,dict21,dict22]
# let's also give each dictionary a code to store the patches separately
myCodes = ["year19","year20","year21","year22"]

# Process every dictionary in turn
for aDict,code in zip(myDicts,myCodes):
    # Go over all the trees
    for k,v in aDict.items():
        # The index of the tree is in "k"
        # Find coordinates and create a patch
        center = (v[0],v[1])
        healthCategory = v[2]
        aPatch = getSquare(patch_size,center,v[3])

        # Let's store all images in a single folder
        # If the folder does not exist, create it
        targetFolder = os.path.join(outputPath)
        if not os.path.exists(targetFolder):
            os.mkdir(targetFolder)

        # store the image
        cv2.imwrite(os.path.join(targetFolder,healthCategory+"patch"+str(k)+code+".jpg"),aPatch)

If you click in the "folder" icon to the left of your screen, you will see the drive of the colab machine that you are using. You will see that we have created a new folder called "HealthPatches" that contains images of the three classes that we have

# 3. Create Dataset


As we did in [UNIT 4, Handling Custom Data in Pytorch](https://colab.research.google.com/drive/1zkdVQMv7YCBExHMJ1qCdrCba7Z1jsEi8?usp=sharing), [\*\*VIDEO  LESSON\*\*](https://www.youtube.com/watch?v=AtyxMuWr5OE&list=PLfZpLxnJ0nUfKAXQhZiIV-pLCudf50-Lg), we need to create a Pytorch dataset so that we can then use it with our data.


In [None]:
# Create Pytorch Dataset





# 4. Create DL network

As we did in [UNIT 5, Training Deep Learning Models in Pytorch](https://colab.research.google.com/drive/17ARJbWw2h1X5rQNEkh7n7oXXA9Bvb2P9?usp=sharing), we now need to define and train Deep learning networks.

In [None]:
# Define a Resnet50 model and at least another DL model and compare their results