# Transfer Learning with Convolutional Neural Networks for Hydrological Streamline Detection

## Abstract
Streamline network delineation is essential for various applications, such as agriculture sustainability, river dynamics, and watershed analysis. Machine learning methods have been applied for streamline delineation and have shown promising performance. However, performance drops substantially when a trained model is applied to different locations. In this paper, we explore whether fine-tuning neural networks that have been pre-trained on large label datasets (e.g., ImageNet) can improve transferability from one geographic area to another. Specifically, we test transferability using small catchment stream lines from the Rowan County, NC and Covington River, VA areas in the eastern United States. First, we fine-tune eleven pre-trained U-Net models with various ResNet backbones on the Rowan County area and compare them with an attention U-net model that is trained from scratch on the same dataset. We find that the DenseNet169 model achieves an F1-score of 85% which is about 4% higher than the attention U-net model. To compare the transferability of the models to a new geographic area, the three highest F1-score models from the Rowan County area are further fine-tuned with data in the Covington area. Similarly, we fine-tune the attention U-net model from the Rowan County area with the data in the Covington area. We find that fine-tuning ResNet50 model achieves an F1-score of 65.58% in predicting the stream pixels in the Covington area, which is significantly higher than training the models from scratch in the Covington area or fine-tuning attention U-net model from Rowan to Covington.

## Keywords:  
Convolutional neural network, Deep learning, Remote sensing, Streamline analysis, Transfer learning


## Table of Content
1. [Study Areas and Input Data](#1-introduction)
2. [Machine Learning Model Training Process](/Train_Models.ipynb)
3. [Traditional Methods process](/Traditional_Methods.ipynb)


## 1. Introduction

In this study, we investigate the transferability of models across two distinct locations: the watershed in Rowan County, North Carolina, and the Covington area in Virginia.

### 1.1 Study Areas

#### 1.1.1 Rowan County, North Carolina

The data for Rowan County, North Carolina (Figure 1), is sourced from the study by Xu et al. (2021). This area comprises a network of tributaries flowing into Second Creek, the primary flowline feature of 12-digit NHD watershed 030401020504. The dataset encompasses 1,400 training samples and 30 validation samples extracted from the upper portion of the area. The test data covers the entire lower area.

![Figure 1: Rowan County area]('.notebook_data/covington_area_figure.jpg')
*Figure 1: Rowan County area (left: boundary of North Carolina state; middle: a 1-m resolution image of the study area from National Agriculture Imagery Program (NAIP); right: reference stream feature). Source: Xu et al., 2021.*

Eight raster layers are stacked to create the dataset, including a 1-m resolution digital elevation model (DEM), geometric curvature, topographic position index (TPI), zenith angle positive openness, return intensity, and point density information. The statistics for each raster layer are summarized in Table 1.

**Table 1: Summary statistics raster images for Rowan County, NC**  

| Raster Image Name                                          | Minimum    | Maximum    | Mean      | Standard Deviation | Range     |
|------------------------------------------------------------|------------|------------|-----------|--------------------|-----------|
| Digital elevation model (meters)                           | 194.11     | 256.19     | 229.07    | 12.96              | 62.07     |
| Geometric curvature                                        | -97.25     | 97.93      | 0.01      | 3.05               | 195.18    |
| Topographic position index (3x3 window)                    | -8.59      | 5.58       | 6.38      | 0.18               | 14.17     |
| Topographic position index (21x21 window)                  | -13.62     | 13.29      | 0.00      | 0.93               | 26.91     |
| Openness (R10, D32) degrees                                | 21.52      | 118.8      | 83.41     | 7.35               | 97.28     |
| Return intensity                                           | 0.00       | 55185.39   | 29047.18  | 10624.11           | 55185.39  |
| Return point density 1 ft above ground (points per m²)     | 0.00       | 0.94       | 0.02      | 0.04               | 0.94      |
| Return point density 3 ft above ground (points per m²)     | 0.00       | 2.89       | 0.12      | 0.23               | 2.89      |

*Source: Xu et al., 2021.*


#### 1.1.2 Covington River Watershed, Virginia

The second study area is the 12-digit NHD Hydrologic Unit (HU) 020801030302 watershed, encompassing primary tributaries of Covington and Rush Rivers in Rappahannock County, northern Virginia (Figure 2). The area covers 108 square kilometers and exhibits diverse land cover, temperature ranges, and elevation characteristics. The watershed's features were rasterized to 1-m resolution for reference.

![Figure 2: Covington area]('/notebook_data/covington_area_figure.jpg')
*Figure 2: Covington area (left: boundary of Virginia, USA; middle: a 1-m resolution image of the study area from National Agriculture Imagery Program (NAIP); right: reference stream feature).*

Eight 1-m resolution Lidar and elevation-derived raster data layers were employed for training, validation, and testing. These layers encompass digital elevation models, geometric curvature, slope, positive openness, topographic position indices, return intensity, geomorphons, and TPI. Summary statistics for these layers are presented in Table 2.

**Table 2: Summary statistics raster images for Covington River watershed, VA**

| Raster image name                        | Minimum      | Maximum    | Mean       | Standard Deviation | Range     |
|------------------------------------------|--------------|------------|------------|--------------------|-----------|
| Digital elevation model (DEM) (m)        | 125.4523     | 1039.1520  | 365.8976   | 190.9829           | 913.6997  |
| Geometric curvature                      | -1.9900      | 1.9974     | 0.0001     | 0.0941             | 3.9957    |
| TPI with moving window size 3            | -5.7020      | 5.7213     | 0.00000235 | 0.0661             | 11.4232   |
| TPI with moving window size 21           | -14.6981     | 12.8009    | 0.0001     | 0.2873             | 27.4990   |
| Positive openness                        | 45.3490      | 162.6082   | 88.8340    | 2.4694             | 117.2592  |
| Lidar reflectance                        | 0.0000       | 255.0000   | 39.3197    | 12.6086            | 255.0000  |
| Slope data (degree)                      | 0.0000       | 14.2646    | 0.2323     | 0.1711             | 14.2646   |

> **Note.** Geomorphons is an integer-coded discrete class, therefore we do not include the statistics in this table.

For training, 200 initial sample patches were extracted and augmented to generate 1400 samples for the training dataset. Additionally, 30 unaugmented samples were extracted for the validation dataset. The southern region of the study area served as the test dataset for evaluating model performance and generalization.
