Background and connection to disease spillover

This notebook describes the ongoing development of a convolutional neural network (CNN) that identifies village-features in satellite imagery. Though this notebook is intended as a stand-alone example of machine vision, this work is a part of a larger project with the goal of forecasting the risk of disease transmission from wildlife into humans. Oftentimes, these zoonotic viruses occur in areas with limited health infrastructure. As a result, the burden that these viruses cause to humans is underestimated or unknown. It is my hope that machine learning can help pinpoint areas that are most affected by wildlife diseases. If possible, these models can help direct intervention strategies like vaccination.

My current work focuses on Lassa virus, an arenavirus that circulates within rodent populations in West Africa and transmits to humans that come into contact with rodent waste. Generally, a critical step for building better forecasts of a zoonotic virus is obtaining environmental features that predict the abundance of animal reservoirs that are available to host the virus. For example, large-scale GIS datasets, like those provided by the Landsat program, are important for generating broad predictions of where the primary Lassa-reservoir, the multimammate rodent (Mastomys natalensis), occurs. However, past field surveys indicated that the distribution of multimammate rats is influenced by small-scale features as well: relative to other rodents that do not transmit Lassa virus, multimammate rodent populations were more prevalent in village-areas with houses near agricultural cultivations and less prevalent in forested areas of the village. Consequently, CNN's that extract features like houses, cultivations, forest, etc, from imagery, could provide a feature-set that allows for finescale risk prediction.

I'm proud to state that this work is my own. I was responsible for collecting the satellite imagery from a Google API (using QGIS), creating shapefiles that described building perimeter and type, creating annotated versions of images, and of course, designing and fitting the CNN. Obviously, I learned quite a bit from online sources and peer-reviewed articles. These are cited throughout the walkthrough. If the reader is interested in using this repository's datasets, I kindly ask that they give the appropriate credit.

Convolutional neural network design and performance

As a first step towards the goal of extracting features that are relevant to spillover forecasts, I have created a CNN that identifies buildings in satellite imagery. This CNN is based on the U-Net image segmentation design, using an EfficientNet as an encoder and a simple decoder. Specifically, this CNN will classify each pixel in an image as traditional building (circular thatch structure), modern building (rectangular aluminum roof), or background. The figures below show an example of the CNN's ability to segment the different building types.

By segmenting the images in this way, I can then use OpenCV tools to identify groups of pixels as unique buildings. The image below shows the output of a function that I wrote that uses OpenCV the FindContours function. After grouping the building pixels into distinct objects the function overlays the building contours onto the original image.

Associating pixel groups with individual pixels, in turn, allows the CNN pipeline to "count" the number of different building types in an image. The plots below show the CNN pipeline's ability to count modern buildings (left) and traditional buildings (right) in test images that were omitted from the training process. Generally, the CNN is able to accurately assess the number of building types in an image -- however, it is also clear that the CNN pipeline underestimates areas with high building density.

Jupyter notebooks

This repository contains the jupyter notebooks, CNN model architectures, and training results of the model-building process. Unfortunately, the full dataset of satellite imagery is too large to upload to Github. As a compromise, I've included two (of 29) large rasters of data that will allow a smaller version of the image dataset to be generated, and a simpler version of the CNN model to be fit. To be clear, the images that are shown above are all from the standard, large-dataset version of the model.

The original Jupyter notebook is titled "Building_Segmentation.ipynb", and the version of the notebook that uses the truncated dataset is titled "Small_Building_Segmentation.ipynb". Because of the reduced amount of training data, the performance of the small version of the model is significantly worse. Even so, hopefully this helps users of this repository understand how the notebook functions.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
Figures		Figures
Fitted_Models/logs		Fitted_Models/logs
Model_Frames		Model_Frames
Tools		Tools
Building_Segmentation.html		Building_Segmentation.html
Building_Segmentation.ipynb		Building_Segmentation.ipynb
README.md		README.md
Small_Building_Segmentation.html		Small_Building_Segmentation.html
Small_Building_Segmentation.ipynb		Small_Building_Segmentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Figures

Figures

Fitted_Models/logs

Fitted_Models/logs

Model_Frames

Model_Frames

Tools

Tools

Building_Segmentation.html

Building_Segmentation.html

Building_Segmentation.ipynb

Building_Segmentation.ipynb

README.md

README.md

Small_Building_Segmentation.html

Small_Building_Segmentation.html

Small_Building_Segmentation.ipynb

Small_Building_Segmentation.ipynb

Repository files navigation

Background and connection to disease spillover

Convolutional neural network design and performance

Jupyter notebooks

About

Releases

Packages

Languages

54481andrew/sky-glass

Folders and files

Latest commit

History

Repository files navigation

Background and connection to disease spillover

Convolutional neural network design and performance

Jupyter notebooks

About

Resources

Stars

Watchers

Forks

Languages