Skip to content

aditya-sawadh/Satellite_Image_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Satellite Image Classification

Overview

The repository is my submission to the Kaggle competion DSTL Satellite Image Classification. The objective is to locate and classify objects like roadways, waterbodies, vehicles, buildings etc. in multiband satellite images. The project takes a feature extraction approach towards object classification by building a XGboost decision tree classifier to find objects in test images.

Repository Overview

The repository consists of four folders: code, input and output, scores

  • Code - It contains two files: CreateGrids.py and GridBased Classification.py.
    • CreateGrids.py is responsible for setting the grid sizes for an image to extract fatures.
    • GridBased Classification.py consists code for feature extraction of training and test images, building the classifier model and making the predictions.
  • Input
    • train_wkt.csv - the WKT format of all the training images with its labels.
    • grid_sizes.csv - the sizes of grids for all the images.
    • train_geojson.zip - the geojson format of all the training labels.
    • feature_grid - This folder consists of the feature vectors extracted for test images.
    • feature_train_grid - This folder consists of the feature vectors extracted for trained images.
  • Output - This folder contains the predictions of class labels generated by the code for test images in the format specified by kaggle
  • Scores - This folder contains all the scores we got from using different classifiers and by tuning different parameters of the XGBoost model.

Steps to run the program:

  • Get the three band and sixteen band data from the Kaggle website and store it in the input folder
  • Run the gridBased.py file. The code is currently set with configurations giving best results
  • The output will be generated in the output folder as a csv file
  • Submit these files on Kaggle to get your score

Methodology

  • Objective: The aim of this competition is to find the location of 10 features in the images - Buildings, Miscellaneous man-made structures, Road, Track, Crops, Trees, Crops, Waterway, Large Vehicles and Small Vehicles. The prediction contains the location of each feature in each image. The location is defined in the form of an array of polygons

  • Feature extraction: The data has the location of buildings, rivers, and other labels in the training images.Every image is divided into a number of Grids. The grid is iterated and best size is chosen to improve the classification score based on Jaccard distance. The training images are divided into grids and features are extracted from the obtained grids.The feature vector for every grid contained the four measures, namely mean, variance, skewness and kurtosis.

  • Model Development: An XGBoost model is built using the feature vectors available from the feature extraction. Each feature vector is a mapping of the features with the class to which it belongs. The tuning parameters are set to train the XGBoost model to get the best results. Different experiments on tuning parametres of teh XGboost model were performed to get the best score evaluated by Kaggle

  • Model Evaluation : Submissions are evaluated on Average Jaccard Index between the predicted multipolygons and the actual multipolygons. This is a vector-based metric where weuse for polygon geometries to evaluate how well the predictions are aligned with the answer. The code for the same was shared by the official Kaggle evaluatorsand is available here

About

Classification of objects in an image based on feature extraction. XGBoost and ensemble classification methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages