First of all, what is "good"? 😇
Explore the docs »
Report Bug
·
Request Feature
We will be speaking into a dataset regarding radar data and whether the radar returns show evidence of some type of structure in the ionosphere. If signals pass through it or not is how it will be deemed as either “good” or “bad.” While the dataset gives us the labels, we want to use cluster analysis to determine which characteristics might lead to a “good” radar. Clustering is not meant for prediction, but just to find similarities and relationships.
Here is a link to the Ionosphere dataset information.
To get a local copy up and running, download the Kmeans_clustering.R
and the text input file, ionosphere.csv
. Then run the code in an IDE software, such as RStudio. Set the working directory to the location of the CSV file.
The code guides you through the following:
- Importing the CSV file
- Visualizing the formatting of the variables (datatypes, number of rows/columns, measures of central tendancy, statistical descriptions, etc.)
- Pre-processing such as cleanup (removing irrelevant variables, check for missing variables), importing libraries, and normalization (scaling the variables)
- Set the seed to allow for reproducability and split the dataset into a training set and test set
- Perform the K-means clustering algorithm, running one round at a time
- Evaluate the accuracy through cross-tabulation and visually plotting clusters
- Change parameters to improve accuracy (our goal is to minimize the sum of squared distances within the cluster while maximizing the sum of squared distances between clusters)
The main findings in relation to the stated objective were that each of the instances of radars in the ionosphere dataset can be clustered into three groups. Each group contains similar characteristics that can be used to deem a good or bad radar through evaluation of the antennas used in each instance. There is some overlap in characteristics between some groups, called fuzzy partitioning, but this may also add into the knowledge that can be gained about the data. Further analysis will need to be done to achieve the true objective, as the cluster analysis is purely used for exploratory analysis and relationships.
Karishma Mathur - karishma324@gmail.com
Project Link: https://github.com/Mathurkarishma/radar-signals-ionosphere
- Dr. Firdu Bati at University of Maryland, Global Campus - Fall 2019
- Ionosphere Dataset Description