Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
11 lines (9 sloc) 486 Bytes

Global Field Importance for Clusters

This script determines which input fields are most important for differentiating between clusters. Given a cluster id, it:

  • Using batchcentroid, extends the dataset the cluster was built on with the centroid id each instance is a member of.
  • Builds an ensemble (random forest) on that centroid id field.
  • Averages the importance of each input field across all models in the ensemble.
  • Returns a sorted list of input fields by importance.