The HinMine algorithm is an algorithm designed to construct network-analysis-based feature vectors for data instances that can be either nodes in a network or standard data instances with a fixed set of numeric features. In this implementation, the input for the algorithm is a set of data instances, and the output of the algorithm is a new data set with the same instances, but new features constructed out of them.
The algorithm works in two steps. In the first step, a network is constructed out of the input data, where the nodes of the network correspond to the data instances and the strength of the connection between two instances exponentially depends on the square of the Euclidean distance between the two instances. In the second step, network propositionalization is performed on the resulting network. Network propositionalization is a method for constructing feature vectors for each target node in the network using the personalized PageRank (P-PR) algorithm. The personalized PageRank of node v in a network is defined as the stationary distribution of the position of a random walker who starts the walk in node
-
damping: The variable p used in the construction of the P-PR vectors during propositionalization. The value of this variable can be any real number between 0 and 1. Smaller values of the damping factor ensure faster calculation of the feature vectors, however larger values of p mean that the algorithm is capable of performing longer walks, exploring more of the structure of the data.
-
normalize (True/False): This variable determines whether the feature values of the input data instances should be normalized or not. If True, then the values of each feature are normalized to be between 0 and 1. This allows the algorithm to fairly compare two features measured with incomparable units. The value of this variable should be False if the difference in the size of the features carries inherent meaning.
Reference: Kralj, J., Robnik-Šikonja, M., & Lavrač, N. (2018). HINMINE: heterogeneous information network mining with information retrieval heuristics. Journal of Intelligent Information Systems, 50(1), 29-61. URL