Skip to content

DavideMassidda/doppelganger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

doppelganger

An R package to analyze correlations and select a subset of non-redundant variables.

For details, see the post on Towards Data Science:

Fighting doppelgängers: How to rid data of evil twins reducing the feature space

Quick usage

dg <- doppelganger(data, priority={...}, threshold={...})
Argument Description
data Data frame containing numerical variables.
variables Columns of data to consider in the analysis (default: all).
priority Ranking method to prioritize variables ("centrality", "peripherality", "raw_order").
threshold Correlation cut-off (absolute value).

Output

Variabiles to keep / drop:

dg$keep
dg$drop

About

Select a subset of non-redundant variables

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages