Clone the repository

VPUFS

This repository contains the implementation of VPUFS, a novel unsupervised feature selection framework designed to effectively reduce the high dimensionality of microarray gene expression data for cancer sample clustering.

📌 Overview..

Gene expression datasets are highly dimensional, with only a small fraction of features (genes) being truly informative for classifying or clustering cancer samples. VPUFS addresses this issue by selecting the most relevant and non-redundant features using:

-Variance Score: Measures relevance based on statistical variability.

-Pearson Similarity: Identifies and eliminates redundant features by measuring pairwise correlations.

-The selected features can then be used to improve performance in clustering algorithms such as K-Means, Spectral Clustering, GMM, etc., providing better insights into cancer subtypes.

⚙️ Features

-Unsupervised feature selection (no class labels needed)

-Efficient reduction of high-dimensional microarray gene data

-Improved clustering results using selected features

-Evaluated on multiple datasets: Leukemia, Colon, Prostate, Breast

-Tested against established techniques (Laplacian Score, MCFS, JELSR, NDFS, LDFS)

📁 Dataset Info

🧮 Methodology

Data Preprocessing -Remove null/duplicate rows and columns

-Separate target labels (if available)

Feature Scoring -Variance Score: High variance = high relevance

-Pearson Similarity: Measures redundancy between features

-Non-Redundant Score: 1 - max(Pearson Correlation)

-Final Score: Variance × Non-Redundant Score

Feature Selection -Sort features by score

-Select top-𝑞 ranked features

Output: reduced gene expression matrix

📊 Performance Evaluation

--Classifier: SVM with LOOCV, 5-fold, and 10-fold cross-validation

--Clustering: K-Means, GMM, Agglomerative, SOM, Spectral

--Metrics: Rand Index (RI), Adjusted Rand Index (ARI)

VPUFS outperforms most traditional unsupervised methods in terms of both classification accuracy and clustering performance.

🔍 Key Results

📦 Installation & Usage

Clone the repository

git clone https://github.com/Phoenixcoder-6/VPUFS.git cd VPUFS

Install dependencies

pip install -r requirements.txt

Run the VPUFS pipeline

python vpufs_main.py Replace vpufs_main.py with the actual script name used.

🧠 Future Work

-Extend the framework to RNA-seq datasets

-Integrate deep learning-based feature selectors

-Deploy as a web-based cancer clustering tool

📚 Full Paper Access

The complete research paper "Variance Score and Pearson Similarity based Unsupervised Feature Selection (VPUFS) for Sample Clustering in Microarray Gene Expression Data" is published in the IEEE International Conference 2024.

📄 Access the full paper here: https://ieeexplore.ieee.org/document/10763835

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Codes		Codes
Dataset		Dataset
LICENSE		LICENSE
README.md		README.md
VPUFS_IEEE_Paper.pdf		VPUFS_IEEE_Paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VPUFS

📌 Overview..

⚙️ Features

📁 Dataset Info

🧮 Methodology

📊 Performance Evaluation

🔍 Key Results

📦 Installation & Usage

Clone the repository

Install dependencies

Run the VPUFS pipeline

🧠 Future Work

📚 Full Paper Access

About

Uh oh!

Releases

Packages

Languages

License

Phoenixcoder-6/VPUFS

Folders and files

Latest commit

History

Repository files navigation

VPUFS

📌 Overview..

⚙️ Features

📁 Dataset Info

🧮 Methodology

📊 Performance Evaluation

🔍 Key Results

📦 Installation & Usage

Clone the repository

Install dependencies

Run the VPUFS pipeline

🧠 Future Work

📚 Full Paper Access

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages