Skip to content

DataGuard-team/DataGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DataGuard Banner

DataGuard

Marketplace Downloads
License VS Code

AI-Assisted Dataset Analysis & Cleaning Inside Visual Studio Code

Inspect β€’ Visualize β€’ Clean β€’ Generate Insights

Stay inside your editor. Explore datasets faster.


Table of Contents


πŸ“— Overview

DataGuard is a Visual Studio Code extension designed to make dataset exploration feel native to the editor experience.

Open a supported dataset to instantly access an interactive workspace for analysis, visual inspection, visualization, and safe cleaning operations β€” without switching to notebooks, or external tools.

Core processing runs locally.

Tip

AI capabilities can be enabled for deeper dataset insights.


🎯 Why DataGuard

Traditional dataset workflows usually involve unnecessary context switching:

Open Dataset
β†’ Launch Notebook
β†’ Inspect
β†’ Clean
β†’ Export
β†’ Return to Editor

DataGuard reduces that workflow into a single environment.

πŸ““ Principles

  • Local-first execution
  • Fast feedback loops
  • Minimal context switching
  • Safe modification workflow
  • Optional AI augmentation

✨ Features

πŸ“Š Dataset Analysis

  • Automatic dataset profiling
  • Dataset overview and metadata
  • Row and column inspection
  • Column type detection
  • Missing value analysis
  • Duplicate discovery
  • Statistical summaries

πŸ“ˆ Interactive Visualizations

  • Dataset composition
  • Numerical distributions
  • Missing value breakdown
  • Column exploration
  • Interactive charts

🧹 Data Cleaning

Perform cleaning operations directly inside VS Code:

  • Remove duplicates
  • Fill missing values
  • Convert data types
  • Column operations
  • Safe save workflow

Note

Changes are applied locally.


πŸ€– Optional AI Insights

Configure an AI provider to generate:

  • Dataset summaries
  • Pattern discovery
  • Cleaning suggestions
  • High-level observations

Compatible with configurable AI providers:

  • OpenAI
  • Anthropic
  • Google Gemini
  • Groq
  • Cohere

Note

AI features are optional and remain disabled until explicitly configured.


🎯 Design Goals

  • Fast startup
  • Local-first processing
  • Minimal workflow interruption
  • Safe cleaning operations
  • Optional AI assistance

🧩 Supported Formats

Format Supported
CSV βœ…
TSV βœ…
JSON βœ…

⚑ Smart Activation

DataGuard activates only for supported dataset files.

To avoid unnecessary interruptions during development workflows, common project and configuration JSON files are intentionally ignored.

Examples:

package.json
package-lock.json
tsconfig.json
jsconfig.json
launch.json
settings.json
extensions.json
tasks.json
manifest.json
devcontainer.json

πŸ“Έ Screenshots

Profile Dataset

Dataset Overview Screen


Explore Dashboard

Dashboard Screen


Clean Data

Data Cleaning Screen


AI Insights (Optional)

Ai Insights Screen


πŸŽ₯ Demonstration

Watch the demo on YouTube:

Watch Demo


πŸ—ƒοΈ Dataset Attribution

Screenshots, demonstrations, and promotional materials shown in this repository may include examples generated using the googleplaystore.csv dataset.

Dataset source:

  • L. Gupta, "Google Play Store Apps," Feb 2019. [Online]. Available: Kaggle

Usage purpose:

  • Product demonstration
  • Dashboard showcase
  • Visualization examples
  • Documentation screenshots

Important

DataGuard is not affiliated with, endorsed by, or associated with the dataset maintainers.

Note

The extension itself is dataset-agnostic and supports analysis of user-provided datasets in supported formats.


πŸ”„ Workflow

Open Dataset
      ↓
Automatic Detection
      ↓
Local Processing
      ↓
Interactive Dashboard
      ↓
Visualize β€’ Inspect β€’ Clean
      ↓
Save Changes

πŸš€ Installation

🏬 Option 1 β€” Visual Studio Code Marketplace (Recommended)

  1. Open Visual Studio Code
  2. Open Extensions tab
  3. Search for DataGuard
  4. Click Install
  5. Open a supported dataset

πŸ“¦ Option 2 β€” Install via GitHub Release (.vsix)

Repository releases include installable extension packages.

  1. Open the GitHub Releases page
  2. Download the available .vsix
  3. Open Visual Studio Code
  4. Extensions β†’ Install from VSIX
  5. Select downloaded package

βš™οΈ Requirements

Requirement Version
VS Code Latest Stable
Python 3.10+

DataGuard automatically detects required Python dependencies.

If manual installation is needed:

pip install pandas numpy

If Python path detection does not work:

Open Command Palette β†’ DataGuard: Set Python Path


πŸ—οΈ Tech Stack

Layer Technology
Extension Platform VS Code Extension API
Runtime TypeScript
Processing Python
Data Engine Pandas
Visualization Chart.js
Packaging VSCE

πŸ”’ Privacy

DataGuard processes datasets locally.

AI features require explicit configuration.

No data leaves your machine unless an AI provider is enabled.


πŸ“ˆ Performance

Performance depends on:

  • Dataset size
  • Available memory
  • Python environment

Note

Designed to support analysis across datasets of varying sizes.


πŸ“¦ Distribution

DataGuard is available through:

🏬 Visual Studio Code Marketplace

Primary installation channel.

Install and update directly inside Visual Studio Code.


πŸ“¦ GitHub Repository

This repository serves as a project showcase and distribution companion for the extension.

🧺 Repository contents

  • README
  • Branding assets
  • Screenshots
  • Demonstration material
  • License
  • Security policy
  • VSIX release packages

Note

This repository does not contain the extension source code. It is intended for distribution assets and release artifacts only.

Important

Release policy: Additional releases are published only when a new version of the extension is available.


πŸ›‘οΈ Security

Security practices and responsible reporting guidance are available in: SECURITY.md


πŸ“„ License

Usage rights and restrictions are documented in the LICENSE file.

Note

Commercial use and redistribution are prohibited unless explicitly permitted by the license.


🀝 Team

Built and maintained by the DataGuard Team.

Individuals Behind DataGuard

Krishna Yadav

Lead Developer

See Profile β†—


Sachin Vishwakarma

Developer

See Profile β†—


Nikhil Vishwakarma

Developer

See Profile β†—


Explore the Team

View all team members β†—


Made for faster dataset workflows inside Visual Studio Code.

About

AI-assisted dataset analysis & cleaning inside Visual Studio Code. Profile, visualize, clean, and explore datasets without leaving your editor.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Contributors