1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Jun 18, 2025 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
A Data Analysis Board in Vue.
PySpark-Tutorial provides basic algorithms using PySpark
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Powerful & Easy way for big data discovery
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
This is about learning courses in Coursera. All the answers given written by myself
I have built the computer vision models in 3 different ways addressing different personas, because not all companies will have a resolute data science team. quality-control manufacturing big-data-analytics jupyter-notebook cognitive services industry solutions
Bucketize an image based on exhaust data and AI generated data. industry-solutions azure azure machine learning services computer-vision big data big data analytics machine learning image recognition manufacturing quality control cognitive services
The Pandata scalable open-source analysis stack
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
Big data projects implemented by Maniram yadav
The binary build of LEO CDP Free Edition for training purposes
Add a description, image, and links to the big-data-analytics topic page so that developers can more easily learn about it.
To associate your repository with the big-data-analytics topic, visit your repo's landing page and select "manage topics."