massive-datasets

Here are 26 public repositories matching this topic...

polardb / polardbx-sql

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

mysql distributed-transactions cloud-native high-availability relational-database high-concurrency massive-datasets htap horizontal-scaling enterprise-class

Updated Jun 6, 2025
Java

helmholtz-analytics / heat

Star

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

python data-science machine-learning hpc gpu numpy mpi pytorch distributed parallelism data-analytics tensors data-processing multi-gpu mpi4py massive-datasets multi-node-cluster array-api

Updated Jun 6, 2025
Python

polardb / polardbx

Star

PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.

mysql distributed-transactions cloud-native high-availability relational-databases high-concurrency massive-datasets htap horizontal-scaling enterprise-class

Updated Nov 29, 2024
Makefile

joshuaboud / gen-dataset

Star

Command line tool to quickly generate a lot of files in a lot of directories

linux benchmarking evaluation multithreading dataset dataset-generation massive-datasets cli-tool dataset-generator

Updated Feb 18, 2022
C++

rajeshidumalla / Bloom-Filter

Star

Building a Bloom Filter on English dictionary words

python data-science machine-learning bloom-filter data-analysis nltk-library massive-datasets

Updated Oct 7, 2021
Jupyter Notebook

FedericoBruzzone / anti-money-laundering

Star

The project is based on the analysis of the "IBM Transactions for Anti Money Laundering" dataset published on Kaggle. The task is to implement a model which predicts whether or not a transaction is illicit, using the attribute "Is Laundering" as a label to be predicted.

machine-learning machine-learning-algorithms pyspark massive-datasets

Updated Aug 12, 2024
Jupyter Notebook

gmalik9 / floating_point_data_compressor

Star

gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data

compression data-visualization autoencoder compressor data-compression representation representation-learning floating-point massive-datasets

Updated Sep 14, 2017
Python

rajeshidumalla / PageRank

Star

Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library

python data-science machine-learning spark numpy pagerank-algorithm pandas data-analysis massive-datasets networkx-library

Updated Oct 7, 2021
Jupyter Notebook

FedericoBruzzone / algorithms-for-massive-datasets

Star

This repository contains a LaTeX file that generates a PDF document comprising comprehensive notes for the course "Algorithms for Massive Datasets"

deep-learning algorithms recommender-system massive-datasets unimi linkanalysis

Updated Aug 12, 2024
TeX

rajeshidumalla / node2vec

Star

Building node2vec algorithm

python data-science machine-learning numpy pandas data-analysis matplotlib massive-datasets node2vec networkx-graph

Updated Oct 7, 2021
Jupyter Notebook

diem-ai / google-bigquery

Star

Series of SQL exercise working with databases, using Google BigQuery to scale to massive datasets taught by educators in Kaggle.com

python bigquery sql analytics kaggle massive-datasets

Updated Jul 9, 2019
Jupyter Notebook

Alex4gtx / Massive-Data-Handler

Star

Permite abrir e manipular arquivos massivos de texto/dados cujo seria impossivel abrir em um computador, por exemplo um arquivo de texto de +20gb, permite manipular o arquivo pegando apenas as linhas necessárias sem travar o computador por falta de memória.

big-data dictionaries python-script massive-datasets manipulacao-arquivos

Updated Feb 12, 2022
Python

datakaveri / k-anonymisation-SKALD

Star

Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.

encoding chunking ola large-dataset massive-datasets k-anonymity l-diversity t-closeness skald discernibility record-suppression predictive-tagging

Updated Jun 3, 2025
Python

manuparra / hadoop-statistics

Star

Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application

java hadoop bigdata max avg min standardeviation massive-datasets

Updated Feb 24, 2017
Java

arhcoder / Netflix-Recommendation

Star

📺 Content Recommendation System for the Netflix Prize Challenge with Collaborative Filtering.

python jupyter-notebook collaborative-filtering netflix recommendation-system recommendation-engine recommender-system massive-datasets netflix-prize massive-data

Updated Feb 17, 2024
Jupyter Notebook

StefanoBalbo / Geocoding

Star

Automated massive geolocator of addresses with parallel processing.

python docker geocoding osm geospatial geolocation ssh-server jupyter-notebook nominatim jupyterlab spatial-analysis massively-parallel geopandas osm-data geopy massive-datasets massive nominatim-docker micromamba

Updated Mar 20, 2025
Jupyter Notebook

simkarwin / mimo_keras

Star

TF-Package: Multiple-Input Multiple-Output Keras Data-Generator for massive and complex datasets

massive-datasets keras-datagenerator mimo-models

Updated Jan 2, 2023
Python

KolwaBrad / massivedataset

Star

Training the MASSIVE dataset by Amazon(english-US, German-DE and Swahili-KE)

python massive-datasets

Updated Oct 2, 2023
Python

rajeshidumalla / Wordcount-in-Spark

Star

word count in Spark

python spark python-library pandas wordcount massive-datasets

Updated Oct 6, 2021
Jupyter Notebook

SJ22032003 / massive-data-streaming-nodejs

Star

Stream, parse, manipulate and transform extremly large data ( can be 1 GB or 1TB ) in NodeJS without any process block, memory overflow or bottle neck with peak performance. And also show it in UI with the help of webStreams

stream buffers transform node-js massive-datasets advance-nodejs

Updated Jul 21, 2024
JavaScript

Improve this page

Add a description, image, and links to the massive-datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the massive-datasets topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

massive-datasets

Here are 26 public repositories matching this topic...

polardb / polardbx-sql

helmholtz-analytics / heat

polardb / polardbx

joshuaboud / gen-dataset

rajeshidumalla / Bloom-Filter

FedericoBruzzone / anti-money-laundering

gmalik9 / floating_point_data_compressor

rajeshidumalla / PageRank

FedericoBruzzone / algorithms-for-massive-datasets

rajeshidumalla / node2vec

diem-ai / google-bigquery

Alex4gtx / Massive-Data-Handler

datakaveri / k-anonymisation-SKALD

manuparra / hadoop-statistics

arhcoder / Netflix-Recommendation

StefanoBalbo / Geocoding

simkarwin / mimo_keras

KolwaBrad / massivedataset

rajeshidumalla / Wordcount-in-Spark

SJ22032003 / massive-data-streaming-nodejs

Improve this page

Add this topic to your repo