preprocessing

Here are 1,381 public repositories matching this topic...

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

nlp machine-learning information-retrieval ocr deep-learning chatbot orchestration preprocessing pdf-to-text data-pipelines document-parser rag document-understanding table-structure-recognition llm llmops retrieval-augmented-generation

Updated Jun 2, 2024
Python

Unstructured-IO / unstructured

Star

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated Jun 1, 2024
HTML

harveybc / preprocessor

Star

Data pre-processing with modular components for: normalizer/standarizer, unbiaser, trimmer and feature selector.

machine-learning reinforcement-learning timeseries deep-learning preprocessor openai-gym regression feature-selection dataset feature-extraction classification standardization preprocessing spectral-analysis trims

Updated Jun 1, 2024
Python

ALebrun-108 / BoxSERS

Star

Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).

python machine-learning deep-learning pca-analysis preprocessing unsupervised-learning cnn-keras data-augmentation chemometrics sers vibrational-spectroscopy raman-spectroscopy baseline-correction

Updated May 31, 2024
Jupyter Notebook

MaemoonFarooq / Amazon-Dataset-Mining

Star

The Frequent Dataset Mining project offers a comprehensive solution for mining frequent itemsets from the extensive Amazon dataset using Apache Kafka. Leveraging the power of distributed computing, this project employs two powerful algorithms, Apriori and PCY, to efficiently process and analyze large volumes of data.

kafka python3 kafka-consumer kafka-producer frequent-itemset-mining preprocessing bash-script mongodb-atlas dataextraction

Updated May 31, 2024
Python

adbar / courlan

Star

Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters

url crawler uri domain rate-limiting tld url-parsing cleaner preprocessing url-validation webcrawling

Updated May 31, 2024
Python

raj-sutariya / indic-num2words

Star

Python library for converting numbers to words for all Indian Languages.

python nlp preprocessing speech-processing indic indian-languages

Updated May 30, 2024
Python

pauhidalgoo / top-streaming-songs-modeling

Star

This repository contains the PMAAD course project from the Artificial Intelligence Degree at Universitat Politècnica de Catalunya. It models and analyzes Spotify's top 40 weekly streamed songs (2017-2021) using R. Techniques include clustering, textual analysis, and geospatial analysis to uncover music trends and characteristics.

r statistics clustering geospatial preprocessing textual-analysis

Updated May 30, 2024
HTML

JAdelhelm / Automated-Anomaly-Detection-Preprocessing-Pipeline

Star

This automated anomaly detection preprocessing pipeline can be used to automatically preprocess tabular data for anomaly detection methods.

python machine-learning sklearn anomaly preprocessing automated anomalydetection data-quality automated-machine-learning anomaly-detection pyod preprocessing-data preprocessing-pipeline

Updated May 30, 2024
Python

pawlyk / dsml-tools

Star

set of Data Science and Machine Learning tools

data-science machine-learning ml plotting preprocessing ds

Updated May 29, 2024
Python

keurfonluu / toughio

Star

Pre- and post-processing Python library for TOUGH

io preprocessing postprocessing tough2 tough3

Updated May 29, 2024
Python

vbyrao / DataMiningAlgooos

Star

Breast Cancer Data Analysis: Analyzes and classifies breast cancer data using a Naive Bayes classifier with preprocessing, label encoding, and k-fold cross-validation. Cars Dataset Analysis: Explores a cars dataset with data loading, statistics, and visualizations, including price distribution and correlation heatmap. Hayes-Roth Classification: C

analysis cnn matplotlib convolutional-neural-networks preprocessing decision-tree-classifier naive-bayes-classification k-fold-cross-validation skit-learn

Updated May 28, 2024
Jupyter Notebook

francoisschwarzentruber / abcd

Star

A simple ASCII format to represent music scores, and a music score editor

music markdown midi ascii music-composition lilypond music-notation abc preprocessing music-score abcjs simple-app music-notation-format

Updated May 28, 2024
JavaScript

mlr-org / mlr3pipelines

Sponsor

Star

Dataflow Programming for Machine Learning in R

data-science machine-learning r pipelines ensemble-learning dataflow-programming r-package preprocessing stacking bagging mlr3

Updated May 30, 2024
R

karsterr / DevSalary

Star

Software Developer Salaries Analysis and Forecast

python data-science linear-regression preprocessing datasciene

Updated May 28, 2024
Python

marcpinet / handigits

Star

🖐️ Background-independent deep learning model for hand sign digit recognition

recognition deep-learning tensorflow keras preprocessing cv2 hand mediapipe

Updated May 27, 2024
Python

LukaNedimovic / table_editor

Star

A simple table data editor, with easily scalable functions and operations & a nice GUI

java formula parser data-science data spring parsing tokenizer preprocessing

Updated May 27, 2024
Java

SubbulakshmiSN / Retail_Store_Sales_Forecast

Star

Retail_Store_Sales -- the task is to use past sales data to predict future sales, with particular attention to the impact of promotional events and major holidays, which are given extra significance in the evaluation.

python numpy pandas preprocessing regression-models

Updated May 27, 2024
Jupyter Notebook

viniciusds2020 / ml_pycaret_classificacao

Star

Sistema de preprocessamento e treinamento de modelos de machine learning utilizando PyCaret. Uma metodologia low-code para processos de MLops

python machine-learning scikit-learn preprocessing mlops pycaret

Updated May 27, 2024

aishwaryamensinkai / Data-Mining-and-Analysis

Star

Predicting response time of the Paris Fire Brigade Vehicles

data-mining preprocessing

Updated May 27, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocessing

Here are 1,381 public repositories matching this topic...

infiniflow / ragflow

Unstructured-IO / unstructured

harveybc / preprocessor

ALebrun-108 / BoxSERS

MaemoonFarooq / Amazon-Dataset-Mining

adbar / courlan

raj-sutariya / indic-num2words

pauhidalgoo / top-streaming-songs-modeling

JAdelhelm / Automated-Anomaly-Detection-Preprocessing-Pipeline

pawlyk / dsml-tools

keurfonluu / toughio

vbyrao / DataMiningAlgooos

francoisschwarzentruber / abcd

mlr-org / mlr3pipelines

karsterr / DevSalary

marcpinet / handigits

LukaNedimovic / table_editor

SubbulakshmiSN / Retail_Store_Sales_Forecast

viniciusds2020 / ml_pycaret_classificacao

aishwaryamensinkai / Data-Mining-and-Analysis

Improve this page

Add this topic to your repo