#

pyarrow

Here are 54 public repositories matching this topic...

vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

visualization python data-science machine-learning bigdata tabular-data hdf5 machinelearning dataframe memory-mapped-file pyarrow

Updated Oct 8, 2024
Python

ibis-project / ibis

the portable Python dataframe library

mysql python bigquery sql database clickhouse sqlite impala postgresql snowflake pandas pyspark mssql trino pyarrow datafusion duckdb polars

Updated Oct 31, 2024
Python

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

machine-learning deep-learning tensorflow pytorch pyspark parquet parquet-files sysml pyarrow

Updated Dec 2, 2023
Python

narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

pandas dask ibis vaex pyarrow modin cudf duckdb polars

Updated Oct 31, 2024
Python

wheretrue / biobear

Work with bioinformatic files using Arrow, Polars, and/or DuckDB

python bioinformatics biology arrow biopython samtools pyarrow rust-bio duckdb polars

Updated Oct 22, 2024
Rust

dacort / faker-cli

Command-line interface to quickly generate fake CSV and JSON data

aws json csv parquet faker-provider pyarrow deltalake

Updated Jul 11, 2024
Python

chicago-crimes

RandomFractals / chicago-crimes

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.

julia parquet jupyter-notebooks chicago pyarrow crimes duckdb polars large-csv malloy malloydata

Updated Jan 29, 2023
Jupyter Notebook

icaropires / pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

python pdf distributed-systems data-science ocr pandas-dataframe parallel distributed-computing tesseract python3 tesseract-ocr parquet ray pdftotext pytesseract pdf2image pyarrow pytesseract-ocr

Updated Sep 20, 2020
Python

ismailhammounou / db2ixf

db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.

Updated Mar 16, 2024
Python

zen-xu / pyarrow-stubs

Type annotations for pyarrow

Updated Oct 29, 2024
Python

milesgranger / flaco

(PoC) A very memory-efficient way to read data from PostgreSQL

python rust arrow postgresql pyarrow

Updated Oct 28, 2022
Rust

kraina-ai / overturemaestro

An open-source tool for reading OvertureMaps data with multiprocessing and additional Quality-of-Life features

python open-source openstreetmap geo geospatial pyarrow overturemaps overture-maps

Updated Oct 31, 2024
Python

vipinc007 / ParquetViewer

A web application for viewing Apache Parquet files . This is a Python + Flask application

pandas python3 flask-application parquet-files parquet-viewer pyarrow

Updated Apr 17, 2018
HTML

legout / pydala

Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb

datalake pyarrow duckdb

Updated Jul 14, 2023
Python

SaelKimberly / rxls

Reading both XLSX and XLSB files, fast and memory-safe, with Python, into PyArrow

Updated Feb 6, 2024
Jupyter Notebook

DanielAvdar / pandas-pyarrow

Seamlessly switch Pandas DataFrame backend to PyArrow.

python backend arrow pandas-dataframe pandas pyarrow pandas-tricks-for-data-manipulation dtypes db-dtypes pandas-pyarrow pandas-arrow

Updated Oct 30, 2024
Python

mercator-labs / oakstore

highspeed timeseries pandas dataframe database

python finance data-science machine-learning database big-data timeseries deep-learning pandas dataset parquet deeplearning dask datawarehouse pyarrow

Updated Oct 28, 2024
Python

asierra01 / pyarrow_to_db2

ibm_db extension to load a pyarrow table to db2

python3 db2 pyarrow luw

Updated Nov 25, 2019
C

jaysnm / dremio-arrow

Dremio Arrow Flight Client

python r pandas dataframe dremio pyarrow dremio-arrow

Updated Mar 20, 2024
Python

lykmapipo / Python-Spark-Log-Analysis

Python scripts to process, and analyze log files using PySpark.

Updated Jul 13, 2024
Python

Improve this page

Add a description, image, and links to the pyarrow topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyarrow topic, visit your repo's landing page and select "manage topics."