A high-performance pandas-like DataFrame library powered by Polars
Combine the familiar pandas API with Polars' blazing-fast performance
- πΌ Pandas-like API - Use familiar pandas syntax without learning a new library
- β‘ Polars Backend - Leverage Polars' optimized engine for maximum performance
- π Lazy Evaluation - Optimize queries with lazy operations before execution
- π Comprehensive I/O - Read/write CSV, Parquet, JSON, and Excel files
- π― Automatic Fallback - Seamless fallback to pandas for unimplemented methods
- π§ Type Safety - Support for pandas-like type casting and schema inference
nitro-pandas bridges the gap between pandas' user-friendly API and Polars' exceptional performance. If you're familiar with pandas but need better performance, nitro-pandas is the perfect solution.
Benchmarked on the Books Rating dataset (~3M rows, 10 columns). All times are wall-clock seconds on a single machine.
| Operation | nitro-pandas | pandas | Polars | vs pandas | vs Polars |
|---|---|---|---|---|---|
| Read CSV | 4.56s | 13.54s | 1.09s | 3.0x faster | 0.24x |
| GroupBy + Count | 0.038s | 0.150s | 0.036s | 3.9x faster | ~same |
| Chained Ops (filter+groupby+sort) | 0.049s | 0.089s | 0.014s | 1.8x faster | 0.29x |
| GroupBy Multi-Column | 0.156s | 0.224s | 0.074s | 1.4x faster | 0.48x |
| Sort Values | 0.082s | 0.178s | 0.082s | 2.2x faster | ~same |
| Double Filter + GroupBy | 0.021s | 0.061s | 0.011s | 2.9x faster | 0.53x |
| Value Counts | 0.010s | 0.007s | 0.010s | ~same | ~same |
| Multi Aggregations (mean/min/max) | 0.114s | 0.170s | 0.039s | 1.5x faster | 0.35x |
| Nunique (count distinct) | 0.098s | 0.503s | 0.080s | 5.1x faster | 0.81x |
| Drop Duplicates | 0.189s | 0.531s | 0.223s | 2.8x faster | 1.2x faster |
| Column Arithmetic | 0.010s | 0.002s | 0.003s | 0.19x | 0.28x |
| Fill Null Values | 0.011s | 0.005s | 0.003s | 0.42x | 0.29x |
| String Contains Filter | 0.635s | 0.574s | 0.022s | ~same | 0.03x |
| Describe (summary stats) | 0.088s | 0.074s | 0.014s | ~same | 0.16x |
| Select + Rename Columns | 0.001s | 0.035s | 0.001s | 47.8x faster | ~same |
| TOTAL | 6.06s | 16.14s | 1.70s | 2.7x faster | 0.28x |
| Operation | nitro-pandas | pandas | Polars | vs pandas | vs Polars |
|---|---|---|---|---|---|
| nlargest (top-N rows) | 0.059s | 0.141s | 0.179s | 2.4x faster | 3.1x faster |
| sample (random sampling) | 0.035s | 0.048s | 0.033s | 1.4x faster | ~same |
| pivot_table (group aggregation) | 0.009s | 0.028s | 0.007s | 3.0x faster | ~same |
| Operation | nitro-pandas | pandas | vs pandas |
|---|---|---|---|
| median | 0.042s | 0.034s | ~same |
| std | 0.034s | 0.028s | ~same |
| corr | 0.020s | 0.015s | ~same |
| apply | 0.024s | 0.019s | ~same |
| cumsum | 0.023s | 0.014s | ~same |
Summary: nitro-pandas is faster than pandas in 10/15 core tests with an overall 2.7x speedup on the total benchmark. Operations implemented natively (groupby, sort, filter, nunique, nlargest, pivot_table) see the biggest gains. Fallback operations (median, std, corr, apply, cumsum) carry minimal overhead (~20%) over raw pandas.
Results may vary based on data size and hardware.
# Using uv (recommended)
uv add nitro-pandas
# Using pip
pip install nitro-pandas- Python 3.11+
- Dependencies (automatically installed):
polars>=1.30.0- High-performance DataFrame enginepandas>=2.2.3- For fallback methodsfastexcel>=0.7.0- Fast Excel readingopenpyxl>=3.1.5- Excel file supportpyarrow>=20.0.0- Parquet file support
import nitro_pandas as npd
# Create a DataFrame (pandas-like syntax)
df = npd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['Paris', 'London', 'New York']
})
# Access columns (returns pandas Series for compatibility)
ages = df['age']
print(ages > 30) # Boolean Series
# Filter data
filtered = df.loc[df['age'] > 30]
print(filtered)# Read CSV
df = npd.read_csv('data.csv')
# Read with lazy evaluation (optimized for large files)
lf = npd.read_csv_lazy('large_data.csv')
df = lf.query('id > 1000').collect()
# Read other formats
df_parquet = npd.read_parquet('data.parquet')
df_excel = npd.read_excel('data.xlsx')
df_json = npd.read_json('data.json')# GroupBy operations (pandas-like syntax, Polars backend)
result = df.groupby('city')['age'].mean()
print(result)
# Multi-column groupby
result = df.groupby(['city', 'category'])['value'].sum()
# Aggregations with dictionaries
result = df.groupby('category').agg({
'value': 'mean',
'count': 'sum'
})
# Sorting and filtering
df_sorted = df.sort_values('age', ascending=False)
df_filtered = df.query("age > 25 and city == 'Paris'")# Write to various formats
df.to_csv('output.csv')
df.to_parquet('output.parquet')
df.to_json('output.json')
df.to_excel('output.xlsx')# From dictionary
df = npd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
# From Polars DataFrame
df = npd.DataFrame(pl.DataFrame({'a': [1, 2, 3]}))
# Empty DataFrame
df = npd.DataFrame()# Column selection
df['column_name'] # Returns pandas Series
df[['col1', 'col2']] # Returns DataFrame
# Boolean filtering
df[df['age'] > 30] # Returns DataFrame
# Label-based indexing
df.loc[df['age'] > 30, 'name'] # Returns Series
df.loc[0:5, ['name', 'age']] # Returns DataFrame
# Position-based indexing
df.iloc[0:5, 0:2] # Returns DataFrame# Type casting (pandas-like types)
df = df.astype({'id': 'int64', 'name': 'str'})
# Rename columns
df = df.rename(columns={'old_name': 'new_name'})
# Drop rows/columns
df = df.drop(labels=[0, 1], axis=0) # Drop rows
df = df.drop(labels=['col1'], axis=1) # Drop columns
# Fill null values
df = df.fillna({'column': 0})
# Sort values
df = df.sort_values('age', ascending=False)# Eager reading
df = npd.read_csv('file.csv',
sep=',',
usecols=['col1', 'col2'],
dtype={'id': 'int64'})
# Lazy reading
lf = npd.read_csv_lazy('file.csv', n_rows=1000)
df = lf.collect()# Eager reading
df = npd.read_parquet('file.parquet',
columns=['col1', 'col2'],
n_rows=1000)
# Lazy reading
lf = npd.read_parquet_lazy('file.parquet')
df = lf.collect()# Eager reading
df = npd.read_excel('file.xlsx',
sheet_name=0,
usecols=['col1', 'col2'],
nrows=1000)
# Lazy reading
lf = npd.read_excel_lazy('file.xlsx', sheet_name='Sheet1')
df = lf.collect()# Eager reading
df = npd.read_json('file.json',
dtype={'id': 'int64'},
n_rows=1000)
# Lazy reading
lf = npd.read_json_lazy('file.json', lines=True)
df = lf.collect()# Create lazy frame
lf = npd.read_csv_lazy('large_file.csv')
# Chain operations (optimized before execution)
result = (lf
.query('age > 30')
.groupby('city')
.agg({'value': 'mean'}))
# Execute query
df = result.collect()
# Sort after collection if needed
df = df.sort_values('value', ascending=False)Migrating from pandas to nitro-pandas is straightforward:
# Before (pandas)
import pandas as pd
df = pd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()
# After (nitro-pandas)
import nitro_pandas as npd
df = npd.read_csv('data.csv')
result = df.groupby('category')['value'].mean()Most pandas operations work the same way! The main differences:
- Single column selection (
df['col']) returns a pandas Series (not a nitro-pandas Series) to maintain compatibility with pandas expressions and boolean indexing - Comparison operations (
df > 2) return pandas DataFrames for boolean indexing compatibility - Unimplemented methods: Automatic fallback to pandas is available at both the DataFrame instance level and the package level:
Note: Methods that only exist on DataFrame instances (like
# β Works: fallback on DataFrame instance df = npd.DataFrame({'a': [1, 2, 3]}) result = df.describe() # Falls back to pandas DataFrame method # β Works: fallback at package level import pandas as pd df_pd = pd.DataFrame({'a': [1, 2, 1], 'b': ['x', 'y', 'x']}) result = npd.get_dummies(df_pd) # Falls back to pandas module function result = npd.date_range('2024-01-01', periods=5) # Falls back to pandas
describe()) are only available via DataFrame instances, not at the package level. - Mixed types in columns: Unlike pandas, Polars (and thus nitro-pandas) does not allow mixed types within a single column. Each column must have a consistent type. If your pandas DataFrame has mixed types in a column, Polars will coerce them to a common type (usually
object/string) or raise an error.# β This works in pandas but NOT in Polars/nitro-pandas pd.DataFrame({'col': [1, 'text', 3.5]}) # Mixed int, str, float # β Polars will coerce to string or raise error npd.DataFrame({'col': [1, 'text', 3.5]}) # All values become strings
- No
inplaceparameter: Polars operations are always immutable (return new DataFrames), so nitro-pandas does not support theinplace=Trueparameter found in pandas. All operations return new DataFrame objects.# β This works in pandas but NOT in nitro-pandas df.drop(columns=['col'], inplace=True) # inplace not supported # β Always assign the result df = df.drop(labels=['col'], axis=1) # Returns new DataFrame
nitro-pandas/
βββ nitro_pandas/
β βββ __init__.py # Package initialization
β βββ dataframe.py # DataFrame implementation
β βββ lazyframe.py # LazyFrame implementation
β βββ io/
β βββ __init__.py # IO module exports
β βββ csv.py # CSV I/O
β βββ parquet.py # Parquet I/O
β βββ json.py # JSON I/O
β βββ excel.py # Excel I/O
βββ tests/
β βββ test_dataframe.py # DataFrame tests
β βββ test_groupby.py # GroupBy tests
β βββ test_io.py # I/O tests
β βββ helpers.py # Test utilities
βββ pyproject.toml # Project configuration
βββ README.md # This file
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
# Clone repository
git clone https://github.com/yourusername/nitro-pandas.git
cd nitro-pandas
# Install development dependencies
uv sync --dev
# Run tests
uv run python tests/test_runner.pyThis project is licensed under the MIT License - see the LICENSE file for details.
The MIT License is a permissive open-source license that allows anyone to:
- β Use the software for any purpose (commercial or personal)
- β Modify the software
- β Distribute the software
- β Sublicense the software
In short: Everyone can use it freely!
- Polars - For the high-performance DataFrame engine
- pandas - For the API inspiration and fallback support
For questions, suggestions, or support, please open an issue on GitHub.
Made with β€οΈ for the Python data science community
β Star this repo if you find it useful!
