# FireDucks: A Modern Alternative to Pandas

## Introduction
FireDucks is a powerful alternative to pandas, designed to be a drop-in 
replacement that can seamlessly integrate with your existing pandas code.
This tutorial will guide you through installing FireDucks and
understanding its key usage patterns.

## System Requirements
### Prerequisites
- Python (version >3.8, <=3.12)
- Linux environment (x86_64 architecture) or Windows with WSL

> **Note:** Currently, FireDucks is only available for Linux (manylinux) on x86_64 architecture.

#

## FireDucks Advantages

#### ✨ Key Benefits:

1. **Massive Speedup**
   - Dramatically faster data processing
   - Optimized execution model

2. **100% Compatibility with Existing Pandas Code**
   - Works with all pandas operations
   - No need to learn new syntax

3. **Zero Code Change**
   - Direct replacement for pandas
   - No refactoring needed

4. **Effortless / Super Easy to Use**
   - Simple pip installation
   - Immediate integration

#

## Installation
    pip install fireducks

#

## Two Ways to Use FireDucks

### 1. Import Hook (For Existing Projects)
Perfect for existing pandas projects - no code changes needed! FireDucks will automatically replace all pandas imports with its own implementation.

#### For Terminal/Python Scripts:

    python3 -m fireducks.pandas your_script.py

#### For Jupyter Notebook:
    %load_ext fireducks.pandas
    import pandas as pd    

>  **Note:** use import hooks for existing program as your program may include many Python scripts which import pandas internally

### 2. Explicit Import (For New Projects)
For new projects, you can directly import FireDucks instead of pandas. This is the most straightforward approach when starting fresh.

    import fireducks.pandas as pd

#

## Execution Model: Lazy vs Eager
### Pandas (Eager Execution)
Pandas executes operations immediately when they are called:
```python
df = pd.read_csv("data.csv")      # Reads file immediately
df = df.sort_values("a")          # Sorts immediately
df.to_csv("sorted.csv")           # Writes immediately

In [None]:
# Eager Execution (pandas)
import pandas as pd
import time

# Timing for read_csv
t0 = time.time()
df = pd.read_csv("https://raw.githubusercontent.com/roualdes/data/master/carnivora.csv")
t1 = time.time()
print(f"Read CSV took: {t1-t0:.3f} seconds")

# Timing for sort_values
t0 = time.time()
df = df.sort_values(by="Order")
t1 = time.time()
print(f"Sort took: {t1-t0:.3f} seconds")

# Timing for to_csv
t0 = time.time()
df.to_csv("sorted_carnivora.csv")
t1 = time.time()
print(f"Write CSV took: {t1-t0:.3f} seconds")

### FireDucks (Lazy Execution)
FireDucks delays operations until results are actually needed:
```python
df = pd.read_csv("data.csv")      # Just plans to read(creates intermediate language)
df = df.sort_values("a")          # Just plans to sort
df.to_csv("sorted.csv")           # NOW executes everything at once!

In [2]:
# Lazy Execution (FireDucks)
import fireducks.pandas as pd
import time

# Timing for read_csv (just planning)
t0 = time.time()
df = pd.read_csv("https://raw.githubusercontent.com/roualdes/data/master/carnivora.csv")
t1 = time.time()
print(f"Plan read CSV took: {t1-t0:.3f} seconds")

# Timing for sort_values (just planning)
t0 = time.time()
df = df.sort_values(by="Order")
t1 = time.time()
print(f"Plan sort took: {t1-t0:.3f} seconds")

# Timing for to_csv (actual execution of all operations)
t0 = time.time()
df.to_csv("sorted_carnivora.csv")
t1 = time.time()
print(f"Execute all operations took: {t1-t0:.3f} seconds")

Plan read CSV took: 0.152 seconds
Plan sort took: 0.002 seconds
Execute all operations took: 0.161 seconds


## How FireDucks Achieves Better Performance

FireDucks uses two main mechanisms for acceleration:

### 1. Compiler Optimization
FireDucks uses a smart compiler that optimizes your code before running it:
```python
# Original code (what you write)
selected = df[df["a"] > 10]["b"]

# Optimized code (what FireDucks actually runs)
tmp = df[["a", "b"]]
selected = tmp[df["a"] > 10]["b"]

The Compiler:

### 1. Converts Python Code to IR (Intermediate Language)
FireDucks converts your Python DataFrame operations into a specialized intermediate language designed for optimal performance.

### 2. Automatic DataFrame Optimization
The compiler intelligently analyzes your DataFrame operations and:
- Optimizes column selections
- Minimizes memory usage
- Reduces redundant operations

### 3. Expert-Level Optimizations
Automatically applies optimizations that typically require deep DataFrame knowledge:
- Reorders operations for efficiency
- Uses column-oriented processing
- Optimizes data access patterns

### 4. Consistent Results
While making all these optimizations, FireDucks ensures:
- Same output as regular pandas
- No change in data accuracy
- Maintains data integrity


### 2. Multithreading

FireDucks has a powerful backend system that:

#### High-Performance Data Structure
* Uses Apache Arrow for efficient data handling
* Optimized for modern CPU architectures
* Column-oriented storage for faster operations

#### Parallel Processing
* Leverages multiple CPU cores simultaneously
* Automatically splits work across threads
* Processes large datasets more efficiently

#### Hardware Flexibility 
* Configurable for different hardware setups
* Optimizes for available CPU cores
* Future support for GPU acceleration

#### Automatic Optimization
* Parallelizes operations without user intervention
* Finds the most efficient execution path
* Eliminates redundant operations

> **Key Advantage:** Think of FireDucks like a chess grandmaster - while pandas makes each move immediately, FireDucks plans the entire game ahead of time. This planning allows it to find the most efficient way to execute your data operations, just like how a chess player can think several moves ahead to find the best strategy!