fileslicer

fileslicer is a lightweight Python library for efficiently reading and splitting large files using memory mapping. It allows you to iterate over lines within a file slice and split files into chunks without loading the entire file into memory, making it ideal for processing very large files.

Features

Memory-efficient line iteration using mmap.
Split large files into chunks while respecting newline boundaries.
Simple and Pythonic API.
Works with files of arbitrary size.

Installation

Install via pip:

pip install fileslicer

Usage

Basic Example: Iterate over a file

from fileslicer import FileSlice

# Create a FileSlice for an entire file
slice = FileSlice.from_file("large_file.txt")

# Iterate over lines in the slice
for line in slice.iter_lines():
    print(line.decode().strip())

Split a File into Chunks

from fileslicer import FileSlice

# Split a file into 4 chunks
chunks = FileSlice.split_file("large_file.txt", splits=4)

for chunk in chunks:
    print(f"Processing bytes {chunk.start_offset}-{chunk.end_offset}")
    for line in chunk.iter_lines():
        print(line.decode().strip())

Create a Custom File Slice

from fileslicer import FileSlice

# Only read bytes 1000 to 5000
slice = FileSlice("large_file.txt", 1000, 5000)

for line in slice.iter_lines():
    print(line.decode().strip())

API

`FileSlice`

FileSlice(file_path: str, start_offset: int, end_offset: int): Represents a slice of a file.
iter_lines() -> Generator[bytes]: Iterate over lines in the file slice as bytes.
@staticmethod from_file(file_path: str) -> FileSlice: Create a FileSlice covering the entire file.
@staticmethod split_file(file_path: str, splits: int) -> list[FileSlice]: Split a file into multiple slices, aligned to newline boundaries.

Why Use fileslicer?

Processing extremely large files with standard file reading can be slow and memory-intensive. fileslicer uses memory mapping to efficiently slice and iterate over file data without reading everything into memory. Inspired by the "1 Billion Row Challenge" in Python, it is perfect for data processing pipelines, log analysis, and ETL tasks.

License

fileslicer is distributed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
src/fileslicer		src/fileslicer
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
taplo.toml		taplo.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fileslicer

Features

Installation

Usage

Basic Example: Iterate over a file

Split a File into Chunks

Create a Custom File Slice

API

`FileSlice`

Why Use fileslicer?

License

About

Uh oh!

Releases 1

Packages

Languages

License

FlavioAmurrioCS/fileslicer

Folders and files

Latest commit

History

Repository files navigation

fileslicer

Features

Installation

Usage

Basic Example: Iterate over a file

Split a File into Chunks

Create a Custom File Slice

API

FileSlice

Why Use fileslicer?

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

`FileSlice`

Packages