Skip to content

Lift special-purpose data into common tabular formats for analytics 💪

License

Notifications You must be signed in to change notification settings

childmindresearch/elbow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💪 Elbow

Build codecov Code style: black MIT License

Elbow is a lightweight and scalable library for getting diverse data out of specialized formats and into common tabular data formats for downstream analytics.

Example

Extract image metadata and pixel values from all JPEG image files under the current directory and save as a Parquet dataset.

import numpy as np
import pandas as pd
from PIL import Image

from elbow.builders import build_parquet

def extract_image(path: str):
    img = Image.open(path)
    width, height = img.size
    pixel_values = np.asarray(img)
    return {
        "path": path,
        "width": width,
        "height": height,
        "pixel_values": pixel_values,
    }

build_parquet(
    source="**/*.jpg",
    extract=extract_image,
    output="images.pqds/",
    workers=8,
)

df = pd.read_parquet("images.pqds")

For a complete example, see here.

Installation

pip install elbow

The current development version can be installed with

pip install git+https://github.com/childmindresearch/elbow.git

Related projects

There are many other high quality projects for extracting, loading, and transforming data. Some alternative projects focused on somewhat different use cases are:

Contributing

We welcome contributions of any kind! If you'd like to contribute, please feel free to start a conversation in our issues.