Skip to content
forked from towhee-io/towhee

Open source platform for embedding tasks

License

Notifications You must be signed in to change notification settings

derekdqc/towhee

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

https://towhee.io

X2Vec, Towhee is all you need!

Slack License Language Github Actions Coverage

What is Towhee?

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data. Built on top of PyTorch and Tensorflow (coming soon™), Towhee provides a unified framework for running machine learning pipelines locally, on a multi-GPU/TPU/FPGA machine (coming soon™), or in the cloud (coming soon™). Towhee aims to make democratize machine learning, allowing everyone - from beginner developers to AI/ML research groups to large organizations - to train and deploy machine learning models.

Key features

  • Easy embedding for everyone: Transform your data into vectors with less than five lines of code.

  • Standardized pipeline: Keep your pipeline interface consistent across projects and teams.

  • Rich operators and models: No more reinventing the wheel! Collaborate and share models with the open source community.

  • Support for fine-tuning models: Feed your dataset into our trainer and get a new model in just a few easy steps.

Getting started

Towhee can be installed as follows:

% pip install -U pip
% pip cache purge
% pip install towhee

Towhee provides pre-built computer vision models which can be used to generate embeddings:

>>> from towhee import pipeline
>>> from PIL import Image

# Use our in-built embedding pipeline
>>> img = Image.open('towhee_logo.png')
>>> embedding_pipeline = pipeline('image-embedding')
>>> embedding = embedding_pipeline(img)

Your image embedding is now stored in embedding. It's that simple.

Custom machine learning pipelines can be defined in a YAML file and uploaded to the Towhee hub (coming soon™). Pipelines which already exist in the local Towhee cache (/$HOME/.towhee/pipelines) will be automatically loaded:

# This will load the pipeline defined at $HOME/.towhee/pipelines/fzliu/resnet50_embedding.yaml
>>> embedding_pipeline = pipeline('fzliu/resnet50_embedding')
>>> embedding = embedding_pipeline(img)

Dive deeper

Towhee architecture

  • Pipeline: A Pipeline is a single machine learning task that is composed of several operators. Operators are connected together internally via a directed acyclic graph.

  • Operator: An Operator is a single node within a pipeline. It contains files (e.g. code, configs, models, etc...) and works for reusable operations (e.g., preprocessing an image, inference with a pretrained model).

  • Engine: The Engine sits at Towhee's core, and drives communication between individual operators, acquires and schedules tasks, and maintains CPU/GPU/FPGA/etc executors.

Design concepts

  • Flexible: A Towhee pipeline can be created to implement any machine learning task you can think of.

  • Extensible: Individual operators within each pipeline can be reconfigured and reused in different pipelines. A pipeline can be deployed anywhere you want - on your local machine, on a server with 4 GPUs, or in the cloud (coming soon™)

  • Convenient: Operators can be defined as a single function; new pipelines can be constructed by looking at input and output annotations for those functions. Towhee provides a high-level interface for creating new graphs by stringing together functions in Python code.

About

Open source platform for embedding tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%