Skip to content

TensorFrames user guide

Timothy Hunter edited this page Mar 16, 2016 · 24 revisions

TensorFrames user guide

TensorFrames (TensorFlow on Spark Dataframes) lets you manipulate Spark's DataFrames with TensorFlow programs.

This package is highly experimental and is provided as a technical preview only.

Officially supported Spark versions: 1.6+

This user guide helps you run some simple examples with the python and scala interface.

Core API

The core API provides some primitives to express transformation of DataFrames using TensorFlow programs. These programs can be written in python using the official TensorFlow API, in Scala using TensorFrames, or directly by passing a protocol buffer description of the operations graph.

The most simple way of using TensorFrames in a python program is to import TensorFlow and TensorFrames into PySpark:

import tensorflow as tf
import tensorframes as tfs

Additionally, this guide will make use of the following imports from PySpark:

from pyspark.sql import Row
from pyspark.sql.functions import *

Basic concepts

At its core, TensorFlow expresses operations on tensors: homogeneous data structures that consist in an array and a shape. They can be interpreted as a generalization of vectors and matrices:

  • a scalar (real or integer) is a tensor of dimension zero,
  • a vector is a tensor of one dimension,
  • a matrix is a tensor of two dimensions and so on.

TensorFlow programs can be executed

Mapping

Reducing

Aggregation

Using a different version of TensorFlow

Clone this wiki locally