Odo migrates between many formats. These include
in-memory structures like
also data outside of Python like CSV/JSON/HDF5 files, SQL databases,
data on remote machines, and the Hadoop File System.
odo takes two arguments, a source and a target for a data transfer.
>>> from odo import odo >>> odo(source, target) # load source into target
It efficiently migrates data from the source to the target.
The target and source can take on the following forms
|Object||Object||An instance of a
So the following lines would be valid inputs to
>>> odo(df, list) # create new list from Pandas DataFrame >>> odo(df, ) # append onto existing list >>> odo(df, 'myfile.json') # Dump dataframe to line-delimited JSON >>> odo('myfiles.*.csv', Iterator) # Stream through many CSV files >>> odo(df, 'postgresql://hostname::tablename') # Migrate dataframe to Postgres >>> odo('myfile.*.csv', 'postgresql://hostname::tablename') # Load CSVs to Postgres >>> odo('postgresql://hostname::tablename', 'myfile.json') # Dump Postgres to JSON >>> odo('mongodb://hostname/db::collection', pd.DataFrame) # Dump Mongo to DataFrame
If the target in
odo(source, target) already exists, it must be of a type that
supports in-place append.
>>> odo('myfile.csv', df) # this will raise TypeError because DataFrame is not appendable
Odo is dependent on external libraries for many of its conversions. Since most users will only use a small subset of conversions, Odo does not install most external libraries.
If you try to use a supported conversion and that conversion is not installed, you may get the following error:
NotImplementedError: Unable to parse uri to data resource...
To install various subsystems of odo you can use extra install targets like:
pip install odo[postgres] pip install odo[bcolz] ...
There are a lot of these, but two special extras targets are
odo[all] will install all of the subsystems.
install the versions of packages we used to run the full test suite for the
release. This can be helpful if you are seeing an issue that you suspect may be
due to an incomptatible library version.
To convert data any pair of formats
odo.odo relies on a network of
pairwise conversions. We visualize that network below
Each node represents a data format. Each directed edge represents a function
to transform data between two formats. A single call to
traverse multiple edges and multiple intermediate formats. Red nodes
support larger-than-memory data.
A single call to
odo may traverse several intermediate formats calling on
several conversion functions. These functions are chosen because they are
fast, often far faster than converting through a central serialization format.