# Demonstration of intake-omnisci

This is an [intake](https://intake.readthedocs.io/en/latest/) plugin for OmniSci databases.
It allows the user to specify data sources via human-readable YAML catalogs,
and then transparently load them and begin analyzing data.

In [3]:
import intake
catalog = intake.open_catalog('catalog.yml')

### Inspecting the catalog

We can interactively inspect the items in the catalog:

In [4]:
list(catalog)

['flights', 'faults', 'metis']

We can also display the individual catalog items to get more information:

In [5]:
catalog.flights

name: flights
container: dataframe
plugin: ['omnisci']
description: 
direct_access: forbid
user_parameters: []
metadata: 
args: 
  uri: mapd://mapd:HyperInteractive@metis.mapd.com:443/mapd?protocol=https
  sql_expr: SELECT * from flights_donotmodify LIMIT 10

With the catalog loaded, we can read full datasets into memory.

In [6]:
fault_df = catalog.faults.read()

In [7]:
fault_df.head()

Unnamed: 0,FNODE,TNODE,LPOLY,RPOLY,FAULT_LENGTH,FAULTL,FAULTL_ID,CODE,FAULT_TYPE,omnisci_geo
0,0.0,0.0,0.0,0.0,5713.214781,442.0,525.0,12,Fault,LINESTRING (-118.686621933564 35.2590779891513...
1,0.0,0.0,0.0,0.0,20781.796087,58.0,141.0,16,Low Angle Detachment Fault,"LINESTRING (-118.494910960316 49.002895983403,..."
2,0.0,0.0,0.0,0.0,20683.814036,314.0,397.0,12,Fault,LINESTRING (-120.627722936043 42.6887369913462...
3,0.0,0.0,0.0,0.0,93338.233235,474.0,557.0,12,Fault,"LINESTRING (-121.10708892397 39.9928879784387,..."
4,0.0,0.0,0.0,0.0,12083.065249,443.0,526.0,12,Fault,LINESTRING (-118.742213984366 35.2338819882059...


### Catalog source

This package also includes an intake source that itself provides a catalog.
This is used to generate a data source for each table in a database:

In [8]:
tables = catalog.metis
list(tables)

['flights_donotmodify',
 'contributions_donotmodify',
 'tweets_nov_feb',
 'zipcodes_orig',
 'zipcodes',
 'demo_vote_clean',
 'us_faults',
 'zipcodes_2017',
 'us_county_level_tiger_edges_2018',
 'ca_roads_tiger',
 'input_node',
 'uk_wells']

### Lazy evaluation of expressions

Loading a table into memory is fine for smaller datasets, but it doesn't scale well up to larger datasets.
We would like to be able to build queries from an intake source that allows them to execute lazily.

In order to accomplish this, we have provided functionality to get an ibis expresison from a source:

In [9]:
ibis_expr = tables.ca_roads_tiger.to_ibis()
ibis_expr.head().execute()

Unnamed: 0,STATEFP,COUNTYFP,TLID,TFIDL,TFIDR,MTFCC,FULLNAME,SMID,LFROMADD,LTOADD,...,TTYP,DECKEDROAD,ARTPATH,PERSIST,GCSEFLG,OFFSETL,OFFSETR,TNIDF,TNIDT,omnisci_geo
0,6,29,109445989,262692478,262692478,S1400,Gable Ct,4301.0,2201.0,2499.0,...,,N,N,,N,N,N,7130864,7130860,"LINESTRING (-118.20082895374 34.8529319906854,..."
1,6,29,109480778,219966282,219966280,H3010,,4302.0,,,...,,,N,I,N,N,N,7153085,7117434,LINESTRING (-117.870766873411 35.2383489977747...
2,6,83,108691363,219314797,259286589,S1200,State Rte 166,,,,...,,N,N,,N,N,N,5045328,5081796,"LINESTRING (-120.122753838134 35.096795919862,..."
3,6,31,197977284,262355768,250759875,S1400,King Ave,1217.0,,,...,,N,N,,N,N,N,907879,909947,LINESTRING (-119.558610934559 36.0871509304676...
4,6,85,614046794,261694532,261694532,S1750,,7095.0,,,...,,,N,,N,N,N,407470980,407470981,LINESTRING (-121.580519937715 37.0705499253564...
