# Getting started with GraphLab Create

## Overview
In this tutorial, you'll get a good flavor of some of the fundamental tasks that GraphLab Create is built for. 

You will learn how to:
    * load data into SFrames
    * create a Graph data structure from these frames
    * write simple graph queries
    * apply a machine learning model from the Graph Analytics Toolkit

We also have many other toolkits to explore from including recommender systems, data matching, graph analytics and more. Explore these and the rest of Graphlab Create in our User Guide.

...oh yeah, you'll also learn that some of us at Dato have a thing for Bond...yes...James Bond...

In [2]:
import graphlab as gl
#gl.canvas.set_target('ipynb') # use IPython Notebook output for GraphLab Canvas THis works for a local browser I think

In [6]:
#gl.canvas.set_target('headless',port=9021)

Error: Requested port is unavailable: 9021


In [7]:
vertices = gl.SFrame.read_csv('/san-data/personal/kesj/public_data/dato/bond_vertices.csv')

[INFO] This trial license of GraphLab Create is assigned to aj.rader.kesj@statefarm.com and will expire on November 13, 2015. Please contact trial@dato.com for licensing options or to request a free non-commercial license for personal or academic use.

[INFO] Start server at: ipc:///tmp/graphlab_server-73855 - Server binary: /home/kesj/envs/dato-env/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1446743278.log
[INFO] GraphLab Server Version: 1.6.1


PROGRESS: Finished parsing file /san-data/personal/kesj/public_data/dato/bond_vertices.csv
PROGRESS: Parsing completed. Parsed 10 lines in 0.042332 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /san-data/personal/kesj/public_data/dato/bond_vertices.csv
PROGRESS: Parsing completed. Parsed 10 lines in 0.040732 secs.


In [8]:
edges = gl.SFrame.read_csv('/san-data/personal/kesj/public_data/dato/bond_edges.csv')

PROGRESS: Finished parsing file /san-data/personal/kesj/public_data/dato/bond_edges.csv
PROGRESS: Parsing completed. Parsed 20 lines in 0.040373 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /san-data/personal/kesj/public_data/dato/bond_edges.csv
PROGRESS: Parsing completed. Parsed 20 lines in 0.040785 secs.


## view the graph
-- note that edges.show() won't work on EAAC right yet

In [13]:
edges.head()

src,dst,relation
Wai Lin,James Bond,friend
M,James Bond,worksfor
Inga Bergstorm,James Bond,friend
Elliot Carver,James Bond,killed_by
Gotz Otto,James Bond,killed_by
James Bond,M,managed_by
Q,M,managed_by
Moneypenny,M,managed_by
Q,Moneypenny,colleague
M,Moneypenny,worksfor


## Create a Graph Object

In [9]:
#vertices.show()
g = gl.SGraph() 

## Add vertices and edges to this graph

In [10]:
g = g.add_vertices(vertices=vertices,vid_field='name')

In [12]:
g = g.add_edges(edges=edges, src_field='src',dst_field ='dst')

## DO some basic graph querying

In [14]:
g.get_vertices()
# show all the vertices

__id,gender,license_to_kill,villian
Inga Bergstorm,F,0,0
Moneypenny,F,1,0
Henry Gupta,M,0,1
Paris Carver,F,0,1
Wai Lin,F,1,0
Q,M,1,0
M,M,1,0
James Bond,M,1,0
Elliot Carver,M,0,1
Gotz Otto,M,0,1


In [15]:
# show all the edges
g.get_edges()

__src_id,__dst_id,relation
Inga Bergstorm,James Bond,friend
Moneypenny,M,managed_by
Moneypenny,Q,colleague
Henry Gupta,Elliot Carver,killed_by
Q,Moneypenny,colleague
M,Moneypenny,worksfor
James Bond,Inga Bergstorm,friend
Wai Lin,James Bond,friend
M,James Bond,worksfor
James Bond,M,managed_by


In [17]:
# get all 'friend' edges
g.get_edges(fields={'relation':'friend'})

__src_id,__dst_id,relation
Inga Bergstorm,James Bond,friend
James Bond,Inga Bergstorm,friend
Wai Lin,James Bond,friend
James Bond,Wai Lin,friend


## Apply pagerank algorithm to our graph

In [19]:
pr = gl.pagerank.create(g)

PROGRESS: Counting out degree
PROGRESS: Done counting out degree
PROGRESS: +-----------+-----------------------+
PROGRESS: | Iteration | L1 change in pagerank |
PROGRESS: +-----------+-----------------------+
PROGRESS: | 1         | 6.65833               |
PROGRESS: | 2         | 4.65611               |
PROGRESS: | 3         | 3.46298               |
PROGRESS: | 4         | 2.55686               |
PROGRESS: | 5         | 1.95422               |
PROGRESS: | 6         | 1.42139               |
PROGRESS: | 7         | 1.10464               |
PROGRESS: | 8         | 0.806704              |
PROGRESS: | 9         | 0.631771              |
PROGRESS: | 10        | 0.465388              |
PROGRESS: | 11        | 0.364898              |
PROGRESS: | 12        | 0.271257              |
PROGRESS: | 13        | 0.212255              |
PROGRESS: | 14        | 0.159062              |
PROGRESS: | 15        | 0.124071              |
PROGRESS: | 16        | 0.0935911             |
PROGRESS: | 17        |

In [20]:
pr.get('pagerank').topk(column_name='pagerank')

__id,pagerank,delta
James Bond,2.52743578524,0.0132914517076
M,1.87718696576,0.00666194771763
Moneypenny,1.18363921275,0.00143637385736
Q,1.18363921275,0.00143637385736
Inga Bergstorm,0.869872717136,0.00477951418076
Wai Lin,0.869872717136,0.00477951418076
Elliot Carver,0.634064732205,0.000113553313724
Henry Gupta,0.284762885673,1.89255522873e-05
Paris Carver,0.284762885673,1.89255522873e-05
Gotz Otto,0.284762885673,1.89255522873e-05


### We see, not unexpectedly, that James Bond is a very important person, and that bad guys aren't that popular...