# Regression with GraphLab

Creating regression models is easy with GraphLab Create! The regression toolkit implements the following models:

* Linear regression
* Boosted Decision Trees

These algorithms differ in how they make predictions, but conform to the same API. With all models, call create() to create a model, predict() to make predictions on the returned model, and evaluate() to measure performance of the predictions. All models can incorporate:

* Numeric features
* Categorical variables
* Sparse features (i.e feature sets that have a large set of features, of which only a small subset of values are non-zero)
* Dense features (i.e feature sets with a large number of numeric features)
* Text data
* Images

### Model Selector

It isn't always clear that we know exactly which model is suitable for a given task. GraphLab Create's model selector automatically picks the right model for you based on statistics collected from the data set.

In [1]:
import graphlab as gl

# Load the data
data =  gl.SFrame('http://s3.amazonaws.com/dato-datasets/regression/yelp-data.csv')

# Make a train-test split
train_data, test_data = data.random_split(0.8)

# Automatically picks the right model based on your data.
model = gl.regression.create(train_data, target='stars',
                                    features = ['user_avg_stars',
                                                'business_avg_stars',
                                                'user_review_count',
                                                'business_review_count'])

# Save predictions to an SArray
predictions = model.predict(test_data)

# Evaluate the model and save the results into a dictionary
results = model.evaluate(test_data)

[INFO] [1;32m1451496617 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_FILE to C:\Anaconda\envs\dato-env\lib\site-packages\certifi\cacert.pem
[0m[1;32m1451496617 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_DIR to 
[0mThis trial license of GraphLab Create is assigned to aj.rader.kesj@statefarm.com and will expire on February 05, 2016. Please contact trial@dato.com for licensing options or to request a free non-commercial license for personal or academic use.

[INFO] Start server at: ipc:///tmp/graphlab_server-26908 - Server binary: C:\Anaconda\envs\dato-env\lib\site-packages\graphlab\unity_server.exe - Server log: C:\Users\kesj\AppData\Local\Temp\graphlab_server_1451496617.log.0
[INFO] GraphLab Server Version: 1.7.1


PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/regression/yelp-data.csv to C:/Users/kesj/AppData/Local/Temp/graphlab-KESJ/26908/e3fd6c55-4c83-4f2a-8917-6b4903d6a9d3.csv
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/regression/yelp-data.csv
PROGRESS: Parsing completed. Parsed 100 lines in 1.20212 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str,long,str,str,str,dict,long,long,long,list,str,str,float,float,str,long,long,float,str,str,float,str,long,str,long,long,long,dict]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Read 47948 lines. Lines per second: 28829.4
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/regression/yelp-data.csv
PROGRESS: Parsing completed. Parsed 215879 lines in 4

GraphLab create implementations are built to work with up to billions of examples and up to millions of features. GraphLab Create also provides a wrapper to Vowpal Wabbit, an open source out- of-core learning system that is also known to be fast and scalable.

## Linear Regression

GraphLab's linear regression module is used to predict a continuous target as a linear function of features. This is a two-stage process, analogous to many other GraphLab toolkits. First a model is created (or trained), using training data. Once the model is created, it can then be used to make predictions on new examples that were not seen in training (the test data). Model creation, prediction, and evaluation work will data that is contained in an SFrame. The following figure illustrates how linear regression works. Notice that the functional form learned here is a linear function (unlike the previous figure where the predicted function was non-linear).

