# Import GraphLab Create to be able to use it

GraphLab Create software (free with registration): https://turi.com/learn/coursera/
<br>User guide: https://turi.com/learn/userguide/
<br>Documentation: https://turi.com/products/create/docs/

*Note: This specialization uses GraphLab Create, since it is a goal of the specialization to have students learn core ML concepts, not how to use a specific software package. GraphLab Create contains all the necessary tools for the course. With most existing packages (e.g. scikit-learn, Pandas), the student will have to install a combination of packages to get the tools that they need.*

In [4]:
import graphlab

# Load a tabular dataset

In [5]:
sf = graphlab.SFrame("people-example.csv")

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


# SFrame basics

In [7]:
sf  # view table

First Name,Last Name,Country,age
Bob,Smith,United States,24
Alice,Williams,Canada,23
Malcolm,Jone,England,22
Felix,Brown,USA,23
Alex,Cooper,Poland,23
Tod,Campbell,United States,22
Derek,Ward,Switzerland,25


In [11]:
sf.head(4)  # view first n rows

First Name,Last Name,Country,age
Bob,Smith,United States,24
Alice,Williams,Canada,23
Malcolm,Jone,England,22
Felix,Brown,USA,23


In [10]:
sf.tail(3)  # view last n rows

First Name,Last Name,Country,age
Alex,Cooper,Poland,23
Tod,Campbell,United States,22
Derek,Ward,Switzerland,25


In [14]:
sf.show()  # visualize table with Canvas

Canvas is accessible via web browser at the URL: http://localhost:52777/index.html
Opening Canvas in default web browser.


In [18]:
graphlab.canvas.set_target("ipynb")  # set this notebook as target for visualizations

In [19]:
sf.show()

In [22]:
sf["age"].show(view="Categorical")  # show only the age column using a specific visualization

# Inspect columns of dataset

In [23]:
sf["Country"]  # print column "Country"

dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [25]:
sf["age"]

dtype: int
Rows: 7
[24, 23, 22, 23, 23, 22, 25]

In [27]:
sf["age"].mean()  # compute mean age

23.14285714285714

In [28]:
sf["age"].max()  # compute max age

25

# Create new columns in the SFrame

Feature engineering — taking columns and transforming them, or creating new ones

In [31]:
# Create a new column "Full Name" by concatenating two other columns 
sf["Full Name"] = sf["First Name"] + " " + sf["Last Name"]
sf

First Name,Last Name,Country,age,Full Name
Bob,Smith,United States,24,Bob Smith
Alice,Williams,Canada,23,Alice Williams
Malcolm,Jone,England,22,Malcolm Jone
Felix,Brown,USA,23,Felix Brown
Alex,Cooper,Poland,23,Alex Cooper
Tod,Campbell,United States,22,Tod Campbell
Derek,Ward,Switzerland,25,Derek Ward


In [34]:
sf["age"] * sf["age"]  # square every age

dtype: int
Rows: 7
[576, 529, 484, 529, 529, 484, 625]

# Use the apply function to do an advanced data transformation

In [38]:
# Change passed country to "United States" if it is "USA", otherwise return unchanged
def transform_country(country):
    if country == "USA":
        return "United States"
    else: 
        return country
    
# To change all instances of "USA" to "United States" in the "Country" column, you could use a for loop to iterate 
# through every row yourself. Or you could use the apply function, which takes a function and applies it to every row. 
sf["Country"] = sf["Country"].apply(transform_country)
sf

First Name,Last Name,Country,age,Full Name
Bob,Smith,United States,24,Bob Smith
Alice,Williams,Canada,23,Alice Williams
Malcolm,Jone,England,22,Malcolm Jone
Felix,Brown,United States,23,Felix Brown
Alex,Cooper,Poland,23,Alex Cooper
Tod,Campbell,United States,22,Tod Campbell
Derek,Ward,Switzerland,25,Derek Ward
