# Lab 2

In this lab you will see some examples of some commonly used data wrangling tools in Python. In particular, we aim to give you some familiarity with:

* Slicing data frames
* Filtering data
* Grouped counts
* Joining two tables
* NA/Null values

## Setup

In [1]:
import pandas as pd
import numpy as np

# These lines load the tests.
!pip install -U okpy
from client.api.notebook import Notebook
ok = Notebook('lab02.ok')

Requirement already up-to-date: okpy in /Users/juntan/anaconda3/envs/datascience/lib/python3.5/site-packages
Requirement already up-to-date: requests==2.12.4 in /Users/juntan/anaconda3/envs/datascience/lib/python3.5/site-packages (from okpy)
Assignment: Lab 2
OK, version v1.9.6



The code below produces the data frames used in the examples

In [19]:
heroes = pd.DataFrame(
    data={'color': ['red', 'green', 'black', 
                    'blue', 'black', 'red'],
          'first_seen_on': ['a', 'a', 'f', 'a', 'a', 'f'],
          'first_season': [2, 1, 2, 3, 3, 1]},
    index=['flash', 'arrow', 'vibe', 
           'atom', 'canary', 'firestorm']
)

identities = pd.DataFrame(
    data={'ego': ['barry allen', 'oliver queen', 'cisco ramon',
                  'ray palmer', 'sara lance', 
                  'martin stein', 'ronnie raymond'],
          'alter-ego': ['flash', 'arrow', 'vibe', 'atom',
                        'canary', 'firestorm', 'firestorm']}
)

teams = pd.DataFrame(
    data={'team': ['flash', 'arrow', 'flash', 'legends', 
                   'flash', 'legends', 'arrow'],
          'hero': ['flash', 'arrow', 'vibe', 'atom', 
                   'killer frost', 'firestorm', 'speedy']})

## Pandas and Wrangling

For the examples that follow, we will be using a toy data set containing information about superheroes in the Arrowverse.  In the `first_seen_on` column, `a` stands for Archer and `f`, Flash.

In [20]:
heroes

Unnamed: 0,color,first_season,first_seen_on
flash,red,2,a
arrow,green,1,a
vibe,black,2,f
atom,blue,3,a
canary,black,3,a
firestorm,red,1,f


In [21]:
identities

Unnamed: 0,alter-ego,ego
0,flash,barry allen
1,arrow,oliver queen
2,vibe,cisco ramon
3,atom,ray palmer
4,canary,sara lance
5,firestorm,martin stein
6,firestorm,ronnie raymond


In [22]:
teams

Unnamed: 0,hero,team
0,flash,flash
1,arrow,arrow
2,vibe,flash
3,atom,legends
4,killer frost,flash
5,firestorm,legends
6,speedy,arrow


### Slice and Dice

#### Column selection by label
To select a column of a `DataFrame` by column label, the safest and fastest way is to use the `.loc` method. General usage looks like `frame.loc[rowname,colname]`. (Reminder that the colon `:` means "everything").  For example, if we want the `color` column of the `ex` data frame, we would use :

In [23]:
heroes.loc[:, 'color']

flash          red
arrow        green
vibe         black
atom          blue
canary       black
firestorm      red
Name: color, dtype: object

In [24]:
heroes['color']

flash          red
arrow        green
vibe         black
atom          blue
canary       black
firestorm      red
Name: color, dtype: object

Selecting multiple columns is easy.  You just need to supply a list of column names.  Here we select the `color` and `value` columns:

In [25]:
heroes.loc[:, ['color', 'first_season']]

Unnamed: 0,color,first_season
flash,red,2
arrow,green,1
vibe,black,2
atom,blue,3
canary,black,3
firestorm,red,1


While `.loc` is invaluable when writing production code, it may be a little too verbose for interactive use.  One recommended alternative is the `[]` method, which takes on the form `frame['colname']`.

In [26]:
heroes['first_seen_on']

flash        a
arrow        a
vibe         f
atom         a
canary       a
firestorm    f
Name: first_seen_on, dtype: object

#### Row Selection by Label

Similarly, if we want to select a row by its label, we can use the same `.loc` method.

In [27]:
heroes.loc[['flash', 'vibe'], :]

Unnamed: 0,color,first_season,first_seen_on
flash,red,2,a
vibe,black,2,f


If we want all the columns returned, we can, for brevity, drop the colon without issue.

In [28]:
heroes.loc[['flash', 'vibe']]

Unnamed: 0,color,first_season,first_seen_on
flash,red,2,a
vibe,black,2,f


In [29]:
heroes.loc['flash']

color            red
first_season       2
first_seen_on      a
Name: flash, dtype: object

#### General Selection by Label

More generally you can slice across both rows and columns at the same time.  For example:

In [30]:
heroes.loc['flash':'atom', :'first_seen_on']

Unnamed: 0,color,first_season,first_seen_on
flash,red,2,a
arrow,green,1,a
vibe,black,2,f
atom,blue,3,a


#### Selection by Integer Index

If you want to select rows and columns by position, the Data Frame has an analogous `.iloc` method for integer indexing. Remember that Python indexing starts at 0.

In [31]:
heroes.iloc[:4,:2]

Unnamed: 0,color,first_season
flash,red,2
arrow,green,1
vibe,black,2
atom,blue,3


### Filtering with boolean arrays
Filtering is the process of removing unwanted material.  In your quest for cleaner data, you will undoubtedly filter your data at some point: whether it be for clearing up cases with missing values, culling out fishy outliers, or analyzing subgroups of your data set.  For example, we may be interested in characters that debuted in season 3 of Archer.  Note that compound expressions have to be grouped with parentheses.

In [32]:
heroes[(heroes['first_season']==3) & (heroes['first_seen_on']=='a')].loc['atom']

color            blue
first_season        3
first_seen_on       a
Name: atom, dtype: object

#### Problem Solving Strategy
We want to highlight the strategy for filtering to answer the question above:

* **Identify the variables of interest**
    * Interested in the debut: `first_season` and `first_seen_on`
* **Translate the question into statements one with True/False answers**
    * Did the hero debut on Archer? $\rightarrow$ The hero has `first_seen_on` equal to `a`
    * Did the hero debut in season 3? $\rightarrow$ The hero has `first_season` equal to `3`
* **Translate the statements into boolean statements**
    * The hero has `first_seen_on` equal to `a` $\rightarrow$ `hero['first_seen_on']=='a'`
    * The hero has `first_season` equal to `3` $\rightarrow$ `heroes['first_season']==3`
* **Use the boolean array to filter the data**

Note that compound expressions have to be grouped with parentheses.

For your reference, some commonly used comparison operators are given below.

Symbol | Usage      | Meaning 
------ | ---------- | -------------------------------------
==   | a == b   | Does a equal b?
<=   | a <= b   | Is a less than or equal to b?
>=   | a >= b   | Is a greater than or equal to b?
<    | a < b    | Is a less than b?
&#62;    | a &#62; b    | Is a greater than b?
~    | ~p       | Returns negation of p
&#124; | p &#124; q | p OR q
&    | p & q    | p AND q
^  | p ^ q | p XOR q (exclusive or)

An often-used operation missing from the above table is a test-of-membership.  The `Series.isin(values)` method returns a boolean array denoting whether each element of `Series` is in `values`.  We can then use the array to subset our data frame. For example, if we wanted to see which rows of `heroes` had values in $\{1,3\}$, we would use:

In [33]:
heroes[heroes['first_season'].isin([1,3])]

Unnamed: 0,color,first_season,first_seen_on
arrow,green,1,a
atom,blue,3,a
canary,black,3,a
firestorm,red,1,f


Notice that in both examples above, the expression in the brackets evaluates to a boolean series.  The general strategy for filtering data frames, then, is to write an expression of the form `frame[logical statement]`.

### Counting Rows

To count the number of instances of a value in a `Series`, we can use the `value_counts` method.  Below we count the number of instances of each color.

In [34]:
heroes['color'].value_counts()

red      2
black    2
blue     1
green    1
Name: color, dtype: int64

A more sophisticated analysis might involve counting the number of instances a tuple appears.  Here we count $(color,value)$ tuples.

In [35]:
heroes.groupby(['color', 'first_season']).size()

color  first_season
black  2               1
       3               1
blue   3               1
green  1               1
red    1               1
       2               1
dtype: int64

This returns a series that has been multi-indexed.  We'll eschew this topic for now.  To get a data frame back, we'll use the `reset_index` method, which also allows us to simulataneously name the new column.

In [36]:
heroes.groupby(['color', 'first_season']).size().reset_index(name='count')

Unnamed: 0,color,first_season,count
0,black,2,1
1,black,3,1
2,blue,3,1
3,green,1,1
4,red,1,1
5,red,2,1


### Joining Tables on One Column

Suppose we have another table that classifies superheroes into their respective teams.  Note that `canary` is not in this data set and that `killer frost` and `speedy` are additions that aren't in the original `heroes` set.

For simplicity of the example, we'll convert the index of the `heroes` data frame into an explicit column called `hero`.  A careful examination of the [documentation](http://pandas.pydata.org/pandas-docs/version/0.19.1/generated/pandas.DataFrame.merge.html) will reveal that joining on a mixture of the index and columns is possible.

In [37]:
heroes['hero'] = heroes.index
heroes

Unnamed: 0,color,first_season,first_seen_on,hero
flash,red,2,a,flash
arrow,green,1,a,arrow
vibe,black,2,f,vibe
atom,blue,3,a,atom
canary,black,3,a,canary
firestorm,red,1,f,firestorm


#### Inner Join

The inner join below returns rows representing the heroes that appear in both data frames.

In [38]:
pd.merge(heroes, teams, how='inner', on='hero')

Unnamed: 0,color,first_season,first_seen_on,hero,team
0,red,2,a,flash,flash
1,green,1,a,arrow,arrow
2,black,2,f,vibe,flash
3,blue,3,a,atom,legends
4,red,1,f,firestorm,legends


#### Left and right join
The left join returns rows representing heroes in the `ex` ("left") data frame, augmented by information found in the `teams` data frame.  Its counterpart, the right join, would return heroes in the `teams` data frame.  Note that the `team` for hero `canary` is an `NaN` value, representing missing data.

In [39]:
teams

Unnamed: 0,hero,team
0,flash,flash
1,arrow,arrow
2,vibe,flash
3,atom,legends
4,killer frost,flash
5,firestorm,legends
6,speedy,arrow


In [40]:
pd.merge(heroes, teams, how='left', on='hero')

Unnamed: 0,color,first_season,first_seen_on,hero,team
0,red,2,a,flash,flash
1,green,1,a,arrow,arrow
2,black,2,f,vibe,flash
3,blue,3,a,atom,legends
4,black,3,a,canary,
5,red,1,f,firestorm,legends


#### Outer join

An outer join on `hero` will return all heroes found in both the left and right data frames.  Any missing values are filled in with `NaN`.

In [41]:
pd.merge(heroes, teams, how='outer', on='hero')

Unnamed: 0,color,first_season,first_seen_on,hero,team
0,red,2.0,a,flash,flash
1,green,1.0,a,arrow,arrow
2,black,2.0,f,vibe,flash
3,blue,3.0,a,atom,legends
4,black,3.0,a,canary,
5,red,1.0,f,firestorm,legends
6,,,,killer frost,flash
7,,,,speedy,arrow


#### More than one match?

If the values in the columns to be matched don't uniquely identify a row, then a cartesian product is formed in the merge.  For example, notice that `firestorm` has two different egos, so information from `heroes` had to be duplicated in the merge, once for each ego.

In [42]:
pd.merge(heroes, identities, how='inner', 
         left_on='hero', right_on='alter-ego')

Unnamed: 0,color,first_season,first_seen_on,hero,alter-ego,ego
0,red,2,a,flash,flash,barry allen
1,green,1,a,arrow,arrow,oliver queen
2,black,2,f,vibe,vibe,cisco ramon
3,blue,3,a,atom,atom,ray palmer
4,black,3,a,canary,canary,sara lance
5,red,1,f,firestorm,firestorm,martin stein
6,red,1,f,firestorm,firestorm,ronnie raymond


### Practice Set 1

Consider the "complete" data set shown below.  Note that the rows are indexed by the superheroes' names.

In [43]:
heroes_complete = pd.merge(heroes, identities, left_on='hero', right_on='alter-ego')
heroes_complete = pd.merge(heroes_complete, teams, how='outer', on='hero')
heroes_complete.set_index('hero', inplace=True)
heroes_complete

Unnamed: 0_level_0,color,first_season,first_seen_on,alter-ego,ego,team
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
flash,red,2.0,a,flash,barry allen,flash
arrow,green,1.0,a,arrow,oliver queen,arrow
vibe,black,2.0,f,vibe,cisco ramon,flash
atom,blue,3.0,a,atom,ray palmer,legends
canary,black,3.0,a,canary,sara lance,
firestorm,red,1.0,f,firestorm,martin stein,legends
firestorm,red,1.0,f,firestorm,ronnie raymond,legends
killer frost,,,,,,flash
speedy,,,,,,arrow


Without running the following commands, can you guess the output?  State what is wrong with the ones that will produce errors and propose a fix.


In [44]:
heroes_complete.loc["flash"]

color                    red
first_season               2
first_seen_on              a
alter-ego              flash
ego              barry allen
team                   flash
Name: flash, dtype: object

In [45]:
heroes_complete.iloc[0, ]

color                    red
first_season               2
first_seen_on              a
alter-ego              flash
ego              barry allen
team                   flash
Name: flash, dtype: object

In [46]:
heroes_complete.loc[:,'first_seen_on':'team']

Unnamed: 0_level_0,first_seen_on,alter-ego,ego,team
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
flash,a,flash,barry allen,flash
arrow,a,arrow,oliver queen,arrow
vibe,f,vibe,cisco ramon,flash
atom,a,atom,ray palmer,legends
canary,a,canary,sara lance,
firestorm,f,firestorm,martin stein,legends
firestorm,f,firestorm,ronnie raymond,legends
killer frost,,,,flash
speedy,,,,arrow


In [47]:
heroes_complete.iloc[1:3, :]

Unnamed: 0_level_0,color,first_season,first_seen_on,alter-ego,ego,team
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
arrow,green,1.0,a,arrow,oliver queen,arrow
vibe,black,2.0,f,vibe,cisco ramon,flash


In [48]:
heroes_complete.iloc[1, 1]

1.0

In [49]:
heroes_complete[heroes_complete['color'].isin(['red', 'black'])]

Unnamed: 0_level_0,color,first_season,first_seen_on,alter-ego,ego,team
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
flash,red,2.0,a,flash,barry allen,flash
vibe,black,2.0,f,vibe,cisco ramon,flash
canary,black,3.0,a,canary,sara lance,
firestorm,red,1.0,f,firestorm,martin stein,legends
firestorm,red,1.0,f,firestorm,ronnie raymond,legends


In [50]:
heroes_complete.iloc[1, 1]

1.0

In [51]:
heroes_complete[heroes_complete['first_season'] % 2 == 0]

Unnamed: 0_level_0,color,first_season,first_seen_on,alter-ego,ego,team
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
flash,red,2.0,a,flash,barry allen,flash
vibe,black,2.0,f,vibe,cisco ramon,flash


In [52]:
heroes_complete[heroes_complete['color'].isnull()]

Unnamed: 0_level_0,color,first_season,first_seen_on,alter-ego,ego,team
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
killer frost,,,,,,flash
speedy,,,,,,arrow


Can you propose a fix to any of the broken ones above?

### Practice Set 2
The practice problems below use the department of transportation's "On-Time" flight data for all flights originating from SFO or OAK in January 2016.  Information about the variables can be found in the `readme.html` file.  Information about the airports and airlines are contained in the comma-delimited files `airports.dat` and `airlines.dat`, respectively.  Both were sourced from http://openflights.org/data.html

Disclaimer: There is a more direct way of dealing with time data that is not presented in these problems.  This activity is merely an academic exercise.

In [2]:
flights = pd.read_csv("flights.dat", dtype={'sched_dep_time': 'f8', 'sched_arr_time': 'f8'})
flights.head()

Unnamed: 0,year,month,day,date,carrier,tailnum,flight,origin,destination,sched_dep_time,actual_dep_time,sched_arr_time,actual_arr_time
0,2016,1,1,2016-01-01,AA,N3FLAA,208,SFO,MIA,630.0,628.0,1458.0,1431.0
1,2016,1,2,2016-01-02,AA,N3APAA,208,SFO,MIA,600.0,553.0,1428.0,1401.0
2,2016,1,3,2016-01-03,AA,N3DNAA,208,SFO,MIA,630.0,626.0,1458.0,1431.0
3,2016,1,4,2016-01-04,AA,N3FGAA,208,SFO,MIA,630.0,626.0,1458.0,1444.0
4,2016,1,5,2016-01-05,AA,N3KUAA,208,SFO,MIA,640.0,632.0,1458.0,1439.0


In [3]:
airports_cols = [
    'openflights_id',
    'name',
    'city',
    'country',
    'iata',
    'icao',
    'latitude',
    'longitude',
    'altitude',
    'tz',
    'dst',
    'tz_olson',
    'type',
    'airport_dsource'
]

airports = pd.read_csv("airports.dat", names=airports_cols)
airports.head()

Unnamed: 0,openflights_id,name,city,country,iata,icao,latitude,longitude,altitude,tz,dst,tz_olson,type,airport_dsource
0,1,Goroka,Goroka,Papua New Guinea,GKA,AYGA,-6.081689,145.391881,5282,10.0,U,Pacific/Port_Moresby,,
1,2,Madang,Madang,Papua New Guinea,MAG,AYMD,-5.207083,145.7887,20,10.0,U,Pacific/Port_Moresby,,
2,3,Mount Hagen,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.826789,144.295861,5388,10.0,U,Pacific/Port_Moresby,,
3,4,Nadzab,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569828,146.726242,239,10.0,U,Pacific/Port_Moresby,,
4,5,Port Moresby Jacksons Intl,Port Moresby,Papua New Guinea,POM,AYPY,-9.443383,147.22005,146,10.0,U,Pacific/Port_Moresby,,


#### Question 1
It looks like the departure and arrival were read in a floating-point numbers.  Write two functions, `extract_hour` and `extract_mins` that converts military time to hours and minutes, respectively. Hint: You may want to use modular arithmetic and integer division.

In [4]:
def extract_hour(time):
    """
    Extracts hour information from military time
    
    Args: 
        time (float64): array of time given in military format.  
          Takes on values in 0.0-2359.0 due to float64 representation.
    
    Returns:
        array (float64): array of input dimension with hour information.  
          Should only take on integer values in 0-23
    """
    day = []
    day = time //100
    return np.array(day)

In [5]:
def extract_mins(time):
    """
    Extracts minute information from military time
    
    Args: 
        time (float64): array of time given in military format.  
          Takes on values in 0.0-2359.0 due to float64 representation.
    
    Returns:
        array (float64): array of input dimension with hour information.  
          Should only take on integer values in 0-59
    """
    mins = []
    mins = time % 100
    return np.array(mins)

#### Question 2

Using your two functions above, filter the `flights` data for flights that departed 15 or more minutes later than scheduled.  You need not worry about flights that were delayed to the next day for this question.

In [6]:
def convert_to_minofday(time):
    """
    Converts military time to minute of day
    
    Args:
        time (float64): array of time given in military format.  
          Takes on values in 0.0-2359.0 due to float64 representation.
    
    Returns:
        array (float64): array of input dimension with minute of day
    """
    minofday = []
    hour = extract_hour(time)
    mins = extract_mins(time)
    
    minofday = hour *  60 + mins

    return np.array(minofday)

def calc_time_diff(x, y):
    """
    Calculates delay times y - x
    
    Args:
        x (float64): array of scheduled time given in military format.  
          Takes on values in 0.0-2359.0 due to float64 representation.
        y (float64): array of same dimensions giving actual time
    
    Returns:
        array (float64): array of input dimension with delay time
    """
    delay = []
    scheduled = convert_to_minofday(x)
    actual = convert_to_minofday(y)
    
    delay = actual - scheduled
    return np.array(delay)
    

In [7]:
type(flights['sched_arr_time'])

pandas.core.series.Series

In [8]:
delay = calc_time_diff(flights['sched_arr_time'], flights['actual_arr_time'])
delayed15 = flights[delay >= 15]['date']

  from ipykernel import kernelapp as app


#### Question 3

Using your answer from question 2, find the full name of every destination city with a flight from SFO or OAK that was delayed by 15 or more minutes.  The airport codes used in `flights` are IATA codes.  Sort the cities alphabetically.

In [9]:
delayed_airports = flights[delay >= 15]
delayed_destinations = delayed_airports[(delayed_airports['origin'] == 'SFO') | (delayed_airports['origin'] == 'OAK')]['destination']

  if __name__ == '__main__':


In [10]:
delayed_destinations.head()

11    MIA
12    MIA
14    MIA
15    MIA
19    MIA
Name: destination, dtype: object

In [11]:
delayed_destinations = delayed_destinations.reset_index("destionation", drop = True)
#sorted(delayed_destinations)
delayed_destinations.sort()


  app.launch_new_instance()


In [12]:
delayed_destinations.head()

1347    ABQ
2649    ABQ
1344    ABQ
1343    ABQ
1348    ABQ
Name: destination, dtype: object

In [14]:
#delayed_dest_info = pd.merge(delayed_destinations, airports, how='left', left_on='destination', right_on='iata')

#### Question 4

Find the tail number of the top ten planes, measured by number of destinations the plane flew to in January.  You may find `drop_duplicates` and `sort_values` helpful.

In [25]:
Jan_flights = flights[flights['month'] == 1].drop_duplicates(subset=['tailnum', 'destination'])\
              .groupby('tailnum')
test = Jan_flights.size()
test = test.sort_values(ascending=False)
test


tailnum
N912SW    21
N948SW    19
N927SW    19
N472CA    18
N924SW    18
N824AS    17
N979SW    17
N498CA    17
N957SW    17
N134SY    17
N471CA    17
N938SW    17
N967SW    16
N679SA    16
N937SW    16
N920SW    16
N961SW    16
N103SY    16
N903SW    15
N970SW    15
N909SW    15
N479CA    15
N952SW    15
N122SY    15
N932EV    15
N925SW    15
N988CA    15
N705SK    15
N983CA    15
N110SY    15
          ..
N3HNAA     1
N3HLAA     1
N3KSAA     1
N818NW     1
N3MYAA     1
N3LDAA     1
N3MWAA     1
N3MVAA     1
N3MRAA     1
N3MMAA     1
N3MKAA     1
N3MHAA     1
N3MGAA     1
N3MFAA     1
N3MEAA     1
N3MDAA     1
N3MCAA     1
N3MBAA     1
N3LWAA     1
N3LVAA     1
N3LUAA     1
N3LTAA     1
N3LSAA     1
N3LRAA     1
N3LPAA     1
N3LNAA     1
N3LLAA     1
N3LGAA     1
N3LEAA     1
N996AT     1
dtype: int64

In [26]:
top10 = test[:10]

#### Challenge
Add a new column to `airports` called `sfo_arr_delay_avg` that contains information about the average delay time in January from SFO.

In [None]:
airports = 

Let's take a look at our non-null results.  Do any of the delay values catch your eye?

...

## Submission
Run the cell below to submit the lab.  You may resubmit as many times you want.  We will be grading you on effort/completion.

In [27]:
I_totally_did_everything=True


In [28]:
_ = ok.grade('qcompleted')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02.ipynb'.
Backup... 100% complete
Backup successful for user: jtan0325@berkeley.edu
URL: https://okpy.org/cal/data100/sp17/lab02/backups/bkyOoY
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



In [29]:
_ = ok.submit()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... 



Could not save your notebook. Make sure your notebook is saved before sending it to OK!
Submit... 100% complete
Submission successful for user: jtan0325@berkeley.edu
URL: https://okpy.org/cal/data100/sp17/lab02/submissions/boD6ON

