**Question:** Could you explain how to read the pandas documentation?

[pandas API reference](http://pandas.pydata.org/pandas-docs/stable/api.html)

**Question:** What is the difference between **`ufo.isnull()`** and **`pd.isnull(ufo)`**?

In [2]:
import pandas as pd

# read a dataset of UFO reports into a DataFrame
ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [3]:
# use 'isnull' as a top-level function
pd.isnull(ufo).head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,False,True,False,False,False
1,False,True,False,False,False
2,False,True,False,False,False
3,False,True,False,False,False
4,False,True,False,False,False


In [4]:
# equivalent: use 'isnull' as a DataFrame method
ufo.isnull().head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,False,True,False,False,False
1,False,True,False,False,False
2,False,True,False,False,False
3,False,True,False,False,False
4,False,True,False,False,False


Documentation for [**`isnull`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html)

**Question:** Why are DataFrame slices inclusive when using **`.loc`**, but exclusive when using **`.iloc`**?

In [5]:
# label-based slicing is inclusive of the start and stop
ufo.loc[0:4, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [6]:
# position-based slicing is inclusive of the start and exclusive of the stop
ufo.iloc[0:4, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00


Documentation for [**`loc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) and [**`iloc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)

In [9]:
# 'iloc' is simply following NumPy's slicing convention...
print(ufo.values)
print("\n\n")
print(type(ufo.values))
print("\n\n")
print(ufo.values[0:4, :])

[['Ithaca' nan 'TRIANGLE' 'NY' '6/1/1930 22:00']
 ['Willingboro' nan 'OTHER' 'NJ' '6/30/1930 20:00']
 ['Holyoke' nan 'OVAL' 'CO' '2/15/1931 14:00']
 ..., 
 ['Eagle River' nan nan 'WI' '12/31/2000 23:45']
 ['Eagle River' 'RED' 'LIGHT' 'WI' '12/31/2000 23:45']
 ['Ybor' nan 'OVAL' 'FL' '12/31/2000 23:59']]



<class 'numpy.ndarray'>



[['Ithaca' nan 'TRIANGLE' 'NY' '6/1/1930 22:00']
 ['Willingboro' nan 'OTHER' 'NJ' '6/30/1930 20:00']
 ['Holyoke' nan 'OVAL' 'CO' '2/15/1931 14:00']
 ['Abilene' nan 'DISK' 'KS' '6/1/1931 13:00']]


In [14]:
# ...and NumPy is simply following Python's slicing convention
print('python'[0:4])
print(list(range(0,4)))


pyth
[0, 1, 2, 3]


In [15]:
# 'loc' is inclusive of the stopping label because you don't necessarily know what label will come after it
ufo.loc[0:4, 'City':'State']

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY


**Question:** How do I randomly sample rows from a DataFrame?

In [56]:
# sample 3 rows from the DataFrame without replacement (new in pandas 0.16.1)
ufo.sample(n=3)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
9756,Bismarck,ORANGE,,ND,11/24/1996 19:30
17260,Middletown,,OTHER,NY,8/20/2000 22:30
7148,Xenia,,LIGHT,OH,7/21/1993 23:30


Documentation for [**`sample`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html)

In [57]:
# use the 'random_state' parameter for reproducibility
ufo.sample(n=3, random_state=42)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
217,Norridgewock,,DISK,ME,9/15/1952 14:00
12282,Ipava,,TRIANGLE,IL,10/1/1998 21:15
17933,Ellinwood,,FIREBALL,KS,11/13/2000 22:00


In [61]:
# sample 75% of the DataFrame's rows without replacement
train = ufo.sample(frac=0.75, random_state=99)
train

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
6250,Sunnyvale,,OTHER,CA,12/16/1989 0:00
8656,Corpus Christi,,,TX,9/13/1995 0:10
2729,Mentor,,DISK,OH,8/8/1974 10:00
7348,Wilson,,LIGHT,WI,6/1/1994 1:00
12637,Lowell,,CIRCLE,MA,11/26/1998 10:00
2094,Victorville,,LIGHT,CA,6/6/1971 21:00
15905,Black Canyon City,BLUE,CIRCLE,AZ,2/16/2000 4:45
6792,Houston,,CHEVRON,TX,6/10/1992 23:00
5063,Ely,,DIAMOND,MN,6/15/1984 19:00
16626,Atlantic Ocean,,,NC,6/17/2000 0:35


In [68]:
print(ufo.index)
print(train.index)

RangeIndex(start=0, stop=18241, step=1)
Int64Index([ 6250,  8656,  2729,  7348, 12637,  2094, 15905,  6792,  5063,
            16626,
            ...
             7254,  3622,  8241, 13133,  7598,  8965,  4991,  2740, 11887,
             9809],
           dtype='int64', length=13681)


In [71]:
# store the remaining 25% of the rows in another DataFrame
test = ufo.loc[~ufo.index.isin(train.index), :]
test

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00
5,Valley City,,DISK,ND,9/15/1934 15:30
8,Eklutna,,CIGAR,AK,10/15/1936 17:00
11,Waterloo,,FIREBALL,AL,6/1/1939 20:00
13,Keokuk,,OVAL,IA,7/7/1939 2:00
17,Hapeville,,,GA,6/1/1942 22:30
32,Ft. Lee,,CIGAR,VA,1/1/1945 12:00
35,Winston-Salem,,DISK,NC,6/7/1945 7:00
36,Portsmouth,RED,FORMATION,VA,7/10/1945 1:30
43,Alice,,DISK,TX,3/15/1946 15:30


Documentation for [**`isin`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.isin.html)

In [72]:
# END