# 23. More of your pandas questions answered! 

**Question:** Could you explain how to read the pandas documentation?

[pandas API reference](http://pandas.pydata.org/pandas-docs/stable/api.html)

**Question:** What is the difference between **`ufo.isnull()`** and **`pd.isnull(ufo)`**?

In [1]:
import pandas as pd

In [2]:
# read a dataset of UFO reports into a DataFrame
ufo = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/ufo.csv')
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [3]:
# use 'isnull' as a top-level function
pd.isnull(ufo).head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,False,True,False,False,False
1,False,True,False,False,False
2,False,True,False,False,False
3,False,True,False,False,False
4,False,True,False,False,False


In [4]:
# equivalent: use 'isnull' as a DataFrame method
ufo.isnull().head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,False,True,False,False,False
1,False,True,False,False,False
2,False,True,False,False,False
3,False,True,False,False,False
4,False,True,False,False,False


Documentation for [**`isnull`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html)

**Question:** Why are DataFrame slices inclusive when using **`.loc`**, but exclusive when using **`.iloc`**?

In [5]:
# label-based slicing is inclusive of the start and stop
ufo.loc[0:4, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [6]:
# position-based slicing is inclusive of the start and exclusive of the stop
ufo.iloc[0:4, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00


Documentation for [**`loc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html) and [**`iloc`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html)

In [7]:
# 'iloc' is simply following NumPy's slicing convention...
ufo.values[0:4, :]

array([['Ithaca', nan, 'TRIANGLE', 'NY', '6/1/1930 22:00'],
       ['Willingboro', nan, 'OTHER', 'NJ', '6/30/1930 20:00'],
       ['Holyoke', nan, 'OVAL', 'CO', '2/15/1931 14:00'],
       ['Abilene', nan, 'DISK', 'KS', '6/1/1931 13:00']], dtype=object)

In [8]:
# ...and NumPy is simply following Python's slicing convention
'python'[0:4]

'pyth'

In [9]:
# 'loc' is inclusive of the stopping label because you don't necessarily know what label will come after it
ufo.loc[0:4, 'City':'State']

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY


**Question:** How do I randomly sample rows from a DataFrame?

In [10]:
# sample 3 rows from the DataFrame without replacement (new in pandas 0.16.1)
ufo.sample(n=3)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
12338,Elk Grove,BLUE,FIREBALL,CA,10/11/1998 22:45
6021,Montgomery,,,IL,1/29/1989 23:00
3158,Houston,,DISK,TX,3/24/1976 21:00


Documentation for [**`sample`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html)

In [11]:
# use the 'random_state' parameter for reproducibility
ufo.sample(n=3, random_state=42)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
217,Norridgewock,,DISK,ME,9/15/1952 14:00
12282,Ipava,,TRIANGLE,IL,10/1/1998 21:15
17933,Ellinwood,,FIREBALL,KS,11/13/2000 22:00


In [12]:
# sample 75% of the DataFrame's rows without replacement
train = ufo.sample(frac=0.75, random_state=99)

In [13]:
# store the remaining 25% of the rows in another DataFrame
test = ufo.loc[~ufo.index.isin(train.index), :]

Documentation for [**`isin`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.isin.html)

