# How do I select multiple rows and columns from a pandas DataFrame?

🐼 Tuto on pandas by Data School - Exercice performed by Dorian.H Mekni 🥇 | Wed 09 Dec 2020

In [1]:
import pandas as pd

In [2]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [8]:
ufo.shape

(18241, 5)

In [9]:
ufo.describe()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
count,18216,2882,15597,18241,18241
unique,6476,27,27,52,16145
top,Seattle,RED,LIGHT,CA,11/16/1999 19:00
freq,187,780,2803,2529,27


In [7]:
ufo.head(3)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00



⭐️ loc method is for filtering rows and selecting columns by label. Its format is loc[ ]. The first parameter allows to select rows, and the second is for columns. 


In [10]:
ufo.loc[0, :]

City                       Ithaca
Colors Reported               NaN
Shape Reported           TRIANGLE
State                          NY
Time               6/1/1930 22:00
Name: 0, dtype: object


☝🏻 The colon means all columns. As such, we get all columns of the first row as indicated. 



⭐️ Now, let's select all columns of several rows : 


In [11]:
ufo.loc[[0, 1, 2], :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00



➕ Or using another way such as : 
    

In [12]:
ufo.loc[0:2, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00



✅ The exact same result is obtained by using the list format. 

🧐 loc is inclusive on both sides when using this notation. 



⭐️ The same selection process works when selecting multiple columns : 
    

In [14]:
ufo.loc[:, ['City', 'State']]

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY
...,...,...
18236,Grant Park,IL
18237,Spirit Lake,IA
18238,Eagle River,WI
18239,Eagle River,WI



☝🏻 In the exemple above, we have selected two distinct columns. 



⭐️ Let's now select a range of columns using the loc method : 


In [25]:
ufo.loc[:, 'City' : 'State']

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY
...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL
18237,Spirit Lake,,DISK,IA
18238,Eagle River,,,WI
18239,Eagle River,RED,LIGHT,WI



🧐 It is useful to mention that we can combine both the rows and columns selection : 


In [26]:
ufo.loc[0:2, 'City' : 'State']

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO



➕ Based on previous notebooks, we can perform the same task, achieving the same result using a different chain of methods : 
    

In [27]:
ufo.head(3).drop('Time', axis=1)

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO



✅ Both ways are valid. They simply offer a different workflow. loc is more powerful and flexible so I'll personally favour that way. 



⭐️ Now let's look at another loc method but using boolean conditions : 


In [29]:
ufo.loc[ufo.City=='Oakland', :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
1694,Oakland,,CIGAR,CA,7/21/1968 14:00
2144,Oakland,,DISK,CA,8/19/1971 0:00
4686,Oakland,,LIGHT,MD,6/1/1982 0:00
7293,Oakland,,LIGHT,CA,3/28/1994 17:00
8488,Oakland,,,CA,8/10/1995 21:45
8768,Oakland,,,CA,10/10/1995 22:40
10816,Oakland,,LIGHT,OR,10/1/1997 21:30
10948,Oakland,,DISK,CA,11/14/1997 19:55
11045,Oakland,,TRIANGLE,CA,12/10/1997 1:30
12322,Oakland,,FIREBALL,CA,10/9/1998 19:40



✅ This method is explicit and allows us to specify a particular column we might want to extract information about. 


In [30]:
ufo.loc[ufo.City=='Oakland', 'State']

1694     CA
2144     CA
4686     MD
7293     CA
8488     CA
8768     CA
10816    OR
10948    CA
11045    CA
12322    CA
12941    CA
16803    MD
17322    CA
Name: State, dtype: object


🧐 Another way to perfom this task but in a less explicit fashion is by using a chain indexing :
    

In [32]:
ufo[ufo.City=='Oakland'].State

1694     CA
2144     CA
4686     MD
7293     CA
8488     CA
8768     CA
10816    OR
10948    CA
11045    CA
12322    CA
12941    CA
16803    MD
17322    CA
Name: State, dtype: object


⭐️ iloc method is another way to perform data selection : 
    

In [34]:
ufo.columns 

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

In [33]:
ufo.iloc[:, 0:4]

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY
...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL
18237,Spirit Lake,,DISK,IA
18238,Eagle River,,,WI
18239,Eagle River,RED,LIGHT,WI



☝🏻 With iloc method, the selection is exclusive of the second number and inclusive of the first number, whereas with loc(), it is inclusive on both sides. 



🧐 It works as with the range method : 
    

In [36]:
list(range(0,4))

[0, 1, 2, 3]


⭐️ Another demonstration : 
    

In [37]:
ufo.iloc[0:3, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00



🧐 loc() for labels, and iloc() for integer position. 


-------------------------------------------------------------


😃 In short,   
                
                .loc for label based indexing 

                            or
                            
                .iloc for positional indexing



🙏🏻 Thank you ! 

👋🏻 See you in the next one !
