# Selecting Rows and Columns in pandas

### 1. Reading Data
We first import `pandas` and load a table into a DataFrame.

In [1]:
import pandas as pd

df = pd.read_csv('population.csv', index_col=0)

### 2. Attributes and Methods

`.shape` is an *attribute*. It can be used with any dataset using a dot. It shows the number of rows and columns in a DataFrame as a Python *tuple*:

In [2]:
df.shape

(275, 81)

`.head()` is a *method*. It can be called on any DataFrame object by the dot, followed by parentheses.
It returns the first N rows of the DataFrame.

In [3]:
df.head(3)

Unnamed: 0_level_0,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
Total population,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Abkhazia,,,,,,,,,,,...,,,,,,,,,,
Afghanistan,3280000.0,3280000.0,3323519.0,3448982.0,3625022.0,3810047.0,3973968.0,4169690.0,4419695.0,4710171.0,...,25183615.0,25877544.0,26528741.0,27207291.0,27962207.0,28809167.0,29726803.0,30682500.0,31627506.0,32526562.0
Akrotiri and Dhekelia,,,,,,,,,,,...,15700.0,15700.0,15700.0,,,,,,,


### 3. Selecting Rows and Columns
Match the Python commands with the descriptions below.

* remove rows with missing values
* select a single row
* inspect column labels
* select multiple columns
* select rows by position
* select rows that match a condition
* select multiple rows
* select a single column
* select values in a given range
* select rows and columns by position
* inspect row labels
* select rows and columns

Create new Markdown cells in the notebook to have a heading for each command.

In [5]:
df.index #inspect column labels

Index(['Abkhazia', 'Afghanistan', 'Akrotiri and Dhekelia', 'Albania',
       'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla',
       'Antigua and Barbuda',
       ...
       'British Indian Ocean Territory', 'Clipperton',
       'French Southern and Antarctic Lands', 'Gaza Strip',
       'Heard and McDonald Islands', 'Northern Marianas',
       'South Georgia and the South Sandwich Islands',
       'US Minor Outlying Islands', 'Virgin Islands', 'West Bank'],
      dtype='object', name='Total population', length=275)

In [6]:
df.columns #inspect column labels

Index(['1800', '1810', '1820', '1830', '1840', '1850', '1860', '1870', '1880',
       '1890', '1900', '1910', '1920', '1930', '1940', '1950', '1951', '1952',
       '1953', '1954', '1955', '1956', '1957', '1958', '1959', '1960', '1961',
       '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970',
       '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979',
       '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988',
       '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997',
       '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006',
       '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015'],
      dtype='object')

In [7]:
df['2015'] #select a single column

Total population
Abkhazia                                               NaN
Afghanistan                                     32526562.0
Akrotiri and Dhekelia                                  NaN
Albania                                          2896679.0
Algeria                                         39666519.0
                                                   ...    
Northern Marianas                                      NaN
South Georgia and the South Sandwich Islands           NaN
US Minor Outlying Islands                              NaN
Virgin Islands                                         NaN
West Bank                                              NaN
Name: 2015, Length: 275, dtype: float64

In [None]:
df[['1900', '1950', '2000']] #select multiple columns

In [8]:
df.loc['Estonia'] #select a single row

1800     334136.0
1810     334136.0
1820     342427.0
1830     366799.0
1840     402035.0
          ...    
2011    1328068.0
2012    1324040.0
2013    1320050.0
2014    1316203.0
2015    1312558.0
Name: Estonia, Length: 81, dtype: float64

In [None]:
df.loc[['Japan', 'China', 'Brazil']] #select multiple rows

In [None]:
df.loc['Croatia', '2000'] #select rows and columns

In [None]:
df.iloc[10:15] #select rows by position

In [None]:
df.iloc[10:15, 75:] #select rows and columns by position

In [None]:
df.loc[df['2000'] > 200_000_000] #select rows that match a condition

In [None]:
df[df['2000'].between(500_000, 1_000_000)] #select values in a given range

In [None]:
df.dropna() #remove rows with missing values

## License
(c) 2017 Kristian Rother.
Distributed under the conditions of the MIT License.