Square brackets [] are used to select a single column and select subsets of data. Each column in a DataFrame is a Series. As a single column is selected, the returned object is a pandas Series. 

In [1]:
import pandas as pd

In [2]:
titanictest = pd.read_csv('http://bit.ly/kaggletest')

In [4]:
titanictest.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [5]:
Ages = titanictest['Age'] #Single column is selected

In [6]:
Ages.head()

0    34.5
1    47.0
2    62.0
3    27.0
4    22.0
Name: Age, dtype: float64

In [7]:
type(titanictest['Age'])

pandas.core.series.Series

titanictest['Age'] : Single column is selected and type of that single column is Series as above.

In [8]:
titanictest['Age'].shape

(418,)

DataFrame.shape is an attribute of pandas series and DataFrame containing the number of rows and columns: (nrows, ncolumns). A pandas Series is 1-dimensional and only the number of rows is returned.

In [14]:
Age_Gender = titanictest[['Age','Sex']] # Multiple Column selected

While selecting multiple columns, we should use 2 square brackets - the inner square brackets which define a Python list with column names, whereas the outer brackets are used to select the data from a pandas DataFrame.

In [15]:
type(Age_Gender)

pandas.core.frame.DataFrame

Above Returned data type is a pandas DataFrame

In [16]:
Age_Gender.shape

(418, 2)

'Age_gender' is a Dataframe with 418 rows and 2 columns which is 2 dimensional(with row and column).

###### Filter Specific Rows from a Dataframe: 

In [17]:
below_30 = titanictest[titanictest['Age'] < 30]

To select rows based on a conditional expression, use a condition inside the selection brackets []. 

In [18]:
titanictest['Age'] < 30

0      False
1      False
2      False
3       True
4       True
       ...  
413    False
414    False
415    False
416    False
417    False
Name: Age, Length: 418, dtype: bool

The condition inside the selection brackets titanictest['Age'] < 30 checks for which rows the Age column has a value smaller than 30. 
The output of the conditional expression (<, but also ==, !=, >, <=,… would work) is actually a pandas Series of boolean values
(True or False) where the number of rows will be same for Series and Dataframe.
Series boolean values is used between the selection brackets [] of Dataframe to filter. Only rows with True value selected. 

In [20]:
below_30.shape

(185, 11)

The number of rows(185) which satisfy the condition by checking the shape attribute of the resulting DataFrame below_30. After applying filter to column " titanictest['Age'] < 30 " only row count changes but columns remain as before.

In [21]:
titanictest["Pclass"].isin([2, 3])

0       True
1       True
2       True
3       True
4       True
       ...  
413     True
414    False
415     True
416     True
417     True
Name: Pclass, Length: 418, dtype: bool

 The isin() conditional function returns a True for each row the values are in the provided list. To filter the rows based on such a function, use the conditional function inside the selection brackets [].Here the condition inside the selection brackets titanictest["Pclass"].isin([2, 3]) checks for which rows the Pclass column is either 2 or 3.

In [22]:
class23 = titanictest[titanictest["Pclass"].isin([2, 3])]

In [23]:
class23.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [24]:
class_23 = titanictest[(titanictest["Pclass"] == 2) | (titanictest["Pclass"] == 3)] # Equivalent to titanictest[titanictest["Pclass"].isin([2, 3])]

When combining multiple conditional statements, each condition must be surrounded by parentheses ().

###### Selecting only " Not a Null " rows in Dataframe:

In [29]:
cabin_not_na = titanictest[titanictest["Cabin"].notna()]

In [30]:
cabin_not_na

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
12,904,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23.0,1,0,21228,82.2667,B45,S
14,906,1,"Chaffee, Mrs. Herbert Fuller (Carrie Constance...",female,47.0,1,0,W.E.P. 5734,61.1750,E31,S
24,916,1,"Ryerson, Mrs. Arthur Larned (Emily Maria Borie)",female,48.0,1,3,PC 17608,262.3750,B57 B59 B63 B66,C
26,918,1,"Ostby, Miss. Helene Ragnhild",female,22.0,0,1,113509,61.9792,B36,C
28,920,1,"Brady, Mr. John Bertram",male,41.0,0,0,113054,30.5000,A21,S
...,...,...,...,...,...,...,...,...,...,...,...
404,1296,1,"Frauenthal, Mr. Isaac Gerald",male,43.0,1,0,17765,27.7208,D40,C
405,1297,2,"Nourney, Mr. Alfred (Baron von Drachstedt"")""",male,20.0,0,0,SC/PARIS 2166,13.8625,D38,C
407,1299,1,"Widener, Mr. George Dunton",male,50.0,1,1,113503,211.5000,C80,C
411,1303,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37.0,1,0,19928,90.0000,C78,Q


The notna() conditional function returns a True for each row the values are not an Null value. As such, this can be combined with the selection brackets [] to filter the data table.

###### select specific rows and columns from a DataFrame

In [33]:
name_below_30 = titanictest.loc[titanictest['Age'] <30, 'Name']

In [34]:
name_below_30

3                                  Wirz, Mr. Albert
4      Hirvonen, Mrs. Alexander (Helga E Lindqvist)
5                        Svensson, Mr. Johan Cervin
7                      Caldwell, Mr. Albert Francis
8         Abrahim, Mrs. Joseph (Sophie Halaut Easu)
                           ...                     
403                          Carrau, Mr. Jose Pedro
405    Nourney, Mr. Alfred (Baron von Drachstedt")"
406                       Ware, Mr. William Jeffery
409                       Peacock, Miss. Treasteall
412                  Henriksson, Miss. Jenny Lovisa
Name: Name, Length: 185, dtype: object

While selecting subset of rows and columns, should use loc/iloc operators. When using loc/iloc, the part before the comma is the rows you want, and the part after the comma is the columns you want to select. Here just the selection brackets are not enough but also need to add loc/iloc infront of selection brackets as above.
Where .loc is primarily label based and .iloc is primarily integer position based. 

In [37]:
titanictest.iloc[9:15, 2:7]

Unnamed: 0,Name,Sex,Age,SibSp,Parch
9,"Davies, Mr. John Samuel",male,21.0,2,0
10,"Ilieff, Mr. Ylio",male,,0,0
11,"Jones, Mr. Charles Cresson",male,46.0,0,0
12,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23.0,1,0
13,"Howard, Mr. Benjamin",male,63.0,1,0
14,"Chaffee, Mrs. Herbert Fuller (Carrie Constance...",female,47.0,1,0


A subset of both rows and columns is made in one go using iloc operator in front of the selection brackets []. Where iloc operator can be used when rows and/or columns selected based on their position in the table.

Select specific rows and/or columns using loc when using the row and column names.

Select specific rows and/or columns using iloc when using the positions in the table.