## Selecting Data from a Data Frame

Just like Python (Lists, Dictionaries) and NumPy (1-dimensional array, 2-dimensional array) you can select the data you want from your Data Frame by using attributes such as values (for all of the data) or by using indexes (labels) to select specific data.

Note how np.nan has been used to fill missing values.

In [1]:
import pandas as pd
import numpy as np

# Create a DataFrame containing population data to use in selecting data examples

#Create DataFrame using Series of population density, state, unemployment rate and city names.
s_city = pd.Series(['Sydney','Melbourne','Brisbane','Perth','Adelaide',
                    'Gold Coast','Canberra','Newcastle','Wollongong','Logan City'])
s_density = pd.Series([4627345, 4246375, 2189878, 1896548,1225235, 
                       591473, 367752, 308308, 292190, 282673])
s_state = pd.Series(['New South Wales', 'Victoria','Queensland', 'Western Australia','South Australia',
                     'Queensland','Australian Capital Territory','New South Wales','New South Wales', 
                     'Queensland'])             
s_unemployed_rate = pd.Series([4.3, 4.9, np.nan, np.nan, 7.3, 6.4, 3.5, 4.3, 4.3, 6.4])

df_pop = pd.DataFrame({'cities':s_city,'density':s_density, 'state':s_state, 
                       'unemployed_rate':s_unemployed_rate})
df_pop

Unnamed: 0,cities,density,state,unemployed_rate
0,Sydney,4627345,New South Wales,4.3
1,Melbourne,4246375,Victoria,4.9
2,Brisbane,2189878,Queensland,
3,Perth,1896548,Western Australia,
4,Adelaide,1225235,South Australia,7.3
5,Gold Coast,591473,Queensland,6.4
6,Canberra,367752,Australian Capital Territory,3.5
7,Newcastle,308308,New South Wales,4.3
8,Wollongong,292190,New South Wales,4.3
9,Logan City,282673,Queensland,6.4


## Selecting the Data

If you want to select all the data as a 2-dimensional array, use the values attribute.

If you want to the values of one column, use:

< DataFrame_name >.< name_column > or
< DataFrame_name >['< name_column >']

In [2]:
# values property gives all data as an array
df_pop.values

array([['Sydney', 4627345, 'New South Wales', 4.3],
       ['Melbourne', 4246375, 'Victoria', 4.9],
       ['Brisbane', 2189878, 'Queensland', nan],
       ['Perth', 1896548, 'Western Australia', nan],
       ['Adelaide', 1225235, 'South Australia', 7.3],
       ['Gold Coast', 591473, 'Queensland', 6.4],
       ['Canberra', 367752, 'Australian Capital Territory', 3.5],
       ['Newcastle', 308308, 'New South Wales', 4.3],
       ['Wollongong', 292190, 'New South Wales', 4.3],
       ['Logan City', 282673, 'Queensland', 6.4]], dtype=object)

In [3]:
# selecting only cities
df_pop.cities

0        Sydney
1     Melbourne
2      Brisbane
3         Perth
4      Adelaide
5    Gold Coast
6      Canberra
7     Newcastle
8    Wollongong
9    Logan City
Name: cities, dtype: object

In [5]:
#Selecting only unemployment rate
df_pop['unemployed_rate']

0    4.3
1    4.9
2    NaN
3    NaN
4    7.3
5    6.4
6    3.5
7    4.3
8    4.3
9    6.4
Name: unemployed_rate, dtype: float64

In [7]:
#With <DataFrame_Name>. and tab key you can see the properties and methods of the series
#Also the name of the columns as well.
df_pop.density

0    4627345
1    4246375
2    2189878
3    1896548
4    1225235
5     591473
6     367752
7     308308
8     292190
9     282673
Name: density, dtype: int64