# DataFrame as Dictionary

In [3]:
import pandas as pd

In [4]:
area = pd.Series({
    'California': 423967, 'Texas': 695662,
    'Florida': 170312, 'New York': 141297,
    'Pennsylvania': 119280
    }
)

pop = pd.Series({
    'California': 39538223, 'Texas': 29145505,
    'Florida': 21538187, 'New York': 20201249,
    'Pennsylvania': 13002700
    }
)
data = pd.DataFrame({'area':area, 'pop':pop})
data

Unnamed: 0,area,pop
California,423967,39538223
Texas,695662,29145505
Florida,170312,21538187
New York,141297,20201249
Pennsylvania,119280,13002700


dictionary-style indexing to access a column in a DataFrame

In [5]:
data['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

Equivalently, we can use attribute-style access with column names that are strings:

In [6]:
data.area

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

Like with the Series objects discussed earlier, this dictionary-style syntax can also be
used to modify the object, in this case adding a new column:

In [7]:
data['density'] = data['pop'] / data['area']
data

Unnamed: 0,area,pop,density
California,423967,39538223,93.257784
Texas,695662,29145505,41.896072
Florida,170312,21538187,126.463121
New York,141297,20201249,142.97012
Pennsylvania,119280,13002700,109.009893


We can examine the raw underlying data array using the values
attribute

In [8]:
data.values

array([[4.23967000e+05, 3.95382230e+07, 9.32577842e+01],
       [6.95662000e+05, 2.91455050e+07, 4.18960717e+01],
       [1.70312000e+05, 2.15381870e+07, 1.26463121e+02],
       [1.41297000e+05, 2.02012490e+07, 1.42970120e+02],
       [1.19280000e+05, 1.30027000e+07, 1.09009893e+02]])

With this picture in mind, many familiar array-like operations can be done on the
DataFrame itself. For example, we can transpose the full DataFrame to swap rows and
columns:

In [9]:
data.T

Unnamed: 0,California,Texas,Florida,New York,Pennsylvania
area,423967.0,695662.0,170312.0,141297.0,119280.0
pop,39538220.0,29145500.0,21538190.0,20201250.0,13002700.0
density,93.25778,41.89607,126.4631,142.9701,109.0099


passing a single index to an array accesses a row:

In [10]:
data.values[0]

array([4.23967000e+05, 3.95382230e+07, 9.32577842e+01])

and passing a single “index” to a DataFrame accesses a column

In [11]:
data['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

Using the iloc indexer, we can index
the underlying array as if it were a simple NumPy array

In [12]:
data.iloc[:3, :2]

Unnamed: 0,area,pop
California,423967,39538223
Texas,695662,29145505
Florida,170312,21538187


Similarly, using the loc indexer we can index the underlying data in an array-like
style but using the explicit index and column names:

In [13]:
data.loc[:'Florida', :'pop']

Unnamed: 0,area,pop
California,423967,39538223
Texas,695662,29145505
Florida,170312,21538187


Any of the familiar NumPy-style data access patterns can be used within these indexers.
For example, in the loc indexer we can combine masking and fancy indexing as
follows:

In [14]:
data.loc[data.density > 120, ['pop', 'density']]

Unnamed: 0,pop,density
Florida,21538187,126.463121
New York,20201249,142.97012


Any of these indexing conventions may also be used to set or modify values; this is
done in the standard way that you might be accustomed to from working with
NumPy:

In [15]:
data.iloc[0, 2] = 90
data

Unnamed: 0,area,pop,density
California,423967,39538223,90.0
Texas,695662,29145505,41.896072
Florida,170312,21538187,126.463121
New York,141297,20201249,142.97012
Pennsylvania,119280,13002700,109.009893


# Additional Indexing Conventions

In [18]:
data["Florida":"New York"]

Unnamed: 0,area,pop,density
Florida,170312,21538187,126.463121
New York,141297,20201249,142.97012


Such slices can also refer to rows by number rather than by index:

In [19]:
data[1:3]

Unnamed: 0,area,pop,density
Texas,695662,29145505,41.896072
Florida,170312,21538187,126.463121


Similarly, direct masking operations are interpreted row-wise rather than columnwise:

In [20]:
data[data.density > 120]

Unnamed: 0,area,pop,density
Florida,170312,21538187,126.463121
New York,141297,20201249,142.97012
