In [2]:
import pandas as pd

# Renaming Columns

There are many use cases for renaming columns in your dataframe.  You may simply change your mind when working in a python shell, or you may want to rename column names that were inferred upon reading in a CSV.  In any event, let's assume that you have an existing dataframe and you would like to rename the columns:

In [3]:
cols = ['old_column_1', 'old_column_2', 'old_column_3']
data_df = pd.DataFrame([[1, 'a', 1.2], [8, 'z', 3.5], [4, 's', 34.7], [23, 'b', 12.1]], columns=cols)
print data_df.head()

   old_column_1 old_column_2  old_column_3
0             1            a           1.2
1             8            z           3.5
2             4            s          34.7
3            23            b          12.1


But let's also assume that you did some kind of nifty transformations on this dataframe before renaming the columns, and these transformations included, for example, setting one column as an index and then resetting this index:

In [8]:
data_df = data_df.set_index('old_column_3').reset_index()
print data_df.head()

   old_column_3  old_column_1 old_column_2
0           1.2             1            a
1           3.5             8            z
2          34.7             4            s
3          12.1            23            b


As you can see, the ordering of the columns has been modified in my transformations.  There are innumerable other ways to produce such a scenario, but the important thing is this: *if you bang out a bunch of transformations with pandas dataframes, the resulting ordering of columns may not be what you expect.*  As such, you could fall into this **first pitfall** which is: *brute force renaming columns with an array of column names.*

Let me illustrate.  In the above case, I want to rename `old_column_1` to `new_column_1`, etc., and I might assume that the column ordering was preserved in my transformations.  Under this assumption, I might try to rename the columns using:

In [9]:
data_df.columns = ['new_column_1', 'new_column_2', 'new_column_3']
print data_df.head()

   new_column_1  new_column_2 new_column_3
0           1.2             1            a
1           3.5             8            z
2          34.7             4            s
3          12.1            23            b


The data from `old_column_3` would actually get renamed to `new_column_1`. More importantly, any subsequent operations I was intending to apply to `old_column_1`'s data would be applied to `old_column_3`'s data.  This could be a huge problem.  If `old_column_1` represented, say, company revenue and `old_column_3` represented company costs, you may report negative profits to your boss when in actuality the company made money.

A better approach is to utilize [pandas.DataFrame.rename](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.rename.html).  `rename` allows you to pass explicit dictionary of old to new column mappings that will correctly rename columns regardless of column ordering:

In [11]:
cols = ['old_column_1', 'old_column_2', 'old_column_3']
data_df = pd.DataFrame([[1, 'a', 1.2], [8, 'z', 3.5], [4, 's', 34.7], [23, 'b', 12.1]], columns=cols)
data_df = data_df.set_index('old_column_3').reset_index()

newcols = {
    'old_column_1': 'new_column_1', 
    'old_column_2': 'new_column_2', 
    'old_column_3': 'new_column_3'
}
data_df.rename(columns=newcols, inplace=True)
print data_df.head()

   new_column_3  new_column_1 new_column_2
0           1.2             1            a
1           3.5             8            z
2          34.7             4            s
3          12.1            23            b


# Indexing / Inserting Rows

Pandas makes it easy for you to set a column of a dataframe as an index.  After doing this, you may want to reference a value in the dataframe at a particular index value or you may want to insert a new row with a corresponding new index value.  These operations can trip you up.  For example, let's take a similar dataframe to the one above:

In [17]:
cols = ['column_1', 'column_2', 'column_3']
data_df = pd.DataFrame([[1, 'a', 1.2], [8, 'z', 3.5], [4, 's', 34.7], [23, 'b', 12.1]], columns=cols)
print data_df.head()

   column_1 column_2  column_3
0         1        a       1.2
1         8        z       3.5
2         4        s      34.7
3        23        b      12.1


Now let's set the index as `column_3`.

In [18]:
data_df.set_index('column_3', inplace=True)
print data_df.head()

          column_1 column_2
column_3                   
1.2              1        a
3.5              8        z
34.7             4        s
12.1            23        b


If we want to see the values in the row with index `34.7`, we could use `data_df.ix[34.7]`.

In [19]:
data_df.ix[34.7]

column_1    4
column_2    s
Name: 34.7, dtype: object

Using some reasonable logic, we could assume, based on the above, that a good way to insert a new row with index 4.5 would be:

In [20]:
data_df.ix[4.5] = [17, 'e']

In [21]:
print data_df

          column_1 column_2
column_3                   
1.2              1        a
3.5              8        z
34.7             4        s
12.1            23        b
4.5             17        e


On the other hand, if we would like to insert a new row with an index value of an integer, e.g., 3, we would run into problems using this approach.  Performing:

In [22]:
data_df.ix[3] = [9, 'y']

In [23]:
print data_df

          column_1 column_2
column_3                   
1.2              1        a
3.5              8        z
34.7             4        s
12.1             9        y
4.5             17        e


So, our **second pitfall** is: *trying to reference a numerical valued index with the index value and accidentally overwriting another row of the dataframe.*  However, this can be avoided by using the `loc` function instead of `ix`:

In [24]:
data_df.loc[3] = [9, 'y']

In [25]:
print data_df

          column_1 column_2
column_3                   
1.2              1        a
3.5              8        z
34.7             4        s
12.1             9        y
4.5             17        e
3.0              9        y


See more about indexing and selecting data in the pandas documentation [here](http://pandas.pydata.org/pandas-docs/stable/indexing.html).