Title: Removing Columns
Slug: removing_columns 
Summary: Removing columns and all their associated values. 
Date: 2016-11-27 12:00 
Category: wrangling_data 
Tags: 
Authors: Nate Hall

<a href='http://nbviewer.jupyter.org/github/nathan-hall/nathan-hall.github.io/blob/pelican/content/wrangling_data/editing_column_names.ipynb'>Link to the full jupyter notebook.</a><br/>
<a href='http://nbviewer.jupyter.org/github/nathan-hall/nathan-hall.github.io/blob/pelican/content/wrangling_data/editing_column_names_code.ipynb'>Link to the code only jupyter notebook.</a>

This is a quick introduction to some basic methods for removing columns from a dataset. We will use the billboard hot 100 dataset as an example.

In [1]:
import pandas as pd

In [4]:
data = pd.read_csv('../data/billboard_weeks_edited.csv')
df = pd.DataFrame(data)
df.head(1)

Unnamed: 0,year,artist,track,time,genre,entered,peaked,1,2,3,...,67,68,69,70,71,72,73,74,75,76
0,2000,Destiny's Child,Independent Women Part I,3:38,Rock,2000-09-23,2000-11-18,78,63.0,49.0,...,,,,,,,,,,


## Drop a column
The simplest way to remove a column is with the "df.drop" method from pandas. <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html" target="_blank">See more info at the documentation here.</a>

We specify what dataframe to apply the .drop method. Then we tell it if we're dropping a column or row with the "axis" argument. And finally we use the "inplace" argument to either create the results in an alternate universe or apply it back to our current dataframe.

In [5]:
#Can you guess why I am dropping this column? Its not just for an example. See if you can figure it out.
df.drop('year', axis=1, inplace=True)
df.head(1)

Unnamed: 0,artist,track,time,genre,entered,peaked,1,2,3,4,...,67,68,69,70,71,72,73,74,75,76
0,Destiny's Child,Independent Women Part I,3:38,Rock,2000-09-23,2000-11-18,78,63.0,49.0,33.0,...,,,,,,,,,,


## Drop multiple columns
Notice that for the "labels" argument in the documentation it says that the elements to drop can be "list-like". Which means that if we want to drop multiple columns we can give it a list of column names since that is how all the column headers are stored in pandas anyways. So the function would be...

In [9]:
#This time we are doing it for example purposes only so note that inplace is set to False to make this abundantly clear.
#It defaults to False automatically but this makes it clear we are not applying it to the original dataframe.

df.drop(['entered', 'peaked'], axis=1, inplace=False)[:1]

#note we aren't using df.head to see how it works. Because df.head shows the original dataframe values.
#since we are not applying this to the original dataframe we can slice the results to show the first row
#only and that acts the same as using df.head(1)

Unnamed: 0,artist,track,time,genre,1,2,3,4,5,6,...,67,68,69,70,71,72,73,74,75,76
0,Destiny's Child,Independent Women Part I,3:38,Rock,78,63.0,49.0,33.0,23.0,15.0,...,,,,,,,,,,
