# Adding and dropping columns from DataFrames

This document will demonstrate how to add and remove columns from a DataFrame

In [None]:
# import pandas

# load the gapminder dataset

# take a look at the head of gapminder


### Creating a new column in a DataFrame

In [None]:
# compute the product of the pop and gdpPercap columns


Let's add a new column `gdp` which is the product of the `pop` and `gdpPercap` columns:

In [None]:
# add a new column to gapminder corresponding to the product of the values in the 'pop' and 'gdpPercap' columns


In [None]:
# Has the original gapminder object changed?


### Removing a column from a DataFrame using `.drop()`

To remove a column from a DataFrame, you can use the pandas `.drop` method. 


In [None]:
# try to remove gdp from gapminder using the df.drop(columns=) method


In [None]:
# did the original gapminder data object change?



To update the `gapminder` DataFrame to be the version without the `gdp` column, you need to overwrite the `gapminder` object by assigning it to be the version without `gdp` as follows:

In [None]:
# overwrite gapminder with the output of the df.drop(columns=) method


In [None]:
# now look at the gapminder data object - has it changed?


### Creating a copy of a DataFrame object

Suppose that you want to keep an unmodified copy of the original `gapminder` DataFrame object in your environment, and create a different version, called `gapminder_new`, that you can modify as much as you like. 

You might try to create a new variable `gapminder_new` that contains the original `gapminder` DataFrame as follows:

In [None]:
# define gapminder_new and set it equal to gapminder


Indeed, `gapminder_new` contains the same DataFrame object as `gapminder`:

In [None]:
# take a look at gapminder_new


Let's add a `GDP` column to this new `gapminder_new` DataFrame object:

In [None]:
# Define a new column in gapminder_new called 'GDP' that is equal to the product of the 'pop' and 'gdpPercap' columns

# take a look at gapminder_new


In [None]:
# take a look at the original gapminder object -- has it changed?


What's going on here?

Let's revert the `gapminder` DataFrame object to the original dataset by re-loading the csv file:

In [None]:
# read in gapminder again to revert to the original dataset
gapminder = pd.read_csv('data/gapminder.csv')
gapminder

### The `.copy()`` method

The problem, is that when you write `gapminder_new = gapminder`, this is creating a new "pointer" to the `gapminder` DataFrame: `gapminder_new` acts as an "alias" for the original `gapminder` DataFrame. 

The way to create an independent copy of a DataFrame, use the Pandas `.copy()` method.

In [None]:
# define gapminder_new this time as a copy of gapminder


Now let's add a new column to `gapminder_new` called `gdp_new`:

In [None]:
# add a column, gdp_new, to gapminder_new that is equal to the product of the 'pop' and 'gdpPercap' columns

# take a look at gapminder_new


In [None]:
# check whether the original gapminder object has changed


### Exercise 

Create a version of gapminder called `gapminder_gdp` that contains three columns: country, year, and gdp (the GDP for each country-year in millions). Make sure that the original `gapminder` DataFrame is not modified.

## Modifying existing columns of a DataFrame

The `df['col'] = ...` syntax can be used not only to add new columns, but also to modify existing columns.

In [None]:
# Round the lifeExp column values to the nearest integer:
# -------------
# import numpy

# apply np.round() to the 'lifeExp' column of gapminder


Update the existing `lifeExp` column with this rounded version as follows:

In [None]:
# update the lifeExp column of gapminder with the rounded version

# look at gapminder
