# Manipulating and Creating Columns
> During the course of doing data analysis and modeling, a significant amount of time is spent on data preparation: loading, cleaning, transforming, and rearranging. Such tasks are often reported to take up 80% or more of an analyst's time.
>
> \- Wes McKinney, the creator of Pandas, in his book *Python for Data Analysis*

## Applied Review
### Importing Data
- Python can read in data from CSVs, JSON files, and pickle files with just a few lines of code.

### Selecting and Filtering Data
- Python's pandas library supports limiting rows (via *filtering* and *slicing*), as well as *selecting* columns.
- All of these operations use the bracket operators, but row syntax includes the `.loc` *accessor*.

## Columns as Mutable Objects
It's common to want to modify a column of a DataFrame, or sometimes even to create a new column.
Let's take a look at our planes data again.

In [5]:
import pandas as pd
planes = pd.read_csv('../data/planes.csv')

In [6]:
planes.head()

Unnamed: 0,tailnum,year,type,manufacturer,model,engines,seats,speed,engine
0,N10156,2004.0,Fixed wing multi engine,EMBRAER,EMB-145XR,2,55,,Turbo-fan
1,N102UW,1998.0,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,,Turbo-fan
2,N103US,1999.0,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,,Turbo-fan
3,N104UW,1999.0,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,,Turbo-fan
4,N10575,2002.0,Fixed wing multi engine,EMBRAER,EMB-145LR,2,55,,Turbo-fan


Suppose we wanted to know the total capacity of each plane, including the crew.
We have data on how many seats each plane has (in the `seats` column), but that only includes paying passengers.



In [11]:
seats = planes['seats']
seats.head()

0     55
1    182
2    182
3    182
4     55
Name: seats, dtype: int64

For simplicity, let's say a full flight crew is always 5 people.
Series objects allow us to perform addition with the regular `+` syntax –- in this case, `seats + 5`.

In [14]:
capacity = seats + 5
capacity.head()

0     60
1    187
2    187
3    187
4     60
Name: seats, dtype: int64

You can switch the order of the addends (i.e. `5 + seats`) and it will still work.

In [16]:
capacity = 5 + seats
capacity.head()

0     60
1    187
2    187
3    187
4     60
Name: seats, dtype: int64

So we've create a new series, `capacity`, with the total capacity of the plane.
Right now it's totally separate from our original `planes` DataFrame, but we can make it a column of `planes` using the assignment syntax with the column reference syntax.
```python
df['new_column_name'] = new_column_series
```

In [18]:
planes['capacity'] = capacity
planes.head()

Unnamed: 0,tailnum,year,type,manufacturer,model,engines,seats,speed,engine,capacity
0,N10156,2004.0,Fixed wing multi engine,EMBRAER,EMB-145XR,2,55,,Turbo-fan,60
1,N102UW,1998.0,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,,Turbo-fan,187
2,N103US,1999.0,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,,Turbo-fan,187
3,N104UW,1999.0,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,,Turbo-fan,187
4,N10575,2002.0,Fixed wing multi engine,EMBRAER,EMB-145LR,2,55,,Turbo-fan,60


Note that `planes` now has a "capacity" column at the end.
Also note that in the code above, the *column name* goes in quotes within the bracket syntax, while the *values that will become the column* -- the Series we're using -- is on the right side of the statement, without any brackets or quotes.