## Updating a dataframe
There are many ways you may want to update a dataframe in order to modify it into a more presentable form. Some common tools to update a dataframe are described below. Note that this is not an exhaustive list, and you may need to consult pandas documentation for tasks not mentioned here.

Note that all the examples below use the domestic_flights.csv Download domestic_flights.csvdataset. It can be imported with the code:

In [1]:
import pandas as pd
flights = pd.read_csv('domestic_flights.csv')
flights

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Route_Distance,Seats
0,ADELAIDE,ALICE SPRINGS,Jan-84,15743,143,1316,19246
1,ADELAIDE,BRISBANE,Jan-84,3781,32,1622,4210
2,ADELAIDE,CANBERRA,Jan-84,1339,12,972,1414
3,ADELAIDE,DARWIN,Jan-84,3050,33,2619,4566
4,ADELAIDE,GOLD COAST,Jan-84,1596,16,1607,1803
...,...,...,...,...,...,...,...
25962,SYDNEY,WAGGA WAGGA,Apr-22,11318,326,367,17244
25963,SYDNEY,WAGGA WAGGA,May-22,12278,351,367,18599
25964,SYDNEY,WAGGA WAGGA,Jun-22,12119,338,367,18116
25965,SYDNEY,WAGGA WAGGA,Jul-22,12863,415,367,20302


### Adding new columns

New columns can be added to a dataframe through assignment, provided the column name doesn't already exist in the dataframe. The general format is <code>df_name['col_name'] = ....</code> If the column name provided does exist, it will instead update that column.

In [2]:
# example: adding a column with total distance.
flights['Total_Distance'] = flights['Route_Distance'] * flights['Aircraft_Trips']

### Removing columns

If you want to remove a small amount of columns, then you should use the <code>drop</code> method and use the <code>columns</code> argument to specify what columns to remove. Note: this doesn't update the dataframe in place, so you will need to assign to a variable to store the modified dataframe.

In [3]:
# removing route distance column
flights.drop(columns = 'Route_Distance')

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Seats,Total_Distance
0,ADELAIDE,ALICE SPRINGS,Jan-84,15743,143,19246,188188
1,ADELAIDE,BRISBANE,Jan-84,3781,32,4210,51904
2,ADELAIDE,CANBERRA,Jan-84,1339,12,1414,11664
3,ADELAIDE,DARWIN,Jan-84,3050,33,4566,86427
4,ADELAIDE,GOLD COAST,Jan-84,1596,16,1803,25712
...,...,...,...,...,...,...,...
25962,SYDNEY,WAGGA WAGGA,Apr-22,11318,326,17244,119642
25963,SYDNEY,WAGGA WAGGA,May-22,12278,351,18599,128817
25964,SYDNEY,WAGGA WAGGA,Jun-22,12119,338,18116,124046
25965,SYDNEY,WAGGA WAGGA,Jul-22,12863,415,20302,152305


Selecting what columns you want to keep (using loc or iloc) is also viable.

### Removing rows

Row removal is similar to column removal. You can use the <code>drop</code> method to remove a select few rows.

In [4]:
# removing first row of the data
flights.drop(0)

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Route_Distance,Seats,Total_Distance
1,ADELAIDE,BRISBANE,Jan-84,3781,32,1622,4210,51904
2,ADELAIDE,CANBERRA,Jan-84,1339,12,972,1414,11664
3,ADELAIDE,DARWIN,Jan-84,3050,33,2619,4566,86427
4,ADELAIDE,GOLD COAST,Jan-84,1596,16,1607,1803,25712
5,ADELAIDE,MELBOURNE,Jan-84,50817,711,643,76647,457173
...,...,...,...,...,...,...,...,...
25962,SYDNEY,WAGGA WAGGA,Apr-22,11318,326,367,17244,119642
25963,SYDNEY,WAGGA WAGGA,May-22,12278,351,367,18599,128817
25964,SYDNEY,WAGGA WAGGA,Jun-22,12119,338,367,18116,124046
25965,SYDNEY,WAGGA WAGGA,Jul-22,12863,415,367,20302,152305


Selecting what columns you want to keep (using loc or iloc) is also viable. 

If removal is conditional, then you need to remove with filtering.

In [5]:
# removing all rows with 0 aircraft trips
flights[flights['Aircraft_Trips'] != 0]

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Route_Distance,Seats,Total_Distance
0,ADELAIDE,ALICE SPRINGS,Jan-84,15743,143,1316,19246,188188
1,ADELAIDE,BRISBANE,Jan-84,3781,32,1622,4210,51904
2,ADELAIDE,CANBERRA,Jan-84,1339,12,972,1414,11664
3,ADELAIDE,DARWIN,Jan-84,3050,33,2619,4566,86427
4,ADELAIDE,GOLD COAST,Jan-84,1596,16,1607,1803,25712
...,...,...,...,...,...,...,...,...
25962,SYDNEY,WAGGA WAGGA,Apr-22,11318,326,367,17244,119642
25963,SYDNEY,WAGGA WAGGA,May-22,12278,351,367,18599,128817
25964,SYDNEY,WAGGA WAGGA,Jun-22,12119,338,367,18116,124046
25965,SYDNEY,WAGGA WAGGA,Jul-22,12863,415,367,20302,152305


### Changing the index

You can change the index of a dataframe with the <code>set_index</code> method. Note that this will remove the old index. You can move the old index to a column with assignment.

In [6]:
# example: setting Month column to be the index
flights.set_index('Month')

Unnamed: 0_level_0,City1,City2,Passenger_Trips,Aircraft_Trips,Route_Distance,Seats,Total_Distance
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Jan-84,ADELAIDE,ALICE SPRINGS,15743,143,1316,19246,188188
Jan-84,ADELAIDE,BRISBANE,3781,32,1622,4210,51904
Jan-84,ADELAIDE,CANBERRA,1339,12,972,1414,11664
Jan-84,ADELAIDE,DARWIN,3050,33,2619,4566,86427
Jan-84,ADELAIDE,GOLD COAST,1596,16,1607,1803,25712
...,...,...,...,...,...,...,...
Apr-22,SYDNEY,WAGGA WAGGA,11318,326,367,17244,119642
May-22,SYDNEY,WAGGA WAGGA,12278,351,367,18599,128817
Jun-22,SYDNEY,WAGGA WAGGA,12119,338,367,18116,124046
Jul-22,SYDNEY,WAGGA WAGGA,12863,415,367,20302,152305


### Changing column names

Column names can be changed using the <code>rename method</code>. For this, you should supply a <code>columns</code> argument which is a dictionary. The dictionary keys are the old names, and the dictionary values are the new names.

In [7]:
# changing seats column name
flights.rename(columns = {'Seats': 'Total_Seats'})

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Route_Distance,Total_Seats,Total_Distance
0,ADELAIDE,ALICE SPRINGS,Jan-84,15743,143,1316,19246,188188
1,ADELAIDE,BRISBANE,Jan-84,3781,32,1622,4210,51904
2,ADELAIDE,CANBERRA,Jan-84,1339,12,972,1414,11664
3,ADELAIDE,DARWIN,Jan-84,3050,33,2619,4566,86427
4,ADELAIDE,GOLD COAST,Jan-84,1596,16,1607,1803,25712
...,...,...,...,...,...,...,...,...
25962,SYDNEY,WAGGA WAGGA,Apr-22,11318,326,367,17244,119642
25963,SYDNEY,WAGGA WAGGA,May-22,12278,351,367,18599,128817
25964,SYDNEY,WAGGA WAGGA,Jun-22,12119,338,367,18116,124046
25965,SYDNEY,WAGGA WAGGA,Jul-22,12863,415,367,20302,152305


### Changing column order

Column order is easily changed through column selection. Columns will appear in the order they are selected.

In [8]:
# moving month to the left column
flights.iloc[:, [2, 0, 1, 3, 4, 5, 6, 7]]

Unnamed: 0,Month,City1,City2,Passenger_Trips,Aircraft_Trips,Route_Distance,Seats,Total_Distance
0,Jan-84,ADELAIDE,ALICE SPRINGS,15743,143,1316,19246,188188
1,Jan-84,ADELAIDE,BRISBANE,3781,32,1622,4210,51904
2,Jan-84,ADELAIDE,CANBERRA,1339,12,972,1414,11664
3,Jan-84,ADELAIDE,DARWIN,3050,33,2619,4566,86427
4,Jan-84,ADELAIDE,GOLD COAST,1596,16,1607,1803,25712
...,...,...,...,...,...,...,...,...
25962,Apr-22,SYDNEY,WAGGA WAGGA,11318,326,367,17244,119642
25963,May-22,SYDNEY,WAGGA WAGGA,12278,351,367,18599,128817
25964,Jun-22,SYDNEY,WAGGA WAGGA,12119,338,367,18116,124046
25965,Jul-22,SYDNEY,WAGGA WAGGA,12863,415,367,20302,152305


### Data type conversions

Data type conversions can be done with the <code>astype method</code>. For this, you supply a dictionary where the keys are the column names, and the values are the data type to convert to. 

In [11]:
# example: converting seats column to a string
flights = flights.astype({'Seats': str})
flights.dtypes

City1              object
City2              object
Month              object
Passenger_Trips     int64
Aircraft_Trips      int64
Route_Distance      int64
Seats              object
Total_Distance      int64
dtype: object