In many of our previous practice, we have used 'inplace' parameter. So here lets dig in more about 'inplace' parameter.

In [1]:
import pandas as pd

In [2]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [3]:
ufo.shape

(18241, 5)

Around 18000 rows and 5 columns

In [4]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


By default head(), shows first 5 rows of data.

Lets assume, We dont want City column, since we dont use it. So will just drop it from the dataset using drop method as below: 

In [5]:
ufo.drop('City',axis=1).head()

Unnamed: 0,Colors Reported,Shape Reported,State,Time
0,,TRIANGLE,NY,6/1/1930 22:00
1,,OTHER,NJ,6/30/1930 20:00
2,,OVAL,CO,2/15/1931 14:00
3,,DISK,KS,6/1/1931 13:00
4,,LIGHT,NY,4/18/1933 19:00


Here, we can see that 'City' column is gone after running above command.

The question is, is it City column is actually gone, lets check in 'ufo' Dataframe as below: 

In [6]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


Above, we can see that 'City' column is not gone. So Why we have the 'City' column in 'ufo' Dataframe is still shows up? 

The reason is, in drop method, by default " inplace=False " : in other words when in any method if " inplace=False " by default then which means the operation won't affect the underlying data by default.

So if we want to make the changes permanently, all we have to do is change " inplace=True " as below:

In [7]:
ufo.drop('City', axis=1, inplace=True)

In [8]:
ufo.head()

Unnamed: 0,Colors Reported,Shape Reported,State,Time
0,,TRIANGLE,NY,6/1/1930 22:00
1,,OTHER,NJ,6/30/1930 20:00
2,,OVAL,CO,2/15/1931 14:00
3,,DISK,KS,6/1/1931 13:00
4,,LIGHT,NY,4/18/1933 19:00


Here when we did " inplace=True", its affects the underlying data and 'City' column is gone.

Lets see why "inplace=False" by default as below:

Pandas will allows us to experiment, so if we are not sure on what we are doing and we just want to see the impact without permanently affecting underlying data. That can be possible only by keeping parameter "inplace=False".

So when we are done with the all the experiments and ready to make it permanent, then we can change it proactively to "inplace=True". For instance: 

Lets say, we are thinking of dropping missing values using dropna() method: 

In [9]:
ufo.dropna(how='any') #Dropna() signature is having "inplace=False" by default
# how = any : drop a row if any values are missing (NaN)

Unnamed: 0,Colors Reported,Shape Reported,State,Time
12,RED,SPHERE,SC,6/30/1939 20:00
19,RED,OTHER,AK,4/30/1943 23:00
36,RED,FORMATION,VA,7/10/1945 1:30
44,GREEN,SPHERE,CA,6/30/1946 19:00
82,BLUE,CHEVRON,CA,7/15/1947 21:00
...,...,...,...,...
18213,GREEN,FIREBALL,CA,12/28/2000 19:10
18216,ORANGE,LIGHT,CA,12/29/2000 16:10
18220,BLUE,DISK,CA,12/29/2000 20:30
18233,RED,VARIOUS,AK,12/31/2000 21:00


Above, default index is integers and drop(how=any) we can see lot of rows are missing.

In [10]:
ufo.dropna(how='any').shape

(2490, 4)

Here we lost around 16000 rows when how='any'from the original data 18000. So here we wanted to know what would happen. 
Since we did not use " inplace = True " , so thankfully nothing has been lost and it allowed us to experiment and check the impact of using dropna() method without actually affecting the data permanently. 

So we can find lots of other examples using lots of different methods with "inplace=False" by default. Below we can see original data as it is : 

In [11]:
ufo.shape

(18241, 4)

Some of the methods where 'inplace' parameter exist are : rename(), sort_values(), set_index() and so on.
So its all over the pandas and we just have to look for it and its always going to "inplace=False" by default.

We actually dont need 'inplace' parameter instead we could just do an 'Assignment' Statement. Lets see how below:

In [12]:
ufo = ufo.set_index('Time') # ufo.set_index('Time', inplace='True') This is what we were doing previously

In [13]:
ufo.tail() # Will check the output of above assignment statement

Unnamed: 0_level_0,Colors Reported,Shape Reported,State
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12/31/2000 23:00,,TRIANGLE,IL
12/31/2000 23:00,,DISK,IA
12/31/2000 23:45,,,WI
12/31/2000 23:45,RED,LIGHT,WI
12/31/2000 23:59,,OVAL,FL


Here, we will see now index is 'Time' . And this works as we expected.  

Difference between using "inplace=True" in methods and Assignment statement is as below:

Assignment statement has potential to create 2nd copy. So temporarily it has 2 copies of this Dataframe. So what if we have gigabytes large data, our computers performance will be slow or possible of operation failing.

Whereas using "inplace", inplace=True makes it sounds like it will inplace and wont create a 2nd copy. 

But there is no guarantee that inplace=True is more efficient than doing an assignment statement. So which ever way you like, you can proceed with that way.

###### Useful tip:  

Lets take an advantage of 'inplace=False' by default for exploring pandas methods. For example below :

In [14]:
ufo.fillna(method='bfill').tail()
# Since inplace=False(default), we dont have to worry about screwing things up.

Unnamed: 0_level_0,Colors Reported,Shape Reported,State
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12/31/2000 23:00,RED,TRIANGLE,IL
12/31/2000 23:00,RED,DISK,IA
12/31/2000 23:45,RED,LIGHT,WI
12/31/2000 23:45,RED,LIGHT,WI
12/31/2000 23:59,,OVAL,FL


In [15]:
ufo.fillna(method='ffill').tail()

Unnamed: 0_level_0,Colors Reported,Shape Reported,State
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12/31/2000 23:00,RED,TRIANGLE,IL
12/31/2000 23:00,RED,DISK,IA
12/31/2000 23:45,RED,DISK,WI
12/31/2000 23:45,RED,LIGHT,WI
12/31/2000 23:59,RED,OVAL,FL


So, in above each of 2 cases since 'inplace=False', it didn't affect the underlying data and we can figure out which case did what we were expecting. Backfill(bfill) and forwardfill(ffill) are much useful for time series data than anything else. 

So above bfill and ffill are not really appropriate here, but its good way to explore without any risk of affecting underlying Dataframe. 