Using pandas for a while, chances are that we’ve likely encountered a SettingWithCopyWarning. 

First of all, we should know that SettingWithCopyWarning is not an error. It’s a warning as name itself says.

And the proper response to the warning is to find out how to handle it and think out what pandas is asking us to do but the warning is bit tricky to understand. So most of the people just turn it off instead which is not a good practice unless we are very sure on what we are doing. 

Lets see the scenarios in which this warning arises ::

In [3]:
import pandas as pd

In [4]:
movies = pd.read_csv('http://bit.ly/imdbratings')

This is a dataset of Movies from the internet movie database.

In [5]:
movies.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


Here are first 5 rows and we will work on 'content_rating' column. 

In [6]:
movies.content_rating.isnull().sum() # Looking for sum of missing values of content_rating in movies df

3

So there are 3 missing values and lets take a look at those real value :

In [7]:
movies[movies.content_rating.isnull()]
#code inside the[ ] is generating Boolean Series and we are passing to dataframe.

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
187,8.2,Butch Cassidy and the Sundance Kid,,Biography,110,"[u'Paul Newman', u'Robert Redford', u'Katharin..."
649,7.7,Where Eagles Dare,,Action,158,"[u'Richard Burton', u'Clint Eastwood', u'Mary ..."
936,7.4,True Grit,,Adventure,128,"[u'John Wayne', u'Kim Darby', u'Glen Campbell']"


Here we can see real values by passing boolean series to dataframe and we can see missing values(NaN) in 'content_rating' Series.

Lets see all the unique values in 'content_rating' column. 

In [8]:
movies.content_rating.value_counts()

R            460
PG-13        189
PG           123
NOT RATED     65
APPROVED      47
UNRATED       38
G             32
PASSED         7
NC-17          7
X              4
GP             3
TV-MA          1
Name: content_rating, dtype: int64

Lets say 'NOT RATED' total 65 movies to be representes as missing values. Sometimes in datasets there is a flag which means missing and best to replace those values with NaN, we saw in 'content_rating' column so that we can take advantage of missing value functionality.

So first step is to find relevant movies:

In [9]:
movies[movies.content_rating=='NOT RATED']

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
5,8.9,12 Angry Men,NOT RATED,Drama,96,"[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals..."
6,8.9,"The Good, the Bad and the Ugly",NOT RATED,Western,161,"[u'Clint Eastwood', u'Eli Wallach', u'Lee Van ..."
41,8.5,Sunset Blvd.,NOT RATED,Drama,110,"[u'William Holden', u'Gloria Swanson', u'Erich..."
63,8.4,M,NOT RATED,Crime,99,"[u'Peter Lorre', u'Ellen Widmann', u'Inge Land..."
66,8.4,Munna Bhai M.B.B.S.,NOT RATED,Comedy,156,"[u'Sunil Dutt', u'Sanjay Dutt', u'Arshad Warsi']"
...,...,...,...,...,...,...
665,7.7,Lolita,NOT RATED,Drama,152,"[u'James Mason', u'Shelley Winters', u'Sue Lyon']"
673,7.7,Blow-Up,NOT RATED,Drama,111,"[u'David Hemmings', u'Vanessa Redgrave', u'Sar..."
763,7.6,Hunger,NOT RATED,Biography,96,"[u'Stuart Graham', u'Laine Megaw', u'Brian Mil..."
827,7.5,The Wind That Shakes the Barley,NOT RATED,Drama,127,"[u'Cillian Murphy', u'Padraic Delaney', u'Liam..."


Here found all 'NOT RATED' movies.

2nd Step: To overwrite those 'NOT RATED' Series with missing values (NaN) and remember we already have 3 rows with NaN :

In [10]:
movies[movies.content_rating=='NOT RATED'].content_rating #Select only 'NOT RATED' Series

5      NOT RATED
6      NOT RATED
41     NOT RATED
63     NOT RATED
66     NOT RATED
         ...    
665    NOT RATED
673    NOT RATED
763    NOT RATED
827    NOT RATED
899    NOT RATED
Name: content_rating, Length: 65, dtype: object

Final Step: To overwrite above Series only with NaN and NaN is not a String, its a special value from Numpy library. So we gonna import Numpy 

In [11]:
import numpy as np

So now we should be able to overwrite 'NOT RATED' to NAN as below:

In [12]:
movies[movies.content_rating=='NOT RATED'].content_rating = np.nan

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


Here is our " SettingWithCopyWarning " warning, saying ' A value is trying to be set on a copy of a slice from a DataFrame.' which is not exactly clear after reading the given documentation also. So lets see how to fix this and understand:

So first check the results and see if it worked because its just a warning not an error.

In [13]:
movies.content_rating.isnull().sum()

3

No it didn't work, we were already having 3 values. if it has changed then it should be 68(65 NOT RATED and 3 NaN).

Lets try to fix this line of code which gave warning by using Loc[ ] method with simple modification: 

In [14]:
movies.loc[movies.content_rating=='NOT RATED','content_rating']= np.nan 

We saw that it did not throw any warning, so in the above command we are saying format of .Loc[ ] is to specify which rows we require and only the column 'content_rating' we require and should be overwritted by NaN(Numpy). Lets check the results :

In [15]:
movies.content_rating.isnull().sum()

68

Now we got results as expected. 

Next, lets try to understand what was going to throw warning, in this line of code :

movies[movies.content_rating=='NOT RATED'].content_rating = np.nan 

The above line of code is actually two operation," movies[movies.content_rating=='NOT RATED'] " known as get item which is 1st operation. This part " .content_rating = np.nan " with a reference to whatever was produced by the 1st operation part is known as set item. So we have got get item and set item in the above line of code which was throwing warning.

Problem is with these getitem and setitem, pandas can't promise whether the getitem the 1st part returned a view or copy of the data. if getitem returned view of the data, the setitem would affect the dataframe but if getitem returned copy of the data it would be modifying the copy but copy just get discarded. so the original dataframe does not get modified. 

So again pandas does not know if it resulted a view or a copy. so pandas is trying to warn us that it's not sure what has happened. 

Finally, lets see how Loc[ ] method solves the above line of code problem by turning it from two operations into single set item operation and why this did not throw warning. 
To brief it, if we are trying to select rows and columns in the same line of code then just use Loc[ ] method which always works better with pandas.

So above is the 1st example of SettingWithCopyWarning.

###### 2nd instance ::

Lets assume for the moment, we only require to focus on movies with a very high star rating.

create a separate Dataframe called Top Movies and select out Movie style.

In [16]:
top_movies = movies.loc[movies.star_rating >= 9, :] 
# want all movies with star_rating >= 9 as well as all columns

In [17]:
top_movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."


Will just see 4 movies and this is entire top movies dataframe.

Lets say, after some analysis we notice that the 'duration' for the title 'The Shawshank Redemption' is incorrect.

We notice that and we want to fix it. Lets see how:

In [18]:
top_movies.loc[0, 'duration'] = 150
#From previous example we learned our lesson about using .Loc[ ]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Again we got same " SettingWithCopyWarning " for this line of code too. This is confusing because pandas suggestion is use 
loc[ ] method to above warning.

First, lets check whether it worked because its just a warning not an error. 

In [19]:
top_movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,150,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."


Actually it worked and it did actually modify 'top_movies' , we can see the changes in 'duration' column for the 'title' The Shawshank Redemption.

So this is one reason, why this is not a good idea to turn these warnings off because sometimes the warnings mean it didn't work and sometime the warnings means its going to warn us but still works.

So we should be able to see the warning where we can check if our code worked. 

What was the problem why this line of code with Loc[ ] also generated warning? ::

Well, pandas is actually not sure whether 'top_movies'is of view or copy of the movies and it also not sure that if its reference to a original movies data or its a copy. Thus its trying to warn us, are you modifying one thing 'top_movies' or are you modifying two thing. 

Lets check solution to the above problem:

Copy the line of code where the problem actually started from ::

    top_movies = movies.loc[movies.star_rating >= 9, :] #this if the problem line of code

Whenever we are trying to create a dataframe copy, we should explicitly use .copy() method. Thus pandas can be sure it's a copy and won't get confused about 'top_movies' is view or copy of 4 rows from 'movies'.

In [20]:
top_movies = movies.loc[movies.star_rating >= 9, :].copy()

pandas is now sure 'top_movies' is a copy at this point and we can use the below line of code right here to edit top movies:

In [21]:
top_movies.loc[0, 'duration'] = 150

Here above line of code did not throw warning and lets check 'top_movies' dataframe copy to see the changes.

In [22]:
top_movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,150,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."


Yes, it has edited 'top_movies' duration from 142 to 150. Lets check original 'movies' dataframe:

In [23]:
movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."
...,...,...,...,...,...,...
974,7.4,Tootsie,PG,Comedy,116,"[u'Dustin Hoffman', u'Jessica Lange', u'Teri G..."
975,7.4,Back to the Future Part III,PG,Adventure,118,"[u'Michael J. Fox', u'Christopher Lloyd', u'Ma..."
976,7.4,Master and Commander: The Far Side of the World,PG-13,Action,138,"[u'Russell Crowe', u'Paul Bettany', u'Billy Bo..."
977,7.4,Poltergeist,PG,Horror,114,"[u'JoBeth Williams', u""Heather O'Rourke"", u'Cr..."


'movies' dataframe did not edit the duration here, it is as it was.