In [7]:
import pandas as pd
import numpy as np

This notebook are tokes takes from a video made by Kevin Markham. <br>
Video: [How do I avoid a SettingWithCopyWarning in pandas?](https://www.youtube.com/watch?v=4R4WsDJ-KVc&list=RDQMhV94pbwVKoI&index=7)

In [9]:
df = pd.read_csv('datasets/imdb_ratings.csv')
df.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


#### Find the movies without a content rating

In [12]:
df[df.content_rating.isna()]

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
187,8.2,Butch Cassidy and the Sundance Kid,,Biography,110,"[u'Paul Newman', u'Robert Redford', u'Katharin..."
649,7.7,Where Eagles Dare,,Action,158,"[u'Richard Burton', u'Clint Eastwood', u'Mary ..."
936,7.4,True Grit,,Adventure,128,"[u'John Wayne', u'Kim Darby', u'Glen Campbell']"


In [11]:
df.content_rating.value_counts(dropna=False)

R            460
PG-13        189
PG           123
NOT RATED     65
APPROVED      47
UNRATED       38
G             32
PASSED         7
NC-17          7
X              4
NaN            3
GP             3
TV-MA          1
Name: content_rating, dtype: int64

As we see above, there are 65 movies that are labeled 'NOT RATED'. Those movies did not show up when we used the `isna()` method. <br>
Let's change tose 'NOT RATED' values to NaNs instead. (There are several ways to do this, but here we will do it in a way that gives us a `SettingWithCopyWarning`)

In [8]:
df[df.content_rating == 'NOT RATED'].content_rating = np.nan

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


The above resulted in a warning from pandas and did not actually change any of the values. 

In [14]:
df.content_rating.isna().sum()

3

If we tweak the code a little bit, we can make it work for us. 

In [15]:
df.loc[df.content_rating == 'NOT RATED', 'content_rating'] = np.nan

# check if the code above actually change the values
df.content_rating.isna().sum()

68

In [16]:
top_movies = df.loc[df.star_rating >= 9, :]

In [17]:
top_movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."


The duration of The Shawshank Redemption should be 150 minutes. Let's change that. 

In [18]:
top_movies.loc[0, 'duration'] = 150

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


Even though we got the same warning, the above code actually worked

In [21]:
top_movies.head(1)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,150,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."


The problem is actually not with the statement above, but rather with this line:
```python
top_movies = df.loc[df.star_rating >= 9, :]
```
<br>
When we create a new DataFrame out of a subset of an existing DataFrame we should include `.copy()` like this:

```python
top_movies = df.loc[df.star_rating >= 9, :].copy()
```

In [22]:
top_movies = df.loc[df.star_rating >= 9, :].copy()
top_movies.loc[0, 'duration'] = 150
top_movies

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,150,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
