# `SettingWithCopyWarning`

In [22]:
import pandas as pd

## The `SettingWithCopyWarning` should never be ignored!

In [23]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [24]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df.index.name = "Country"
df = df.reset_index()
df["Country"] = df["Country"].astype("string")
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")

In [25]:
df

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
1,Belgium,Brussels,11449656,Philippe,30688
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022
5,Portugal,Lisbon,10295909,,92212
6,Norway,Oslo,5391369,Harald V,385207
7,Greece,Athens,10718565,,131957


Italy decides to change its constitution, and **Queen Vittoria** is the new monarch - time to correct the data!

### Chained indexing

In [26]:
df["Monarch"]

0    Felipe VI
1     Philippe
2         <NA>
3         <NA>
4         <NA>
5         <NA>
6     Harald V
7         <NA>
Name: Monarch, dtype: string

In [27]:
df["Country"] == "Italy"

0    False
1    False
2    False
3     True
4    False
5    False
6    False
7    False
Name: Country, dtype: boolean

In [28]:
df[df["Country"] == "Italy"]

Unnamed: 0,Country,Capital,Population,Monarch,Area
3,Italy,Roma,60390560,,301340


In [29]:
df[df["Country"] == "Italy"]["Monarch"]

3    <NA>
Name: Monarch, dtype: string

In [30]:
df[df["Country"] == "Italy"]["Monarch"] = "Vittoria"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df["Country"] == "Italy"]["Monarch"] = "Vittoria"


<div class="alert alert-warning">

<b>Beware:</b> Every single warning should be read, understood, and <b>the code should be fixed to remove the warning!</b>

</div>

In [31]:
df

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
1,Belgium,Brussels,11449656,Philippe,30688
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022
5,Portugal,Lisbon,10295909,,92212
6,Norway,Oslo,5391369,Harald V,385207
7,Greece,Athens,10718565,,131957


### `.loc[]` and `.iloc[]` methods

In [32]:
df[df["Country"] == "Italy"]["Monarch"]

3    <NA>
Name: Monarch, dtype: string

In [33]:
df.loc[df["Country"] == "Italy", "Monarch"]

3    <NA>
Name: Monarch, dtype: string

In [34]:
df.loc[df["Country"] == "Italy", "Monarch"] = "Vittoria"

In [35]:
df

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
1,Belgium,Brussels,11449656,Philippe,30688
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,Vittoria,301340
4,Germany,Berlin,83122889,,357022
5,Portugal,Lisbon,10295909,,92212
6,Norway,Oslo,5391369,Harald V,385207
7,Greece,Athens,10718565,,131957


<div class="alert alert-success">

<b>Best Practice:</b> Use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>view values</b> (i.e. avoid <code>][</code>).

</div>

<div class="alert alert-danger">

<b>Warning:</b> Always use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>assign values</b>!

</div>

## The `SettingWithCopyWarning` applies to assignments

In [36]:
df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [10, 20, 30, 40, 50]})

In [37]:
df

Unnamed: 0,a,b
0,1,10
1,2,20
2,3,30
3,4,40
4,5,50


For rows of b where a was 3 or less, set them equal to b / 10.

### Chained indexing

In [38]:
df["a"] <= 3

0     True
1     True
2     True
3    False
4    False
Name: a, dtype: bool

In [39]:
df[df["a"] <= 3]

Unnamed: 0,a,b
0,1,10
1,2,20
2,3,30


In [40]:
df[df["a"] <= 3]["b"]

0    10
1    20
2    30
Name: b, dtype: int64

In [41]:
df[df["a"] <= 3]["b"] = df[df["a"] <= 3]["b"] / 10

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df["a"] <= 3]["b"] = df[df["a"] <= 3]["b"] / 10


<div class="alert alert-warning">

<b>Beware:</b> Every single warning should be read, understood, and <b>the code should be fixed to remove the warning!</b>

</div>

In [42]:
df

Unnamed: 0,a,b
0,1,10
1,2,20
2,3,30
3,4,40
4,5,50


### `.loc[]` and `.iloc[]` methods

In [43]:
df[df["a"] <= 3]["b"]

0    10
1    20
2    30
Name: b, dtype: int64

In [44]:
df.loc[df["a"] <= 3, "b"]

0    10
1    20
2    30
Name: b, dtype: int64

The `.loc[]` and `.iloc[]` methods **must be used for assignement**, i.e. on the left of the equal sign:

In [45]:
df[df["a"] <= 3]["b"] = df[df["a"] <= 3]["b"] / 10

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df["a"] <= 3]["b"] = df[df["a"] <= 3]["b"] / 10


In [46]:
df.loc[df["a"] <= 3, "b"] = df[df["a"] <= 3]["b"] / 10

In [47]:
df

Unnamed: 0,a,b
0,1,1.0
1,2,2.0
2,3,3.0
3,4,40.0
4,5,50.0


<div class="alert alert-success">

<b>Best Practice:</b> Use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>view values</b> (i.e. avoid <code>][</code>).

</div>

<div class="alert alert-danger">

<b>Warning:</b> Always use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>assign values</b>!

</div>

## The `SettingWithCopyWarning` can come from much earlier!

In [48]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [49]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df.index.name = "Country"
df = df.reset_index()
df["Country"] = df["Country"].astype("string")
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")

In [50]:
df

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
1,Belgium,Brussels,11449656,Philippe,30688
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022
5,Portugal,Lisbon,10295909,,92212
6,Norway,Oslo,5391369,Harald V,385207
7,Greece,Athens,10718565,,131957


Focus on large countries:

In [51]:
large = df[df["Population"] > 30_000_000]

In [52]:
large

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022


<div class="alert alert-info">

<b>Note:</b> In general, <code>pandas</code> cannot guarantee whether it returns a view or a copy of the underlying data.

</div>

...do lots of analysis on the `large` DataFrame...

Italy decides to change its constitution, and **Queen Vittoria** is the new monarch - time to correct the data!

### Chained indexing

In [53]:
large[large["Country"] == "Italy"]["Monarch"] = "Vittoria"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  large[large["Country"] == "Italy"]["Monarch"] = "Vittoria"


<div class="alert alert-warning">

<b>Beware:</b> Every single warning should be read, understood, and <b>the code should be fixed to remove the warning!</b>

</div>

In [54]:
large

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022


### `.loc[]` and `.iloc[]` methods

In this case, using the `.loc[]` and `.iloc[]` methods is not enough, because the `large` DataFrame itself was selected with indexing:

In [55]:
large.loc[large["Country"] == "Italy", "Monarch"] = "Vittoria"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


Note that in this case, the DataFrame was actually modified... but it is not guaranteed, and **the same code will fail on a different DataFrame!**

In [56]:
large

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,Vittoria,301340
4,Germany,Berlin,83122889,,357022


The issue is that the `large` DataFrame is itself defined by indexing, so there is still chained indexing even with `.loc[]`:

```python
large = df[df["Population"] > 30_000_000]
...
large.loc[large["Country"] == "Italy", "Monarch"] = "Vittoria"
large.loc[df[df["Population"] > 30_000_000]["Country"] == "Italy", "Monarch"] = "Vittoria"

```

To fix this issue, define `large` using `.copy()` to ensure that it is not a view:

In [57]:
large = df[df["Population"] > 30_000_000].copy()

In [58]:
large

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,,301340
4,Germany,Berlin,83122889,,357022


In [59]:
large.loc[large["Country"] == "Italy", "Monarch"] = "Vittoria"

In [60]:
large

Unnamed: 0,Country,Capital,Population,Monarch,Area
0,Spain,Madrid,46733038,Felipe VI,505990
2,France,Paris,67076000,,640679
3,Italy,Roma,60390560,Vittoria,301340
4,Germany,Berlin,83122889,,357022
