# `SettingWithCopyWarning`

In [None]:
import pandas as pd

## The `SettingWithCopyWarning` should never be ignored!

In [None]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [None]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df.index.name = "Country"
df = df.reset_index()
df["Country"] = df["Country"].astype("string")
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")

In [None]:
df

Italy decides to change its constitution, and **Queen Vittoria** is the new monarch - time to correct the data!

### Chained indexing

In [None]:
df["Monarch"]

In [None]:
df["Country"] == "Italy"

In [None]:
df[df["Country"] == "Italy"]

In [None]:
df[df["Country"] == "Italy"]["Monarch"]

In [None]:
df[df["Country"] == "Italy"]["Monarch"] = "Vittoria"

<div class="alert alert-warning">

<b>Beware:</b> Every single warning should be read, understood, and <b>the code should be fixed to remove the warning!</b>

</div>

In [None]:
df

### `.loc[]` and `.iloc[]` methods

In [None]:
df[df["Country"] == "Italy"]["Monarch"]

In [None]:
df.loc[df["Country"] == "Italy", "Monarch"]

In [None]:
df.loc[df["Country"] == "Italy", "Monarch"] = "Vittoria"

In [None]:
df

<div class="alert alert-success">

<b>Best Practice:</b> Use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>view values</b> (i.e. avoid <code>][</code>).

</div>

<div class="alert alert-danger">

<b>Warning:</b> Always use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>assign values</b>!

</div>

## The `SettingWithCopyWarning` applies to assignments

In [None]:
df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [10, 20, 30, 40, 50]})

In [None]:
df

For rows of b where a was 3 or less, set them equal to b / 10.

### Chained indexing

In [None]:
df["a"] <= 3

In [None]:
df[df["a"] <= 3]

In [None]:
df[df["a"] <= 3]["b"]

In [None]:
df[df["a"] <= 3]["b"] = df[df["a"] <= 3]["b"] / 10

<div class="alert alert-warning">

<b>Beware:</b> Every single warning should be read, understood, and <b>the code should be fixed to remove the warning!</b>

</div>

In [None]:
df

### `.loc[]` and `.iloc[]` methods

In [None]:
df[df["a"] <= 3]["b"]

In [None]:
df.loc[df["a"] <= 3, "b"]

The `.loc[]` and `.iloc[]` methods **must be used for assignement**, i.e. on the left of the equal sign:

In [None]:
df[df["a"] <= 3]["b"] = df[df["a"] <= 3]["b"] / 10

In [None]:
df.loc[df["a"] <= 3, "b"] = df[df["a"] <= 3]["b"] / 10

In [None]:
df

<div class="alert alert-success">

<b>Best Practice:</b> Use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>view values</b> (i.e. avoid <code>][</code>).

</div>

<div class="alert alert-danger">

<b>Warning:</b> Always use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>assign values</b>!

</div>

## The `SettingWithCopyWarning` can come from much earlier!

In [None]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [None]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df.index.name = "Country"
df = df.reset_index()
df["Country"] = df["Country"].astype("string")
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")

In [None]:
df

Focus on large countries:

In [None]:
large = df[df["Population"] > 30_000_000]

In [None]:
large

<div class="alert alert-info">

<b>Note:</b> In general, <code>pandas</code> cannot guarantee whether it returns a view or a copy of the underlying data.

</div>

...do lots of analysis on the `large` DataFrame...

Italy decides to change its constitution, and **Queen Vittoria** is the new monarch - time to correct the data!

### Chained indexing

In [None]:
large[large["Country"] == "Italy"]["Monarch"] = "Vittoria"

<div class="alert alert-warning">

<b>Beware:</b> Every single warning should be read, understood, and <b>the code should be fixed to remove the warning!</b>

</div>

In [None]:
large

### `.loc[]` and `.iloc[]` methods

In this case, using the `.loc[]` and `.iloc[]` methods is not enough, because the `large` DataFrame itself was selected with indexing:

In [None]:
large.loc[large["Country"] == "Italy", "Monarch"] = "Vittoria"

Note that in this case, the DataFrame was actually modified... but it is not guaranteed, and **the same code will fail on a different DataFrame!**

In [None]:
large

The issue is that the `large` DataFrame is itself defined by indexing, so there is still chained indexing even with `.loc[]`:

```python
large = df[df["Population"] > 30_000_000]
...
large.loc[large["Country"] == "Italy", "Monarch"] = "Vittoria"
large.loc[df[df["Population"] > 30_000_000]["Country"] == "Italy", "Monarch"] = "Vittoria"

```

To fix this issue, define `large` using `.copy()` to ensure that it is not a view:

In [None]:
large = df[df["Population"] > 30_000_000].copy()

In [None]:
large

In [None]:
large.loc[large["Country"] == "Italy", "Monarch"] = "Vittoria"

In [None]:
large