# <center><div style="width: 370px;"> ![Panel Data](pictures/Panel_Data.jpg)

# <center> Copying

In [1]:
import pandas as pd
import numpy as np

The [`copy()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html#pandas.DataFrame.copy) method on pandas objects copies the underlying data (though not
the axis indexes, since they are immutable) and returns a new object. Note that
**it is seldom necessary to copy objects**. For example, there are only a
handful of ways to alter a DataFrame *in-place*:

- Inserting, deleting, or modifying a column.
- Assigning to the index or columns attributes.
- For homogeneous data, directly modifying the values via the values attribute or advanced indexing (Will be discussed later in indexing section).

To be clear, no pandas method has the side effect of modifying your data; almost every method returns a new object, leaving the original object untouched. **If the data is modified, it is because you did so explicitly.**

## Copy Data for Modification

Pandas `copy()` function is used to create a copy of the Pandas object. Variables are also used to generate a copy of the object. Still, variables are just pointer to an object, and any change in new data will also change the previous data. For example:

In [2]:
s = pd.Series(['a', 'b', 'c', 'd'])
s

0    a
1    b
2    c
3    d
dtype: object

In [3]:
# creating reference of series
new_s = s

In [4]:
new_s[1]='Changed value'

In [5]:
new_s

0                a
1    Changed value
2                c
3                d
dtype: object

In [6]:
s

0                a
1    Changed value
2                c
3                d
dtype: object

To copy Pandas DataFrame, use the `copy()` method. The `DataFrame.copy()` method makes a copy of the provided object’s indices and data. The `copy()` method accepts one parameter called deep, and it returns the `Series` or `DataFrame` that matches the caller.

In [7]:
s = pd.Series(['a', 'b', 'c', 'd'])
s

0    a
1    b
2    c
3    d
dtype: object

In [8]:
new_s = s.copy()

In [9]:
new_s[1] = 'Changed value'

In [10]:
new_s

0                a
1    Changed value
2                c
3                d
dtype: object

In [11]:
s

0    a
1    b
2    c
3    d
dtype: object

When `deep=False`, the new object will be generated without copying the calling object’s data or index (only references to the data and Index are copied). Any modifications to the data of the original will be followed in the shallow copy (and vice versa).

# <center><div style="width: 370px;"> ![Panel Data](pictures/shallowcopy.jpg)

When `deep=False`, the new object will be generated without copying the calling object’s data or index (only references to the data and Index are copied). Any modifications to the data of the original will be followed in the shallow copy (and vice versa).

# <center><div style="width: 370px;"> ![Panel Data](pictures/deepcopy.jpg)

You may be wondering if shallow copy doesn't copy the data, what is the difference between shallow copy and direct assignment? Let's explore it with an example.

In [12]:
df = pd.DataFrame(
    [[1, 'a'], [2, 'b']],
    columns=['int', 'string']
)

df

Unnamed: 0,int,string
0,1,a
1,2,b


In [13]:
new_df = df

In [14]:
new_df['new_column'] = 0

In [15]:
new_df

Unnamed: 0,int,string,new_column
0,1,a,0
1,2,b,0


In [16]:
df

Unnamed: 0,int,string,new_column
0,1,a,0
1,2,b,0


In [17]:
df.int is new_df.int, df.string is new_df.string

(True, True)

In [18]:
df.index is new_df.index

True

In [20]:
df.new_column is new_df.new_column

True

A simple assignment reflects the changes. Let's see shallow copy now:

In [21]:
df = pd.DataFrame(
    [[1, 'a'], [2, 'b']],
    columns=['int', 'string']
)

df

Unnamed: 0,int,string
0,1,a
1,2,b


In [22]:
new_df = df.copy(deep=False)

In [23]:
new_df['new_column'] = 0

In [24]:
new_df

Unnamed: 0,int,string,new_column
0,1,a,0
1,2,b,0


In [25]:
df

Unnamed: 0,int,string
0,1,a
1,2,b


As you can see, the new column is not added to the original dataframe as the reference to the dataframe has been copied. In general, a shallow copy allows you

- Have access to frames data without copying it (memory optimization, etc.)
- Modify frames structure without reflecting it to the original dataframe

Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.

Note that the original data is still shared:

In [26]:
np.may_share_memory(df.string, new_df.string)

True

In [27]:
np.may_share_memory(df.int, new_df.int)

True

In [28]:
np.may_share_memory(df.index, new_df.index)

True

In [29]:
new_df.at[0, 'string'] = 'new value'

In [30]:
new_df

Unnamed: 0,int,string,new_column
0,1,new value,0
1,2,b,0


In [31]:
df

Unnamed: 0,int,string
0,1,new value
1,2,b
