![alt text](pandas.png "Title")

In [1]:
import pandas as pd

# Dataframes copies

Remember the discussion of copying mutable objects? It does not create a new object, but instead a view to it.

Dataframe are mutable! Therefore df1=df gives you a view, not a copy. 

__A change on a view changes the original!__

In [2]:
# Creating a small df
df = pd.DataFrame( {'col1': [1, 2, 3]})
df

Unnamed: 0,col1
0,1
1,2
2,3


In [3]:
# Taking a so-called 'copy'
df2 = df

# Changing the new df
df2['col1'] = df2['col1'] * 2
df2 

Unnamed: 0,col1
0,2
1,4
2,6


In [4]:
# The original dataframe has been changed too :o
df

Unnamed: 0,col1
0,2
1,4
2,6


In [5]:
df = pd.DataFrame( {'col1': [1, 2, 3]})

# Maybe I want a copy:
df2 = df.copy()

# here's an alternative way to cut the links between df and df2
# df2 = pd.dataFrame(data=df.values, columns=df.columns)

df2['col1'] = df2['col1'] * 2

# that looks better:
df

Unnamed: 0,col1
0,1
1,2
2,3


## Using mutable objects in a Dataframe is an anti-pattern

Problem is, even copy() does not make a completely independent copy. You can see it when mutable objects (e.g. lists, dictionaries etc) inside a Dataframe!

In [6]:
df = pd.DataFrame( {'col1': [['a', 'b', 'c']] } ) # the value is a list, a mutable object
df

Unnamed: 0,col1
0,"[a, b, c]"


In [7]:
# Let's copy df
df2 = df.copy()

# Say that I want to extract that list and append items.

# I could convert the Series 'col1' to a Python list of lists and take the first item:
mylist = list(df2.col1)[0]
print('The list extracted from the df:', mylist, '\n')  
print(type(mylist))

The list extracted from the df: ['a', 'b', 'c'] 

<class 'list'>


In [8]:
# Adding one item to the list
mylist.append('d')
mylist

['a', 'b', 'c', 'd']

In [9]:
# In df2, we see the appended item too, that's ... interesting (?)
df2

Unnamed: 0,col1
0,"[a, b, c, d]"


In [10]:
# In df, we also see the change despite the copy() :-O
df

Unnamed: 0,col1
0,"[a, b, c, d]"


In [11]:
# Here's a way to cut the link between the list and the df
df = pd.DataFrame( {'col1': [['a', 'b', 'c']] } ) 

# Reconstructing the list from scratch using a comprehension list
mylist = [ item for item in df.col1[0] ]

# Adding one item to the list
mylist.append('d')

# df wasn't modified, phheww
df

Unnamed: 0,col1
0,"[a, b, c]"


## Conclusion

__Avoid__ using mutable objects inside dataframes, unless you know what you're doing! 

df2=df1 doesn't create a copy but a view, which is far more efficient... if that's what you want :-)

__________________________________________________
Nicolas Dupuis ([Analytics Innovation Team](https://jnj.sharepoint.com/teams/AIT/SitePages/Home.aspx "Sharepoint")), 2020