# Copying Objects using Shallow and Deep Operations

- Objects: *Label* vs *Identity* vs *Value* 
- `id()` function and the is operand 
- Shallow and Deep copies of the objects

In [1]:
# we're assigning (reference to) a label (variable) 
a_list = [1, 'New York', 100] # with an object (could consist of other objects) 

a_string = '10 days to departure'
print(a_string, id(a_string)) # using the id() function gives us the identity of an object 

10 days to departure 3006270118768


Why do we even use `id()` function?
- It's more often used to debug the code 
- To check our copies are referring to the right object 



**Shallow** vs **Deep** Copies 
- Shallow refers to the same reference of the object 
- Deep creates a new instance of that object 

We could see this with `is` vs `==`:
- *is* checks if it refers to the same object or not (referring to the same id()?) 
- == checks if the values of both operands are equal

In [1]:
a_string = ['10', 'days', 'to', 'departure']
b_string = a_string # Shallow Copy (same list reference in this case) 

print('a_string identity:', id(a_string))
print('b_string identity:', id(b_string))
print('The result of the value comparison:', a_string == b_string)
print('The result of the identity comparison:', a_string is b_string) # this is true because the id() are the same 


a_string identity: 1794571522432
b_string identity: 1794571522432
The result of the value comparison: True
The result of the identity comparison: True


In [2]:
a_string = ['10', 'days', 'to', 'departure']
b_string = ['10', 'days', 'to', 'departure']

print('a_string identity:', id(a_string))
print('b_string identity:', id(b_string)) # these two have different id() 
print('The result of the value comparison:', a_string == b_string) # they are equally the same list
print('The result of the identity comparison:', a_string is b_string) # but dont have the same id() which is why is is returning false 


a_string identity: 1794572872704
b_string identity: 1794572872768
The result of the value comparison: True
The result of the identity comparison: False


In [7]:
a_list = [10, "banana", [997, 123]]
b_list = a_list[:] # we get a fresh copy of a_list object using [:] to copy all values during the slice 
print("a_list contents:", a_list)
print("b_list contents:", b_list)
print("Is it the same object?", a_list is b_list) # different ids same list object  

a_list contents: [10, 'banana', [997, 123]]
b_list contents: [10, 'banana', [997, 123]]
Is it the same object? False


In [8]:
# modifying a value in b_list will modify the a_list
b_list[2][0] = 112
print("a_list contents:", a_list)
print("b_list contents:", b_list) # this is an example of shallow copy
print("Is it the same object?", a_list is b_list)

a_list contents: [10, 'banana', [112, 123]]
b_list contents: [10, 'banana', [112, 123]]
Is it the same object? False


In [9]:
# what about separate copies?
import copy 
a_list = [10, "banana", [997, 123]]
b_list = copy.deepcopy(a_list) # making a deep copy of all the objects found in the original list 
print("a_list contents:", a_list)
print("b_list contents:", b_list)
print("Is it the same object?", a_list is b_list) # we know the ids are different 

a_list contents: [10, 'banana', [997, 123]]
b_list contents: [10, 'banana', [997, 123]]
Is it the same object? False


In [10]:
# let's see if modifying b_list will change a_list
b_list[2][0] = 112
print("a_list contents:", a_list)
print("b_list contents:", b_list) # because of the deep copy we have a new reference of the same list
print("Is it the same object?", a_list is b_list)

a_list contents: [10, 'banana', [997, 123]]
b_list contents: [10, 'banana', [112, 123]]
Is it the same object? False


### Just a recap of Shallow vs Deep copies

*Shallow* just means that we have the same list reference; therefore, if we change the value in one list it will affect the other 

*Deep* on the other hand, creates a new reference to the same object (as we could use it with dictionaries and custom class too); therefore, if we change the value in one object we don't change the value for the other

### Recap of is vs ==:

*is* deals with checking the id() while *==* checks the value.

Two variables could have the same value (True with *==*) but different `id()` (False with *is*) depending on the assignment of that variable 

```
python

# same ids() 
a_list = ['ele1', 'ele2']
b_list = a_list

# different ids() 
b_list = a_list[:]
```

### It's VERY important to note:

Just because something have different ids **DOESNT** mean it's a **deep copy**. That's something i misunderstood...

Look at how `b_list = a_list[:]` have difference `id()` yet they're a *shallow copy* given that they refer to the same memory address; therefore, changing one will change the other.

In [14]:
import copy

a_dict = {
    'first name': 'James',
    'last name': 'Bond',
    'movies': ['Goldfinger (1964)', 'You Only Live Twice']
    }
b_dict = copy.deepcopy(a_dict)

print('Same memory chunk?', a_dict is b_dict)
# Changing a will not affect b
a_dict['movies'].append('Diamonds Are Forever (1971)')
print('a_dict movies:', a_dict['movies'])
print('b_dict movies:', b_dict['movies'])

Same memory chunk? False
a_dict movies: ['Goldfinger (1964)', 'You Only Live Twice', 'Diamonds Are Forever (1971)']
b_dict movies: ['Goldfinger (1964)', 'You Only Live Twice']


In [13]:
# look at deepcopies of class variables 
import copy

class Example:
    def __init__(self):
        self.properties = ["112", "997"]
        print("Hello from __init__()")

a_example = Example()
# notice how we dont get another __init__ print because the object we're referring to has already been initialized 
b_example = copy.deepcopy(a_example)
print("Same memory chunk?", a_example is b_example)
b_example.properties.append('911')
print()
# again deep copies allow different references because of new instances of the same objects 
print('a_example.properties:', a_example.properties)
print('b_example.properties:', b_example.properties)

Hello from __init__()
Same memory chunk? False

a_example.properties: ['112', '997']
b_example.properties: ['112', '997', '911']


## Pickle Module 

**Pickling** is the process of extending the lifespan of food

We want to extend our python objects through *serialization*

**serialization** converts an object structure into bytes containing all of the information to reconstruct that object 

**Deserialization** would be the opposite 

The `pickle` module could be used to "pickle" your python objects for later use 

The following types can be pickled:
-None, booleans;
- integers, floating-point numbers, complex numbers;
- strings, bytes, bytearrays;
- tuples, lists, sets, and dictionaries containing pickleable objects;
- objects, including objects with references to other objects (remember to avoid cycles!)
- references to functions and classes, but not their definitions.


In [16]:
import pickle 

# creating our objects
a_dict = dict()
a_dict['EUR'] = {'code':'Euro', 'symbol': '€'}
a_dict['GBP'] = {'code':'Pounds sterling', 'symbol': '£'}
a_dict['USD'] = {'code':'US dollar', 'symbol': '$'}
a_dict['JPY'] = {'code':'Japanese yen', 'symbol': '¥'}

a_list = ['a', 123, [10, 100, 1000]]

# pickling our objects into a file
file_loc = 'practice_files/multidata.pckl' # mind the pckl extension 
# note that we're using write byte mode
with open(file_loc, 'wb') as file_out:
    pickle.dump(a_list, file_out)
    pickle.dump(a_list, file_out)

In [20]:
with open(file_loc, 'rb') as file_in:
    data1 = pickle.load(file_in)
    data2 = pickle.load(file_in)

print(f'{data1} is a {type(data1)}\n{data2} is a {type(data2)}') # we could see that it preserved the type of data we serialized

['a', 123, [10, 100, 1000]] is a <class 'list'>
['a', 123, [10, 100, 1000]] is a <class 'list'>


With our `pickle` module we're focused on two things:
- `pickle.dump(data_object, file_destination)`
- `pickle.load(bytes_object)`

It's similar to the json module 

Like we said about the pickle module...

We that serialized object could be persisted in a db or sent via a network

Our `pickel.dumps(data_object, file)` function expects an **initial object** but not a **file** 

As for our 'pickle.load(bytes_object)` function, we could send a **file object** as an argument but also a normal byte object that we **serialized using dump** 

In [22]:
# remember pickle was already imported in this ipynb
a_list = ['a', 123, [10, 100, 1000]]
a_bytes = pickle.dumps(a_list)
print('Intermediate object type, used to preserve data:', type(bytes))

Intermediate object type, used to preserve data: <class 'bytes'>


In [23]:
# therefore when you receive a bytes object from an appropriate driver you can deserialize it
b_list = pickle.loads(bytes) 
print('Contents:', b_list)

Contents: ['a', 123, [10, 100, 1000]]


### Expected Pickling Errors 

`PicklingError` exception will be raised if we pickle a non-pickleable object 

`RecursionError` for highly recursive data structure 

`AttributeError` for unrecognizable namespace of your code

In [33]:
# functions are pickleable 
def f1():
    print("Hello from da jar :D") 

with open('practice_files/function.pckl', 'wb') as file_out:
    pickle.dump(f1, file_out)
 
with open('practice_files/function.pckl', 'rb') as file_in:
    data = pickle.load(file_in)

print(type(data))
data()

<class 'function'>
Hello from da jar :D


In [38]:
# pickling a class
class Cucumber:
    def __init__(self):
        self.__size = 'small'
        
    # encapsulation :D 
    def get_size(self):
        return self.__size

cucu = Cucumber()
with open('practice_files/cucumber.pckl', 'wb') as file_out:
    pickle.dump(cucu, file_out)

with open('practice_files/cucumber.pckl', 'rb') as file_in:
    data = pickle.load(file_in)

print(data.get_size())

small


### What about the shelve module?

We know that pickling works by changing the serialized data into a single byte stream 

`Shelve` module works using a **serialization dictionary** where pickled objects are **associated with a key**

In [42]:
import shelve

shelve_name = 'practice_files/first_shelve.shlv'

"""
Shelve has it's own optioanl flag parameter:
r = reading only 
w = reading and writing 
c = reading, writing and creating if not exist (default value) 
n = create new database for reading and writing 
"""

# here our flag is c to show that if not exist we create 
my_shelve = shelve.open(shelve_name, flag='c')

# adding the data to our shelf 
my_shelve['EUR'] = {'code':'Euro', 'symbol': '€'}
my_shelve['GBP'] = {'code':'Pounds sterling', 'symbol': '£'}
my_shelve['USD'] = {'code':'US dollar', 'symbol': '$'}
my_shelve['JPY'] = {'code':'Japanese yen', 'symbol': '¥'}

my_shelve.close()

In [43]:
# open our shelf and access data like a dictionary
new_shelve = shelve.open(shelve_name)
print(new_shelve['USD'])
new_shelve.close()

{'code': 'US dollar', 'symbol': '$'}


### So tell me how shelves work again?

We start by importing the 'shelve' module:
`import shelve`

To use shelve we mainly use 
`shelve.open(file, flag='mode')` 
where mode could be:
- r for reading 
- w for reading and writing 
- c (default) for reading, writing, and creating new file if it doesnt exist 
- n for creating a new emptied database for reading and writing 

Knowing this, lets create our shelve by attaching it to a variable 
`new_shelve = shelve.open('file.shlv', flag='c')`

To add data, treat this as a dictionary:
`new_shelve['key'] = dict`

Then like other files we close it 
`new_shelve.close()`

To access our data with the shlv file:
- we open it using `shelve.open(file.shlv)`
- access data based on keys like you could with any dictionary 
    - `new_shlv['key']`
- We close that file 
    - `new_shelve.close()`

   

## New File extensions to know 
- Pickle = pckl 
- Shelve = shlv