# Garbage Collection in Python
> how to clear memory after using it in Python

it's a begginer level notebook. it discuss garbage collection and gc module in python.

In [1]:
import datatable as dt
import pandas as pd

In [2]:
data = dt.fread('car-sales.csv').to_pandas()
data.head()

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"


In [3]:
data_2 = data

In [4]:
print("data id", id(data))
print("data_2 id", id(data_2))

data id 139997003741120
data_2 id 139997003741120


**it's the same address, so changing in one means change in the other**

In [5]:
data['test_1'] = 'test_1'

In [23]:
display(data.head(1))
display(data_2.head(1))

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"


Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_2
0,Toyota,White,150043,4,"$4,000.00",test_2


**but using the copy() method it creates a variable with a different address**

In [7]:
data_3 = data.copy()
data_3.head(1)

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_1
0,Toyota,White,150043,4,"$4,000.00",test_1


In [8]:
print("data id", id(data))
print("data_2 id", id(data_2))
print("data_3 id", id(data_3))

data id 139997003741120
data_2 id 139997003741120
data_3 id 139995637919856


In [9]:
data_2["test_2"] = "test_2"

In [10]:
display(data.head(1))
display(data_2.head(1))

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_1,test_2
0,Toyota,White,150043,4,"$4,000.00",test_1,test_2


Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_1,test_2
0,Toyota,White,150043,4,"$4,000.00",test_1,test_2


**Changing a table inplace means it doesn't change its address. It works with the same allocated memory**

In [11]:
data.drop('test_1', axis=1, inplace=True)

In [12]:
display(data.head(1))
display(data_2.head(1))

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_2
0,Toyota,White,150043,4,"$4,000.00",test_2


Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_2
0,Toyota,White,150043,4,"$4,000.00",test_2


**But assigning it to a new variable means a new memory slot needs to be allocated**

In [13]:
data = data.drop('test_2', axis=1)

In [14]:
display(data.head(1))
display(data_2.head(1))

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"


Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price,test_2
0,Toyota,White,150043,4,"$4,000.00",test_2


**What if we split the data does it change the address?**

In [15]:
X = data.drop('Price', axis=1)
y = data['Price']

In [16]:
print("X id", id(X))
print("y id", id(y))
print("data id", id(data))

X id 139995637485824
y id 139995637488032
data id 139995637486688


## Trying it with simple variables

In [17]:
a = 1
b = a
c = a + 1

print("a id", id(a))
print("b id", id(b))
print("c id", id(c))

a = a+1

print()
print("a id", id(a))
print("b id", id(b))
print("c id", id(c))

a id 139997058959664
b id 139997058959664
c id 139997058959696

a id 139997058959696
b id 139997058959664
c id 139997058959696


In [18]:
a = 1
b = a
c = a + 1

print("a id", id(a))
print("b id", id(b))
print("c id", id(c))

a = b+1

print()
print("a id", id(a))
print("b id", id(b))
print("c id", id(c))

a id 139997058959664
b id 139997058959664
c id 139997058959696

a id 139997058959696
b id 139997058959664
c id 139997058959696


**so any thing in the the rhs will evaluate as a new variable, therefore new memory address**

Just noticed that increasing the value of a by 1 (now it's 2, the same as c) will make the address of a and c the same.

Does that mean Python will create one memory slot for a specific value and any variable with that value will get the same address?

In [19]:
d = 2
print("d id", id(d))

d id 139997058959696


it looks like it. i'm impressed ngl

i read a little into an [article](https://stackify.com/python-garbage-collection/#wpautbox_about) about python garbage collection.

What i got is, yes, it allocate only one memory slot for the object and in the underlying layers it got something called reference count, where it keeps count of all objects uses this memory slot and when the count is equal to 0 it deallocate it immediatly.

In [20]:
import sys
sys.getrefcount(2)
e = 2
sys.getrefcount(2)
del e
sys.getrefcount(2)

1928

1929

1928

from this [article](https://www.pythonpool.com/python-clear-memory/)
> After using this type of statement, the objects are no longer accessible for the given code. But, the objects are still there in the memory.

so we use `gc.collect()` after it to free the memory.

In [21]:
import gc

In [22]:
a = 19
del a
gc.collect()

0