# Interning

`Interning`: reusing objects on-demand
At startup, Python pre-loads(caches) a global list of integers in the range [-5, 256].
When an integer in that range is referenced python will use that cached version of the object.<br>
`Singletons` - classes that can be instanciataed once.<hr>
It is for optimization - small integers a used very often<br>

## Int intering

In [1]:
a = 10

In [2]:
b = 10

In [3]:
def mem(x):
    print(f"id = {hex(id(x))}\ntype = {type(x)}")

In [4]:
mem(a)

id = 0x2acc1e76a50
type = <class 'int'>


In [5]:
mem(b)

id = 0x2acc1e76a50
type = <class 'int'>


In [6]:
a = -5

In [7]:
b = -5

In [8]:
mem(a)

id = 0x2acc1e76870
type = <class 'int'>


In [9]:
mem(b)

id = 0x2acc1e76870
type = <class 'int'>


In [10]:
a is b

True

In [11]:
a = 256

In [12]:
b = 256

In [13]:
a is b

True

In [14]:
a = 257

In [15]:
b = 257

In [16]:
a is b

False

In [17]:
a = 10

In [18]:
b = int(10)

In [19]:
c = int('10')

In [20]:
d = int('1010', 2)

In [21]:
d is a

True

In [22]:
mem(a)

id = 0x2acc1e76a50
type = <class 'int'>


In [23]:
mem(d)

id = 0x2acc1e76a50
type = <class 'int'>


In [24]:
mem(c)

id = 0x2acc1e76a50
type = <class 'int'>


In [25]:
mem(b)

id = 0x2acc1e76a50
type = <class 'int'>


## string intering

Some strings are also interned by python - but not all.<br>
`identifiers` are interned
- variable names
- function names
- class names ...
`Identifiers` must start with an `_` or `[aA-zZ]` and can contains just `[_ - aA-zA- 0 - 9]` <hr>

* strings that look like identifiers get interned
* although it starts with a digit it may still get interned <br>
But not always

## Why ?
For optimization
Not all strings are interned by python, but we can `force` strings to be interned by using the sys.intern() method.
```
import sys

a = sys.intern('Salut le Monde')
b = sys.intern('Salut le Monde')

a is b ==> True
```
DO NOT DO IT unless:
* We wnat to tokenize a large number of strings that could have high repetition
* When a a string in a text repeats regulary in a text(NLP)
* lots of string comparisons

In [26]:
a = 'hello'

In [27]:
b = 'hello'

In [28]:
mem(a)

id = 0x2acc6927070
type = <class 'str'>


In [29]:
mem(b)

id = 0x2acc6927070
type = <class 'str'>


In [30]:
a = 'hello world'

In [31]:
b = 'hello world'

In [32]:
mem(a)

id = 0x2acc6d2f630
type = <class 'str'>


In [33]:
mem(b)

id = 0x2acc6d2f230
type = <class 'str'>


In [34]:
a is b

False

In [35]:
a == b

True

In [36]:
a = '_ce_text_est_tres_long'

In [37]:
b = '_ce_text_est_tres_long'

In [38]:
a is b

True

In [39]:
import sys

In [40]:
a = sys.intern('salut le monde d\'amateurs')

In [41]:
b = sys.intern('salut le monde d\'amateurs')

In [42]:
c = 'salut le monde d\'amateurs'

In [43]:
mem(a)

id = 0x2acc6d576c0
type = <class 'str'>


In [44]:
mem(b)

id = 0x2acc6d576c0
type = <class 'str'>


In [45]:
mem(c)

id = 0x2acc6c6f5d0
type = <class 'str'>


In [46]:
a is b

True

In [47]:
a == b

True

In [48]:
def compare_using_equals(n):
    a = 'une longue phrase qui n\'est pas interned' * 201
    b = 'une longue phrase qui n\'est pas interned' * 201
    for i in range(n):
        if a == b:
            pass

In [49]:
def compare_using_interning(n):
    a = sys.intern('une longue phrase qui n\'est pas interned' * 201)
    b = sys.intern('une longue phrase qui n\'est pas interned' * 201)
    for i in range(n):
        if a is b:
            pass

In [50]:
import time

In [54]:
start = time.perf_counter()
compare_using_equals(10000000)
end = time.perf_counter()

print(f'performance: {end-start}')

performance: 3.0773337999999058


In [55]:
start = time.perf_counter()
compare_using_interning(10000000)
end = time.perf_counter()

print(f'performance: {end-start}')

performance: 0.25403300000107265
