Strings that look like identifiers get interned

In [1]:
a = 'hello'
b = 'hello'

In [2]:
print(id(a), id(b))

140490938334256 140490938334256


Strings that don't look like identifiers (e.g. strings with spaces) don't get interned

In [3]:
a = 'hello world'
b = 'hello world'

In [4]:
print(id(a), id(b))

140490938131568 140490938131632


In [5]:
a == b

True

In [6]:
a is b

False

In [8]:
a = 'hello'
b = 'hello'
a == b

True

In [9]:
a is b

True

Even really long strings that look like identifiers get interned

In [10]:
a = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
b = '_this_is_a_long_string_that_could_be_used_as_an_identifier'

In [11]:
a is b

True

IF you want to manually intern a String, you need to intern every instance of that String.

In [12]:
import sys

In [13]:
a = sys.intern('hello world')
b = sys.intern('hello world')
c = 'hello world'

In [14]:
print(id(a), id(b), id(c))

140490937689584 140490937689584 140490937688176


In [15]:
a == b    # This is a literal string comparison, O(n) time complexity

True

In [16]:
a is b    # This is an interger comparison, O(1) time complexity + the time to do the interning

True

Performance comparison between the equality and identity operators

In [17]:
def compare_using_equals(n):
    a = 'a long string that is not interned' * 200
    b = 'a long string that is not interned' * 200
    
    for i in range(n):
        if a == b:
            pass
    

In [18]:
def compare_using_interning(n):
    a = sys.intern('a long string that is not interned' * 200)
    b = sys.intern('a long string that is not interned' * 200)
    
    for i in range(n):
        if a is b:
            pass

In [19]:
import time

In [20]:
start = time.perf_counter()
compare_using_equals(10000000)
end = time.perf_counter()
print('equality: ', end-start)

equality:  3.804283340999973


In [21]:
start = time.perf_counter()
compare_using_interning(10000000)
end = time.perf_counter()
print('equality: ', end-start)

equality:  0.6854307760004303
