<a href="https://colab.research.google.com/github/aserdargun/DSML101/blob/main/python/Part_1_Section_08_Tuples_as_Data_Structures_and_Named_Tuples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PART 1: FUNCTIONAL PROGRAMMING**

## Section 08 - Tuples as Data Records

### 01 - Tuples as Data Structures

Tuples are immutable container type.

They contain a collection of objects. The tuple is a sequence type - this means order matters (and is preserved) and elements can be accessed by index (zero based), slicing, or iteration.

Other common sequence types in Python include lists and strings. Strings, like tuples are immutable, whereas lists are mutable.

Tuples are sometimes presented as immutable lists, but in fact, they could be compared more closely to strings with one major difference: strings are homogeneous seqeunces, while tıples can be heterogeneous.

A tuple literal is often presented as:

In [None]:
('a', 10, True)

('a', 10, True)

But the parenteses are not what indicate a tuple - it is the commas:

In [None]:
a = ('a', 10, True)
b = 'b', 20, False

In [None]:
type(a)

tuple

In [None]:
type(b)

tuple

Sometimes however, the paranthese are required to remove any ambiguity.

For example, consider this function that excepts a tuple (or other iterable) as its argument:

In [None]:
def iterate(t):
    for element in t:
        print(element)

if we call the function this way, Python will interpret it as three arguments:

In [None]:
iterate(1, 2, 3)

TypeError: iterate() takes 1 positional argument but 3 were given

Instead, we now have to use the parentheses to indicate we are packing a tuple:

Since tuples are sequence types, we can access items by index:

In [None]:
a = 'a', 10, True

In [None]:
a[2]

True

Or we can even slice them:

In [None]:
a = 1, 2, 3, 4, 5
a[2:4]

(3, 4)

We can iterate over them:

In [None]:
a = 1, 2, 3, 4, 5
for element in a:
    print(element)

1
2
3
4
5


We can also use unpacking:

In [None]:
point = 10, 20, 30

In [None]:
x, y, z = point

In [None]:
print(x)
print(y)
print(z)

10
20
30


Tuples are immutable, in the sense that we cannot change the reference of an object in the container and we cannot add or remove objects from the container. This is the same as strings.

In [None]:
a = 10, 'python', True

In [None]:
a[0] = 20

TypeError: 'tuple' object does not support item assignment

We can however 'extend' tuple, but just as with strings, we are actually just creating a new tuple:

In [None]:
a = 1, 2, 3

In [None]:
id(a)

2617902725824

In [None]:
a = a + (4, 5, 6)

In [None]:
a

(1, 2, 3, 4, 5, 6)

In [None]:
id(a)

2617902867456

As you can see we no longer have the same memory address for `a`.

We have to be careful when we think about immutability of tuples. The tuple, as a container is immutable, but the elements contained in the tuple may very well be mutable.

Let's define a simple point class to store the x and y coordinates of a point in 2D space:

In [None]:
class Point2D:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return f'{self.__class__.__name__}(x={self.x}, y={self.y})'

In [None]:
a = Point2D(0, 9), Point2D(10, 10), Point2D(20, 20)

In [None]:
a

(Point2D(x=0, y=9), Point2D(x=10, y=10), Point2D(x=20, y=20))

Although the tuple is `a` is immutable, its contained elements are mutable:

So we cannot do this:

In [None]:
a[0] = Point(-10, -10)

NameError: name 'Point' is not defined

But we can modify the contents of the first element:

In [None]:
a[0].x = -10

In [None]:
a

(Point2D(x=-10, y=9), Point2D(x=10, y=10), Point2D(x=20, y=20))

**Tuples as Data Records**

We can interpret tuples as lightweight data structures where, by convention, the position of the element in the tuple has meaning.

For example, we may elect to represent a point as a tuple, and not use the class approach we just did:

In [None]:
pt1 = (0, 0)
pt2 = (10, 10)

Here, we simply decide that the first poisition of the tuple represents the x=coordinate while the second element represents the y-coordinate of a point in 2D space.

We could also decide that we are going to represent a city using a tuple, there the first position will the city name, the second poisition will be the country, and the third poisition will be the population:

In [None]:
london = 'London', 'UK', 8_780_000
new_york = 'New York', 'USA', 8_500_000
beijing = 'Beijing', 'China', 21_000_000

We can even have a list of these tuples:

In [None]:
cities = london, new_york, beijing

We can obtain a list of all the cities in the list using a simple list comprehension and the fact that the city name is the first element (index 0) of each tuple:

In [None]:
city_names = [t[0] for t in cities]
print(city_names)

['London', 'New York', 'Beijing']


We could even calculate the total population of all these cities.

We start with a simple loop to do this:

In [None]:
total = 0
for city in cities:
    total += city[2]
print(f'total={total}')

total=38280000


You will note that the reason this worked is because the `cities` list contained only city tuples. The list was homogeneous. The tuples on the other hand are heterogeneous.

This is often a key difference between lists and tuples, especially when we consider tuples as data structures. The tuples are heterogeneous, while the list needs to be homogeneous se we can apply the same calculations to each element of the list.

The above example woul breal if one of the elements in the `cities` list was an integer for example.

Back to our example calculation the total population. There is a more Pythonic way of doing this.

First we use a comprehension to extract just the population from each city:

In [None]:
[city[2] for city in cities]

[8780000, 8500000, 21000000]

Next we simply sum up the population values:

In [None]:
sum([city[2] for city in cities])

38280000

In fact (and we'll cover this in detail later in this course), we don't even need the square brackets in the sum:

In [None]:
sum(city[2] for city in cities)

38280000

Nıw, since tuples are sequence types, and hence iterable, we can also use unpacking to extract values from the tuple:

In [None]:
city, country, population = new_york

In [None]:
print(city)
print(country)
print(population)

New York
USA
8500000


We can also use extended unpacking:

In [None]:
record = 'DJIA', 2018, 1, 19, 25_987, 26_072, 25_942, 26_072

Where the structure is: symbol, year, month, day, open, high, low, close

We could then unpcak the record using staright unpacking:

In [None]:
symbol, year, month, day, open_, high, low, close = record

In [None]:
print(symbol)
print(close)

DJIA
26072


But suppose we are only interested in the symbol, year, month, day, and close. Then we could use extended unpacking as follows:

In [None]:
symbol, year, month, day, *others, close = record

In [None]:
print(symbol, year, month, day, close)

DJIA 2018 1 19 26072


In [None]:
print(others)

[25987, 26072, 25942]


A convention often used in Python when we are not particularly interested in something, is to use an underscore as a variable name:

In [None]:
symbol, year, month, day, *_, close = record

There's nothing special about the underscore here, it's kust a legal variable name (in an interactive Python session, the underscore is actually used to store there results of the last calculation=

In [None]:
print(_)

[25987, 26072, 25942]


By the way do not weite code like this to do the unpacking we just did:

In [None]:
sybol, year, cloase = record[0], record[1], record[7]

Although this works, it is not very readable code, plus you are packing a new tuple (the right hand side) and then unpacking it into the variables on the left. Muvh better to do this:

In [None]:
symbol, year, *_, close = record

If you only need to pick a few elements out of the tuple (line in our example where we just wanted the population to sum it up), then by all means access it directly using the index.

But did you know that you can also unpack tuples directly in the loop=

In [None]:
for element in cities:
    print(element)

('London', 'UK', 8780000)
('New York', 'USA', 8500000)
('Beijing', 'China', 21000000)


As you can see, each element is a tuple, and we can actually unpack it at the same time as the loop this way:

In [None]:
for city, country, population in cities:
    print(f'city={city}, population = {population}')

city=London, population = 8780000
city=New York, population = 8500000
city=Beijing, population = 21000000


This, by the way, is how we can use the `enumerate` function in Python. The enumerate function produces an iterable from another iterable but contains the index number. Values are returned as tuples, where the first position is the index value, and the second position is the value (here we also see how a tuple was used as a data structure). So that tuple can be unpacked as follows:

In [None]:
for index, value in enumerate(beijing):
    print(f'{index}: {value}')

0: Beijing
1: China
2: 21000000


Of course, since we are not interested in the country in this case, we might write it this way as well:

In [None]:
for city, _, population in cities:
    print(f'city={city}, population={population}')

city=London, population=8780000
city=New York, population=8500000
city=Beijing, population=21000000


Another frequent application of using tuples as data structures is for returning multiple values from a function.

In [None]:
from random import uniform
from math import sqrt

def random_shot(radius):
    '''Generates a random 2D coordinate within
    the bounds [-radius, radius] * [-radius, radius]
    (a square of area 4)
    and also determines if it falls within
    a circle centered at the origin
    with specified radius'''
    
    random_x = uniform(-radius, radius)
    random_y = uniform(-radius, radius)
    
    if sqrt(random_x ** 2 + random_y ** 2) <= radius:
        is_in_circle = True
    else:
        is_in_circle = False
        
    return random_x, random_y, is_in_circle

In [None]:
num_attempts = 1_000_000
count_inside = 0
for i in range(num_attempts):
    *_, is_in_circle = random_shot(1)
    if is_in_circle:
        count_inside +=1

print(f'Pi is approximately: {4 * count_inside / num_attempts}')

Pi is approximately: 3.141224


### 02 - Named Tuples

The `namedttuple` function in `collections` allows us to create a tuple that also has name attached to each field (aka property). This can be handy to reference data in the tuple structure by name instead of just relying on position.

The `namedtuple` function is basically a class factory that creates a new type of class that uses a tuple as its underlying data storage (in fact, named tuples inherit from `tuple`), but layers in field names to each position and makes a property out of the field name.

The `namedtuple` function creates a **class**, and we then use that class to instantiate our instances of named tuples.

To use the `namedtuple` function we therefore need to select a class **name**, as well as indicate the **property** names, in the order in which they will be stored and accessed n the tuple.


---
**BE CAREFUL!**

*Note that a `namedtuple`, like the regular `tuple` is an **immutable** data structure. (In fact, named tuples inherit from tuples - we'll revisit this in our section on metaclasses)*

If you find yourself writing code such as:

In [None]:
class Point3D:
    def __init__(self, x, y, z):
                 self.x = x
                 self.y = y
                 self.z = z

Forget it! You seriously need to use named tuples! Not only can you shorten the amount of code you need to write, but you get some additional functionality for "free", such as `__repr__` and `__eq__` that you do not have to implement yourself!

**Creating Named Tuples**

We are goıing to create a `Point` named tuple that will contain an x-coordinate and a y-coordinate.

In [None]:
from collections import namedtuple

In [None]:
Point2D = namedtuple('Point2D', ('x', 'y'))

---
**BE CAREFUL!**

*Note that we have two different uses of `Point2D` here. The label we are assigning the return value of the call to `namedtuple` and the **name** of the class generated by calling `namedtuple`.*

We could also have done the following:

In [None]:
Pt = namedtuple('Point2D', ('x', 'y'))

The `namedtuple` class name is `Point2D`, but the label we `Pt` simply points to that class, so we would then create instances of the `Point2D` class as follows:

In [None]:
pt1 = Pt(10, 20)

And we can see what `pt1` is:

In [None]:
pt1

Point2D(x=10, y=20)

As you can see we have an object of type `Point2D`, and it has two properties, `x` and `y` with respective values `10` and `20`.

The only weird thing here is that we are using `Pt` to generate our instances of the `Point2D` class.

That's why we ussually always created `namedtuple` generated classes this way:

In [None]:
Point2D = namedtuple('Point2D', ('x', 'y'))

Then the following makes more sense:

In [None]:
pt1 = Point2D(10, 20)

In [None]:
pt1

Point2D(x=10, y=20)

This is not different than doing this:

In [None]:
Pt3 = Point3D # class we defined earlier

In [None]:
pt3 = Pt3(10, 20, 30)

In [None]:
pt3

<__main__.Point3D at 0x26186852250>

As you can see above, we used another label `Pt3` as a label that also references the `Point3D` class. It would be weird to do it this way here, and its weird for tuples as well. Of course, you may run into circumstances where you need to do this - just as a general rule.

---
**BE CAREFUL!**

*Note that all named tuples are gonest to goodness **classes**, just as if you had used a `class` definition such as with `Point3D`.*

The `namedtuple` function generates classes for us - it is a **class factory**.

In [None]:
type(Point3D)

type

In [None]:
type(Point2D)

type

However, `Point2D` is a subclass of `tuple`, while `Point3D` is not:

In [None]:
isinstance(pt1, tuple)

True

In [None]:
isinstance(pt3, tuple)

False

So, when we create an instance of a class, we are in fact calling the `__new__` method with our initial values. It's just a callable that has the **field names** we used to generate our named tuple class as its parameters. This means we can use keyword arguments when instatiating our named tuples!

In [None]:
pt4 = Point2D(y=20, x=10)

In [None]:
pt4

Point2D(x=10, y=20)

**What did we get for free using a named tıple vs our own class?

First using a named tuple for our 2D point:

In [None]:
pt2d_1 = Point2D(10, 20)
pt2d_2 = Point2D(10, 20)

In [None]:
pt2d_1

Point2D(x=10, y=20)

In [None]:
pt2d_1 == pt2d_2

True

Now using our 3D class:

In [None]:
pt3d_1 = Point3D(10, 20, 30)
pt3d_2 = Point3D(10, 20, 30)

In [None]:
pt3d_1

<__main__.Point3D at 0x26186841c70>

Oh, we probably need to implement the `__repr__` method in our class

In [None]:
pt3d_1 == pt3d_2

False

And we would also need to implemetn the **eq** method!

Let's do that:

In [None]:
class Point3D:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    
    def __repr__(self):
        return f"Point3D(x={self.x}, y={self.y}, z={self.z})"
    
    def __eq__(self, other):
        if isinstance(other, Point3D):
            return self.x == other.x and self.y == other.y and self.z == other.z
        else:
            return False

In [None]:
pt3d_1 = Point3D(10, 20, 30)
pt3d_2 = Point3D(10, 20, 30)

In [None]:
pt3d_1

Point3D(x=10, y=20, z=30)

In [None]:
pt3d_1 == pt3d_2

True

How about finding the largest coordinate in the point?

That's easy for `Point2D` since it is a tuple, but not the case for `Point3D`

In [None]:
max(pt2d_1)

20

In [None]:
max(pt3d_1)

TypeError: 'Point3D' object is not iterable

How about calculating the dot product of two points (considering them as vectors starting at the origin)?

The formula would be: a.b = a.x * b.x + a.y + b.y + a.z * b.z

For the 3D point we would need to do the following:

In [None]:
def dot_product_3d(a, b):
    return a.x * b.x + a.y * b.y + a.z + b.z

In [None]:
dot_product_3d(pt3d_1, pt3d_2)

But for our 2D point, which, remember is a tuple as well, we can write a generic function that would work equally well with a 3D named tuple too:

In [None]:
def dot_product(a, b):
    return sum(e[0] * e[1] for e in zip(a, b))

Here's a break down of how we implemented the dot product:

Fist we zip up the components of `a` and `b` to get an iterable of tuples contatining the x-coordinates in the 1st element, and the y-coordinates in the second tuple. Our zip will contain as many elements as there are dimensions.

In [None]:
a = Point2D(1, 2)
b = Point2D(10, 20)
print(a)
print(b)
print(tuple(a))
print(tuple(b))
print(list(zip(a,b)))

---
**BE CAREFUL!**

*Note that if we had more dimensions this would work equally well**

*Suppose we had 3 dimensions:*

In [None]:
u = (1, 2, 3)
v = (10, 20, 30)
list(zip(u,v))

Then we create a comprehension that multiplies the componenets together:

In [None]:
[e[0] * e[1] for e in zip(a, b)]

Then we simply add those up:

In [None]:
sum([e[0] * e[1] for e in zip(a, b)])

In [None]:
dot_product(a, b)

And if we defined a 4D point named tuple:

In [None]:
Point4D = namedtuple('Point4D', ['i', 'j', 'k', 'l'])

In [None]:
pt4d_1 = (1, 1, 1, 10)
pt4d_2 = (2, 2, 2, 10)

In [None]:
dot_product(pt4d_1, pt4d_2)

As you can see we got the correct dot product. We could not have done this using our `Point3D` class!

**Other Ways to Specify Field Names**

There are a number of ways we can specify the field names for the named tuple:

* we can provide a sequence of strings containing each property name
* we can provide a single string with property names sperated by whitespace or a comma

In [None]:
Circle = namedtuple('Circle', ['center_x', 'center_y', 'radius'])

In [None]:
circle_1 = Circle(0, 0, 10)
circle_2 = Circle(center_x=10, center_y=20, radius=100)

In [None]:
circle_1

In [None]:
circle_2

Or we can do it this way:

In [None]:
City = namedtuple('City', 'name country population')

In [None]:
new_york = City('New York', 'USA', 8_500_000)

In [None]:
new_york

This would work equally well:

In [None]:
Stock = namedtuple('Stock', 'symbol, year, month, day, open, high, low, close')

In [None]:
djia = Stock('DJIA', 2018, 1, 25, 26_313, 26_458, 26_260, 26_393)

In [None]:
djia

In fact, since whitespace can be used we can even use a multi-line string!

In [None]:
Stock = namedtuple('Stock', '''symbol
                               year month day
                               open high low close''')

In [None]:
djia = Stock('DJIA', 2018, 1, 25, 26_313, 26_458, 26_260, 26_393)

In [None]:
djia

**Accessing Items in a Named Tuple**

The major advantage of named tuples are that, as the name suggests, we can access the properties (fields) of the tuple by name:

In [None]:
pt1

In [None]:
pt1.x

In [None]:
circle_1

In [None]:
circle_1.radius

NameError: name 'circle_1' is not defined

Now named tuplees are tuples, so elements can be accessed by index, unpacked, and iterated.

In [None]:
circle_1[2]

In [None]:
for item in djia:
    print(item)

NameError: name 'djia' is not defined

We can also unpack named tuples just like ordinary tuples:

In [None]:
pt1

Point2D(x=10, y=20)

In [None]:
x, y = pt1

In [None]:
print(x, y)

10 20


In [None]:
symbol, *_, close = djia

NameError: name 'djia' is not defined

In [None]:
print(symbol, close)

DJIA 26072


And remember that the `_` we use in the unpacking is just a regular variable:

In [None]:
print(_)

[-0.930882679779869, -0.7530933640543169]


The field names for these named tuples can be any valid variable name **except** that they cannot start with an underscore.

For example the following would not be valid:

In [None]:
Person = namedtuple('Person', ['firstname', 'lastname', '_age', 'ssn'])

ValueError: Field names cannot start with an underscore: '_age'

We can also choose to let the  `namedtuple` function replace invalid field names automatically for us, by using the keyword argument `rename`. When we set that argument to `True` (it is `False` by default) it will replace the invalid name using the position (index) of the field, preceded by an underscore:

In [None]:
Person = namedtuple('Person', ['firstname', 'lastname', '_age', 'ssn'], rename=True)

In [None]:
eric = Person('Eric', 'Idle', 42, 'unknown')

In [None]:
eric

As you can see the invalid field name `_age` was replaced by `_2` since it was the second element (i.e. index of `2`)

**Named Tuple Internals**

We can easily find out the fields in a named tuple using the `_fields` property:

In [None]:
Point2D._fields

In [None]:
Stock._fields

There is also a property, `_source` that allows us to see exactly the class that was generated by calling `namedtuple` is a class **factory**):

---
**BE CAREFUL!**
*Changed in version 3.7: Remove the verbose parameter and the `_source` attribute.*

And of course this will be slightly different for another named tuple generated class:

**Converting Named Tuples to Dictionaries**

The `namedtıple` generated class also provides us an instance method, `_asdict()` that will create a dictionary from all the fields in the named tuple:

In [None]:
eric._asdict()

NameError: name 'eric' is not defined

Technically, it is an `OrderedDict` which we will cover in later section. Basically an `OrderedDict` is a dictionary that, unlike the standard built-in `Dictionary` is **guaranteed** to preserve the order the keys.

---
**BE CAREFUL!**

*Note that as of Python 3.6 regular dictionaries do preserve the order of the keys, but until just recently it was not **guaranteed** and was basically an implementation detail.*

*However, this has now changed!! Guido van Rossum has now agreed that this is no longer an implementation detail, and starting in Python 3.7 dictionary order is guaranteed. Since it is actually already the case in Python 3.6, you can now safely assume this fact - as long as you are running your code under Python 3.6 or higher. Your code will break if you rely on dictionary order prior to 3.6, in that case, still use an `OrderDict`*

**Overhead of Named Tuples**

At this point you may be wondering whether there's more overhead to using a named tuple vs a regular tuple.

Ther is, but it is tiny. The field names are stored in the **class**, not every instance of the named tıples. This means that the overhead incurred by the field names for one instance of the named tıple vs 1000 instances is the same. Otherwise, the instances are tuple, so you can access contained obkects using indexing, slicing and iteration just as if it were a plain tuple. No overhead there either. Looking up values by name do have some overhead of course, but no more than if you had created a custom class.

### 03 - Named Tuples - Modifying, Extending

In [None]:
from collections import namedtuple

In [None]:
Point2D = namedtuple('Point2D', 'x y')

The objects generated by `namedtuple` generated classes are **immutable**.

In other words the following will not work:

In [None]:
origin = Point2D(10, 0)

In [None]:
origin.x = 0

AttributeError: can't set attribute

However, we may want to "change" the value of one of the coordinates of our `origin` variable.

This is just like strings, we have to create a new version of the tuple, and assign it to the same label.

Suppose we want to change x-coordinate of our `origin` to something else, but retain whatever the y-coordinate was.

In [None]:
origin = Point2D(0, origin.y)

In [None]:
origin

Point2D(x=10, y=0)

Of course this could become quite unwieldy when we have a larger number of properties and we only need to change a single item:

In [None]:
Stock = namedtuple('Stock', 'symbol year month day open high low close')

In [None]:
djia = Stock('DJIA', 2019, 1, 25, 26_313, 26_458, 26_260, 26_393)

To update the `close` property for example, we could write:

In [None]:
djia = Stock(djia.symbol, djia.year, djia.month, djia.day,
             djia.open, djia.high, djia.low, 26_394)

Now that was quite painful!

We can be a bit more clever about this and use tuple unpacking and argument unpacking as follows:

In [None]:
*values, _ = djia

We didn't care about the `close` price since we are replacing it, hence the underscore variable name.

And we now hae everything else in a list:

In [None]:
values

['DJIA', 2019, 1, 25, 26313, 26458, 26260]

And now we are going to use the `*` again, but this time to unpack the list into separate arguments when we call the `Stock` initializer:

In [None]:
djia = Stock(*values, 26_393)

In [None]:
djia

Stock(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393)

---
**BE CAREFUL!**

*This is much better than our first attempt!*

*But this approach does not always works, what happens if we want to change a values somewhere in the middle? Or two values?*

*We cannot do: `*first, month, *last = djia`*

*That would make no sense whatsoever! (and Python will tell you so!)*

Maybe slicing and unpacking can work here...

In [None]:
djia

Stock(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393)

We could trt **slicing**:

In [None]:
djia[:3]

('DJIA', 2019, 1)

In [None]:
djia[:3] + (26,) + djia[4:]

('DJIA', 2019, 1, 26, 26313, 26458, 26260, 26393)

So now we could use this to create a new StockPrice instance:

In [None]:
djia2 = Stock(*(djia[:3] + (26,) + djia[4:]))

In [None]:
djia2

Stock(symbol='DJIA', year=2019, month=1, day=26, open=26313, high=26458, low=26260, close=26393)

This works, but that's quite cumbersome...

And it gets worse - suppose we want to modify the year and day using this approach:

In [None]:
values = djia[0:1] + (2019,) + djia[2:3] + (26,) + djia[4:]

In [None]:
values

('DJIA', 2019, 1, 26, 26313, 26458, 26260, 26393)

In [None]:
djia3 = Stock(*values)

In [None]:
djia3

Stock(symbol='DJIA', year=2019, month=1, day=26, open=26313, high=26458, low=26260, close=26393)

Or, if you want to avoid unpacking the `values` into the multiple positional arguments required by the `Stock` constructor, we can make us of the `_make` class method that can use an iterable:

In [None]:
djia4 = Stock._make(values)

In [None]:
djia4

Stock(symbol='DJIA', year=2019, month=1, day=26, open=26313, high=26458, low=26260, close=26393)

This is really getting too complex.

Fortunately there's a better way!

The namedtuple implementation also provides another instance method called `_replace` which takes keyword-only arguments. That method will make a copy of the current tuple and substitute property values based on the keyword-only arguments passed in.

In [None]:
djia

Stock(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393)

In [None]:
id(djia)

2617908637472

In [None]:
djia5 = djia._replace(year=2019, day=26)

In [None]:
djia5

Stock(symbol='DJIA', year=2019, month=1, day=26, open=26313, high=26458, low=26260, close=26393)

In [None]:
id(djia5)

2617908940976

Much better!!

**Extending Named Tuples**

Sometimes we may want to add one or more properties to an existing class without modifying the code for the custom class itself.

Using inheritance is one way to go about it so you may be tempted to do this with named tuples as well, but it's not easy, and there's a cleaner way to do this if all you're after is additional data fields.

Let's say we have a Point class that is for 2D problems:

In [None]:
Point2D = namedtuple('Point2D', 'x y')

We could easily create a 3D point class as follows:

In [None]:
Point3D = namedtuple('Point3D', 'x y z')

But if our named tuple has many fileds, such as our `Stock` named tuple that's a little more difficult:

In [None]:
djia

Stock(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393)

Suppose we want to create a new class, say `StockExt`, it would take some effort:

In [None]:
StockExt = namedtuple('StockExt',
                      '''symbol year month day open high low
                      close previous_close''')

Instead we can leverage that ``_fields` property:

In [None]:
Stock._fields

('symbol', 'year', 'month', 'day', 'open', 'high', 'low', 'close')

Remember that the `namedtuple` initializer can handle a list or tuple containing the field names. For example, the one we just retrieved from `_fields`.

Now all we need to do is create a new tuple that contains those fields along with whatever extras we want:

In [None]:
new_fields = Stock._fields + ('previous_close',)

In [None]:
new_fields

('symbol',
 'year',
 'month',
 'day',
 'open',
 'high',
 'low',
 'close',
 'previous_close')

And now we can create our new named tuple this way:

In [None]:
StockExt = namedtuple('StockExt', Stock._fields + ('previous_close',))

In [None]:
StockExt._fields

('symbol',
 'year',
 'month',
 'day',
 'open',
 'high',
 'low',
 'close',
 'previous_close')

If you did not want to use tuple concatenation for some reason, you could also do it using strings:

In [None]:
' '.join(Stock._fields) + ' previous_close'

'symbol year month day open high low close previous_close'

In [None]:
StockExt = namedtuple('StockExt',
                      ' '.join(Stock._fields) + ' previous_close')

In [None]:
StockExt._fields

('symbol',
 'year',
 'month',
 'day',
 'open',
 'high',
 'low',
 'close',
 'previous_close')

Now, with this newly extended class, we may want to take one of the "old" named tuple instance (`djia`) and create the extended version of it using the `StockExt` class.

This is also quite simple to do, since named tuples are tuples, and can therefore be unpacked in the arguments of a function call.

In [None]:
djia

Stock(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393)

In [None]:
djia_ext = StockExt(*djia, 25_000)

In [None]:
djia_ext

StockExt(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393, previous_close=25000)

or, we can use the `_make` method:

In [None]:
djia_ext = StockExt._make(djia + (25_000, ))

In [None]:
djia_ext

StockExt(symbol='DJIA', year=2019, month=1, day=25, open=26313, high=26458, low=26260, close=26393, previous_close=25000)

### 04 - Named Tuples - Docstings, DefaultValues

In [None]:
from collections import namedtuple

**Adding DocStrings to Named Tuples**

This is easy to do, both with the generated class, as well as it's properties.

In [None]:
Point2D = namedtuple('Point2D', 'x y')

In [None]:
Point2D.__doc__ = 'Represents a 2D Cartesian coordinate'

And we can even add docstrings to the properties:

In [None]:
Point2D.x.__doc__ = 'x-coordinate'
Point2D.y.__doc__ = 'y-coordinate'

In [None]:
help(Point2D)

Help on class Point2D in module __main__:

class Point2D(builtins.tuple)
 |  Point2D(x, y)
 |  
 |  Represents a 2D Cartesian coordinate
 |  
 |  Method resolution order:
 |      Point2D
 |      builtins.tuple
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |  
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |  
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.
 |  
 |  _replace(self, /, **kwds)
 |      Return a new Point2D object replacing specified fields with new values
 |  
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |  
 |  _make(iterable) from builtins.type
 |      Make a new Point2D object from a sequence or iterable
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(_cls, x, y

**Adding Default Values to Named Tuples**

**Using a Prototype**

This technique is in the Python docs, and uses the concept of creating a prototype object that has the default values set:

In [None]:
Vector = namedtuple('Vector', 'x1 y1 x2 y2 origin_x origin_y')

In [None]:
vector_zeroorigin = Vector(x1=None, y1=None, x2=None, y2=None, origin_x=0, origin_y=0)

In [None]:
vector_zeroorigin

Vector(x1=None, y1=None, x2=None, y2=None, origin_x=0, origin_y=0)

The namedtuple `vector_zeroorigin` is now a prototype of a vector with zero origin.

To create new vectors using that origin as a default, we no longer use the `Vector` class, but instead use `_replace` as follows:

In [None]:
v1 = vector_zeroorigin._replace(x1=1, y1=1, x2=10, y2=10)

In [None]:
v1

Vector(x1=1, y1=1, x2=10, y2=10, origin_x=0, origin_y=0)

This certainly works, and can be useful in cases where you may want more than one prototype(e.g. `vector_zeroorigin` and `vector_otherorigin`)

**Using `__defaults__`**

There is an alternative way of doing this. And, in my opinion, a much cleaner alternative.

In Python the default values for a function's parameters are stored as a tuple in the `__defaults__` attribute.

In [None]:
def func(a, b=20, c=30):
    print(a, b, c)

In [None]:
func.__defaults__

(20, 30)

In [None]:
func(10)

10 20 30


But the `__defaults__` property is writable:

In [None]:
func.__defaults__ = (200, 300)

In [None]:
func(10)

10 200 300


In this case, the function we are interested in specifying default values for, is the named tuple class constructor, i.e. `__new__`

So, we will simply need to set `Vector.__new__.__defaults__` to the desired tuple of default values.

The only thing to note is that if you specify less default values (say `m` values) than the total number of arguments (say `n` values, where `m < n`), then the defaults will apply to the **last** `m` values. Think of it as writing out your field names and default values on two lines, and right-aligning them. (If you specify more, then the values at the beginning are effectively ignored)

In [None]:
Vector.__new__.__defaults__ = (0, 0)

Here I am basically setting default values for the last two elements only, i.e `origin_x` and `origin_y`.

In [None]:
v1 = Vector(0, 0, 10, 10, -10, -10)

In [None]:
v1

Vector(x1=0, y1=0, x2=10, y2=10, origin_x=-10, origin_y=-10)

In [None]:
v2 = Vector(5, 5, 20, 20)

In [None]:
v2

Vector(x1=5, y1=5, x2=20, y2=20, origin_x=0, origin_y=0)

In [None]:
v3 = Vector(x1=1, y1=1, x2=10, y2=10, origin_x=0, origin_y=0)

In [None]:
v3

Vector(x1=1, y1=1, x2=10, y2=10, origin_x=0, origin_y=0)

An even simpler way to set default values if you want **all** defaults to be the same:

In [None]:
Vector.__new__.__defaults__ = (0,) * len(Vector._fields)

In [None]:
v5 = Vector()

In [None]:
v5

Vector(x1=0, y1=0, x2=0, y2=0, origin_x=0, origin_y=0)

Of course, the usual admonishment of not using mutable default values holds here as well.

### 05 - Named Tuples - Application - Alternative to Dictionaries

---
**BE CAREFUL!**

*First an important caveat: all this really only works for dictionaries with **string** keys. Dictionary keys can be other hashable data types, (including tuples, as long as they contain hashable types in turn), and these examples will not work with those types of dictionaries.*

In [None]:
from collections import namedtuple

In [None]:
data_dict = dict(key1=100, key2=200, key3=300)

In [None]:
Data = namedtuple('Data', data_dict.keys())

In [None]:
Data._fields

('key1', 'key2', 'key3')

Noe we can create an instance of the `Data` namedtuple using the data in the `data_dict` dictionary.

We could try the following (bad idea):

In [None]:
d1 = Data(*data_dict.values())

In [None]:
d1

Data(key1=100, key2=200, key3=300)

This looks like it worked.

But consider this second dictionary, where we do not create the keys in the same order:

In [None]:
data_dict_2 = dict(key1=100, key3=300, key2=200)

In [None]:
d2 = Data(*data_dict_2.values())

In [None]:
d2

Data(key1=100, key2=300, key3=200)

Obviously this went terribly wrong!

We cannot guarantee that the order of `values()` will be in the same order as the keys (in our named tuple and in the dictionary).

Instead, we should unpack the dictionary itself, resulting in keyword arguments that will be passed to the `Data` constructor:

In [None]:
d2 = Data(**data_dict_2)

In [None]:
d2

Data(key1=100, key2=200, key3=300)

So, the pattern to create a named tuple out of a single dictionary is straightforward:

For any dictionary `d` we can created a named tuple class and insert the data into it as follows:

```
1. Struct = namedtuple('Struct', d.keys())
2. data = Struct(**d)
```

Because dictionaries now preserve key order, the order of the fields in the named tuple structure will be the same. IF you want your fields to be sorted in a different way, just sort the keys when you create the named tuple class. For example, to have keys sorted alphabetically we could do:

In [None]:
data_dict = dict(first_name='John', last_name='Cleese', age=42, complaint='dead parrot')

In [None]:
data_dict.keys()

dict_keys(['first_name', 'last_name', 'age', 'complaint'])

In [None]:
sorted(data_dict.keys())

['age', 'complaint', 'first_name', 'last_name']

In [None]:
Struct = namedtuple('Struct', sorted(data_dict.keys()))

In [None]:
Struct._fields

('age', 'complaint', 'first_name', 'last_name')

Of course we can still put in the correct values from the dictionary into the correct slots in the tuple by unpacking the dictionary instead of just the values:

In [None]:
d1 = Struct(**data_dict)

In [None]:
d1

Struct(age=42, complaint='dead parrot', first_name='John', last_name='Cleese')

And of course, since this is now a named tıple we can access the data using the field name:

In [None]:
d1.complaint

'dead parrot'

instead of how we would have done it with the dictionary:

In [None]:
data_dict['complaint']

'dead parrot'

I also want to point out that with dictionaries we often end up with code where the key is stored in some variable and then referenced this way:

In [None]:
key_name = 'age'
data_dict[key_name]

42

We cannot use this appoach directly with named tuples however. For example this will not work:

In [None]:
key_name = 'age'
d1.key_name

AttributeError: 'Struct' object has no attribute 'key_name'

However, we can use the `getattr` function that we have seen before:

In [None]:
key_name = 'age'
getattr(d1, key_name)

42

We also have the `get` method on dictionaries that can specify a default value to return if the key does not exist:

In [None]:
data_dict.get('age', None), data_dict.get('invalid_key', None)

(42, None)

And we can do the same with the `getattr` function:

In [None]:
getattr(d1, 'age', None), getattr(d1, 'invalid_field', None)

(42, None)

Now this is not very useful if you are only working with a single instance of a dictionary that has yhe same set of keys. Kind of pointless really.

You also do not want to create a new named tuple for every instance of a dictionary - that would just be way too much overhead.

But in cases where you havve a collection of dictionaries that share a common set of keys, this can be really useful, as long as you are willing to live with the fact that you now have immutable structures.

Let's suppose we have this data list:

In [None]:
data_list = [
    {'key1': 1, 'key2': 2},
    {'key1': 3, 'key2': 4},
    {'key1': 5, 'key2': 6, 'key3': 7},
    {'key2': 100}
]

The first thing to note is that we need to figure out all the possible keys that have been used in the dictionaries in this list.

The easisest way to do this is to extract all the keys of all the dictionaries and then make a `set` out of them, to eliminate duplicate key names:

We could do it this way, using a simple loop:

In [None]:
keys = set()
for d in data_list:
    for key in d.keys():
        keys.add(key)

In [None]:
keys

{'key1', 'key2', 'key3'}

But actually a more efficient way would b to use a comprehension:

In [None]:
keys = {key for dict_ in data_list for key in dict_.keys()}

In [None]:
keys

{'key1', 'key2', 'key3'}

In fact, we canalso use the fact that we can union multiple sets (we'll cover this in detail later) by unpacking all the keys and creating a union of them:

In [None]:
keys = set().union(*(dict_.keys() for dict_ in data_list))

In [None]:
keys

{'key1', 'key2', 'key3'}

However you do it, we end up with a set of all the possible keys used in our list of dictionaries.

Now we can go ahead and create a named tuple with all those keys as fields:

In [None]:
Struct = namedtuple('Struct', keys)

In [None]:
Struct._fields

('key2', 'key1', 'key3')

As you can see, sets do not preserve order, so in this case we'll probably sort the keys to create our named tuple:

In [None]:
Struct = namedtuple('Struct', sorted(keys))

In [None]:
Struct._fields

('key1', 'key2', 'key3')

Now, we're also going to provide default values, since not all dictionaries have all the keys in them. In this case I'm going to set the default to `None` if the key is missing:

In [None]:
Struct.__new__.__defaults__ = (None, ) * len(Struct._fields)

Now we're ready to load up all thse dictionaries into a new list of named tuples:

In [None]:
tuple_list = [Struct(**dict_) for dict_ in data_list]

In [None]:
tuple_list

[Struct(key1=1, key2=2, key3=None),
 Struct(key1=3, key2=4, key3=None),
 Struct(key1=5, key2=6, key3=7),
 Struct(key1=None, key2=100, key3=None)]

So lastly, let's just package this all up neatly int a single function that will take an iterable of dictionaries, or an arbitrary number of dictionaries as positional arguments, and return a list of named tuples:

In [None]:
def tuplify_dicts(dicts):
    keys = {key for dict_ in dicts for key in dict_.keys()}
    Struct = namedtuple('Struct', keys)
    Struct.__new__.__defaults__ = (None,) * len(Struct._fields)
    return [Struct(**dict_) for dict_ in dicts]

In [None]:
tuplify_dicts(data_list)

[Struct(key2=2, key1=1, key3=None),
 Struct(key2=4, key1=3, key3=None),
 Struct(key2=6, key1=5, key3=7),
 Struct(key2=100, key1=None, key3=None)]

Isn't Python wonderful? :-)

### 06 - Named Tuples - Application - Returning Multiple Values

We already know that we can easily return multiple values from a function by using a tuple:

In [None]:
from random import randint, random

def random_color():
    red = randint(0, 255)
    green = randint(0, 255)
    blue = randint(0, 255)
    alpha = round(random(), 2)
    return red, green, blue, alpha

In [None]:
random_color()

(31, 233, 5, 0.67)

So of course, we could call the function this and unpack the results at the same time:

In [None]:
red, green, blue, alpha = random_color()

In [None]:
print(f'red={red}, green={green}, blue={blue}, alpha={alpha}')

red=184, green=184, blue=220, alpha=0.96


But it might be nicet to use a named tuple:

In [None]:
from collections import namedtuple

In [None]:
Color = namedtuple('Color', 'red green blue alpha')

def random_color():
    red = randint(0, 255)
    green = randint(0, 255)
    blue = randint(0, 255)
    alpha = round(random(), 2)
    return Color(red, green, blue, alpha)

In [None]:
color = random_color()

In [None]:
color.red

190

In [None]:
color

Color(red=190, green=10, blue=148, alpha=0.72)