### Tuples as Data Structures

__Tuples vs. Lists vs. Strings__

| Tuples              | Lists                 | Strings            |
| ------------------- | --------------------- | ------------------ |
| container           | container             | container          |
| order matters       | order matters         | order matters      |
| hetero*/homogeneous | hetero/homogeneous*   | homogeneous        |
| indexable           | indexable             | indexable          |
| iterable            | iterable              | iterable           |
| immutable           | mutable               | immutable          |
| fixed length/order  | variable length/order | fixed length/order |

The immutability of tuples works well for representing data structures, as we can assign meaning to the position of data.

e.g. Circle: (0, 0, 10) or a City ('London', 'UK', 8_780_000)

__Tuples as Data Records__

Because tuples are immutable, we are guaranteed that the data and the data structure will never change.

In [3]:
london = ('London', 'UK', 8_780_000)
new_york = ('New York', 'USA', 8_500_000)

__Extracting Data from Tuples__

In [4]:
city, country, pop = london

In [5]:
# Note how the tuples themselves are heterogeneous, but the list of them is homogeneous
cities = [london, new_york]

total_pop = 0
for city in cities:
    total_pop += city[2]

__Dummy Variables__

An underscore, `_`, can be used to indicate a variable is not intended to be used, or that it can be ignored. This is still a valid variable name however, and still holds the value of whatever was assigned to it. It is simply a convention.

In [6]:
city, _, pop = ('Beijing', 'China', 21_000_000)

In [7]:
# Dummy variables can be used with extended unpacking as well
stock_record = ('DIJA', 2018, 1, 19, 25987.35, 26071.72, 25942.83, 26071.72)

symbol, year, month, day, *_, close = stock_record

### Named Tuples

If we need some sort of simple named structure for a data record, one might think to encapsulate it as a class, such as: 

In [8]:
class Point2D:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def __repr__(self):
        return f"Point2D(x={self.x}, y={self.y})"
    
    def __eq__(self, other):
        if isinstance(other, Point2D):
            return self.x == other.x and self.y == other.y
        else:
            return False

However, using a class might not be the best approach for simple data structures.

For one thing, the Point2D object is mutable which may not be what we want.

A perceived downside to using tuples is that we lose the labels associated with class properties.

e.g. class: `point.x` vs. a tuple: `point[0]`

But by using __named tuples__, we get the benefits of both classes and tuples!

Named tuples are a subclass of `tuple`, and add a layer to assign property names to positional elements.

In [9]:
from collections import namedtuple

`namedtuple` is a function which generates a new class (a class factory), this new class inherits from `tuple` and provides named properties to access elements of the tuple. But an instacne of this class is still a tuple. 

__Generating Named Tuple Classes__

`namedtuple` needs:
- the class name you want to use
- a sequence of field names to assign, in the order of the elemnts in the tuple
    - field names cannot start with an underscore!
    
The return value of `namedtuple` will be a class, which we will use to construct instances.


In [10]:
# The variable name should be (but not required to be) the same as the one specified in the call to namedtuple().
# And should be capitalized like a class name would be.
Point2D = namedtuple('Point2D', ['x', 'y'])

In [11]:
pt = Point2D(10, 20)

In [12]:
# We can pass the sequence of field names to namedtuple in multiple ways:

Point2D = namedtuple('Point2D', ['x', 'y']) # by list
Point2D = namedtuple('Point2D', ('x', 'y')) # by tuple
Point2D = namedtuple('Point2D', 'x, y' ) # by comma separated string
Point2D = namedtuple('Point2D', 'x y' ) # by space separated string

__Instantiating Named Tuples__

In [13]:
# Via positional arguments
pt = Point2D(10, 20) # x = 10, y = 20

# Via keyword arguments
pt = Point2D(x=10, y=20)

__Accessing Data in a Named Tuple__


In [14]:
x, y = pt

In [15]:
x = pt[0]

In [16]:
for e in pt:
    print(e)

10
20


In [17]:
pt.x

10

In [18]:
pt.y

20

In [19]:
# Since namedtuple generates classes inheriting from Tuple, the class instances are 
# immutable just like normal tuples.
isinstance(pt, tuple)

True

In [20]:
pt.x = 'error'

AttributeError: can't set attribute

__Introspection__

In [25]:
# You can find the field names for a named tuple
Person = namedtuple('Person', 'name age _ssn', rename=True)

Person._fields

('name', 'age', '_2')

__Extracting Named Tuple Values to a Dictionary__

In [28]:
Point2D = namedtuple('Point2D', 'x y')

pt = Point2D(10, 20)

pt._asdict()

{'x': 10, 'y': 20}

__Modifying a Named Tuple__

In [50]:
# Since tuples are immutable, we must create a new object to modify values.
pt = Point2D(10, 20)

# This, however, is not the cleanest approach if we have a tuple with lots of fields
pt = Point2D(pt.x, 100)

In [51]:
Stock = namedtuple('Stock', 'ticker year month day current high low close')
djia = Stock('DJIA', 2021, 1, 1, 26, 28, 24, 25)

# We can grab only the fields we want to keep by slicing
current = djia[:7] # Returns a tuple

In [52]:
# Or by unpacking
*current, _ = djia # Returns a list

In [53]:
djia = Stock(*current, 25.5)

In [61]:
# We can also use the _make class method, but we need to construct the new values first
*current, _ = djia

current.append(25.5)

djia = Stock._make(current)
djia

Stock(ticker='DJIA', year=2021, month=1, day=1, current=26, high=28, low=24, close=25.5)

__The Best Way: `_replace`__

In [62]:
Stock = namedtuple('Stock', 'ticker year month day current high low close')
djia = Stock('DJIA', 2021, 1, 1, 26, 28, 24, 25)

djia = djia._replace(day=26, high=29, close=25.5)
djia

Stock(ticker='DJIA', year=2021, month=1, day=26, current=26, high=29, low=24, close=25.5)

__Extending a Named Tuple__

In [64]:
Stock = namedtuple('Stock', 'ticker year month day current high low close')
djia = Stock('DJIA', 2021, 1, 1, 26, 28, 24, 25)

new_fields = Stock._fields + ('prev_close', )

StockExt = namedtuple('StockExt', new_fields)
djia_ext = StockExt(*djia, 26)

__Named Tuple Docstrings__

In [65]:
Point2D = namedtuple('Point2D', 'x y')

help(Point2D)

Help on class Point2D in module __main__:

class Point2D(builtins.tuple)
 |  Point2D(x, y)
 |  
 |  Point2D(x, y)
 |  
 |  Method resolution order:
 |      Point2D
 |      builtins.tuple
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |  
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |  
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.
 |  
 |  _replace(self, /, **kwds)
 |      Return a new Point2D object replacing specified fields with new values
 |  
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |  
 |  _make(iterable) from builtins.type
 |      Make a new Point2D object from a sequence or iterable
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(_cls, x, y)
 |      Create new in

In [67]:
# You can also override the docstring values for the object and its attributes
Point2D.__doc__ = 'Represents a 2D Cartesian coordinate.'
Point2D.x.__doc__ = 'x coordinate'

__Default Values for a Named Tuple__

In [69]:
# Using a prototype
Vector2D = namedtuple('Vector2D', 'x1 y1 x2 y2 origin_x origin_y')

vector_zero = Vector2D(0, 0, 0, 0, 0, 0)

v1 = vector_zero._replace(x1=10, y1=10, x2=20, y2=20)

In [70]:
# Using the __defaults__ property (better)
Vector2D = namedtuple('Vector2D', 'x1 y1 x2 y2 origin_x origin_y')

# This will set the default value of origin_x and origin_y to 0
Vector2D.__new__.__defaults__ = (0, 0)

v1 = Vector2D(10, 10, 20, 20)
v1

Vector2D(x1=10, y1=10, x2=20, y2=20, origin_x=0, origin_y=0)

__Alternative to Dictionaries__

In [72]:
data_dict = {
    'key1': 100, 'key2': 200, 'key3': 300
}

In [73]:
Data = namedtuple('Data', data_dict.keys())

In [76]:
d1 = Data(**data_dict)
d1

Data(key1=100, key2=200, key3=300)

In [78]:
# If storing a key name as a variable
key_name = 'key2'

#We cant use this variable as an attribute obviously
# d2.key_name

getattr(d1, key_name)

200

In [79]:
# Returning default value is key doesnt exist
data_dict.get('key10', None)

getattr(d1, 'key10', 10_000)

10000