<a href="https://colab.research.google.com/github/4dsolutions/clarusway_data_analysis/blob/main/python_warm_up/warmup_data_structures.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a><br/>
[![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://colab.research.google.com/github/4dsolutions/clarusway_data_analysis/blob/main/python_warm_up/warmup_data_structures.ipynb)


# Python Object Types

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/52563704012/in/album-72177720296706479/" title="LMS Dashboard"><img src="https://live.staticflickr.com/65535/52563704012_71ef4beb8a_b.jpg" width="1024" height="354" alt="LMS Dashboard"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

Python Warm-up Notebooks:

*  [Introduction to Python](warmup_python_intro.ipynb)
*  [3rd Party Libraries](warmup_3rd_party_datascience.ipynb)
*  [Object Types](warmup_data_structures.ipynb)   (you are here)
*  [Object Oriented Paradigm](warmup_object_oriented.ipynb)
*  [Calling Callables and Type Checking](warmup_callables.ipynb)
*  [Class and Static Methods, Properties](warmup_object_oriented2.ipynb)
*  [SQLite3 and Context Managers](warmup_object_sql.ipynb)
*  [Iterators and Generators](warmup_generators.ipynb) 

Exactly what is "a type" in Python?  

Objects of a specific type have corresponding capabilities, just like in ordinary language.  

Car type objects drive, horse type objects gallop, and objects of the oven type heat food.  These types of objects perform in other ways too.

## Number types 
Numbers, for example, add and multiply.  

Integers and floating point numbers are different types of number:

In [1]:
type(3)

int

In [2]:
type(3.0)

float

In [3]:
3 + 3.0

6.0

We continue looking at additional number types in a section below.

## Collection Types

Also among the types are "collections", such as 

* sequences, with their left-to-right order, and 
* mappings, which store (key, value) pairs in no particular order.  

We also call these "data structures".

The Python names `sequence` and `mapping` below are to help us remember these categories, and have no special meaning in the language itself i.e. these are not keywords or the names of built-in objects.

We could have used `cat` and `dog`.

In [4]:
sequence = [1, 2, 'a', '🍿']      # a list
mapping  = {"Joe":10, "Jill":20}  # a dict

In [5]:
sequence[0]       # look up element at index position 0

1

If you're remembering the n-dimensional array type from the numpy package, that's apropos (relevant).  Native Python sequences, such as the list and range types, likewise support slicing.

In [6]:
sequence[1:-1]

[2, 'a']

In [7]:
sequence[::2]   # start at element 0, jump by 2

[1, 'a']

In [8]:
sequence[1::2]  # start at element 1, jump by 2

[2, '🍿']

In [9]:
mapping["Jill"]   # look up the value with key "Jill"

20

In [10]:
cat = list(reversed(sequence))  # reversed is native built in
dog = list(mapping.items())

In [11]:
print(cat)    # cat is a list type object

['🍿', 'a', 2, 1]


In [12]:
print(dog)    # dog is a list of tuples, obtained from a dict

[('Joe', 10), ('Jill', 20)]


NOTE:  `reversed` does not return a list, but a specialized sequence type we're free to feed to `list`.

## Callable Types

Some types of object are what we call "callable", meaning we cause them to "do something" by using parentheses, which act like a "mouth" through which arguments may (or may not) be fed.  

We have been using callable objects extensively already.

The print object is a good example of a callable.

In [13]:
type(print)  # print is the name of a callable object

builtin_function_or_method

In [14]:
callable(callable)  #

True

Python comes with many useful built-in types, available the moment Python starts.  These built-in tyes include the string, list, tuple, dict, set.  The number types int and float are also built in.

These types, but for the None type, are also callables.  We feed them arguments to create new objects of the type in question.  We have seen this in the case of feeding a range object to a list object.  Both range and list are callables.

In [15]:
print(type(range), type(list))

<class 'type'> <class 'type'>


In [16]:
list(range(10))  

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Other types may be imported from the Standard Library, such as fractions.Fraction and decimal.Decimal.

Still other types, such as the numpy array and pandas DataFrame, may be imported from 3rd party packages.

Look for objects being called in the cells below:

In [17]:
from pandas import DataFrame
from numpy import array

In [18]:
df = DataFrame(array(range(1, 11)).reshape(2,5))
df

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,5
1,6,7,8,9,10


Above: `range`, `array`, `DataFrame`, `reshape` all get called with one or more arguments.

Below:  nothing is *called*, but rather *assigning* is happening (using the assignment operator `=`).

In [19]:
df.index = ['Week1', 'Week2']
df.columns = ['Mon', 'Tue', 'Weds', 'Thurs', 'Fri']
df

Unnamed: 0,Mon,Tue,Weds,Thurs,Fri
Week1,1,2,3,4,5
Week2,6,7,8,9,10


## The Function type

A primary way to create new callables of the function type, is to use the keyword `def` for "define".

In [20]:
def f(x):
    return x * x  # return x times itself

print(10)

10


The function `f` is now a callable that expects one argument, with which it computes a return or result object.  As we will see later, functions, as just another type of object, may be passed as arguments to other functions!

In [21]:
type(f)

function

In [22]:
issubclass(type(f), object)

True

In [23]:
isinstance(f, type(f))

True

In [24]:
f(10)

100

In [25]:
result = f(-20)
result

400

The one liner below takes no arguments, prints to console as a side effect, and returns the None object, as is standard when nothing is explicitly returned.  

Using the keyword return with no object also returns None.

In [26]:
def hello() -> None:
    print("Hello, world!")

In [27]:
hello()

Hello, world!


In [28]:
def hello() -> None:
    print("Hello, world!")
    return

In [29]:
result = hello()

Hello, world!


In [30]:
type(result)

NoneType

NOTE:  none of the keywords are callable, meaning no keyword takes arguments.  

You will sometimes see coders in other languages bringing habits to Python that make it look as if keywords if and return were callables, but they really are not.  


The parentheses around an object do not constitute "passing an argument" when the keyword has no "mouth".

In [31]:
def extra(): 
    """
    misleading use of syntax making it look 
    as if we were passing arguments to 
    keywords if and return
    """
    if(1 == 1):
        print("unnecessary parentheses")
    return("string object")

In [32]:
extra()

unnecessary parentheses


'string object'

Here is a version in purer Python:

In [33]:
def extra(): 
    """
    more pythonic.  keywords don't "eat".
    """
    if (1 == 1):  # now we see an expression
        print("optional parentheses may add readability")
    return "string object" # no parens

In [34]:
extra()

optional parentheses may add readability


'string object'

Another way to create a callable function, is by means of the keyword lambda.  Lambda expressions are meant to be callable one liners, eating one or more arguments and returning a result.  The keyword return is not used.

In [35]:
(lambda a, b:  a ** b)(2, 3)  # argument (2, 3) maps to a=2, b=3, returns 2 ** 3

8

In [36]:
type(lambda x: x)  # identity lambda, returns what it's given, note the type is a function

function

The built-in `map` distributes a callable function across an iterable.

In the example below, a tuple of tuples feeds pairs to the lambda expression through t, which then then raises the left number to the right power.

In [37]:
tuples = ((2, 3), (3, 4), (0, 1))
m = map( lambda t: t[0] ** t[1], tuples )
list(m)

[8, 81, 0]

## Fraction and Decimal types

These two number types were added later in Python's history.  They're not built-ins, but they're in the Standard Library.

Fractions operate according to the rules of fractions, keeping numerators and denominators deparate.  Decimals provide more control over rounding and are suitable for financial computations, but also for arbitrary precision computations i.e. numbers with more decimal places than floating point numbers have room for.

In [38]:
from fractions import Fraction

In [39]:
p = Fraction(2, 3)  # numerators and denominators
q = Fraction(5, 8)  # ... must be integers

In [40]:
p + q

Fraction(31, 24)

In [41]:
p * q

Fraction(5, 12)

In [42]:
from decimal import Decimal, getcontext

In [43]:
type(Decimal)

type

In [44]:
d = Decimal('2').sqrt()  # more precision than float 
d

Decimal('1.414213562373095048801688724')

In [45]:
getcontext()

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[Inexact, Rounded], traps=[InvalidOperation, DivisionByZero, Overflow])

In [46]:
getcontext().prec = 50  

d = Decimal('2').sqrt()  # more precision than float 
d

Decimal('1.4142135623730950488016887242096980785696718753769')

### The string type

A string is a data structure, and a sequence, consisting of letters, digits, punctuation symbols, any unicode (like emoji).

We're just getting a brief taste with these various types, and will be returning to them to learn more about their capabilities.  For now, we're just taking a first look.

In [47]:
foods = '🍕🥯🍿'   # a string of food emoji

In [48]:
type(foods)       # use type(obj) to check the type of any object

str

String elements (individual characters) may be accessed by numeric index, left to right, starting from element 0.  This is true of any sequence type object.  Even the Python range type is a sequence.  Sequences may also be sliced, as we shall see.

In [49]:
foods[0]

'🍕'

In [50]:
phrase = "lets test slicing a string using slice notation"

In [51]:
phrase[3:12]  # from position 3 through position 11, numbering from 0

's test sl'

In [52]:
type(10)      # using type()

int

In [53]:
type(10.0)    # checking 10.0

float

In [54]:
type(d)       # used earlier

decimal.Decimal

### The list and tuple types

Lists get signified with square brackets around elements separated by commas.  Lists are sequences.

Tuples look like lists except they use curved parentheses in place of square brackets.  They too are sequences.

In [55]:
foods_list = list(foods)  # create a new list from foods
foods_list

['🍕', '🥯', '🍿']

In [56]:
print(foods_list[:3] + foods_list[5:10])  # lists may be added to give longer lists

['🍕', '🥯', '🍿']


In [57]:
foods   # the original string is unaffected

'🍕🥯🍿'

In [58]:
foods_list[-1]  # last element

'🍿'

In [59]:
import random   # from the Standard Library

foods_list = [ ] # start over, empty

for turn in range(10):
    foods_list.append(random.choice(foods))  # list type has an append method
    
foods_list

['🍿', '🍿', '🍕', '🥯', '🥯', '🍿', '🥯', '🥯', '🍿', '🍿']

In [60]:
foods_list[5:-1]  # slicing -- upper bound not included as usual

['🍿', '🥯', '🥯', '🍿']

In [61]:
foods_list[5:]  # slicing -- how to include the last element

['🍿', '🥯', '🥯', '🍿', '🍿']

In [62]:
type(foods_list)

list

In [63]:
foods_tuple = tuple(foods_list)  # what's the difference

In [64]:
type(foods_tuple)

tuple

In [65]:
foods_tuple

('🍿', '🍿', '🍕', '🥯', '🥯', '🍿', '🥯', '🥯', '🍿', '🍿')

Tuple type objects are more immutable, i.e. we cannot simply reassign the leftmost element to be something else, as we may with a mutable list.

Lets introduce some more of those Python keywords: `try` and `except`.  Code in a try block that tries something possibly against the rules, will trigger an exception, or else will run normally if it's legal Python.

Usually, an exception would crash the program (make it abort, with a message), but exception syntax allows us to handle exceptions more gracefully, should we wish.

In [66]:
try:
    foods_tuple[0] = '🌭'  # hot dog
except:
    print("I'm sorry but that can't be done.")
else:
    print("As you wish.")

I'm sorry but that can't be done.


In [67]:
foods_tuple   # tuple unchanged -- good for data you want frozen solid

('🍿', '🍿', '🍕', '🥯', '🥯', '🍿', '🥯', '🥯', '🍿', '🍿')

In [68]:
try:
    foods_list[0] = '🌭'  # hot dog
except:
    print("I'm sorry but that can't be done.")
else:
    print("As you wish.")

As you wish.


In [69]:
foods_list   # list changed -- list allow overwriting elements, deleting, adding new ones

['🌭', '🍿', '🍕', '🥯', '🥯', '🍿', '🥯', '🥯', '🍿', '🍿']

### The Boolean type

This is one of the simplest types, either True or False.  

Think of True and False as names for the numbers 1 and 0 respectively.  Python returns a bool in response to statements involving such operators as `==`, `<`, `>`, `!=`, `>=`, `<=`.

The booleans also work with the logical operators `and` `or` and `if`.

In [70]:
issubclass(bool, int)

True

In [71]:
isinstance(True, bool)

True

In [72]:
True == 1

True

In [73]:
type(True)

bool

In [74]:
bool(0)

False

In [75]:
True + True

2

In [76]:
3 > 4

False

In [77]:
4 >= 3

True

In [78]:
(4 > 3) and ("love" in "I love Python")  # two True expressions connected by and

True

In [79]:
(1 + 1 == 3) or (2 + 2 == 4)  # one of these is True, so the whole expression is True

True

In this next example, the if block only executes if the if condition is True.

In [80]:
if 4 > 3:
    print("Do this thing")
else:
    print("Do this otherwise")

Do this thing


We may fit if / else logic on into a single statement.

In [81]:
result = "this" if 4 > 3 else "that"
result

'this'

In [82]:
condition = 3 in [5, 6, 7]  # True or False

result = "this" if condition else "that"
result

'that'

### The None type

Although not a Boolean, this is the place to remind ourselves that `None` is the sole instance of the `NoneType`.

In [83]:
type(None)

NoneType

In [84]:
True and True   # operator and

True

In [85]:
(True and None) == False

False

In [86]:
(True and None) == None

True

In [87]:
(True or None)  # operator or

True

### The dict type

The dictionary or dict is perhaps the most important in Python.  The dict reminds us of the namespace idea:  names paired with objects.  

A dict consists of key:value pairs, separated by commas, inside a pair of curly braces.

Remember, we'll be revisiting these types.  They know many more "tricks" (like animals do tricks) than we're seeing in this brief overview.

In [88]:
foods_dict = {'pizza'  : '🍕',
              'bagel'  : '🥯',
              'popcorn': '🍿'}

Now that a dict object has been defined, I can look up a value (in this case an emoji) using its key (in this case a word, likewise a string).

In [89]:
foods_dict['popcorn']

'🍿'

In [90]:
foods_dict['pizza']

'🍕'

In [91]:
foods_dict.keys()

dict_keys(['pizza', 'bagel', 'popcorn'])

In [92]:
foods_dict.values()

dict_values(['🍕', '🥯', '🍿'])

In [93]:
foods_dict.items()

dict_items([('pizza', '🍕'), ('bagel', '🥯'), ('popcorn', '🍿')])

In [94]:
for key, value in foods_dict.items():
    print(f"{key:10}: {value}")

pizza     : 🍕
bagel     : 🥯
popcorn   : 🍿


### The set type

The set type is a lot like a dict type, but with only keys, no values.  The keys must be both unique and immutable.  Data types which can be changed, such as lists, are not allowed to play the role of set elements, or dict keys.

In [95]:
the_set   = set(foods_list)    # duplicates will be dropped
other_set = set(foods_tuple)   # ditto

In [96]:
the_set.intersection(other_set)  # whatever they have in common

{'🍕', '🍿', '🥯'}

In [97]:
the_set.union(other_set)         # whatever they have all together

{'🌭', '🍕', '🍿', '🥯'}

### The namedtuple type

The final data structure we'll look at in this notebook is the namedtuple, which is a cross between a dictionary and a tuple.  We'll be able to access a value by its key, but also by its numeric index.

Consider an Atom, as in the Periodic Table.  It has a number of protons, an abbreviation, a name.

![](https://1.bp.blogspot.com/_5S18BpQ5mvg/RvzIZrxebuI/AAAAAAAABSs/3vVj2CQX3wk/w1200-h630-p-k-no-nu/element+periodic+table.JPG)

First we define a generic Atom, then we can make a few.

In [98]:
import collections

Atom = collections.namedtuple("Element", "protons abbrev name")

In [99]:
hydrogen = Atom(1, "H", "Hydrogen")
helium = Atom(2, "He", "Helium")
lithium = Atom(3, "Li", "Lithium")

In [100]:
hydrogen

Element(protons=1, abbrev='H', name='Hydrogen')

In [101]:
hydrogen[0]  # index attribute by number

1

In [102]:
for the_atom in (hydrogen, helium, lithium):  # loop over a tuple
    print(f"Name: {the_atom.name:20} Protons: {the_atom.protons:5}   Abbreviation: {the_atom.abbrev:4}") 

Name: Hydrogen             Protons:     1   Abbreviation: H   
Name: Helium               Protons:     2   Abbreviation: He  
Name: Lithium              Protons:     3   Abbreviation: Li  


The above for-loop showcases yet another Python feature:  format strings.  The letter f prefix, in front of the quoted string, automatically fills in the curly brace placeholders using names already defined in the namespace.