# Lesson 1 - Data types and data structures in python
This tutorial will cover basic data types in python as well as some common data structures that are used in modeling.

## Types
The most commonly used data types for modelers that we will cover are:
* `int` - integer
* `float` - floating point number (decimal places)
* `str` - string (text)
* `bool` - True or False
* `list` - an ordered list of objects
* `dict` - a dictionary of keys and values (may be ordered or unordered depending on version)

There are two types that are more unncommon that we will cover:
* `tuple` - similar to a list but cannot be changed after creation
* `set` - an unordered collection of unique objects (no duplicates)

To determine what type an object is, you can use the `type` function. 
Another useful function is the dir function, which returns the attributes of the object, which includes attributes as well as methods.
To change the type of an object, you can use the type as a function like below. It will return an error if it is unable to recast as that type.

In [1]:
# Getting the type of 10
print(type(10))
# re-casting a float as an integer
# Note that recasting a float as an integer truncates everything after the decimal point
print(int(45.6))

<class 'int'>
45


In [1]:
# You can also use dir to get all the built in methods of a function
dir(int)

['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 'real',
 'to_bytes']

## Immutable vs mutable types
One of the most important and confusing things about python is immutable vs mutable types. Simply put, immutable types cannot be changed once created, while mutable objects can. This may seem like an unimportant detail but can cause misunderstandings when coupled with how variables are assigned. `int`,`float`,`str`,`bool`, and `tuple` are immutable, while `list`, `dict`, and `set` are mutable. 

When assigning variables, you are not actually creating a new object, but simply stating that the variable references that object. You can think of it almost like creating a shortcut in for a file or a folder. For example, if I say `x = 1`. This says x references the object "1". If I then say `x = x + 1`, it will now say that x references the object 2. We can determine this by using the `id` function, which returns the identity of the object that the variable is referencing.

In [2]:
x = 1
print(id(x))
x = x + 1
print(id(x))

94700992135808
94700992135840


Notice how the id has changed, since the object that x is now referencing is different. For immutable objects, this referencing is not an issue. However, this can cause some confusion for mutable objects. What if we create a list `x = [1,2,3]` and set `y = x`? We can see using `id` that x and y reference the same object.

Can you imagine what would happen now if we change x or y? Since they reference the same object, making any change to either one will change the other too! This is different from other programming languages that  other people are familiar with, where we would expect y is a completely new and separate copy of x. To create a new copy, we would instead have to use the `copy` method that is present for mutable objects.

In [3]:
# Using normal assignment
x = [1, 2, 3]
y = x
print(f'id of x = {id(x)}')
print(f'id of y = {id(y)}')
#Adding 10 to the list
y.append(10)
print(f'x = {x}')
print(f'y = {y}')

id of x = 140120160423040
id of y = 140120160423040
x = [1, 2, 3, 10]
y = [1, 2, 3, 10]


In [4]:
# Using the copy function, notice how the ids are now different
x = [1, 2, 3]
y = x.copy()
print(f'id of x = {id(x)}')
print(f'id of y = {id(y)}')
#Adding 10 to the list
y.append(10)
print(f'x = {x}')
print(f'y = {y}')

id of x = 140120160422560
id of y = 140120160417904
x = [1, 2, 3]
y = [1, 2, 3, 10]


## Integers - `int`
Integers are whole numbers that are useful for counting and iterating over. You can easily create an integer by just using a whole number with no decimal place. For example, if you wanted the length of a list, it would be returned as an integer. This would allow you to easily compare equality and create a range to iterate over if necessary. You can still use integers to run your typical calculations.

In [5]:
# integer
print(type(1))

<class 'int'>


## Floating point - `float`
`float` are use to define numbers with decimal places. You can create a float by making a number with a decimal. They can support up to 16 digits of precision. They are similar to most other floats in other programming languages with all the limiations that typically arise due to floats, such as being unable to test exact equality and formatting. We will cover how to format floats when printing them in the `str` section.

In [6]:
# not integer
x = 1.
y = 671.143671371
print(type(x))
print(type(y))

<class 'float'>
<class 'float'>


## Strings - `str`
Strings are used to define text. Strings can be created by enclosing the text between `"` or `'`. There is no difference between them. Strings act similar to lists and can be indexed using brackets `[start:end]`. **Indexing in python has 0 as the first value.** It also has an inclusive start and an exclusive end. For example `string[2:5]` would include the 3rd character up to the 5th, but does not include the 6th. Strings can also be concatenated by using the `+` sign.

There is also a very handy functionality in python called `f-strings`, which allow you to use variables inside of strings (also known as "string interpolation"). To create an f string, you simply put f before the quotation marks. Then you can call a variable between `{}`. For float values, you can use the format `{variable:length.precisionf}` to format the output. See [this guide](https://realpython.com/python-formatted-output/) for more options when formatting.

In [7]:
#Indexing a string
string = '0123456'
string[1:5]

'1234'

In [8]:
x = 5
y = 125.3246278
# Creating strings of the numbers
string_x = f'x = {x}'
string_y = f'y = {y:3.3f}'
# concatonating strings together
print(string_x + ', ' + string_y)

x = 5, y = 125.325


## Booleans - `bool`
Booleans are used for equalities. Booleans consist of only the object `True` and `False`. Booleans are actually considered numerical values, with `True == 1` and `False == 0` so it is possible to do math on them.

In [9]:
print(True == 1)
print(False == 0)
print((True + False) / (5 - True))

True
True
0.25


## List - `list`
A list is an ordered array of objects. A list is defined using `[]`, with `,` between each item. You can add lists together using `+`, similar to strings. They can be indexed the same way as well. A list can be any combination of any types. There are several useful methods that can be used when generating lists. A full list of methods can be found [here](https://docs.python.org/3/tutorial/datastructures.html).

* `list.append(x)` - adds an item to the end of the list
* `list.extend(iterable)` - adds all the items of an object, such as another list, to the list
* `list.remove(x)` - removes the first item that equals the given value
* `list.count(x)` - returns the number of times a value appears in the list
* `list.sort()` - sorts the list (see [here for more info](https://docs.python.org/3/library/functions.html#sorted))
* `list.copy()` - returns a shallow copy, you can also use list_name[:] (for what a shallow vs deep copy is, see [here](https://docs.python.org/3/library/copy.html))
* `len(list)` - returns the number of items in the list (works for other types as well)

Another extremely useful feature of lists is list comprehensions, which allow you to create lists out of other lists. This will be covered later. There are also numpy arrays which are similar to lists but are used for numbers which will also be covered later.

In [10]:
example_list = [1,
                2.167,
                'test']
example_list.append(['list', 'within', 'a', 'list'])
example_list.extend(['another', 'list'])
print(example_list)
print(example_list[0:2])

[1, 2.167, 'test', ['list', 'within', 'a', 'list'], 'another', 'list']
[1, 2.167]


list functions are arraywise and not elementwise. For example, `+` adds lists together and does not add elements together.

In [11]:
# z = [1,2,3,4,5,6] NOT [5,7,9]
x = [1,2,3]
y = [4,5,6]
z = x + y
print(z)
# m = x repeated 3 times, NOT [3,6,9]
m = x * 3
print(m)

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 1, 2, 3, 1, 2, 3]


## Dictionary - `dict`
A dictionary is an extremely useful data structure that consists of key-value pairs. The purpose of a dictionary is to allow for indexing by a key, instead of by order. A dictionary is created by using the `{key: value}`. A key can be anything that does not contain a mutable object, such as a list or another dictionary. Each key must also be unique. If another key of the same object is added, it is overwritten. A value can be any object. 
Some useful methods for a dictionary are:

* `dict.get(key)` - returns the value of a specified key (can also use `dict[key]`)
* `dict.pop(key)` - removes key from dictionary and returns value that was removed
* `dict.keys()` - returns the keys of the dictionary
* `dict.values()` - returns the value of the dictionary
* `dict.items()` - returns the dictionary as a list containing the tuple for each key-value pair (useful for iterating)
* `dict.update(new_dict)` - adds the given dictionary to the dictionary, either updating existing keys or adding new ones
* `dict.setdefault(key, value)` - returns the value with the specified key, if the key does not exists, inserts it with the value

In [12]:
dictionary = {
    'a': 1,
    'b': 2,
    'c': 3,
}
# Get keys and values, and get value of key a
print(dictionary.keys())
print(dictionary.values())
print(dictionary['a'])

dict_keys(['a', 'b', 'c'])
dict_values([1, 2, 3])
1


There are 3 ways to add values. The easiest method is to simply call the key using `[]` and setting it to a new value. If it does not exist in the dictionary it will add it. You can also use `.setdefault`or update. `.setdefault` is different from `[]` because it does not chang the value if they key already exists. `.update` is useful for changing or adding multiple key value pairs. To summarize:
* dict[key] = value:
    * if key exists: change it
    * if key does not exist: add it
* dict.setdefault(key, value):
    * if key exists: do not change it and return value
    * if key does not exist: add it and return new value
* dict.update({new_dict}):
    * if the key exists: change it
    * if the key does not exist: add it

In [13]:
# Adds new value
dictionary['d'] = 2
print(dictionary)
# Changes value of existing key
dictionary['d'] = 4
print(dictionary)

{'a': 1, 'b': 2, 'c': 3, 'd': 2}
{'a': 1, 'b': 2, 'c': 3, 'd': 4}


In [14]:
# Using setdefault on an existing key just returns the value, the value given is ignored
print(dictionary.setdefault('a',2))
print(dictionary)
# Using setdefault for a new key adds the key value pair to the dictionary
print(dictionary.setdefault('e',4))
print(dictionary)

1
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
4
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 4}


In [15]:
dictionary.update({'e':5, 'f':6})
dictionary

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}

## Tuple - `tuple`
A tuple is similar to a list, but unchangable once created and thus immutable. In practice, this allows for faster computation and search time but is generally unnecessary for modelers. A tuple can be created with `()`. You can also convert a tuple to a list or vice versa using `list` or `tuple`.

In [16]:
example_tuple = (1, "test", "tuple")
# converting tuple to list
converted_list = list(example_tuple)

print(type(converted_list))
print(converted_list)

<class 'list'>
[1, 'test', 'tuple']


In [17]:
# The commented-out code below will not work because a tuple is immutable.
# You cannot replace an element within it. 

# example_tuple[0] = 2

# If this object had been a list, this code will run.

## Sets - set
A set is a somewhat unique data type. A set is unordered, unindexed and unique. This means that the items can be in any order and there can only be no duplicates. A set is created using `{}`. It differs from defining a dictionary since there is no `key:value`. A set is especially useful since it can be used to get unique values from a list or other collection. By converting a list into a set, (and back to a list if you want), you can get the list without duplicates.

In [18]:
# Creating a list
example_set = {1,'test', 'set'}
print(example_set)

{1, 'test', 'set'}


In [19]:
list_with_duplicates = [1,1,2,3,5,5,5,7,7,8]
set_without_duplicates = set(list_with_duplicates)
print(set_without_duplicates)

{1, 2, 3, 5, 7, 8}
