# Lecture 2: Variable types and basic data structures
ENVR 890-001: Python for Environmental Research, Fall 2020

August 21, 2020

By Andrew Hamilton. Some material adapted from Greg Characklis, David Gorelick and H.B. Zeff.

## Summary
In this lecture, we will learn about how computers store different kinds of information, and what types of operations we can do with different **types** of data. We will then find out how different **data structures** can be used to store, access, and manipulate collections of data.

## Variable types
Many types of information can be assigned to a Python variable. Python will dynamically try to figure out what kind of information you are dealing with, and classify the variable with the appropriate **type**. Each type has different memory requirements and allowable operations. 

### Floating point numbers (``float``)
The first variable type to understand is the floating point number. This is a computer's representation of a numeric value with a decimal point (e.g., 1.2345). 

In [1]:
x = 1.2345
print(type(x))

<class 'float'>


One important fact to remember is that computers generally do not represent floating point numbers exactly. Rather, since computers simply manipulate 1's and 0's on the machine code level, Python represents floating point numbers as "binary fractions". This representation is always extremely close, but not guaranteed to be perfectly precise. This logic of how this works is somewhat complicated (but cool! check out the explanations [here](https://realpython.com/python-data-types/) and [here](https://docs.python.org/3/tutorial/floatingpoint.html) for more detail), but the important things to remember are:

1. Floating point numbers can only be trusted out to about 15 significant digits. This is not a problem under most circumstances, but can cause problems if you are working with very large or very small numbers. It can also cause problems if a program includes massive numbers of computations, as the rounding errors will accumulate throughout the program.

2. The value of a variable printed to the screen is not the exact representation stored by the computer - Python will generally round the output for visual convenience. So 1.2345 may actually be stored as 1.234500000000001 or 1.234499999999998.

3. The largest number you can represent as a float is approximately $1.79 * 10^{308}$. Anything larger will be designated ``inf``. The closest a non-zero number can be to zero is approximately $5.0 * 10^{-324}$. The next "step" is zero.

In [2]:
1.79 * 10**308

1.79e+308

In [3]:
1.8 * 10e308

inf

In [4]:
5e-324

5e-324

In [5]:
4e-324

5e-324

In [6]:
1e-324

0.0

### Integers (``int``)
The other major numeric variable type is the integer. If you don't include a decimal when assigning a variable, Python will default to an integer type.

In [7]:
x = 101
print(type(x))

<class 'int'>


Integers are the default because they use less memory than floats (consider storing 2 vs 2.000000000000000). In Python 3, there is no limit on the size of the integer that can be stored (assuming your computer has enough memory).

Unlike "lower level" programming languages like C, Python will automatically update the variable type if a calculation demands it. For example:

In [8]:
x = 2
y = 3
z = x + y
print(x, type(x))

2 <class 'int'>


In [9]:
print(y, type(y))
print(z, type(z))

3 <class 'int'>
5 <class 'int'>


In [10]:
z = x / y
print(x, type(x))
print(y, type(y))
print(z, type(z))

2 <class 'int'>
3 <class 'int'>
0.6666666666666666 <class 'float'>


In [11]:
y = 3.3
z = x + y
print(x, type(x))
print(y, type(y))
print(z, type(z))

2 <class 'int'>
3.3 <class 'float'>
5.3 <class 'float'>


### Complex numbers (``complex``)
Python also allows for complex variables by denoting the imaginary part "j".

In [12]:
x = 4+2.3j
print(x, type(x))

(4+2.3j) <class 'complex'>


In [13]:
y = 3.1
z = x + y 

print(z, type(z))

(7.1+2.3j) <class 'complex'>


### Logical/Boolean variables (``bool``)
Logical (Boolean) variables can only take the value of True or False. This will be useful for "if statements" and "logical indexing", which we will discuss in the next  two lectures.

In [14]:
x = True
print(x, type(x))

True <class 'bool'>


Note that Boolean variables are actually a "subclass" of int, with True being equal to 1 and False being equal to 0. Some arithmetic operations will revert Boolean variables to int. This should probably be avoided, as it can be confusing.

In [15]:
y = False
z = 1
print(x + x)
print(x * y)
print(x - z)

2
0
0


### Strings (``str``)
The last major type of variable is the string, a sequence of characters. Strings are surrounded on either side by single or double quotes. They can be any length, or empty.

In [16]:
a = 'This is a string.'
b = "Another string."
c = ''
d = '2.03'
e = '*$%^'
f = ' '
print(type(a), type(b), type(c), type(d), type(e), type(f))

<class 'str'> <class 'str'> <class 'str'> <class 'str'> <class 'str'> <class 'str'>


Whichever type of quotation mark you use (``''`` or ``""``), the other can be included in the string. And the same type of quotation can be included without ending the string by using the backslash.

In [17]:
g = 'This is a "string".'
print(g)
g = "Also a 'string'."
print(g)
g = 'Another \'string\'.'
print(g)
g = "Fourth \"string\"."
print(g)

This is a "string".
Also a 'string'.
Another 'string'.
Fourth "string".


In [18]:
g = 'This is a "string". It is Andrew\'s string'
print(g)

This is a "string". It is Andrew's string


Strings can be concatonated using the ``+`` operator

In [19]:
g = a + f + b + c + f + 'Here are some symbols.' + f  + '_' + e
print(g)

This is a string. Another string. Here are some symbols. _*$%^


We cannot add together strings and numeric types.

In [20]:
# # this should cause an error, can't add str to float
# x = -1.1
# z = x + d

In [21]:
# # this should cause an error, can't add str to int
# y = 45
# z = y + d

But we can convert the string d into a float, then add it to either a float (x) or int (y)

In [22]:
print(d, type(d))

2.03 <class 'str'>


In [23]:
d_float = float(d)
print(d_float, type(d_float))

2.03 <class 'float'>


In [24]:
z1 = d_float + x
z2 = d_float + y
print(z1, z2)

3.03 2.03


We can also go the other direction and convert numeric types to strings.

In [25]:
print(x, type(x))
x_str = str(x)
print(x_str, type(x_str))
y_str = str(y)

True <class 'bool'>
True <class 'str'>


In [26]:
z1_str = str(z1)

z3 = 'Start with the number ' + d + ' and add ' + x_str + ' to get ' + z1_str
print(z3)

Start with the number 2.03 and add True to get 3.03


There are a few different ways to print variables into strings, which you should be familiar with because you will see all of them in online help forums. 

1. first
1. second

    1.third

In [27]:
name = 'Kim'
age = 24
meters = 1.52438
mms = meters * 1000

# Using + operator
sentence = '1. ' + name + ' is ' + str(age) + ' years old, and is ' + str(meters) + ' m (' + str(mms) +' mm) tall.'
print(sentence)

1. Kim is 24 years old, and is 1.52438 m (1524.38 mm) tall.


In [28]:
# We can round the number of digits printed after the decimal as follows (note this will not change the actual value of "meters")
sentence = '2. ' + name + ' is ' + str(age) + ' years old, and is ' + str(round(meters, 2)) + ' m (' + str(mms) +' mm) tall.'
print(sentence)

2. Kim is 24 years old, and is 1.52 m (1524.38 mm) tall.


In [29]:
# Using %s, %d, %f for str, int, float variables
sentence = '3. %s is %i years old, and is %f m (%f mm) tall.' % (name, age, meters, mms)
print(sentence)

3. Kim is 24 years old, and is 1.524380 m (1524.380000 mm) tall.


In [30]:
# We can also round the number of digits printed after the decimal using this method
sentence = '4. %s is %i years old, and is %.2f m (%.2f mm) tall.' % (name, age, meters, mms)
print(sentence)

4. Kim is 24 years old, and is 1.52 m (1524.38 mm) tall.


In [31]:
# Another method, using "format" function of string class
sentence = '5. {} is {} years old, and is {:.2f} m ({:.2f} mm) tall'.format(name, age, meters, mms)
print(sentence)

5. Kim is 24 years old, and is 1.52 m (1524.38 mm) tall


In [32]:
# We can also print with commas for large numbers using this method
sentence = '6. {} is {} years old, and is {:.2f} m ({:,.2f} mm) tall'.format(name, age, meters, mms)
print(sentence)

6. Kim is 24 years old, and is 1.52 m (1,524.38 mm) tall


In [33]:
# lastly, f-strings are a new way to format after Python 3.6
sentence = f'7. {name} is {age} years old, and is {meters:.2f} m ({mms:,.2f} mm) tall'
print(sentence)

7. Kim is 24 years old, and is 1.52 m (1,524.38 mm) tall


In [34]:
# I can also leave the "f" out of ".2f" and give the number of significant digits
sentence = f'7. {name} is {age} years old, and is {meters:.3} m ({mms:,.2} mm) tall'
print(sentence)

7. Kim is 24 years old, and is 1.52 m (1.5e+03 mm) tall


### In-class exercises
Python will dynamically choose the type when you create a new variable. What will the type be for each of the following? (Feel free to create a code cell to try them out and make sure you are right!)
1. 3.14
1. 3
1. "3"
1. 3 + 14
1. 3.14 - 14.14
1. 3 * 14
1. 3 / 14
1. 3+2j
1. False
1. 'False'

## Data structures
Often in computer programming we want to store many values at once, rather than a single value. To do this, there are a variety of different **data structures** we can use in Python. Choosing the best data structure depends on the context of what you are trying to do.

### Lists
The simplest structure is the list, denoted by brackets ``[]``. A list is just want it sounds like, a list of different values. Values are separated by commas. The objects within a list can be of any type.

In [35]:
X = [1, 2, 3, 4, 5]
print(X)
print(type(X))

[1, 2, 3, 4, 5]
<class 'list'>


In [36]:
Y = ['a', 'b', 'c']
Z = [0, 1.3, 'two']

The objects in a list can even be lists!

In [37]:
A = [X, Y, Z, 'hello', 3]
print(A)

[[1, 2, 3, 4, 5], ['a', 'b', 'c'], [0, 1.3, 'two'], 'hello', 3]


We can contatonate lists with ``+``

In [38]:
B = X + Y
print(B)

[1, 2, 3, 4, 5, 'a', 'b', 'c']


And we can repeat lists with ``*``

In [39]:
C = X * 10
print(C)

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]


Sequences of numbers can be created using the ``list`` and ``range`` functions:

In [40]:
R = range(10)
print(R)

range(0, 10)


In [41]:
D1 = list(range(10))
print(D1)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [42]:
D2 = list(range(3, 10))
print(D2)

[3, 4, 5, 6, 7, 8, 9]


In [43]:
D3 = list(range(10, -10, -3))
print(D3)

[10, 7, 4, 1, -2, -5, -8]


We can access particular elements of the list through **indexing** with brackets ``[]``. Note that the **index starts with 0, not 1**. so the first element is index 0, the fourth element is index 3, etc.

In [44]:
X = list(range(5))
print(X)
print()
print(X[0])
print(X[1])

[0, 1, 2, 3, 4]

0
1


We can access a **slice** of values with the colon ``:``. Note that the value on the right hand side of colon is a hard boundary, not included.

In [45]:
# Get first and second element. 
print(X[0:2])

[0, 1]


In [46]:
# can also leave out the 0
print(X[:2])

[0, 1]


In [47]:
# Get 3rd element to end
print(X[2:])

[2, 3, 4]


In [48]:
# 3rd to 4th element
print(X[2:4])

[2, 3]


In [49]:
# 3rd to 4th another way. Minus values count backward from end.
print(X[2:-2])

[2]


We can also reset particular elements based on the index

In [50]:
print(X)
print()

X[0] = 10
print(X)

[0, 1, 2, 3, 4]

[10, 1, 2, 3, 4]


In [51]:
X[:2] = [100, 200]
print(X)

[100, 200, 2, 3, 4]


In [52]:
Y = ['a', 'b', 'c', 'd', 'e']
print(Y)
print()

Y[:3] = X[:3]
print(Y)

['a', 'b', 'c', 'd', 'e']

[100, 200, 2, 'd', 'e']


With lists of lists, we index from the outermost list to the innermost

In [53]:
X = [[[1, 2, 3], [4, 5, 6]],
     [[7, 8, 9], [10, 11, 12]]]
print(X)

[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]


In [54]:
print(X[0])

[[1, 2, 3], [4, 5, 6]]


In [55]:
print(X[0][0])

[1, 2, 3]


In [56]:
print(X[0][0][0])

1


In [57]:
X[0][0] = X[0][1]
print(X)

[[[4, 5, 6], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]


We can get the **length** of a list (or any other data structure) as follows. Note that the command refers to the outermost level of a list of lists.

In [58]:
print(X)
print()
print(len(X))

[[[4, 5, 6], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]

2


In [59]:
print(len(X[0]))

2


In [60]:
print(len(X[0][0]))

3


### Tuples
A tuple is a data structure in Python that is similar to a list, but enclosed within parentheses ``()`` rather than brackets. Variable access still uses brackets though.

In [61]:
T = (1, 2, 3, 'dog')
print(T)
print(T[0:2])

(1, 2, 3, 'dog')
(1, 2)


The main difference is that tuples are **immutable**, meaning the elements cannot be changed once they have been declared for the first time. This can be useful for storing values that should never change, and you don't want to accidentally overwrite.

In [62]:
# # this should cause an error, can't overwrite a tuple element
# T[0] = 10

However, note that the entire tuple can be overwritten using a new value. It's just individual elements that cannot be altered.

In [63]:
# can overwrite tuple with another whole tuple
T = (1, 2, 3, 'dog')
T = (5, 8, 'cat')
print(T)

# or with a non-tuple
T = (1, 2, 3, 'dog')
T = 5  
print(T)

(5, 8, 'cat')
5


### Dictionaries
Another very useful data structure in Python is the dictionary, enclosed within curly brackets ``{}``. A dictionary consists of a collection of **key-value pairs**, separated by commas. For each pair, the key and value are separated by a colon. Both the key and the value can be any variable type or data structure. An example will make this more clear. 

In [64]:
student1 = {'first_name': 'Jamal', 'last_name': 'Doe', 'age': 23, 'department': 'ESE'}
print(student1)
print(student1.keys())
print(student1.values())

{'first_name': 'Jamal', 'last_name': 'Doe', 'age': 23, 'department': 'ESE'}
dict_keys(['first_name', 'last_name', 'age', 'department'])
dict_values(['Jamal', 'Doe', 23, 'ESE'])


We can access and change the value associated with any key as follows:

In [65]:
print(student1['age'])
student1['age'] += 1
print(student1['age'])

23
24


We can also add new key-value pairs after declaration:

In [66]:
student1['email'] = 'jamal.doe@unc.edu'
print(student1)

{'first_name': 'Jamal', 'last_name': 'Doe', 'age': 24, 'department': 'ESE', 'email': 'jamal.doe@unc.edu'}


It can sometimes be helpful to create a list out of either the keys or values:

In [67]:
print( list(student1.keys()) )
print( list(student1.values()) )

['first_name', 'last_name', 'age', 'department', 'email']
['Jamal', 'Doe', 24, 'ESE', 'jamal.doe@unc.edu']


Dictionaries can be added to lists

In [68]:
student2 = {'first_name': 'Jane', 
            'last_name': 'Doer', 
            'age': 26, 
            'department': 'EPI', 
            'email':'jane.doer@unc.edu'}
students = [student1, student2]
print(students)

[{'first_name': 'Jamal', 'last_name': 'Doe', 'age': 24, 'department': 'ESE', 'email': 'jamal.doe@unc.edu'}, {'first_name': 'Jane', 'last_name': 'Doer', 'age': 26, 'department': 'EPI', 'email': 'jane.doer@unc.edu'}]


And vice versa

In [69]:
student1['course_load'] = ['ENVR 890-001', 'ENVR 755', 'SPGH 600']
print(student1)

{'first_name': 'Jamal', 'last_name': 'Doe', 'age': 24, 'department': 'ESE', 'email': 'jamal.doe@unc.edu', 'course_load': ['ENVR 890-001', 'ENVR 755', 'SPGH 600']}


And we can even create a dictionary with dictionary values:

In [70]:
PID_dict = {'0000-00001': student1, '0000-00002': student2}
print(PID_dict)

{'0000-00001': {'first_name': 'Jamal', 'last_name': 'Doe', 'age': 24, 'department': 'ESE', 'email': 'jamal.doe@unc.edu', 'course_load': ['ENVR 890-001', 'ENVR 755', 'SPGH 600']}, '0000-00002': {'first_name': 'Jane', 'last_name': 'Doer', 'age': 26, 'department': 'EPI', 'email': 'jane.doer@unc.edu'}}


In [71]:
print( PID_dict['0000-00001'])

{'first_name': 'Jamal', 'last_name': 'Doe', 'age': 24, 'department': 'ESE', 'email': 'jamal.doe@unc.edu', 'course_load': ['ENVR 890-001', 'ENVR 755', 'SPGH 600']}


In [72]:
print(PID_dict['0000-00001']['email'])

jamal.doe@unc.edu


### In-class exercises
Create and print the following objects, each in their own code cell:
1. "l1": a list with elements 1, 4, and -2.3
1. "l2": a list with elements "cat", "dog", "fish", "cat", "dog", "fish", "cat", "dog", "fish". Try to do this without typing everything three times (hint: ``*`` operator).
1. "l3": a list with the integers from -2 to 8. Try to do this without typing everything (hint: ``list/range``).
1. "i1": an integer with the length of l3
1. "l4": a list containing the first two elements of l1, the second, third, and fourth element of l2, and the last three elements of l3. Try to do this without typing everything (hint: retrieve values with indexes/slices, and combine them with the ``+`` operator).
1. "t1": a tuple with a single element, "0"
1. "t2": a tuple with the elements 5, 1, -3, -7, -11. Try to do this without typing everything (hint: ``tuple/range``).
1. "d1": a dictionary with two keys, "Fact" and "Fiction". The values for these keys are True and False, respectively.
1. "d2": a dictionary with five keys, "l1", "l2", "t1", "t2", "d1". The value for each key is the corresponding object.