# Python data types and variables

**Author**: Andrea Ballatore (Birkbeck, University of London)

**Abstract**: Learn about basic data types and how to create and use variables.

## Setup
This is to check that your environment is set up correctly (it should print 'env ok', ignore warnings).

In [1]:
# Test geospatial libraries
# check environment
import os
print("Conda env:", os.environ['CONDA_DEFAULT_ENV'])
if os.environ['CONDA_DEFAULT_ENV'] != 'geoprogv1':
    raise Exception("Set the environment 'geoprogv1' on Anaconda. Current environment: " + os.environ['CONDA_DEFAULT_ENV'])

# spatial libraries 
import fiona as fi
import geopandas
import pandas as pd
import pysal as sal

print('env ok')

Conda env: geoprogv1
env ok


----
## Data types

Python supports many types of data types, and these are the most common ones. You can print a value with `print()`. Execute each cell to see the results.

### Integers

In [3]:
print(5)
print(22)
print(-384572)

5
22
-384572


### Floating point numbers

In [4]:
print(0.5)
print(.5)
print(12.34)
print(-23.1)

0.5
0.5
12.34
-23.1


Numbers can be manipulated like in a calculator with operations + - * /. Parentheses are used to specify the priority of the operations. `**` means _power_ ($x^y$).

In [5]:
print(3*10)
print(10+2*5)
print((10+2)*5)
print(10/3)
print(2*5)
print(2**5)


30
20
60
3.3333333333333335
10
32


### Strings
A **string** is a sequence of characters, delimited by double or single quotes:

In [2]:
print('This is a string')
print("This is a string")
print("London")
print("Geographic Data Science is fun")
print("A string can contain any character, including %&^}{?><")
print("Other languages can be specified too: caffè français حبيبي_(توضيح)")

This is a string
This is a string
London
Geographic Data Science is fun
A string can contain any character, including %&^}{?><
Other languages can be specified too: caffè français حبيبي_(توضيح)


- Multi-line strings can be specified with three double quotes `"""`.
- Internally, the string will have a `\n` symbol to delimit the end of a line:

In [8]:
"""This is 
a multi-line string.
It can have multiple 
lines."""

'This is \na multi-line string.\nIt can have multiple \nlines.'

### Booleans 
A **boolean** can only take two values: `True` or `False`. Booleans can be combined with `and`, `or` and `not` (this is called Boolean algebra and will be useful later):

<img src="img/algebra1.png" width=400/>

In [9]:
print(True)
print(False)
print(True and False)
print(True or False)
print(not True)
print(not False)

True
False
False
True
False
True


### Comments 
In Python, you can specify **comments** that are useful to document the code, but will not be executed.

In [12]:
# This is a comment and will not be printed.
print('This is printed')
# Comments are extremely useful and you should use them 
# to explain what you are doing in the code.
print('This is printed too')

# Usually comments are written BEFORE an instruction, not after:
# Divide numbers:
print(4/3)

This is printed
This is printed too
1.3333333333333333


## Variables

Variables are a core concept in all programming languages. A variable is a symbol referring to a value and it can be thought of as box with a label (name) and content (value). `x` and `y` are variable names, while `1` and `2` are their content:

<img src="img/variable1.png" alt="Variable" width="400"/>

The value can change over time. A value can be **assigned** to a variable with `=`:

In [17]:
x = 1
y = 2
# nothing is printed. Now x and y have values, 
# which we can inspect with print:
print(x)
print(y)
# we can also print two variables with a single print:
print(x, y)
# we can assign variables to other variables:
z = x + y
# let's print z with some text to make the output more readable:
print('value of z:',z)

1
2
1 2
value of z: 3


Now variables `x`, `y`, and `z` are in memory and can be used again. Let's change the value of variables:


In [18]:
print('value of x:',x)
x = 6  # this assignment overwrites the content of x with a new value
print('new value of x:',x)

# Note that printing a number does not create a variable!
print(6) # nothing changes in memory
x = 6 # a variable is created in memory
print(x)

# note that this instruction does not save the result anywhere!
x + 1

# if we want to save the result, we have to use a variable
my_new_variable = x + 1

# or we can just print the result if we do not need it later
print(x + 1)

# note that you can assign a variable to itself. This is a very common operation:
print(x)
x = x + 1 # this increases the current value by one
print(x)
x += 1 # the same operation in an abbreviated form
print(x)

value of x: 1
new value of x: 6
6
6
7
6
7
8


It is important to understand how variables change in a programme. Printing variables is essential to inspect their value to make sure it is the expected one:


In [20]:
x = 1
print(x)
x = 10
print(x)
x = 20
print(x)

1
10
20


### Variable typing

You can create variables with any value. Python understands automatically what the correct type is (**dynamic typing**):


In [23]:
# some strings
city = 'London'
university = 'Birkbeck, University of London'

# the function type allows you to see the current type of a variable:
print(type(10))
print(type(10.0))
print(type('Monday'))
print(type(True))

# currency
pound_to_euro_exchange_rate = 1.1
my_pounds = 2000
my_euros = pound_to_euro_exchange_rate * my_pounds

# if we write the variable name at the end of a cell, 
# the notebook will show its value without having to say "print"
my_euros

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


2200.0

### Naming variables

- Giving good names to variables is very important to write clear and maintainable code.
- For detailed discussion of Python naming conventions, see [this article](https://medium.com/@dasagrivamanu/python-naming-conventions-the-10-points-you-should-know-149a9aa9f8c7).
- Python variable names **should**:
  - be lowercase
  - not contain spaces (use underscores `_` between words instead)
  - be expressive and meaningful, but not too long
  - not start with a number
  - avoid overly generic terms, unless the context is very clear (`result`, `input`, `output`, `value`, etc.)
  - avoid reserved keywords of the language (`for`, `int`, `float`, `def`, etc.)
- Examples of **good variable names**:
  - city_population     
  - building_height_ft    
  - path_length_m     
  - district_name     
  - highest_point_coords     
  - reference_system_code
  - district_crime_rate_2005      
  - co2_density
  - county_code 
  - unemployment_rate_2017
- These are examples of possible, but bad variable names:
  - total_unemployment_rate_of_east_of_england_in_june_2017
  - Var1   
  - myvar
  - string   
  - myint    
  - CITYPOPULATION 
  - ROAD_width    
  - DSCUR8      
  - 9sdd

### Comparing variables

To understand how variables work in detail, it is useful to compare them. While `=` is used to assign values, `==` is used to compare two values. The comparison returns a boolean (`True` or `False`):

In [32]:
city = 'London'
print(city=='London')
print(city=='Paris')

year = 1950
print(year==1950)
print(year==1990)

True
False
True
False


It is important to understand very well the difference between **variable names** and **strings**:

In [33]:
year = 1950 # a variable called year
"year" # just a string
# False: year is a variable name, 'year' is just a string:
print(year=='year') # False
year='year' # assign string 'year' to variable string
print(year=='year') # Now this is True 

False
True


You can compare string and numbers with several operators `>=` (greater or equal), `>` (greater):

| Symbol | Operator |
|----|---|
| == | is equal |
| !=  | is not equal to |
| < | less than |
| > | greater than |
| <=  | less than or equal to |
| >=  | greater than or equal to |

In [25]:
years = 10
print(years >= 10) # True
print(years > 10) # False
print(years > 20) # False
print(years < 20) # True
print(years != 10) # False, it is equal to 10

True
False
False
True
False


In [27]:
# while it is possible to compare integers and floating points,
# it is DANGEROUS:
print('comparing int and float')
print(10==10) # True, safe
print(10==10.0) # Unsafe! There might be a tiny decimal part that makes the comparison false
print(10==10.00000001) # False! This is hard to predict.
print(10==10.000000000000000000000001) # True! This is hard to predict.
print(10==round(10.0)) # True and safe. We transform the float into a integer before comparing it

comparing int and float
True
True
False
True
True


In [26]:
# we can also compare strings according to the alphabetical order:
print('strings')
city = "London"
print(city < 'Paris') # True (alphabetical order)
print(city < 'Lagos') # False (alphabetical order)

comparing int and float
True
True
False
True
True
strings
True
False


### Data type errors

Data types can be confusing and can lead to errors. Take a look at strings and numbers:

In [32]:
print(x)
x = 10
print(x==10) # True: two integers
print(x=='10') # False: a string is not a number

20
True
False


In [35]:
y = 'london'
print(y)
print(y=='london') # True: identical
print(y=="london") # True: identical
print(y==" london") # False: spaces matter!
print(y==" london ") # False: spaces matter!
print(y=="London") # False: case matters!
print(y=="LONDON") # False: case matters!

# variable names are also case-sensitive
Y = 'Lagos'
print(y, Y)

london
True
True
False
False
False
False
london Lagos


### How to use None

- In Python, `None` is equivalent to `null` in many languages. It means that the variable is **unknown**.
- It is important to distinguish between `None`, `0` and the empty string (`''`):

In [39]:
population = None # population unknown
print(population is None) # True
print(population == None) # True
print(population is not None) # False

population = 0 # population zero
print(population==0) # True
print(population is None) # False
print(population is not None) # True
print(None == 0) # False

True
True
False
True
False
True
False


In [41]:
# strings
print('' == None) # False, empty string is different to None

# Temperature is 0 celsius
temperature = 0
# Temperature is unknown
temperature = None
# Note that these statements mean very different things!

False


### Manipulating strings

In Python, some symbols mean different things depending on context. 
`+` means addition when is used between two numbers, but it can also be used to *concatenate* strings:

In [42]:
a = 'Greater'
b = 'London'
c = a + b # concatenate strings. The result is a new string
print(c)
print(c == 'GreaterLondon')

c = a + ' ' + b # concatenate strings adding a space in between
print(c)
print(c == 'Greater London')

# get individual characters.
# Note that the index starts from 0 and not from 1:
print(b[0]) # first character of b. 
print(b[1]) # second character of b
print(b[2]) # third character of b

GreaterLondon
True
Greater London
True
L
o
n


## Data structures: lists

Lists are important data structures in Python. 
Unlike a simple variable, a list can host a number of values that are ordered. 
Each element has an **index** that starts at 0 (and not 1!) and then increments. 
For example, this list `colors` contains three strings:

<img src="img/list1.png" alt="Variable" width="400"/>

Elements are ordered, and can be accessed with the operator `[]`: `list[number]`.
If the index is out of bounds, Python will raise an error:



In [44]:
colors = ['red','blue','green']
print(colors) # print whole list

print(colors[0]) # print first element
print(colors[1]) # print second element
print(colors[2]) # print third and last element

# this would return an error because the fourth element does not exist!
#print(colors[5])

# see the length of the list (len):
print(len(colors))

['red', 'blue', 'green']
red
blue
green
3


### List operations
Lists can contain any data type and support useful operations. Let us create lists of floating points and integers:

In [47]:
some_floats = [1.23, -0.444, 5.235, -4.69]
# get the maximum number
print(max(some_floats))
# get the minimum number
print(min(some_floats))

5.235
-4.69


In [48]:
# add an element at the end
some_floats.append(-10.34)
# Note the syntax: object <dot> method: list.append()
# now the list has 5 elements!
print(len(some_floats))

# invert the order of the list with reverse:
some_floats.reverse()
print(some_floats)

# sort list
ordered_floats = sorted(some_floats)
print(ordered_floats)

5
[-10.34, -4.69, 5.235, -0.444, 1.23]
[-10.34, -4.69, -0.444, 1.23, 5.235]


### The `random` package
In data science, we often want to **sample or randomise a list**. For this purpose, you can use the `random` package.
Note the `import` command that allows you to access functionality in a package that is not available in basic Python.
The syntax `package.function(...)` is widely used in Python: 


In [56]:
import random 

dice = [1,2,3,4,5,6] # simulate a dice

# Select one value from the list randomly.
# this simulates throwing a dice: 
rand_number = random.sample(dice, 1)
print(rand_number)

# shuffle the content of the list, changing its order randomly
random.shuffle(dice)
print(dice)

# Run this cell again to see randomness in action!

[4]
[6, 2, 5, 1, 4, 3]


----
## Errors and debugging

- Most of the coding time is spent on **debugging**. While it is fairly easy to write code that runs without errors, it is a lot harder to write code that generates the expected results.
- Code can be **syntatically correct** (it runs without errors) but **semantically incorrect** (it does not produce the expected results). Syntax errors are much easier to fix than semantic errors.
- When we write code, it is important to do it **incrementally**. We write one line at a time, making sure that a line works before moving to the next one.

<img src="img/coding_cycle1.png" width=200 />

- By default, Python instructions do not give feedback. This approach is called **no news, good news**. To see the internal details of a calculation we have to use `print`. Inspecting the value of variables is often the only way to find out why a piece of code does not work.
- When in doubt, print, print, print!

<img src="img/coding_errors1.png" width=300 />

- Python generates **errors** when something does not work. Always read these errors carefully. 
- While most errors are clear, some errors might not very useful to solve the problem at hand.

In [60]:
# this will generate an error, unless variable age is defined:
print(age)

25


End of notebook