# Weeks (1-2):Basic elements of Python

## General information
This is a [Jupyter Notebook](https://jupyter.org/). This particular notebook is designed to introduce you to a few of the basic concepts of programming in Python. The contents of this document are divided into cells, which can contain Markdown-formatted text, Python code, or raw text. You can execute a snippet of code in a cell by pressing **Shift-Enter**. Try this out with the examples below.

You can follow the lesson eithe by opening Jupyter Notebook from your computer or through Binder.<br/>
<a href="https://mybinder.org/v2/gh/RUG-Elective/Notebooks_official/main?labpath=Week_1_2%2FSection%25201.ipynb"><img alt="Binder badge" src="https://img.shields.io/badge/launch-binder-red.svg" style="vertical-align:text-bottom"></a>

In this lesson we will learn about good programming practices, how data can be stored in Python lists, some useful ways of using and modifying Python lists, and how to make different data types work together in Python.

When learning a new programming language, it is customary to first learn how to print 'Hello World!'. While a bit quirky, it is a useful first step to know how to send input to the program and where to see the output. In Python, you can use the built-in ``print()`` function to print the greeting ``"Hello World!"``

In [1]:
print('Hello World!')

Hello World!


# 1.Variables

## Strings

##### A string is a sequence of letters, numbers, and punctuation marks - or commonly known as text
##### In Python you can create a string by typing letters between single or double quotation marks.

In [40]:
city = 'Almere'
province = 'Flevoland'
print(city, province)

Almere Flevoland


Concatenate strings together using the "+" sign

In [41]:
print(city + province)

AlmereFlevoland


In [42]:
print(city + ',' + province)

Almere,Flevoland


## Numbers

Python can handle several types of numbers, but the two most common are:

- **int**, which represents integer values like 100, and
- **float**, which represents numbers that have a fraction part, like 0.5

In [43]:
population = 881549
latitude = 37.7739
longitude = -121.5687

In [44]:
print(type(population))

<class 'int'>


In [45]:
print(type(latitude))

<class 'float'>


In [46]:
elevation_feet = 934
elevation_meters = elevation_feet * 0.3048
print(elevation_meters)

284.6832


In [47]:
area_sqmi = 46.89
density = population / area_sqmi
print(density)

18800.362550650458


In [48]:
#Rounding to two decimals
round(18800.362550650458,2)

18800.36

In [49]:
#Adding thousands separator
f'{18800.36:,}'   

'18,800.36'

In [50]:
print(type(18,800.36))

TypeError: type() takes 1 or 3 arguments

## Exercise
We have a variable named `distance_km` below with the value `5940` - indicating the straight-line distance between Groningen and New York in Kilometers. Create another variable called `distance_mi` and store the distance value in miles.

- Hint1: 1 mile = 1.60934 kilometers

Add the code in the cell below and run it. The output should be 9559.47

In [None]:
distance_km = 5940
# Remove this line and add code here

## 2.Selecting "good" variable names

Some "not-so-good" variable names.
To illustrate the point, consider a few not-so-good examples below.

In [None]:
p = "101533"

or

In [None]:
p_id = "101533"

Any idea what these variables are for? Of course not, the variables ``p`` and ``p_id`` are too short and cannot communicate what they should be used for in the code. You might think you know what they are for now, but imagine not looking at the code for a few months. Would you know then? What about if you share the code with a friend? Would they know? Probably not.

Let's look at another example.

In [None]:
dutchmeteorlogicalinstituteobservationstationidentificationnumber = "101533"

OK, so now we have another issue. The variable name potentially provides more information about what the variable represents (the identification number of an FMI observation station), but it does so in a format that is not easy to read, nor something you're likely to want to type more than once. The previous example was too short, and now we have a variable name that is too long (and hard to read as a result).

### What a "good" variable name is

A good variable name should:

1. Be clear and concise. 

2. Be written in English. A general coding practice is to write code with variable names in English, as that is the most likely common language between programmers.

3. Not contain special characters. Python supports use of special characters by way of various encoding options that can be given in a program. That said, it is better to avoid variables such as ``global`` because encoding issues can arise in some cases. Better to stick to the [standard printable ASCII character set](https://en.wikipedia.org/wiki/ASCII#Printable_characters) to be safe.

4. Not conflict with any [Python keywords](https://www.pythonforbeginners.com/basics/keywords-in-python), such as ``for``, ``True``, ``False``, ``and``, ``if``, or ``else``. These are reserved for speical operations in Python and cannot be used as variable names.See the full list of reserved words here: [Python keywords list](https://www.w3schools.in/python/keywords)


With this in mind, let's now look at a few better options for variable names.

## Formatting "good" variable names

There are several possibilities for "good" variable name formats, of which we'll consider two:

### Recommendation: user_id naming

*user_id* uses lowercase words separated by underscores ``_``. This is our suggested format as the underscores make it easy to read the variable, and don't add too much to the length of the variable name. As an example, consider the variable ``temp_celsius``. 

### Python is case sensitive!

In [None]:
user_name = 'User1'

In [None]:
print(User_name)

The code above causes an error because of the inconsistency in upper- and lowercase letters in the variable name. Let's fix this.

In [None]:
print(user_name)

## 2.Data Structures

### Tuples

A *tuple* is a sequence of objects. It can have any number of objects inside. In Python tuples are written with round brackets **()**. 

In [None]:
latitude = 37.7739
longitude = -121.5687
coordinates = (latitude, longitude)
print(coordinates)

You can access each item by its position, i.e. *index*. In programming, the counting starts from 0. So the first item has an index of 0, the second item an index of 1 and so now. The index has to be put inside square brackets **[]**.

In [None]:
y = coordinates[0]
x = coordinates[1]
print(x, y)

## Lists

A **list** is similar to a tuple - but with a key difference. With tuples, once created, they cannot be changed, i.e. they are immutable. But lists are mutable. You can add, delete or change elements within a list.  In Python, lists are written with square brackets **[]**

In [None]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen']
print(cities)

KNMI weather stations:https://www.knmi.nl/nederland-nu/weer/waarnemingen

In [None]:
#Another way of writing lists
KNMI_Stations=[
    "Lauwersoog",
    "Leeuwarden",
    "Twente",
    "De Bilt",
]

In [None]:
print(KNMI_Stations)

In [None]:
To access an individual value in the list we need to use the {term}`index` value. An index value is a number that refers to a given position in the list. Let’s check out some values in our lists as an example by printing out:

In [None]:
print(cities[1])

To get the value for the first item in the list, we must use index `0`

In [None]:
print(cities[0])

To find the value at the end of the list, we can print the value at index `-1`. To go further up the list in reverse, we can simply use larger negative numbers, such as index `-3`. Let's print out the values at these indices below.

In [None]:
print(KNMI_Stations[-1])
print(KNMI_Stations[-3])

Of course, you still need to keep the index values within their ranges. What happens if you check the value at index `-5`?

You can call the len() function with any Python object and it will calculates the size of the object.

In [None]:
print(len(cities))

### Modifying list values

Another nice feature of lists is that they are *mutable*, meaning that the values in a list that has been defined can be modified. Consider a list of the observation station types corresponding to the station names in the `cities` list.

In [None]:
cities[3]="Staveden"
print(cities)

We can also check the type of the cities list using the type() function.

In [None]:
print(type(cities))

We can add items to the list using the append() method

In [None]:
cities.append('Boston')
print(cities)

In [None]:
print(len(cities))

### Data types in lists

Lists can also store more than one type of data. Let’s consider a list where we can define the land use profile for each neighbourhood in Groningen. Before we create this list let's see how an instance would look like:

In [4]:
municipality = "Groningen"

In [5]:
neigh_code = "BU00140001"

In [6]:
neigh_name = "Binnenstad-Zuid"

In [7]:
total_area_ha = 59

In [8]:
built_up_area_ha = 51

In [45]:
recreation_area_ha = 3.1

Now that we have defined some of the Helsinki Kaivopuisto variables we can create the list

In [46]:
land_use = [municipality,neigh_code,neigh_name,total_area_ha,built_up_area_ha,recreation_area_ha]
print(land_use)

['Groningen', 'BU00140001', 'Binnenstad-Zuid', 59, 51, 3.1]


Here we have one list with 3 different types of data in it. We can confirm this using the type() function.

In [11]:
print(land_use(type)) ##it can happen! Let's do it again on the right way!

TypeError: 'list' object is not callable

In [12]:
print(type(land_use))

<class 'list'>


Let's check the types of values at indices 1 and 5

In [13]:
print(type(land_use[1]))

<class 'str'>


In [14]:
print(type(land_use[5]))

<class 'float'>


### Adding and removing values from lists

How can we delete the first values from the list?First let's print it.

In [15]:
print(land_use)

['Groningen', 'BU00140001', 'Binnenstad-Zuid', 59, 51, 3.1]


In [16]:
del(land_use[2])
print(land_use)

['Groningen', 'BU00140001', 59, 51, 3.1]


In [17]:
land_use.append('Utrecht')
print(land_use)

['Groningen', 'BU00140001', 59, 51, 3.1, 'Utrecht']


Keep in mind that append makes sense as a built in function only for lists and not other data types.

Delete repeated elements from a list (e.g Utrecht)

In [18]:
list(filter(('Utrecht').__ne__, land_use))

['Groningen', 'BU00140001', 59, 51, 3.1]

We can also find the index number of specific elements in the list. Lets try this for the value '59'.

In [19]:
land_use.index(59)

2

In [None]:
Print a list with an sole entry that corresponds to the number of features in the land_use list

In [37]:
lu_length=[len(land_use)]
print(lu_length)
print(type(lu_length))

[6]
<class 'list'>


In [39]:
lu_length2=[]
print(lu_length2)

#Join the two lists (+)
joint_list=lu_length + lu_length2
print(joint_list)

[]
[6]


## Dictionaries

In Python dictionaries are written with curly brackets **{}**. Dictionaries have *keys* and *values*. Their elements are ordered* (Python 3.7 and above), changeable and DO NOT allow duplicates. With lists, we can access each element by its index. But a dictionary makes it easy to access the element by name. Keys and values are separated by a colon **:**. 

In [2]:
data = {'city': 'San Francisco', 'population': 881549, 'coordinates': (-122.4194, 37.7749) }
print(data)

{'city': 'San Francisco', 'population': 881549, 'coordinates': (-122.4194, 37.7749)}


You can access an item of a dictionary by referring to its key name, inside square brackets.

In [3]:
print(data['city'])

San Francisco


In [4]:
print(type(data))

<class 'dict'>


## Exercise

From the dictionary below, how do you access the latitude and longitude values? print the latitude and longitude of new york city by extracting it from the dictionary below.

The expected output should look like below.

(40.661, -73.944)

In [20]:
nyc_data = {'city': 'New York', 'population': 8175133, 'coordinates': (40.661,-73.944) }

In [14]:
#print(nyc_data['coordinates'])

(40.661, -73.944)


More examples here: https://medium.com/@arun.verma8007/python-data-structures-56d4211ca5a7

# 3.String Operations

In [21]:
city = 'San Francisco'
print(len(city))

13


In [22]:
print(city.split())

['San', 'Francisco']


In [23]:
print(city.upper())

SAN FRANCISCO


In [24]:
city[0]

'S'

In [25]:
city[-1]

'o'

In [26]:
city[0:3]

'San'

In [None]:
city[4:]

## Escaping characters

Certain characters are special since they are by Python language itself. For example, the quote character **'** is used to define a string. What do you do if your string contains a quote character?

In Python strings, the backslash **\\** is a special character, also called the **escape** character. Prefixing any character with a backslash makes it an ordinary character. (Hint: Prefixing a backslash with a backshalsh makes it ordinary too!)

It is also used for representing certain whitespace characters, \\n is a newline, \\t is a tab etc.

Remove the # from the cell below and run it.

In [1]:
#my_string = 'It's a beautiful day!'

SyntaxError: invalid syntax (4251103408.py, line 1)

We can fix the error by spacing the single quote within the string.

In [9]:
my_string = 'It\'s a beautiful day!'
print(my_string)

It's a beautiful
 day!


In [None]:
Alternatively, you can also use double-quotes if your string contains a single-quote.

In [4]:
my_string = "It's a beautiful day!"
print(my_string)

It's a beautiful day!


What if our string contains both single and double quotes?

We can use triple-quotes! Enclosing the string in triple quotes ensures both single and double quotes are treated correctly.

In [15]:
latitude = '''37° 46' 26.2992" N'''
longitude = '''122° 25' 52.6692" W'''
print(latitude,longitude)

37° 46' 26.2992" N 122° 25' 52.6692" W


In [16]:
#What is I want to print comma-separated values?
print(latitude,longitude, sep=" , ")

37° 46' 26.2992" N , 122° 25' 52.6692" W


In [None]:
Backslashes pose another problem when dealing with Windows paths

In [39]:
path ='C:\Users\ujaval'
print(path)

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (227938026.py, line 1)

Prefixing a string with r makes is a Raw string. Which doesn't interpret backslash as a special character

In [19]:
path = r'C:\Users\ujaval'
print(path)

C:\Users\ujaval


## Printing Strings with the format method

Modern way of creating strings from variables is using the `format()` method and curly brackets.

In [21]:
city = 'Emmen'
population = 107856
output = 'Population of {} is: {}.'.format(city, population)
print(output)

Population of Emmen is: 107856.


In [None]:
You can also use the format method to control the precision of the numbers

In [22]:
latitude = 37.7749
longitude = -122.4194

coordinates = '{:.2f},{:.2f}'.format(latitude, longitude)
print(coordinates)

37.77,-122.42


## Exercise

Use the string slicing to extract and print the degrees, minutes and second parts of the string below. The output should be as follows

```
37
46
26.2992
```

In [None]:
latitude = '''37° 46' 26.2992"'''

In [33]:
a="37"
b="46"
c="26.2992"
print("latitude='''{}° {}' {}'''".format(a,b,c))

latitude='''37° 46' 26.2992'''


### F-String formatting

In [37]:
temp = 18.56789876
station_name = "Zwolle"
station_id = '0010'

# 1. F-string approach (recommended way)
info_text = f"The temperature at {station_name} station (ID: {station_id}) is {temp:.2f} Celsius."

info_text

'The temperature at Zwolle station (ID: 0010) is 18.57 Celsius.'

### Revision

![all_lists.png](attachment:20178178-42d2-4a8e-92dd-192924db32fa.png)

# 4.Loops and Conditionals

## For Loops

A for loop is used for iterating over a sequence. The sequence can be a list, a tuple, a dictionary, a set, or a string.


### for loop format

`for` loops in Python have the general form below.

```python
for variable in collection:
    do things with variable
```

Let's break down the code above to see some essential aspects of `for` loops:

1. The `variable` can be any name you like other than a [reserved keyword](https://docs.python.org/3/reference/lexical_analysis.html#keywords)
2. The statement of the `for` loop must end with a `:`
3. The code that should be executed as part of the loop must be indented beneath the `for` loop statement

    - The typical indentation is 4 spaces

4. There is no additional special word needed to end the loop, you simply change the indentation back to normal.

**Hint**: `for` loops are useful to repeat some part of the code a *finite* number of times.

In [40]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta']

for city in cities:
    print(city)

San Francisco
Los Angeles
New York
Atlanta


### Print in the same line

In [2]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta']

for city in cities:
    print(city, end=", ")

San Francisco, Los Angeles, New York, Atlanta, 

To iterate over a dictionary, you can call the `items()` method on it which returns a tuple of key and value for each item.

In [41]:
data = {'city': 'San Francisco', 'population': 881549, 'coordinates': (-122.4194, 37.7749) }

for x, y in data.items():
    print(x, y)

city San Francisco
population 881549
coordinates (-122.4194, 37.7749)


In [12]:
us_cities = [
    "Detroit",
    "Chicago",
    "Denver",
    "Boston",
    "Portland",
    "San Francisco",
    "Houston",
    "Orlando",
]
for city in us_cities:
    print(city)

Detroit
Chicago
Denver
Boston
Portland
San Francisco
Houston
Orlando


In [11]:
print(f'After the loop the name of the city is:{us_cities}')

After the loop the name of the city is:['Detroit', 'Chicago', 'Denver', 'Boston', 'Portland', 'San Francisco', 'Houston', 'Orlando']


### The range function
The built-in `range()` function allows you to create sequence of numbers that you can iterate over. When given an integer (whole number) as an argument, `range()` will produce a list of numbers with a length equal to the specified `number`.

In [44]:
for x in range(5):
    print(x)

0
1
2
3
4


In [6]:
for x in range(1, 10, 2):  #range(start, stop, step)
    print(x)

1
3
5
7
9


Using the documentation that is produced when you run `help(range)`, what values would you replace the `...` in the parentheses of the `range()` function with to have the following output printed to the screen?

```python
2
5
8
```

In [7]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |

### Looping over lists using index values

Since we already know how to find the length of a list using the `len()` function, we can now take advantage of this knowledge to make our `for` loops more flexible. Starting with the list of numbers below, let's use the `range()` function to loop over the list.

Why use index value to loop over a list?

- First of all, if you want to update individual values in a list you're likely going to need a loop that includes the index values. There are functions such as `enumerate()` that can help, but their use can be somewhat confusing for new programmers. 
- Second, in cases where you have multiple lists that are related to one another, it can be handy to use a loop with the index values to be able to access corresponding locations in each list. For this, let's consider an example with the two lists below.

# Conditionals

Python supports logical conditions such as equals, not equals, greater than etc. These conditions can be used in several ways, most commonly in *if statements* and loops.

An *if statement* is written by using the `if` keyword.

Note: A very common error that programmers make is to use *=* to evaluate a *equals to* condition. The *=* in Python means assignment, not equals to. Always ensure that you use the *==* for an equals to condition.

In [2]:
for city in cities:
    if city == 'Atlanta':
        print(city)

NameError: name 'cities' is not defined

You can use `else` keywords along with `if` to match elements that do not meet the condition

In [1]:
for city in cities:
    if city == 'Atlanta':
        print(city)
    else:
        print('This is not Atlanta')

NameError: name 'cities' is not defined

Python relies on indentation (whitespace at the beginning of a line) to define scope in the for loop and if statements. So make sure your code is properly indented.

You can evaluate a series of conditions using the `elif` keyword.

Multiple criteria can be combined using the `and` and `or` keywords.

In [2]:
cities_population = {
    'San Francisco': 881549,
    'Los Angeles': 3792621,
    'New York': 8175133,
    'Atlanta':498044
}

for city, population in cities_population.items():
    if population < 1000000:
        print('{} is a small city'.format(city))
    elif population > 1000000 and population < 5000000:
        print('{} is a big city'.format(city))
    else:
        print('{} is a mega city'.format(city))

San Francisco is a small city
Los Angeles is a big city
New York is a mega city
Atlanta is a small city


## Control Statements

A for-loop iterates over each item in the sequence. Sometimes is desirable to stop the execution, or skip certain parts of the for-loops. Python has special statements, `break`, `continue` and `pass`. 

A `break` statement will stop the loop and exit out of it

In [None]:
for city in cities:
    print(city)
    if city == 'Los Angeles':
        print('I found Los Angeles')
        break

A `continue` statement will skip the remaining part of the loop and go to the next iteration

In [None]:
for city in cities:
    if city == 'Los Angeles':
        continue
    print(city)

## Exercise

The Fizz Buzz challenge.

Write a program that prints the numbers from 1 to 100 and for multiples of 3 print **Fizz** instead of the number and for the multiples of 5 print **Buzz**. If it is divisible by both, print **FizzBuzz**.

So the output should be something like below

`1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, ...`

Breaking down the problem further, we need to create for-loop with following conditions

- If the number is a multiple of both 3 and 5 (i.e. 15), print FizzBuzz
- If the number is multiple of 3, print Fizz
- If the number is multiple of 5, print Buzz
- Otherwise print the number

Hint: See the code cell below. Use the modulus operator **%** to check if a number is divisible by another. `10 % 5` equals 0, meaning it is divisible by 5.


In [None]:
for x in range(1, 10):
    if x%2 == 0:
        print('{} is divisible by 2'.format(x))
    else:
        print('{} is not divisible by 2'.format(x))

![Anatomy of a function.](img/Function_anatomy-400.png)

# 5.Functions

A function is a block of organized, reusable code that can make your programs more effective, easier to read, and simple to manage.You can think functions as little self-contained programs that can perform a specific task that you can use repeatedly in your code. 

Functions are useful because they allow us to capture the logic of our code and we can run it with differnt inputs without having to write the same code again and again.

One of the basic principles in good programming is "do not repeat yourself".
In other words, you should avoid having duplicate lines of code in your scripts.
Functions are a good way to avoid such situations and they can save you a lot of time and effort as you don't need to tell the computer repeatedly what to do every time it does a common task, such as converting temperatures from Fahrenheit to Celsius.
During the course we have already used some functions such as the `print()` function which is a built-in function in Python.

A funtion is defined using the `def` keyword

```
def my_function():
    ....
    ....
    return something
```



![functions.png](attachment:f8fe9926-73fc-4b3f-a9eb-7cec2be130b3.png)

Above shown is a function definition that consists of the following components:

- Keyword `def`  that marks the start of the function header.
- A function name to uniquely identify the function.
- Parameters (arguments) through which we pass values to a function (optional).
- A colon (:) to mark the end of the function header.
- Documentation string (docstring) to describe what the function does (optional).
- One or more valid python statements that make up the function body. Statements must have the  same indentation level (usually 4 spaces).
- A return statement to return a value from the function (optional).

In [1]:
def greet(name):
    "This function greets to the person passed in as a parameter"
    print("Hello, " + name + ". Good morning!")
greet('Paul')

Hello, Paul. Good morning!


In [2]:
def greet(name):
    return 'Hello ' + name

print(greet('World'))

Hello World


In [5]:
name = 'Alice'
def example():
    print(name)
    name = 'Bob'
example()

UnboundLocalError: local variable 'name' referenced before assignment

"Local variable referenced before assignment" occurs when we reference a local variable before assigning a value to it in a function. To solve the error, mark the variable as global in the function definition, e.g. global my_var.

In [7]:
name = 'Alice'
def example():
    global name
    print(name)
    name = 'Bob'
example()
example()

Alice
Bob


## 5.1 Calling functions
Calling our self-defined function is no different from calling any other function such as print(). You need to call it with its name and provide your value(s) as the required parameter(s) inside the parentheses.

When we call the function, the values we pass to it are assigned to the corresponding parameter variables so that we can use them inside the function (e.g., the variable temp in this function example). Inside the function, we use a return statement to define the value that should be given back when the function is used, or called.

Defining a function does nothing other than make it available for use in our notebooks. In order to use the function we need to call it.

Now let’s try using our function. Calling our self-defined function is no different from calling any other function such as print(). You need to call it with its name and provide your value(s) as the required parameter(s) inside the parentheses. Here, we can define a variable freezing_point that is the temperature in degrees Fahrenheit we get when using our function with the temperature 0°C (the temperature at which water freezes). We can then print that value to confirm. We should get a temperature of 32°F.

In [13]:
def celsius_to_fahr(temp):
    return 9 / 5 * temp + 32

In [14]:
freezing_point = celsius_to_fahr(0.0)

In [15]:
print(f"The freezing point of water in Fahrenheit is {freezing_point}")

The freezing point of water in Fahrenheit is 32.0


We can do the same thing with the boiling point of water in degrees Celsius (100°C). Just like with other functions, we can use our new function directly within something like the print() function to print out the boiling point of water in degrees Fahrenheit.

In [16]:
print(f"The boiling point of water in Fahrenheit is {celsius_to_fahr(100)}")

The boiling point of water in Fahrenheit is 212.0


Let’s use it in the same way as the earlier one by defining a new variable absolute_zero that is the Celsius temperature of 0 Kelvins. Note that we can also use the parameter name temp_kelvins when calling the function to explicitly state which variable value is being used. Again, let’s print the result to confirm everything works.

In [18]:
absolute_zero = kelvins_to_celsius(temp_kelvins=0)

In [19]:
def kelvins_to_celsius(temp_kelvins):
    return temp_kelvins - 273.15
    absolute_zero = kelvins_to_celsius(temp_kelvins=0)
print(f"Absolute zero in Celsius is {absolute_zero}")

Absolute zero in Celsius is -273.15


Functions can take multiple arguments. Let's write a function to convert coordinates from degrees, minutes, seconds to decimal degrees. This conversion is needed quite often when working with data collected from GPS devices.

- 1 degree is equal to 60 minutes
- 1 minute is equal to 60 seconds (3600 seconds)

To calculate decimal degrees, we can use the formula below:

If degrees are positive:

`Decimal Degrees = degrees + (minutes/60) + (seconds/3600)`

If degrees are negative

`Decimal Degrees = degrees - (minutes/60) - (seconds/3600)`

In [None]:
def dms_to_decimal(degrees, minutes, seconds):
    if degrees < 0:
        result = degrees - minutes/60 - seconds/3600
    else:
        result = degrees + minutes/60 + seconds/3600
    return result

In [None]:
output = dms_to_decimal(10, 10, 10)
print(output)

## Combining for-loops and conditional statements

Finally, we can also combine for-loops and conditional statements. Let's iterate over a list of temperatures, and check whether the temperature is hot or not.

In [5]:
temperatures = [0, 28, 12, 17, 30]

# For each temperature, if the temperature is greater than 25, print "..is hot"

## Exercise

Given a coordinate string with value in degree, minutes and seconds, convert it to decimal degrees by calling the `dms_to_decimal` function.

In [None]:
def dms_to_decimal(degrees, minutes, seconds):
    if degrees < 0:
        result = degrees - minutes/60 - seconds/3600
    else:
        result = degrees + minutes/60 + seconds/3600
    return result

coordinate = '''37° 46' 26.2992"'''

# Add the code below to extract the parts from the coordinate string,
# call the function to convert to decimal degrees and print the result
# The expected answer is 37.773972

In [None]:
# Hint: Converting strings to numbers
# When you extract the parts from the coordinate string, they are strings
# You will need to use the built-in int() / float() functions to
# convert them to numbers
x = '25'
print(x, type(x))
y = int(x)
print(y, type(y))

# 6.The Python Standard Library

Python comes with many built-in modules that offer ready-to-use solutions to common programming problems. To use these modules, you must use the `import` keyword. Once imported in your Python script, you can use the functions provided by the module in your script.

We will use the built-in `math` module that allows us to use advanced mathematical functions.

In [3]:
import math

In [None]:
#Explain more in details about Anaconda (pip, conda env) - libraries in general (geopandas)

You can also import specific functions or constants from the module like below

In [None]:
from math import pi
print(pi)

## Calculating Distance

Given 2 points with their Latitude and Longitude coordinates, the Haversine Formula calculates the straight-line distance in meters, assuming that Earth is a sphere.

The formula is simple enough to be implemented in a spreadsheet too. If you are curious, see [my post](https://spatialthoughts.com/2013/07/06/calculate-distance-spreadsheet/) about using this formula for calculating distances in a spreadsheet.

We can write a function that accepts a pair of origin and destination coordinates and computes the distance.

In [None]:
san_francisco = (37.7749, -122.4194)
new_york = (40.661, -73.944)

In [None]:
def haversine_distance(origin, destination):
  lat1, lon1 = origin
  lat2, lon2 = destination
  radius = 6371000
  dlat = math.radians(lat2-lat1)
  dlon = math.radians(lon2-lon1)
  a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
    * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
  c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
  distance = radius * c
  return distance

In [None]:
distance = haversine_distance(san_francisco, new_york)
print(distance/1000, 'km')

## Discover Python Easter Eggs

Programmers love to hide secret jokes in their programs for gun. These are known as *Easter Eggs*. Python has an easter egg that you can see when you try to import the module named `this`. Try writing the command `import this` below.

In [None]:
import this

Let's try one more. Try importing the `antigravity` module.

Here's a complete list of [easter eggs in Python](https://towardsdatascience.com/7-easter-eggs-in-python-7765dc15a203).

## Exercise

Find the coordinates of 2 cities near you and calculate the distance between them by calling the `haversine_distance` function below.

In [None]:
def haversine_distance(origin, destination):
  lat1, lon1 = origin
  lat2, lon2 = destination
  radius = 6371000
  dlat = math.radians(lat2-lat1)
  dlon = math.radians(lon2-lon1)
  a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
    * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
  c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
  distance = radius * c
  return distance

# city1 = (lat1, lng1)
# city2 = (lat2, lng2)
# call the function and print the result

# 7.Third-party Modules

Python has a thriving ecosystem of third-party modules (i.e. libraries or packages) available for you to install. There are hundreds of thousands of such modules available for you to install and use.

## Installing third-party libraries

Python comes with a package manager called `pip`. It can install all the packages listed at [PyPI (Python Package Index)](https://pypi.org/). To install a package using pip, you need to run a command like following in a Terminal or CMD prompt.

`pip install <package name>`

For this course, we are using Anancoda platform - which comes with its own package manager called `conda`. You can use Anaconda Navigator to search and install packages. Or run the command like following in a Terminal or CMD Prompt.

`conda install <package name>`

See this [comparison of pip and conda](https://www.anaconda.com/blog/understanding-conda-and-pip) to understand the differences.

## Calculating Distance

We have already installed the `geopy` package in our environment. `geopy` comes with functions that have already implemented many distance calculation formulae.

- `distance.great_circle()`: Calculates the distance on a great circle using haversine formula
- `distance.geodesic()`: Calculates the distance using a chosen ellipsoid using vincenty's formula

In [None]:
from geopy import distance

san_francisco = (37.7749, -122.4194)
new_york = (40.661, -73.944)

straight_line_distance = distance.great_circle(san_francisco, new_york)
ellipsoid_distance = distance.geodesic(san_francisco, new_york, ellipsoid='WGS-84')

print(straight_line_distance, ellipsoid_distance)

## Exercise

Repeat the distance calculation exercise from the previous module but perform the calculation using the geopy library.

In [None]:
from geopy import distance

# city1 = (lat1, lng1)
# city2 = (lat2, lng2)
# call the geopy distance function and print the great circle and ellipsoid distance

# 8.Using Web APIs

An API, or Application Program Interface, allows one program to *talk* to another program. Many websites or services provide an API so you can query for information in an automated way. 

For mapping and spatial analysis, being able to use APIs is critical. For the longest time, Google Maps API was the most popular API on the web. APIs allow you to query web servers and get results without downloading data or running computation on your machine. 

Common use cases for using APIs for spatial analysis are

- Getting directions / routing
- Route optimization
- Geocoding
- Downloading data
- Getting real-time weather data
- ...

The provide of such APIs have many ways to implement an API. There are standards such as REST, SOAP, GraphQL etc. *REST* is the most populat standard for web APIs, and for geospatial APIs. REST APIs are used over HTTP and thus called web APIs.

## Understanding JSON and GeoJSON

JSON stands for **J**ava**S**cript **O**bject **N**otation. It is a format for storing and transporting data, and is the de-facto standard for data exchanged by APIs. GeoJSON is an extension of the JSON format that is commonly used to represent spatial data.

Python has a built-in `json` module that has methods for reading json data and converting it to Python objects, and vice-versa. In this example, we are using the `requests` module for querying the API which conveniently does the conversion for us. But it is useful to learn the basics of working with JSON in Python.

In [None]:
geojson_string = '''
{
  "type": "FeatureCollection",
  "features": [
    {"type": "Feature",
      "properties": {"name": "San Francisco"},
      "geometry": {"type": "Point", "coordinates": [-121.5687, 37.7739]}
    }
  ]
}
'''
print(geojson_string)

In [None]:
To convert a JSON string to a Python object (i.e. parsing JSON), we can use the `json.loads()` method.

In [None]:
import json

data = json.loads(geojson_string)
print(type(data))
print(data)

Now that we have parsed the GeoJSON string and have a Python object, we can extract infromation from it. The data is stored in a FeatureCollection - which is a list of features. In our example, we have just 1 feature inside the feature collection, so we can access it by using index **0**.

In [None]:
city_data = data['features'][0]
print(city_data)

## The `requests` module

To query a server, we send a **GET** request with some parameters and the server sends a response back. The `requests` module allows you to send HTTP requests and parse the responses using Python. 

The response contains the data received from the server. It contains the HTTP *status_code* which tells us if the request was successful. HTTP code 200 stands for *Sucess OK*.

In [None]:
import requests

response = requests.get('https://www.spatialthoughts.com')

print(response.status_code)

## Calculating Distance using OpenRouteService API

[OpenRouteService (ORS)](https://openrouteservice.org/) provides a free API for routing, distance matrix, geocoding, route optimization etc. using OpenStreetMap data. We will learn how to use this API through Python and get real-world distance between cities.

Almost all APIs require you to sign-up and obtain a *key*. The *key* is used to identify you and enforce usage limits so that you do not overwhelm the servers. We will obtain a key from OpenRouteServie so we can use their API

Visit [OpenRouteService Sign-up page](https://openrouteservice.org/dev/#/signup) and create an account. Once your account is activated, visit your Dashboard and request a token. Select *Standard* as the Token type and enter ``python_foundation`` as the Token name. Click *CREATE TOKEN*. Once created, copy the long string displayed under Key and enter below.

ORS_API_KEY = '<replace this with your key>'

We will use the OpenRouteServices's [Directions Service](https://openrouteservice.org/dev/#/api-docs/v2/directions/{profile}/get). This service returns the driving, biking or walking directions between the given origin and destination points.

In [None]:
import requests

san_francisco = (37.7749, -122.4194)
new_york = (40.661, -73.944)

parameters = {
    'api_key': ORS_API_KEY,
    'start' : '{},{}'.format(san_francisco[1], san_francisco[0]),
    'end' : '{},{}'.format(new_york[1], new_york[0])
}

response = requests.get(
    'https://api.openrouteservice.org/v2/directions/driving-car', params=parameters)

if response.status_code == 200:
    print('Request successful.')
    data = response.json()
else:
    print('Request failed.')

We can read the `response` in JSON format by calling `json()` method on it.

In [None]:
data = response.json()

The response is a GeoJSON object representing the driving direction between the 2 points. The object is a feature collection with just 1 feature. We can access it using the index **0**. The feature's property contains `summary` information which has the data we need. 

In [None]:
summary = data['features'][0]['properties']['summary']
print(summary)

We can extract the `distance` and convert it to kilometers.

In [None]:
distance = summary['distance']
print(distance/1000)

You can compare this distance to the straight-line distance and see the difference.

# 9.Reading Files

Python provides built-in functions for reading and writing files.  

To read a file, we must know the path of the file on the disk. Python has a module called `os` that has helper functions that helps dealing with the the operating system. Advantage of using the `os` module is that the code you write will work without change on any suppored operating systems.

In [None]:
import os

To open a file, we need to know the path to the file. We will now open and read the file `worldcitites.csv` located in your data package. In your data package the data folder is in the `data/` directory. We can construct the relative path to the file using the `os.path.join()` method.

In [None]:
data_pkg_path = 'data'
filename = 'worldcities.csv'
path = os.path.join(data_pkg_path, filename)
print(path)

To open the file, use the built-in `open()` function. We specify the *mode* as `r` which means read-only. If we wanted to change the file contents or write a new file, we would open it with `w` mode.

Our input file also contains Unicode characters, so we specify `UTF-8` as the encoding.

The open() function returns a file object. We can call the  `readline()` method for reading the content of the file, one line at a time.

It is a good practice to always close the file when you are done with it. To close the file, we must call the `close()` method on the file object.

In [None]:
f = open(path, 'r', encoding='utf-8')
print(f.readline())
print(f.readline())
f.close()

Calling `readline()` for each line of the file is tedious. Ideally, we want to loop through all the lines in file. You can iterate through the file object like below.

We can loop through each line of the file and increase the `count` variable by 1 for each iteration of the loop. At the end, the count variable's value will be equal to the number of lines in the file.

In [None]:
f = open(path, 'r', encoding='utf-8')

count = 0
for line in f:
    count += 1
f.close()
print(count)

## Exercise

Print first 5 lines of the file. 

- Hint: Use break statement

In [None]:
import os
data_pkg_path = 'data'
filename = 'worldcities.csv'
path = os.path.join(data_pkg_path, filename)

# Add code to open the file and read first 5 lines

# 10.Reading CSV Files

Comma-separated Values (CSV) are the most common text-based file format for sharing geospatial data. The structure of the file is 1 data record per line, with individual *columns* separated by a comma. 

In general, the separator character is called a delimiter. Other popular delimiters include the tab (\\t), colon (:) and semi-colon (;) characters. 

Reading CSV file properly requires us to know which delimiter is being used, along with quote character to surround the field values that contain space of the delimiter character. Since reading delimited text file is a very common operation, and can be tricky to handle all the corner cases, Python comes with its own library called `csv` for easy reading and writing of CSV files. To use it, you just have to import it.

In [None]:
import csv

The preferred way to read CSV files is using the `DictReader()` method. Which directly reads each row and creates a dictionary from it - with column names as *key* and column values as *value*. Let's see how to read a file using the `csv.DictReader()` method.

In [None]:
import os
data_pkg_path = 'data'
filename = 'worldcities.csv'
path = os.path.join(data_pkg_path, filename)

In [None]:
f = open(path, 'r')
csv_reader = csv.DictReader(f, delimiter=',', quotechar='"')
print(csv_reader)
f.close()

## Using `enumerate()` function

When iterating over an object, many times we need a counter. We saw in the previous example, how to use a variable like `count` and increase it with every iteration. There is an easy way to do this using the built-in `enumerate()` function.

In [None]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta']
for x in enumerate(cities):
    print(x)

We can use enumerate() on any iterable object and get a tuple with an index and the iterable value with each iteration. Let's use it to print the first 5 lines from the DictReader object.

In [None]:
f = open(path, 'r', encoding='utf-8')
csv_reader = csv.DictReader(f, delimiter=',', quotechar='"')
for index, row in enumerate(csv_reader):
    print(row)
    if index == 4:
        break
f.close()

## Using `with` statement

The code for file handling requires we open a file, do something with the file object and then close the file. That is tedious and it is possible that you may forget to call `close()` on the file. If the code for processing encounters an error the file is not closed property, it may result in bugs - especially when writing files.

The preferred way to work with file objects is using the `with` statement. It results in simpler and cleaer code - which also ensures file objects are closed properly in case of errors.

As you see below, we open the file and use the file object `f` in a `with` statement. Python takes care of closing the file when the execution of code within the statement is complete.

In [None]:
with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)

## Filtering rows

We can use conditional statement while iterating over the rows, to select and process rows that meet certain criterial. Let's count how many cities from a particular country are present in the file.

Replace the `home_country` variable with your home country below.

In [None]:
home_country = 'India'
num_cities = 0

with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)

    for row in csv_reader:
        if row['country'] == home_country:
            num_cities += 1
            
print(num_cities)

# Extra material

## Calculating distance

Let's apply the skills we have learnt so far to solve a complete problem. We want to read the `worldcities.csv` file, find all cities within a home country, calculate the distance to each cities from a home city and write the results to a new CSV file.

First we find the coordinates of the out selected `home_city` from the file. Replace the `home_city` below with your hometown or a large city within your country. Note that we are using the `city_ascii` field for city name comparison, so make sure the `home_city` variable contains the ASCII version of the city name.

In [None]:
home_city = 'Bengaluru'

home_city_coordinates = ()

with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        if row['city_ascii'] == home_city:
            lat = row['lat']
            lng = row['lng']
            home_city_coordinates = (lat, lng)
            break
        
print(home_city_coordinates)

Now we can loop through the file, find a city in the chosen home country and call the `geopy.distance.geodesic()` function to calculate the distance. In the code below, we are just computing first 5 matches.

In [None]:
from geopy import distance

counter = 0
with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        if (row['country'] == home_country and
            row['city_ascii'] != home_city):
            city_coordinates = (row['lat'], row['lng'])
            city_distance = distance.geodesic(
                city_coordinates, home_city_coordinates).km
            print(row['city_ascii'], city_distance)
            counter += 1
            
        if counter == 5:
            break

## Writing files

Instead of printing the results, let's write the results to a new file. Similar to csv.DictReader(), there is a companion `csv.DictWriter()` method to write files. We create a `csv_writer` object and then write rows to it using the `writerow()` method.

First we create an `output` folder to save the results. We can first check if the folder exists and if it doesn't exist, we can create it.

In [None]:
output_dir = 'output'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

In [None]:
output_filename = 'cities_distance.csv'
output_path = os.path.join(output_dir, output_filename)

with open(output_path, mode='w', encoding='utf-8') as output_file:
    fieldnames = ['city', 'distance_from_home']
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    
    # Now we read the input file, calculate distance and
    # write a row to the output 
    with open(path, 'r', encoding='utf-8') as f:
        csv_reader = csv.DictReader(f)
        for row in csv_reader:
            if (row['country'] == home_country and
                row['city_ascii'] != home_city):
                city_coordinates = (row['lat'], row['lng'])
                city_distance = distance.geodesic(
                    city_coordinates, home_city_coordinates).km
                csv_writer.writerow(
                    {'city': row['city_ascii'],
                     'distance_from_home': city_distance}
                )

Below is the complete code for our task of reading a file, filtering it, calculating distance and writing the results to a file.

In [None]:
import csv
import os
from geopy import distance

data_pkg_path = 'data'
input_filename = 'worldcities.csv'
input_path = os.path.join(data_pkg_path, input_filename)
output_filename = 'cities_distance.csv'
output_dir = 'output'
output_path = os.path.join(output_dir, output_filename)

if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    
home_city = 'Bengaluru'
home_country = 'India'

with open(input_path, 'r', encoding='utf-8') as input_file:
    csv_reader = csv.DictReader(input_file)
    for row in csv_reader:
        if row['city_ascii'] == home_city:
            home_city_coordinates = (row['lat'], row['lng'])
            break

with open(output_path, mode='w') as output_file:
    fieldnames = ['city', 'distance_from_home']
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()

    with open(input_path, 'r', encoding='utf-8') as input_file:
        csv_reader = csv.DictReader(input_file)
        for row in csv_reader:
            if (row['country'] == home_country and
                row['city_ascii'] != home_city):
                city_coordinates = (row['lat'], row['lng'])
                city_distance = distance.geodesic(
                    city_coordinates, home_city_coordinates).km
                csv_writer.writerow(
                    {'city': row['city_ascii'],
                     'distance_from_home': city_distance}
                )
print('Successfully written output file at {}'.format(output_path))

## Exercise

Replace the `home_city` and `home_country` variables with your own home city and home country and create a CSV file containing distance from your home city to every other city in your country.

## -End of document-