# Weeks (1-2):Basic elements of Python

## General information
This is a [Jupyter Notebook](https://jupyter.org/). This particular notebook is designed to introduce you to a few of the basic concepts of programming in Python. The contents of this document are divided into cells, which can contain Markdown-formatted text, Python code, or raw text. You can execute a snippet of code in a cell by pressing **Shift-Enter**. Try this out with the examples below.

You can follow the lesson eithe by opening Jupyter Notebook from your computer or through Binder.<br/>
<a href="https://mybinder.org/v2/gh/RUG-Elective/Notebooks_official/main?labpath=Week_1_2%2FSection%25201.ipynb"><img alt="Binder badge" src="https://img.shields.io/badge/launch-binder-red.svg" style="vertical-align:text-bottom"></a>


When learning a new programming language, it is customary to first learn how to print 'Hello World!'. While a bit quirky, it is a useful first step to know how to send input to the program and where to see the output. In Python, you can use the built-in ``print()`` function to print the greeting ``"Hello World!"``

In [101]:
print('Hello World!')

Hello World!


In [6]:
name = 'Lakis'
print("\r") #cArriage return (https://www.pythonpool.com/carriage-return-python/)
print("Hello {}, enjoy learning more about Python and computational social science!".format(name)) 


Hello Lakis, enjoy learning more about Python and computational social science!


# 1.Variables

## Strings

##### A string is a sequence of letters, numbers, and punctuation marks - or commonly known as text
##### In Python you can create a string by typing letters between single or double quotation marks.

In [99]:
3.14/3

1.0466666666666666

In [102]:
city = 'Almere'
province = 'Flevoland'
print(city,province)

Almere Flevoland


Concatenate strings together using the "+" sign

In [10]:
print(city + province)

AlmereFlevoland


In [11]:
print(city + ',' + province)

Almere,Flevoland


## Numbers

Python can handle several types of numbers, but the two most common are:

- **int**, which represents integer values like 100, and
- **float**, which represents numbers that have a fraction part, like 0.5

In [15]:
population = 881549
latitude = 37.7739
longitude = -121.5687

In [106]:
print(type(population))

<class 'int'>


In [19]:
print(type(latitude))

<class 'float'>


In [104]:
print(type(longitude))
#something is missing...

<class 'str'>


In [22]:
elevation_feet = 934
elevation_meters = elevation_feet * 0.3048
print(elevation_meters)

284.6832


In [111]:
area_sqmi = 46.89
density = population / area_sqmi
print(density)

18800.362550650458


In [114]:
#Rounding to two decimals
round(density,2)

18800.36

In [115]:
#Adding thousands separator
f'{188888800.36:,}'

'188,888,800.36'

In [119]:
print(type(18800.36))

<class 'float'>


## Exercise
We have a variable named `distance_km` below with the value `5940` - indicating the straight-line distance between Groningen and New York in Kilometers. Create another variable called `distance_mi` and store the distance value in miles.

- Hint1: 1 mile = 1.60934 kilometers

Add the code in the cell below and run it. The output should be 3690.95

In [122]:
distance_km = 5940
distance_mi=distance_km/1.60934
print(distance_mi)
round(distance_mi,2)
# Remove this line and add code here

3690.9540556998522


3690.95

## 2.Selecting "good" variable names

Some "not-so-good" variable names.
To illustrate the point, consider a few not-so-good examples below.

In [30]:
p = "101533"

or

In [31]:
p_id = "101533"

Any idea what these variables are for? Of course not, the variables ``p`` and ``p_id`` are too short and cannot communicate what they should be used for in the code. You might think you know what they are for now, but imagine not looking at the code for a few months. Would you know then? What about if you share the code with a friend? Would they know? Probably not.

Let's look at another example.

In [32]:
dutchmeteorlogicalinstituteobservationstationidentificationnumber = "101533"

OK, so now we have another issue. The variable name potentially provides more information about what the variable represents (the identification number of an FMI observation station), but it does so in a format that is not easy to read, nor something you're likely to want to type more than once. The previous example was too short, and now we have a variable name that is too long (and hard to read as a result).

### What a "good" variable name is

A good variable name should:

1. Be clear and concise. 

2. Be written in English. A general coding practice is to write code with variable names in English, as that is the most likely common language between programmers.

3. Not contain special characters. Python supports use of special characters by way of various encoding options that can be given in a program. That said, it is better to avoid variables such as ``global`` because encoding issues can arise in some cases. Better to stick to the [standard printable ASCII character set](https://en.wikipedia.org/wiki/ASCII#Printable_characters) to be safe.

4. Not conflict with any [Python keywords](https://www.pythonforbeginners.com/basics/keywords-in-python), such as ``for``, ``True``, ``False``, ``and``, ``if``, or ``else``. These are reserved for special operations in Python and cannot be used as variable names.See the full list of reserved words here: [Python keywords list](https://www.w3schools.in/python/keywords)


With this in mind, let's now look at a few better options for variable names.

## Formatting "good" variable names

There are several possibilities for "good" variable name formats, of which we'll consider two:

### Recommendation: user_id naming

*user_id* uses lowercase words separated by underscores ``_``. This is our suggested format as the underscores make it easy to read the variable, and don't add too much to the length of the variable name. As an example, consider the variable ``temp_celsius``. 

### Python is case sensitive!

In [33]:
user_name = 'User1'

In [34]:
print(User_name)

NameError: name 'User_name' is not defined

The code above causes an error because of the inconsistency in upper- and lowercase letters in the variable name. Let's fix this.

In [35]:
print(user_name)

User1


## 2.Data Structures

### Tuples

A *tuple* is a sequence of objects. It can have any number of objects inside. In Python tuples are written with round brackets **()**. 

In [123]:
latitude = 37.7739
longitude = -121.5687
coordinates = (latitude, longitude)
print(coordinates)

(37.7739, -121.5687)


You can access each item by its position, i.e. *index*. In programming, the counting starts from 0. So the first item has an index of 0, the second item an index of 1 and so now. The index has to be put inside square brackets **[]**.

In [125]:
y = coordinates[0]
x = coordinates[1]
print(x, y)

-121.5687 37.7739


In [39]:
print(coordinates[0])

37.7739


## Lists

A **list** is similar to a tuple - but with a key difference. With tuples, once created, they cannot be changed, i.e. they are immutable. But lists are mutable. You can add, delete or change elements within a list.  In Python, lists are written with square brackets **[]**

In [127]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen']
print(cities)

['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen']


KNMI weather stations:https://www.knmi.nl/nederland-nu/weer/waarnemingen

In [128]:
#Another way of writing lists
KNMI_Stations=[
    "Lauwersoog",
    "Leeuwarden",
    "Twente",
    "De Bilt",
]

In [129]:
print(KNMI_Stations)

['Lauwersoog', 'Leeuwarden', 'Twente', 'De Bilt']


To access an individual value in the list we need to use the {term}`index` value. An index value is a number that refers to a given position in the list. Let’s check out some values in our lists as an example by printing out:

In [130]:
print(cities[1])

Los Angeles


To get the value for the first item in the list, we must use index `0`

In [131]:
print(cities[0])

San Francisco


To find the value at the end of the list, we can print the value at index `-1`. To go further up the list in reverse, we can simply use larger negative numbers, such as index `-3`. Let's print out the values at these indices below.

In [134]:
print(KNMI_Stations[-1])
print(KNMI_Stations[-3])

De Bilt
Leeuwarden


Of course, you still need to keep the index values within their ranges. What happens if you check the value at index `-5`?

You can call the len() function with any Python object and it will calculates the size of the object.

In [135]:
print(len(cities))

5


### Modifying list values

Another nice feature of lists is that they are *mutable*, meaning that the values in a list that has been defined can be modified. Consider a list of the observation station types corresponding to the station names in the `cities` list.

In [136]:
cities[3]="Staveden"
print(cities)

['San Francisco', 'Los Angeles', 'New York', 'Staveden', 'Groningen']


We can also check the type of the cities list using the type() function.

In [49]:
print(type(cities))

<class 'list'>


We can add items to the list using the append() method

In [50]:
cities.append('Boston')
print(cities)

['San Francisco', 'Los Angeles', 'New York', 'Staveden', 'Groningen', 'Boston']


In [51]:
print(len(cities))

6


### Data types in lists

Lists can also store more than one type of data. Let’s consider a list where we can define the land use profile for each neighbourhood in Groningen. Before we create this list let's see how an instance would look like:

In [52]:
municipality = "Groningen"

In [53]:
neigh_code = "BU00140001"

In [54]:
neigh_name = "Binnenstad-Zuid"

In [55]:
total_area_ha = 59

In [56]:
built_up_area_ha = 51

In [57]:
recreation_area_ha = 3.1

Now that we have defined some of the variables we can create the list

In [58]:
land_use = [municipality,neigh_code,neigh_name,total_area_ha,built_up_area_ha,recreation_area_ha]
print(land_use)

['Groningen', 'BU00140001', 'Binnenstad-Zuid', 59, 51, 3.1]


Here we have one list with 3 different types of data in it. We can confirm this using the type() function.

In [59]:
print(land_use(type)) ##it can happen! Let's do it again on the right way!

TypeError: 'list' object is not callable

In [60]:
print(type(land_use))

<class 'list'>


Let's check the types of values at indices 1 and 5

In [61]:
print(type(land_use[1]))

<class 'str'>


In [62]:
print(type(land_use[5]))

<class 'float'>


### Adding and removing values from lists

How can we delete the first values from the list?First let's print it.

In [63]:
print(land_use)

['Groningen', 'BU00140001', 'Binnenstad-Zuid', 59, 51, 3.1]


In [64]:
del(land_use[2])
print(land_use)

['Groningen', 'BU00140001', 59, 51, 3.1]


In [65]:
land_use.append('Utrecht')
print(land_use)

['Groningen', 'BU00140001', 59, 51, 3.1, 'Utrecht']


Keep in mind that append makes sense as a built in function only for lists and not other data types.

Delete repeated elements from a list (e.g Utrecht)

In [67]:
list(filter(('Utrecht').__ne__, land_use))

['Groningen', 'BU00140001', 59, 51, 3.1]

We can also find the index number of specific elements in the list. Lets try this for the value '59'.

In [68]:
land_use.index(59)

2

Print a list with an sole entry that corresponds to the number of features in the land_use list

In [70]:
lu_length=[len(land_use)]
print(lu_length)
print(type(lu_length))

[6]
<class 'list'>


In [71]:
lu_length2=[]
print(lu_length2)

#Join the two lists (+)
joint_list=lu_length + lu_length2
print(joint_list)

[]
[6]


## Dictionaries

In Python dictionaries are written with curly brackets **{}**. Dictionaries have *keys* and *values*. Their elements are ordered* (Python 3.7 and above), changeable and DO NOT allow duplicates. With lists, we can access each element by its index. But a dictionary makes it easy to access the element by name. Keys and values are separated by a colon **:**. 

In [72]:
data = {'city': 'San Francisco', 'population': 881549, 'coordinates': (-122.4194, 37.7749) }
print(data)

{'city': 'San Francisco', 'population': 881549, 'coordinates': (-122.4194, 37.7749)}


You can access an item of a dictionary by referring to its key name, inside square brackets.

In [77]:
print(data['city'])

San Francisco


In [78]:
print(data['San Francisco']) #not a key name

KeyError: 'San Francisco'

In [74]:
print(type(data))

<class 'dict'>


## Exercise

From the dictionary below, how do you access the latitude and longitude values? print the latitude and longitude of new york city by extracting it from the dictionary below.

The expected output should look like below.

(40.661, -73.944)

In [79]:
nyc_data = {'city': 'New York', 'population': 8175133, 'coordinates': (40.661,-73.944) }

In [81]:
#print(nyc_data['coordinates'])

(40.661, -73.944)


More examples here: https://medium.com/@arun.verma8007/python-data-structures-56d4211ca5a7

# 3.String Operations

In [82]:
city = 'San Francisco'
print(len(city))

13


In [83]:
print(city.split())

['San', 'Francisco']


In [84]:
print(city.upper())

SAN FRANCISCO


In [85]:
city[0]

'S'

In [86]:
city[-1]

'o'

In [87]:
city[0:3]

'San'

In [None]:
city[4:]

## Escaping characters

Certain characters are special since they are by Python language itself. For example, the quote character **'** is used to define a string. What do you do if your string contains a quote character?

In Python strings, the backslash **\\** is a special character, also called the **escape** character. Prefixing any character with a backslash makes it an ordinary character. (Hint: Prefixing a backslash with a backshalsh makes it ordinary too!)

It is also used for representing certain whitespace characters, \\n is a newline, \\t is a tab etc.

Remove the # from the cell below and run it.

In [88]:
my_string = 'It's a beautiful day!'

SyntaxError: invalid syntax (4251103408.py, line 1)

We can fix the error by spacing the single quote within the string.

In [89]:
my_string = 'It\'s a beautiful day!'
print(my_string)

It's a beautiful day!


Alternatively, you can also use double-quotes if your string contains a single-quote.

In [91]:
my_string = "It's a beautiful day!"
print(my_string)

It's a beautiful day!


What if our string contains both single and double quotes?

We can use triple-quotes! Enclosing the string in triple quotes ensures both single and double quotes are treated correctly.

In [92]:
latitude = '''37° 46' 26.2992" N'''
longitude = '''122° 25' 52.6692" W'''
print(latitude,longitude)

37° 46' 26.2992" N 122° 25' 52.6692" W


## Exercise
Print three string statements of your choice with the following characteristics:
1) one with single quotes
2) one with double quotes
3) one with triple quotes

In [12]:
#What is you want to print comma-separated values?
print(latitude,longitude, sep=" , ")

NameError: name 'latitude' is not defined

Backslashes pose another problem when dealing with Windows paths

In [95]:
path ='C:\Users\ujaval'
print(path)

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (227938026.py, line 1)

Prefixing a string with r makes is a Raw string. Which doesn't interpret backslash as a special character

In [96]:
path = r'C:\Users\ujaval'
print(path)

C:\Users\ujaval


## Combining text and numbers

A common way to combine character strings is using the addition operator +. Let’s create a text string in the variable station_name_and_id that is the combination of the station_name and station_id variables. Once we define station_name_and_id, we can print it to the screen to see the result.

In [6]:
municipality = "Groningen"
neigh_code = "BU00140001"
neigh_name = "Binnenstad-Zuid"
total_area_ha = 59

We can explore the different types of data stored in variables using the `type()` function.
Let's use the cells below to check the data types of the variables: neigh_code, neigh_name and total_area_ha. 

**Hint**: Remember, the data types are important because some are not compatible with one another.

What happens when you try to add the variables `neigh_name` and `neigh_code` in the cell below?

In [4]:
neigh_name_code=neigh_name + ":" + neigh_code
neigh_name_code

'Binnenstad-Zuid:BU00140001'

Now let's try to add the variables `neigh_name` and `total_area_ha`

In [5]:
neigh_total_ha=neigh_name + ":" + total_area_ha

TypeError: can only concatenate str (not "int") to str

we get a Type Error because in order to combine a character string with a number we need to perform a *data type conversion* to make them compatible. Let's convert `total_area_ha` to a character string using the `str()` function and name it neigh_total_ha_str.

In [6]:
#str function

We can confirm the type has changed by checking the type of `neigh_total_ha_str`, or by checking the output when you type the name of the variable into a cell and run it.

In [None]:
#type function

Now let's calculate the `neigh_total_ha` again

In [8]:
#neigh_total_ha=

## Working with text (and numbers)
Let's see a few useful techniques that make manipulating strings easier and more efficient.
There are three approaches that can be used to manipulate strings in Python:

1. f-strings
2. the `.format()` method
3. using the `%` operator

The f-string approach is recommended and is the most modern, introduced in Python 3.6. However, since you are likely to find examples of the older approaches we also show how they work.

### a) f-String formatting

In [27]:
# Driving distance from Groningen to London
cityA='Groningen'
cityB='London'
dist=682.62

In [None]:
# 1. The f-string approach

In [21]:
info_dist= f"The driving distance from {cityA} to {cityB} is {dist:.1f} km."
print(info_dist)

The driving distance from Groningen to London is 682.6 km.


A nice example in the [draft text of the Introduction to Python for Geographic Data Analysis textbook by Tenkanen et al.](https://python-gis-book.readthedocs.io/en/develop/part1/chapter-02/nb/00-python-basics.html#working-with-text-and-numbers).

The key components here are:

- The text that you want to create and/or modify is enclosed within the quotes preceded with letter `f`.
- You can include any existing variable in the text template by placing the name of the variable inside a set of curly braces `{}`.
    - Using string formatting, it is also possible to insert numbers into the body of text without needing first to convert the data type to a string. This is because the f-string functionality does the data type conversion for us.
- It is possible to round numbers on the fly to a specific precision, such as two decimal points as in our example by adding format specifier (`:.1f`) after the variable that we want to format.
    - The format specifier works by first adding a colon (`:`) after the variable name
    - The decimal precision can be specified by adding a dot (`.`) followed by a number that indicates the number of decimal places (two in our case)
    - The final character `f` in the format specifier defines the type of the conversion that will be conducted
        - `f` will convert the value to decimal number

In [30]:
# Using the f-string method print a statement with the following variables
municipality = "Groningen"
neigh_name = "Binnenstad-Zuid"
recreation_area_ha = 3.125

In [None]:
#On_the_way=f"The...

In [34]:
# One more example with the f-string approach
temp = 18.56789876
station_name = "Zwolle"
station_id = '0010'

#info_text =

#info_text

In [23]:
# 2.format() approach

In [29]:
info_dist2 = "The driving distance from {0} to {1} is {2:.1f} km.".format(
    cityA, cityB, dist
)
print(info_dist2)

The driving distance from Groningen to London is 682.6 km.


As you can see, here we get the same result as with f-strings using the `.format()` method, which is placed after the quotes. Placeholders are inserted inside curly braces where the numbers refer to the order of the variables listed in the `.format()` function. There are other ways to use this same approach, but the example above is typical.

In [32]:
#Using the format method print a statement with the following variables
city = 'Emmen'
population = 107856

In [20]:
You can also use the format method to control the precision of the numbers

SyntaxError: invalid syntax (2887847414.py, line 1)

In [33]:
#Use the format method to control the precision of the numbers (2 decimals).
latitude = 37.7749
longitude = -122.4194

#coordinates = 
print(coordinates)

NameError: name 'coordinates' is not defined

In [12]:
#How can we print the following using the format method?
latitude = '''37° 46' 26.2992"'''

In [13]:
a="37"
b="46"
c="26.2992"
print("latitude='''{}° {}' {}'''".format(a,b,c))

latitude='''37° 46' 26.2992'''


### Revision

![all_lists.png](attachment:20178178-42d2-4a8e-92dd-192924db32fa.png)

# 4.Loops and Conditionals

## For Loops

A for loop is used for iterating over a sequence. The sequence can be a list, a tuple, a dictionary, a set, or a string.


### for loop format

`for` loops in Python have the general form below.

```python
for variable in collection:
    do things with variable
```

Let's break down the code above to see some essential aspects of `for` loops:

1. The `variable` can be any name you like other than a [reserved keyword](https://docs.python.org/3/reference/lexical_analysis.html#keywords)
2. The statement of the `for` loop must end with a `:`
3. The code that should be executed as part of the loop must be indented beneath the `for` loop statement

    - The typical indentation is 4 spaces

4. There is no additional special word needed to end the loop, you simply change the indentation back to normal.

**Hint**: `for` loops are useful to repeat some part of the code a *finite* number of times.

In [40]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta']

for city in cities:
    print(city)

San Francisco
Los Angeles
New York
Atlanta


### Print in the same line

In [8]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta']

for city in cities:
    print(city,end=", ")

San Francisco, Los Angeles, New York, Atlanta, 

In [9]:
# What will happen in I try to change the code as follows:
for city in cities:
    print(cities)

['San Francisco', 'Los Angeles', 'New York', 'Atlanta']
['San Francisco', 'Los Angeles', 'New York', 'Atlanta']
['San Francisco', 'Los Angeles', 'New York', 'Atlanta']
['San Francisco', 'Los Angeles', 'New York', 'Atlanta']


### Delete repeated values in a list

In [3]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen', 'New York']

In [4]:
cities2= []

for i in cities:
    if i not in cities2:
        cities2.append(i)

print(cities2)

['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen']


To iterate over a dictionary, you can call the `items()` method on it which returns a tuple of key and value for each item.

In [42]:
data = {'city': 'San Francisco', 'population': 881549, 'coordinates': (-122.4194, 37.7749) }

for x, y in data.items():
    print(x, y)

city San Francisco
population 881549
coordinates (-122.4194, 37.7749)


In [46]:
us_cities = [
    "Detroit",
    "Chicago",
    "Denver",
    "Boston",
    "Portland",
    "San Francisco",
    "Houston",
    "Orlando",
]
for city in us_cities:
    print(city, end= ' ')

Detroit Chicago Denver Boston Portland San Francisco Houston Orlando 

In [43]:
print(f'After the loop the name of the city is:{us_cities}')

NameError: name 'us_cities' is not defined

### The range function
The built-in `range()` function allows you to create sequence of numbers that you can iterate over. When given an integer (whole number) as an argument, `range()` will produce a list of numbers with a length equal to the specified `number`.

In [8]:
for x in range(5):
    print(x)

0
1
2
3
4


In [9]:
for x in range(1, 10, 2):  #range(start, stop, step)
    print(x)

1
3
5
7
9


Using the documentation that is produced when you run `help(range)`, what values would you replace the `...` in the parentheses of the `range()` function with to have the following output printed to the screen?

```python
2
5
8
```

In [10]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |

In [11]:
from IPython.display import Image

In [12]:
Image

IPython.core.display.Image

# Conditionals

Python supports logical conditions such as equals, not equals, greater than etc. These conditions can be used in several ways, most commonly in *if statements* and loops.

An *if statement* is written by using the `if` keyword.

Note: A very common error that programmers make is to use *=* to evaluate a *equals to* condition. The *=* in Python means assignment, not equals to. Always ensure that you use the *==* for an equals to condition.

In [13]:
for i in range(len(cities)):
    print(cities[i])

San Francisco
Los Angeles
New York
Atlanta
Groningen
New York


In [2]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen', 'New York']

In [3]:
for city in cities:
     if city == 'Atlanta':
        print(city)

Atlanta


You can use `else` keywords along with `if` to match elements that do not meet the condition

In [4]:
for city in cities:
    if city == 'Atlanta':
        print(city)
    else:
        print('This is not Atlanta')

This is not Atlanta
This is not Atlanta
This is not Atlanta
Atlanta
This is not Atlanta
This is not Atlanta


Python relies on indentation (whitespace at the beginning of a line) to define scope in the for loop and if statements. So make sure your code is properly indented.

You can evaluate a series of conditions using the `elif` keyword.

Multiple criteria can be combined using the `and` and `or` keywords.

In [7]:
cities_population = {
    'San Francisco': 881549,
    'Los Angeles': 3792621,
    'New York': 8175133,
    'Atlanta':498044
}

for city, population in cities_population.items():
    if population < 1000000:
        print('{} is a small city'.format(city))
    elif population > 1000000 and population < 5000000:
        print('{} is a big city'.format(city))
    else:
        print('{} is a mega city'.format(city))

San Francisco is a small city
Los Angeles is a big city
New York is a mega city
Atlanta is a small city


## Practice - 21/10/22

In [8]:
l0 = [1,4,6]
l1 = [3,1,5]
l2 = [2,9]

In [12]:
l = [ l0, l1, l2 ]
#Calculate the sum of all lists

In [17]:
s=0
for i in range(len(l)):
    s = s+sum(l[i])
print("summation =",s)

summation = 31


In [27]:
import random
a_list = []
for i in range(100):
    a_list.append(random.random())
print(len(a_list))
print(sum(a_list))

100
54.17859305632631


random.randint(start, stop) # both limits included

In [20]:
#Print the integers between 3 and 9.
rand_list=[]
n=10
for i in range(n):
    rand_list.append(random.randint(3,9))
print(rand_list)
print(sum(rand_list))

[3, 5, 8, 8, 8, 9, 7, 5, 6, 7]
66


In [None]:
#In-class exercize.
#Print the integers between 10 and 21

In [33]:
#What does the following loop does?
import random
a = []
for i in range(100):
    a.append(random.randint(1, 10000))
print(a)   
print("The max value is:", max(a))

#Another way to find out the max value
max_numb = 0
for num in a:
    if num > max_numb:
       max_numb = num
print(max_numb)

[353, 5649, 7729, 6951, 9725, 6994, 1256, 5911, 9959, 7114, 5507, 9168, 4476, 2567, 4761, 4251, 7932, 5385, 3259, 4548, 9411, 2783, 6718, 3813, 7061, 8077, 1395, 3837, 3684, 7075, 1130, 7863, 9406, 9071, 6713, 9300, 2171, 4446, 7458, 9450, 9012, 7926, 9897, 9145, 5621, 6855, 4288, 2096, 2850, 1534, 4197, 2469, 8207, 3307, 3450, 1871, 5759, 2595, 1408, 2333, 8068, 8587, 5600, 3420, 2179, 8587, 1100, 317, 1698, 932, 5612, 5493, 1193, 7125, 3439, 4540, 2726, 7349, 4061, 1602, 3299, 4175, 1183, 1713, 6320, 6897, 5365, 8125, 3521, 4814, 1797, 7491, 6217, 8276, 4040, 5454, 6642, 6962, 5034, 4501]
The max value is: 9959
9959


#### Taking range limit from user input 

In [36]:
start = int(input("Enter the start of range:"))
end = int(input("Enter the end of range:"))
 
list_odd=[] #add empty list
# iterating each number in list
for num in range(start, end+1):
 
    # checking condition
    if num % 2 == 0:
        list_odd.append(num)
        print(num, end= ' ')

Enter the start of range: 1
Enter the end of range: 10


2 4 6 8 10 

In [9]:
#Using the formula above print all the even numbers between 0-100

In [1]:
students = [('Isaac','Newton'),('Mark', 'Twain'), ('Mike','Kupper'), ('Nancy','Frank')]
physics_grades = [100, 40,70, 90]
English_grades = [80, 100, 50, 70]
#Now I want to print for each student the first, last names and grades
print('Student Name', students[0], 'Physics Grade:', physics_grades[0], 'English Grade:', English_grades[0])
#let's print the first and last name in a nicer way


Student Name ('Isaac', 'Newton') Physics Grade: 100 English Grade: 80


In [9]:
print('Student Name:', students[0][0], students[0][1], 'Physics Grade:', physics_grades[0], 
      'English Grade:', English_grades[0])

Student Name: Isaac Newton ,Physics Grade: 100 English Grade: 80


In [10]:
#Let's create a variable that carry the index of the element we want to access
i = 2
print('Student Name', students[i][0], students[i][1], 'Physics Grade', physics_grades[i], 
      'English Grade', English_grades[i])
print("length of students list is", len(students))

Student Name Mike Kupper Physics Grade 70 English Grade 50
length of students list is 4


In [19]:
# now I want the computer to automatically change this index
#this is done through loops

In [11]:
#Ex.1
for i in range(len(students)):
    print('Student Name', students[i][0], students[i][1], 'Physics Grade', physics_grades[i], 
      'English Grade', English_grades[i])
#please not the indentation

Student Name Isaac Newton Physics Grade 100 English Grade 80
Student Name Mark Twain Physics Grade 40 English Grade 100
Student Name Mike Kupper Physics Grade 70 English Grade 50
Student Name Nancy Frank Physics Grade 90 English Grade 70


In [12]:
#now I want the computer to automatically change this index
#this is done through loops
for i in range(len(physics_grades)):
    print(i)

0
1
2
3


In [41]:
students = [('Isaac','Newton'),('Mark', 'Twain'), ('Mike','Kupper'), ('Nancy','Frank')]
physics_grades = [100, 40,70, 90]
English_grades = [80, 100, 50, 70]

In [16]:
#Ex.2 Provide the code that will give the following outcome
for i in range(len(students)):
    print(i, students[i][0], students[i][1], physics_grades[i], English_grades[i])

0 Isaac Newton 100 80
1 Mark Twain 40 100
2 Mike Kupper 70 50
3 Nancy Frank 90 70


0 Isaac Newton 100 80
1 Mark Twain 40 100
2 Mike Kupper 70 50
3 Nancy Frank 90 70

In [17]:
#Ex.3 same as before but i starts at zero and ends at 3
for i in range(len(students)):
    print(i+1, students[i][0], students[i][1], physics_grades[i], English_grades[i])

1 Isaac Newton 100 80
2 Mark Twain 40 100
3 Mike Kupper 70 50
4 Nancy Frank 90 70


#### Three different ways to iterate over the list

In [19]:
for i in students:
    print(i)

('Isaac', 'Newton')
('Mark', 'Twain')
('Mike', 'Kupper')
('Nancy', 'Frank')


In [18]:
for i in range (len(students)):
    print(students[i])

('Isaac', 'Newton')
('Mark', 'Twain')
('Mike', 'Kupper')
('Nancy', 'Frank')


In [20]:
for i in range(4):
    print(students[i])

('Isaac', 'Newton')
('Mark', 'Twain')
('Mike', 'Kupper')
('Nancy', 'Frank')


The zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

In [21]:
#but we want to loop over several lists
for i,j, k in zip(students, physics_grades, English_grades):
    print('Student Name', i[0], i[1], 'Physics Grade', j, 
      'English Grade', k)

Student Name Isaac Newton Physics Grade 100 English Grade 80
Student Name Mark Twain Physics Grade 40 English Grade 100
Student Name Mike Kupper Physics Grade 70 English Grade 50
Student Name Nancy Frank Physics Grade 90 English Grade 70


- enumerate and zip

In [56]:
names = ['Alice', 'Bob', 'Charlie']
ages = [24, 50, 18]

for i, (name, age) in enumerate(zip(names, ages)):
    print(i, name, age)

0 Alice 24
1 Bob 50
2 Charlie 18


In [58]:
cities = ['UAE', 'Qatar', 'Kuwait', 'Bahrain', 'Oman', 'Singapore', 'Saudia Arabia', 'Jordan']
immigr_per=[88,77,73,55,46,43,39,34]
print(cities)

['UAE', 'Qatar', 'Kuwait', 'Bahrain', 'Oman', 'Singapore', 'Saudia Arabia', 'Jordan']


In [59]:
for i, (city,per) in enumerate(zip(cities,immigr_per)):
    print (i,city,per)

0 UAE 88
1 Qatar 77
2 Kuwait 73
3 Bahrain 55
4 Oman 46
5 Singapore 43
6 Saudia Arabia 39
7 Jordan 34


Write a computer rolling dice game of 2 players, the game should be over if one of 
the players got a total of 20. #hint use randint(1,6)

In [30]:
#Ex.4
import random

p1 = 0
p2 = 0

while(p1<20 and p2<20):
    p1 = p1 + random.randint(1,6)
    p2 = p2 + random.randint(1,6)
else:
    print('player 1 scored',p1)
    print('player 2 scored',p2)

player 1 scored 14
player 2 scored 22


Write a while loop that runs till a random number greater than 0.5 is generated

In [42]:
#Ex.5
from random import random
x = 0
while(x <= 0.5): #code that runs until x is larger than x 
  x = random ()
  print(x)
else:
    print("loop is finished")

0.17933676112736452
0.43171689135511027
0.37498611598466136
0.8176557954737931
loop is finished


In [32]:
#Sometimes you don't know when your loop will end
#In this case the for loop is not appropiate and you will need the while
stock = 10
while(stock>0):
    print('the items in stock', stock)
    stock = stock - 1
else:
    print('no stock available')

the items in stock 10
the items in stock 9
the items in stock 8
the items in stock 7
the items in stock 6
the items in stock 5
the items in stock 4
the items in stock 3
the items in stock 2
the items in stock 1
no stock available


In [62]:
#Now I want to design a game where 2 players play rolling dice
#The winner is the first player who gets 2 "6"
#Because rolling dice is a random process, we need to
#Generate a random number between 1 and 6 for each player
#To do that in python we need to use an external function
#The keyword import allows us to use external function

In [44]:
#Player with 
#The while loop in python runs until the "while" condition is satisfied.
#The "while true" loop in python runs without any conditions until the break statement executes
#inside the loop.

from random import randint
count_6_player1 = 0
count_6_player2 = 0

counts = 0
while(True):
    counts+=1 #counts=counts+1
    player1 = randint(1, 6)
    player2 = randint(1, 6)
    if player1 == 6:
        count_6_player1 = count_6_player1 + 1

    else:
        count_6_player1 = 0
    
    if player2 == 6:
        count_6_player2 = count_6_player2 + 1

    else:
        count_6_player2 = 0
        
    if count_6_player1==2:
        print("Player 1 won after", counts, 'iterations')
        break
    if count_6_player2==2:
        print("Player 2 won after", counts, 'iterations')
        break

Player 1 won after 53 iterations


In [60]:
#With print statements
from random import randint
count_6_player1 = 0
count_6_player2 = 0
trials = 0
while(count_6_player1 !=2 and count_6_player2!=2):
    trials+=1
    player1 = randint(1, 6)
    player2 = randint(1, 6)
    print('player one', player1, 'player two', player2)
    if player1 == 6:
        count_6_player1 = count_6_player1 + 1
        print("Player 1 scored")
    if player2 == 6:
        count_6_player2 = count_6_player2 + 1
        print("Player 2 scored")    
print('Game over after',trials,'trials')

player one 2 player two 2
player one 3 player two 2
player one 2 player two 2
player one 6 player two 5
Player 1 scored
player one 1 player two 5
player one 4 player two 6
Player 2 scored
player one 2 player two 3
player one 3 player two 5
player one 5 player two 5
player one 5 player two 5
player one 3 player two 3
player one 1 player two 4
player one 6 player two 5
Player 1 scored
Game over after 13 trials


## Control Statements

A for-loop iterates over each item in the sequence. Sometimes is desirable to stop the execution, or skip certain parts of the for-loops. Python has special statements, `break`, `continue` and `pass`. 

A `break` statement will stop the loop and exit out of it

In [49]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta', 'Groningen', 'New York']
for city in cities:
    print(city)
    if city == 'Los Angeles':
        print('I found Los Angeles')
        break

San Francisco
Los Angeles
I found Los Angeles


A `continue` statement will skip the remaining part of the loop and go to the next iteration

In [50]:
for city in cities:
    if city == 'Los Angeles':
        continue
    print(city)

San Francisco
New York
Atlanta
Groningen
New York


In [35]:
for x in range(1, 10):
    if x%2 == 0:
        print('{} is divisible by 2'.format(x))
    else:
        print('{} is not divisible by 2'.format(x))

1 is not divisible by 2
2 is divisible by 2
3 is not divisible by 2
4 is divisible by 2
5 is not divisible by 2
6 is divisible by 2
7 is not divisible by 2
8 is divisible by 2
9 is not divisible by 2


![Anatomy of a function.](img/Function_anatomy-400.png)

# 5.Functions

A function is a block of organized, reusable code that can make your programs more effective, easier to read, and simple to manage.You can think functions as little self-contained programs that can perform a specific task that you can use repeatedly in your code. 

Functions are useful because they allow us to capture the logic of our code and we can run it with differnt inputs without having to write the same code again and again.

One of the basic principles in good programming is "do not repeat yourself".
In other words, you should avoid having duplicate lines of code in your scripts.
Functions are a good way to avoid such situations and they can save you a lot of time and effort as you don't need to tell the computer repeatedly what to do every time it does a common task, such as converting temperatures from Fahrenheit to Celsius.
During the course we have already used some functions such as the `print()` function which is a built-in function in Python.

A funtion is defined using the `def` keyword

```
def my_function():
    ....
    ....
    return something
```



![functions.png](attachment:f8fe9926-73fc-4b3f-a9eb-7cec2be130b3.png)

Above shown is a function definition that consists of the following components:

- Keyword `def`  that marks the start of the function header.
- A function name to uniquely identify the function.
- Parameters (arguments) through which we pass values to a function (optional).
- A colon (:) to mark the end of the function header.
- Documentation string (docstring) to describe what the function does (optional).
- One or more valid python statements that make up the function body. Statements must have the  same indentation level (usually 4 spaces).
- A return statement to return a value from the function (optional).

In [2]:
def greet(name):
    "This function greets to the person passed in as a parameter"
    print("Hello, " + name + ". Good morning!")
greet('Nick')

Hello, Nick. Good morning!


In [5]:
def greet(name):
    return 'Hello ' + name

print(greet('Universe'))

Hello Universe


In [61]:
name = 'Alice'
def example():
    print(name)
    name = 'Bob'
example()

UnboundLocalError: local variable 'name' referenced before assignment

"Local variable referenced before assignment" occurs when we reference a local variable before assigning a value to it in a function. To solve the error, mark the variable as global in the function definition, e.g. global my_var.

In [63]:
name = 'Alice'
def example():
    global name
    print(name)
    name = 'Bob'
example()
example()

Alice
Bob


## 5.1 Calling functions
Calling our self-defined function is no different from calling any other function such as print(). You need to call it with its name and provide your value(s) as the required parameter(s) inside the parentheses.

When we call the function, the values we pass to it are assigned to the corresponding parameter variables so that we can use them inside the function (e.g., the variable temp in this function example). Inside the function, we use a return statement to define the value that should be given back when the function is used, or called.

Defining a function does nothing other than make it available for use in our notebooks. In order to use the function we need to call it.

Now let’s try using our function. Calling our self-defined function is no different from calling any other function such as print(). You need to call it with its name and provide your value(s) as the required parameter(s) inside the parentheses. Here, we can define a variable freezing_point that is the temperature in degrees Fahrenheit we get when using our function with the temperature 0°C (the temperature at which water freezes). We can then print that value to confirm. We should get a temperature of 32°F.

In [12]:
def celsius_to_fahr(temp):
    return 9 / 5 * temp + 32
freezing_point = celsius_to_fahr(0.0)
print(f"The freezing point of water in Fahrenheit is {freezing_point}")

The freezing point of water in Fahrenheit is 32.0


We can do the same thing with the boiling point of water in degrees Celsius (100°C). Just like with other functions, we can use our new function directly within something like the print() function to print out the boiling point of water in degrees Fahrenheit.

In [69]:
print(f"The boiling point of water in Fahrenheit is {celsius_to_fahr(100)}")

The boiling point of water in Fahrenheit is 212.0


Let’s use it in the same way as the earlier one by defining a new variable absolute_zero that is the Celsius temperature of 0 Kelvins. Note that we can also use the parameter name temp_kelvins when calling the function to explicitly state which variable value is being used. Again, let’s print the result to confirm everything works.

In [14]:
#absolute_zero = kelvins_to_celsius(temp_kelvins=0)

In [20]:
def kelvins_to_celsius(temp_kelvins):
    return temp_kelvins - 273.15
    absolute_zero = kelvins_to_celsius(temp_kelvins=0)
print(f"Absolute zero in Celsius is {absolute_zero}")

Absolute zero in Celsius is -273.15


#### Functions can also take multiple arguments. Let's see some examples

In [16]:
def shipping_address(cities="Groningen", country='NL'):
    print("shipping to", cities, country)
shipping_address()

shipping to Groningen NL


In [17]:
shipping_address(cities="Amsterdam")

shipping to Amsterdam NL


In [18]:
shipping_address(cities="Hamburg", country="Germany")

shipping to Hamburg Germany


In [19]:
shipping_address("Germany", "Hamburg")

shipping to Germany Hamburg


- Another example with mm and inches

In [33]:
def mm_to_in(mm):    
    inches = mm / 25.4    
    return inches

In [54]:
precip_jan_mm = 17.8
mm_to_in(mm = 17.8) #or mm_to_in(mm = precip_jan_mm)
#print(f"The precipitation for January was:{precip_jan_mm:.2f}")

0.7007874015748032

In [52]:
mm

25.4

In [49]:
inches

NameError: name 'inches' is not defined

The reason why we get an error here is because whatever value we run into our funvtion (e.g mm, inches)it's going to keep this name inside the function. It does not matter what is called in the outside world but inside the functiuon is called mm or inches and has specific properties.

#### Importing scripts

Open a new python document and paste the codes corresponding to functions: 
- mm_to_in

Rename the file as "inch_converter" and save it.
Run the %ls command and check if converters.py appears on the list.
This means that you are on the right directory!

In [12]:
# %ls

In [9]:
from inch_converter import mm_to_in

In [13]:
print(f"1 meter corresponds to {mm_to_in(1000):.2f}")

1 meter corresponds to 39.37


#### Importing multiple functions

It is also possible to import more functions at the same time by listing and separating them with a comma.

```python
from my_script import func1, func2, func3
```

Sometimes it is useful to import the whole script and all of its functions at once. Let's use a different `import` statement and test that all functions work. This time we can type `import inch_converter as ic`.

In [14]:
import inch_converter as ic

In [15]:
print(f"1 meter corresponds to {ic.mm_to_in(1000):.2f}")

1 meter corresponds to 39.37


# 6.The Python Standard Library (optional)

Python comes with many built-in modules that offer ready-to-use solutions to common programming problems. To use these modules, you must use the `import` keyword. Once imported in your Python script, you can use the functions provided by the module in your script.

We will use the built-in `math` module that allows us to use advanced mathematical functions.

In [3]:
import math

In [None]:
#Explain more in details about Anaconda (pip, conda env) - libraries in general (geopandas)

You can also import specific functions or constants from the module like below

In [None]:
from math import pi
print(pi)

## Calculating Distance

Given 2 points with their Latitude and Longitude coordinates, the Haversine Formula calculates the straight-line distance in meters, assuming that Earth is a sphere.

The formula is simple enough to be implemented in a spreadsheet too. If you are curious, see [my post](https://spatialthoughts.com/2013/07/06/calculate-distance-spreadsheet/) about using this formula for calculating distances in a spreadsheet.

We can write a function that accepts a pair of origin and destination coordinates and computes the distance.

In [None]:
san_francisco = (37.7749, -122.4194)
new_york = (40.661, -73.944)

In [None]:
def haversine_distance(origin, destination):
  lat1, lon1 = origin
  lat2, lon2 = destination
  radius = 6371000
  dlat = math.radians(lat2-lat1)
  dlon = math.radians(lon2-lon1)
  a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
    * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
  c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
  distance = radius * c
  return distance

In [None]:
distance = haversine_distance(san_francisco, new_york)
print(distance/1000, 'km')

## Discover Python Easter Eggs

Programmers love to hide secret jokes in their programs for gun. These are known as *Easter Eggs*. Python has an easter egg that you can see when you try to import the module named `this`. Try writing the command `import this` below.

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Let's try one more. Try importing the `antigravity` module.

Here's a complete list of [easter eggs in Python](https://towardsdatascience.com/7-easter-eggs-in-python-7765dc15a203).

## Exercise (optional)

Find the coordinates of 2 cities near you and calculate the distance between them by calling the `haversine_distance` function below.

In [None]:
def haversine_distance(origin, destination):
  lat1, lon1 = origin
  lat2, lon2 = destination
  radius = 6371000
  dlat = math.radians(lat2-lat1)
  dlon = math.radians(lon2-lon1)
  a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
    * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
  c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
  distance = radius * c
  return distance

# city1 = (lat1, lng1)
# city2 = (lat2, lng2)
# call the function and print the result

# 7. Third-party Modules

Python has a thriving ecosystem of third-party modules (i.e. libraries or packages) available for you to install. There are hundreds of thousands of such modules available for you to install and use.

## Installing third-party libraries

Python comes with a package manager called `pip`. It can install all the packages listed at [PyPI (Python Package Index)](https://pypi.org/). To install a package using pip, you need to run a command like following in a Terminal or CMD prompt.

`pip install <package name>`

For this course, we are using Anancoda platform - which comes with its own package manager called `conda`. You can use Anaconda Navigator to search and install packages. Or run the command like following in a Terminal or CMD Prompt.

`conda install <package name>`

See this [comparison of pip and conda](https://www.anaconda.com/blog/understanding-conda-and-pip) to understand the differences.

## Calculating Distance (New York - Groningen)

We have already installed the `geopy` package in our environment. `geopy` comes with functions that have already implemented many distance calculation formulae.

- `distance.great_circle()`: shortest distance between two points on the surface of a sphere 
- `distance.geodesic()`: Calculates the distance using a chosen ellipsoid using vincenty's formula

In [4]:
from geopy import distance

new_york = (40.89145, -74.044281)
martini_tower = (53.21900, 6.56866)

straight_line_distance = distance.great_circle(new_york, martini_tower) 
ellipsoid_distance = distance.geodesic(new_york, martini_tower, ellipsoid='WGS-84')

print(f"The straight line distance is: {straight_line_distance}")
print(f'The straight_line_distance is: {ellipsoid_distance}')
#print(round(5922.630831657959,2))
#print(distance.distance(new_york, martini_tower).km)

The straight line distance is: 5922.630831657959 km
The straight_line_distance is: 5938.767333776888 km


## Exercise

Repeat the distance calculation exercise from the previous module but perform the calculation using the geopy library.

In [None]:
from geopy import distance

# city1 = (lat1, lng1)
# city2 = (lat2, lng2)
# call the geopy distance function and print the great circle and ellipsoid distance

# 8. Reading Files

Python provides built-in functions for reading and writing files.  

To read a file, we must know the path of the file on the disk. Python has a module called `os` that has helper functions that helps dealing with the the operating system. Advantage of using the `os` module is that the code you write will work without change on any suppored operating systems.

In [11]:
import os

To open a file, we need to know the path to the file. We will now open and read the file `worldcitites.csv` located in your data package. In your data package the data folder is in the `data/` directory. We can construct the relative path to the file using the `os.path.join()` method.

In [None]:
data_pkg_path = 'data'
filename = 'worldcities.csv'
path = os.path.join(data_pkg_path, filename)
print(path)

To open the file, use the built-in `open()` function. We specify the *mode* as `r` which means read-only. If we wanted to change the file contents or write a new file, we would open it with `w` mode.

Our input file also contains Unicode characters, so we specify `UTF-8` as the encoding.

The open() function returns a file object. We can call the  `readline()` method for reading the content of the file, one line at a time.

It is a good practice to always close the file when you are done with it. To close the file, we must call the `close()` method on the file object.

In [None]:
f = open(path, 'r', encoding='utf-8')
print(f.readline())
print(f.readline())
f.close()

Calling `readline()` for each line of the file is tedious. Ideally, we want to loop through all the lines in file. You can iterate through the file object like below.

We can loop through each line of the file and increase the `count` variable by 1 for each iteration of the loop. At the end, the count variable's value will be equal to the number of lines in the file.

In [None]:
f = open(path, 'r', encoding='utf-8')

count = 0
for line in f:
    count += 1
f.close()
print(count)

## Exercise

Print first 5 lines of the file. 

- Hint: Use break statement

In [None]:
import os
data_pkg_path = 'data'
filename = 'worldcities.csv'
path = os.path.join(data_pkg_path, filename)

# Add code to open the file and read first 5 lines

# 10.Reading CSV Files

Comma-separated Values (CSV) are the most common text-based file format for sharing geospatial data. The structure of the file is 1 data record per line, with individual *columns* separated by a comma. 

In general, the separator character is called a delimiter. Other popular delimiters include the tab (\\t), colon (:) and semi-colon (;) characters. 

Reading CSV file properly requires us to know which delimiter is being used, along with quote character to surround the field values that contain space of the delimiter character. Since reading delimited text file is a very common operation, and can be tricky to handle all the corner cases, Python comes with its own library called `csv` for easy reading and writing of CSV files. To use it, you just have to import it.

In [12]:
import csv

The preferred way to read CSV files is using the `DictReader()` method. Which directly reads each row and creates a dictionary from it - with column names as *key* and column values as *value*. Let's see how to read a file using the `csv.DictReader()` method.

In [None]:
import os
data_pkg_path = 'data'
filename = 'worldcities.csv'
path = os.path.join(data_pkg_path, filename)

In [None]:
f = open(path, 'r')
csv_reader = csv.DictReader(f, delimiter=',', quotechar='"')
print(csv_reader)
f.close()

## Using `enumerate()` function

When iterating over an object, many times we need a counter. We saw in the previous example, how to use a variable like `count` and increase it with every iteration. There is an easy way to do this using the built-in `enumerate()` function.

In [None]:
cities = ['San Francisco', 'Los Angeles', 'New York', 'Atlanta']
for x in enumerate(cities):
    print(x)

We can use enumerate() on any iterable object and get a tuple with an index and the iterable value with each iteration. Let's use it to print the first 5 lines from the DictReader object.

In [None]:
f = open(path, 'r', encoding='utf-8')
csv_reader = csv.DictReader(f, delimiter=',', quotechar='"')
for index, row in enumerate(csv_reader):
    print(row)
    if index == 4:
        break
f.close()

## Using `with` statement

The code for file handling requires we open a file, do something with the file object and then close the file. That is tedious and it is possible that you may forget to call `close()` on the file. If the code for processing encounters an error the file is not closed property, it may result in bugs - especially when writing files.

The preferred way to work with file objects is using the `with` statement. It results in simpler and cleaer code - which also ensures file objects are closed properly in case of errors.

As you see below, we open the file and use the file object `f` in a `with` statement. Python takes care of closing the file when the execution of code within the statement is complete.

In [None]:
with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)

## Filtering rows

We can use conditional statement while iterating over the rows, to select and process rows that meet certain criterial. Let's count how many cities from a particular country are present in the file.

Replace the `home_country` variable with your home country below.

In [None]:
home_country = 'India'
num_cities = 0

with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)

    for row in csv_reader:
        if row['country'] == home_country:
            num_cities += 1
            
print(num_cities)

### End of compulsory material ==================================

# Extra material

## Calculating distance

Let's apply the skills we have learnt so far to solve a complete problem. We want to read the `worldcities.csv` file, find all cities within a home country, calculate the distance to each cities from a home city and write the results to a new CSV file.

First we find the coordinates of the out selected `home_city` from the file. Replace the `home_city` below with your hometown or a large city within your country. Note that we are using the `city_ascii` field for city name comparison, so make sure the `home_city` variable contains the ASCII version of the city name.

In [None]:
home_city = 'Bengaluru'

home_city_coordinates = ()

with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        if row['city_ascii'] == home_city:
            lat = row['lat']
            lng = row['lng']
            home_city_coordinates = (lat, lng)
            break
        
print(home_city_coordinates)

Now we can loop through the file, find a city in the chosen home country and call the `geopy.distance.geodesic()` function to calculate the distance. In the code below, we are just computing first 5 matches.

from geopy import distance

counter = 0
with open(path, 'r', encoding='utf-8') as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        if (row['country'] == home_country and
            row['city_ascii'] != home_city):
            city_coordinates = (row['lat'], row['lng'])
            city_distance = distance.geodesic(
                city_coordinates, home_city_coordinates).km
            print(row['city_ascii'], city_distance)
            counter += 1
            
        if counter == 5:
            break

## Writing files

Instead of printing the results, let's write the results to a new file. Similar to csv.DictReader(), there is a companion `csv.DictWriter()` method to write files. We create a `csv_writer` object and then write rows to it using the `writerow()` method.

First we create an `output` folder to save the results. We can first check if the folder exists and if it doesn't exist, we can create it.

In [None]:
output_dir = 'output'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

In [None]:
output_filename = 'cities_distance.csv'
output_path = os.path.join(output_dir, output_filename)

with open(output_path, mode='w', encoding='utf-8') as output_file:
    fieldnames = ['city', 'distance_from_home']
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    
    # Now we read the input file, calculate distance and
    # write a row to the output 
    with open(path, 'r', encoding='utf-8') as f:
        csv_reader = csv.DictReader(f)
        for row in csv_reader:
            if (row['country'] == home_country and
                row['city_ascii'] != home_city):
                city_coordinates = (row['lat'], row['lng'])
                city_distance = distance.geodesic(
                    city_coordinates, home_city_coordinates).km
                csv_writer.writerow(
                    {'city': row['city_ascii'],
                     'distance_from_home': city_distance}
                )

Below is the complete code for our task of reading a file, filtering it, calculating distance and writing the results to a file.

In [None]:
import csv
import os
from geopy import distance

data_pkg_path = 'data'
input_filename = 'worldcities.csv'
input_path = os.path.join(data_pkg_path, input_filename)
output_filename = 'cities_distance.csv'
output_dir = 'output'
output_path = os.path.join(output_dir, output_filename)

if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    
home_city = 'Bengaluru'
home_country = 'India'

with open(input_path, 'r', encoding='utf-8') as input_file:
    csv_reader = csv.DictReader(input_file)
    for row in csv_reader:
        if row['city_ascii'] == home_city:
            home_city_coordinates = (row['lat'], row['lng'])
            break

with open(output_path, mode='w') as output_file:
    fieldnames = ['city', 'distance_from_home']
    csv_writer = csv.DictWriter(output_file, fieldnames=fieldnames)
    csv_writer.writeheader()

    with open(input_path, 'r', encoding='utf-8') as input_file:
        csv_reader = csv.DictReader(input_file)
        for row in csv_reader:
            if (row['country'] == home_country and
                row['city_ascii'] != home_city):
                city_coordinates = (row['lat'], row['lng'])
                city_distance = distance.geodesic(
                    city_coordinates, home_city_coordinates).km
                csv_writer.writerow(
                    {'city': row['city_ascii'],
                     'distance_from_home': city_distance}
                )
print('Successfully written output file at {}'.format(output_path))

## Exercise

Replace the `home_city` and `home_country` variables with your own home city and home country and create a CSV file containing distance from your home city to every other city in your country.

## -End of document-