# Python cheatsheets and tutorials

1. [Python basics](https://www.pythoncheatsheet.org/)
2. [numpy](http://datacamp-community-prod.s3.amazonaws.com/da466534-51fe-4c6d-b0cb-154f4782eb54)
3. [pandas](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
4. [Basics, numpy and pandas](https://www.kaggle.com/lavanyashukla01/pandas-numpy-python-cheatsheet)
5. [matplotlib for plotting](https://github.com/matplotlib/cheatsheets#cheatsheets) 
6. [seaborn for plotting](https://seaborn.pydata.org/tutorial.html) 

# Python basics part 2

# Conditionals

Generally, conditionals look like

    if <test>:
        <Code run if test is valid>>

or

    if <first test>:
        <Code run if the first test is valid.>
    elif <second test>:
        <Code run if the second test is valid>
    else:
        <Code run if neither test is valid>

In both cases the test statements are code segments that return a boolean value, often a test for equality or inequality. The `elif` and `else` statements are always optional; both, either, or none can be included.

In [1]:
x = 20

if x < 10:
    print('x is less than 10')
else:
    print('x is more than 10')

x is more than 10


#### Note: Indentation is very important

* Indentation in python is typically *4 spaces*. Most programming text editors will be smart about indentation, and will also convert TABs to four spaces. Jupyter notebooks are smart about indentation, and will do the right thing, i.e., autoindent a line below a line with a trailing colon, and convert TABs to spaces.

* Indentation is commonly seen in functions, try-except code blocks, conditionals (here), for-while loops (will see later), etc. In Jupyter notebook, just press `Enter` or `Return` following the keyword line (e.g., `def`, `if`, `for`, `while`) and it will create the indentation in the following line.

---
### *Exercise*

> Run the code in the following cell. Is this what you want?

> Add an `elif` statement to code that will print the correct message when x is 10.

---

In [8]:
x = 10
def check_value(x):
    if x < 10:
        print('x is less than 10')
    elif x == 10: 
        print('x is 10')
    else:
        print('x is more than 10')


In [9]:
check_value(x)


x is 10


### More on conditional: `and`, `or`

* For `and`: when all tests joined by `and` are true, the result will be true. If there is any test that is false, the result will be false.

* For `or`: when there is at least one test joined by `or` is true, the result will be true. If all tests are false, the result will be false

In [10]:
a = 3
b = 4

In [11]:
if (a > 2):
    print("a is bigger than 2")

a is bigger than 2


**using `and`:**

In [12]:
# both tests are true
if (a > 2 and b > 3):
    print("the result is true")
else:
    print("the result is false")

the result is true


In [13]:
# one condition is false
if (a > 4 and b > 3):
    print("the result is true")
else:
    print("the result is false")

the result is false


In [14]:
# both conditions are false
if (a > 4 and b > 5):
    print("the result is true")
else:
    print("the result is false")

the result is false


**using `or`:**

In [15]:
# both tests are true
if (a > 2 or b > 3):
    print("the result is true")
else:
    print("the result is false")

the result is true


In [16]:
# one condition is false
if (a > 4 or b > 3):
    print("the result is true")
else:
    print("the result is false")

the result is true


In [17]:
# both conditions are false
if (a > 4 or b > 5):
    print("the result is true")
else:
    print("the result is false")

the result is false


# Loops

Loops are one of the fundamental structures in programming. Loops allow you to iterate over each element in a sequence, one at a time, and do something with those elements.

### For loops

In [18]:
for i in [1, 2, 3]:
    print(i)

1
2
3


*Loop syntax*: Loops have a very particular syntax in Python; this syntax is one of the most notable features to Python newcomers. The format looks like (using for-loop as an example)

    for *element* in *sequence*:                # NOTE the colon at the end
        <some code that uses the *element*>     # the block of code that is looped over for each element
        <more code that uses the *element*>     # is indented four spaces (yes four! yes spaces!)
    
    <the code after the loop continues>         # the end of the loop is marked simply by unindented code
    
**Indentation is important**.

A simple example of the for-loop is to find the sum of the squares of the sequence 0 through 99:

**Note in the following codes:**

1. the `+=` operator is equivalent to `sum = sum + n**2` equal to `sum += n**2`
2. the `**` operator is a power
3. the range function yields a sequence of numbers from 0 up to but not including the number inside range. For example, range(5) will yields a sequence of numbers from 0 up to 4 (**not 5!**)

In [19]:
sum_of_squares = 0

for n in range(100):
    sum_of_squares += n**2        

print(sum_of_squares)

328350


**More on range (see this [tutorial](https://www.datacamp.com/community/tutorials/python-range-function?utm_source=adwords_ppc&utm_medium=cpc&utm_campaignid=14051819510&utm_adgroupid=&utm_device=c&utm_keyword=&utm_matchtype=&utm_network=x&utm_adpostion=&utm_creative=&utm_targetid=&utm_loc_interest_ms=&utm_loc_physical_ms=1026305&gclid=Cj0KCQiAieWOBhCYARIsANcOw0zo4UpTfmMB8VpGxUMp-G9zZDdV5Htglp9FGF_9sobW9aGkq_FnjxIaAmFoEALw_wcB))**

* The range function is a very popular and widely used function in Python, especially when you are working with predominantly **for loops**.

* Since the `range()` function returns a **generator object** that only stores the start, stop, and step values, it consumes less amount of memory irrespective of the range it represents when compared to a list or tuple.

* The range() function can be represented in three different ways, or you can think of them as three range parameters:
    1. range(start, stop, step) : This generates the sequence based on the start and stop value (not including the stop value), with a step size of step.
    2. range(start, stop) : This generates the sequence based on the start and stop value (not including the stop value), with a step size of 1.
    3. range(stop): This generates the sequence based on the start (0) and stop value (not including the stop value), with a step size of 1.

In [20]:
# some examples
for i in range(2,10,2):
    print(i)

2
4
6
8


In [21]:
# some examples
for i in range(2,10):
    print(i)

2
3
4
5
6
7
8
9


In [22]:
# some examples
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [23]:
# This cell generates the same result, but is not recommended
# compared with range(10), this [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] will consume more memory
for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]:
    print(i)


0
1
2
3
4
5
6
7
8
9


---
### *Exercise*

> Create a for loop. print from 10 to 1.

---

In [34]:
for i in range(10,0,-1):
    print(i)

10
9
8
7
6
5
4
3
2
1


You can iterate over any sequence (e.g., range, list, or **even string**)! Note that string is also a sequence.

In [36]:
for element in "hello, world":
    print(element)

h
e
l
l
o
,
 
w
o
r
l
d


**Tips**: If you only need the element (not the indices) inside the sequence, it is better to **iterate over the sequence itself** than to **loop over the indices of that sequence**. 

The following two examples give the same result, but the first is much more readable and easily understood than the second. **Do the first whenever possible**.

In [37]:
# THIS IS BETTER THAN THE NEXT CODE BLOCK. DO IT THIS WAY.
words = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

sentence = ''  # this initializes a string which we can then add onto
for word in words:
    sentence += word + ' ' # note '+=' is equivalent to sentence = sentence + word + ' '

print(sentence)

the quick brown fox jumped over the lazy dog 


In [38]:
# DON'T DO IT THIS WAY IF POSSIBLE, DO IT THE WAY IN THE PREVIOUS CODE BLOCK.
words = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

sentence = ''
for i in range(len(words)):
    sentence += words[i] + ' '

print(sentence)

the quick brown fox jumped over the lazy dog 


Sometimes you want to iterate over both the elements and their indices in a sequence. One way to do that is the `enumerate` function:

    enumerate(<sequence>)

This returns a sequence of **tuples** containing two items. The first item in each tuple is the index, the second is the element itself. It is commonly used in `for` loops, like

In [39]:
for tuple_word in enumerate(words):
    print('The iteration is', tuple_word)

The iteration is (0, 'the')
The iteration is (1, 'quick')
The iteration is (2, 'brown')
The iteration is (3, 'fox')
The iteration is (4, 'jumped')
The iteration is (5, 'over')
The iteration is (6, 'the')
The iteration is (7, 'lazy')
The iteration is (8, 'dog')


**If you want to seperate the index and the element:**

In [40]:
for idx, word in enumerate(words):
    print('The index is {0} and the word is {1}'.format(idx, word))

The index is 0 and the word is the
The index is 1 and the word is quick
The index is 2 and the word is brown
The index is 3 and the word is fox
The index is 4 and the word is jumped
The index is 5 and the word is over
The index is 6 and the word is the
The index is 7 and the word is lazy
The index is 8 and the word is dog


---
### *Exercise*

> Create a list that contains "Sam", "Tim", and "Lidia", iterate this list and print out the following <br>

Number 1 is Sam <br>
Number 2 is Tim <br>
Number 3 is Lidia <br>

> Try to use two print methods

---

In [46]:
name_list = ['Sam', 'Tim', 'Lidia']

for idx, name in enumerate(name_list):
    print('Number {0} is {1}'.format(idx+1,name))

Number 1 is Sam
Number 2 is Tim
Number 3 is Lidia


In [47]:
for idx, name in enumerate(name_list):
    print('Number ' + str(idx+1) + ' is '+ name)

Number 1 is Sam
Number 2 is Tim
Number 3 is Lidia


### List comprehension (super useful!)

There is a short way to make a list from a simple rule by using list comprehensions. The syntax is like

    [<element(item)> for item in sequence]
    
The `element` can be any code snippet that depends on the `item`. 

for example, we can calculate the squares of the first 10 integers

In [48]:
[i**2 for i in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [49]:
# the normal way (cumbersome!):
xx = []
for i in range(10):
    xx.append(i**2)
xx

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

This example gives a sequence of boolean values that determine if the element in a list is a string.

In [50]:
random_list = [1, 2, 'three', 4.0, ['five']]
foo = [isinstance(item, str) for item in random_list]
foo

[False, False, True, False, False]

*the normal for-loop: cumbersome!!!*

In [51]:
random_list = [1, 2, 'three', 4.0, ['five']]

foo = []
for item in random_list:
    foo.append(isinstance(item, str))
foo

[False, False, True, False, False]

---
### *Exercise*

> Modify the previous list comprehension to test if the elements are integers.

---

In [53]:
[isinstance(item, int) for item in random_list]

[True, True, False, False, False]

**Combine list comprehension with filtering**

This filtering form is similar to the simple form of list comprehension, but it evaluates boolean-expression for every item. It also only keeps those members for which the boolean expression is True.

In [54]:
# obtain the integers from 0 to 9
foo = [i for i in range(10)]
foo

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [55]:
# obtain the integers from 0 to 9 that are divisible by 3
# we use the modulus operator % to check is the remainder is 0
foo = [i for i in range(10) if i % 3 == 0]
foo

[0, 3, 6, 9]

*the normal for-loop: cumbersome!!!* Not recommended!

In [56]:
foo = []
for i in range(10):
    if i % 3 == 0:
        foo.append(i)
        
foo

[0, 3, 6, 9]

### While loops

The majority of loops that you will write will be `for` loops. These are loops that have a defined number of iterations, over a specified sequence. However, there may be times when it is not clear when the loop should terminate. In this case, you use a `while` loop. This has the syntax

    while <condition>:
        <code>

The while loop is like a repeated if statement. The code is executed over and over again, as long as the `condition` is True.

`condition` should be something that can be evaluated when the loop is started, and the variables that determine the conditional should be modified in the loop and should finally yield a false condition to stop the loop.

**This kind of loop should be used carefully** — it is relatively easy to accidentally create an infinite loop, where the condition is never triggered to stop so the loop continues forever. **This can cause the computer to crash**.

Here is an example of using the while loop to check the error of a dummy numerical model

* Error starts at 50
* Error is divided by 4 on every run
* Continue until error is no longer > 1

In [60]:
error = 50

while error > 1:
    error = error / 4
    print(error)

12.5
3.125
0.78125


---
### *Exercise*

* Write a for loop, add up integers from 0 to 100.

* Write a while loop, add up integers from 0 to 100.

---

In [70]:
to100 = range(0,101,1)
sum_num = 0
for i in to100:
    sum_num += i
sum_num

5050

In [76]:
i = 0 
sum_num = 0

while i < 101:
    sum_num = sum_num +i
    i = i + 1
    
print(sum_num)

5050


### Flow control: combine loops and conditionals

There are a few statements that allow you to control the flow of any iterative loop: `continue`, `break`, and `pass`.

- `continue` stops the current iteration and continues to the next element of the loop, if there is one.

- `break` stops the current iteration, and leaves the loop.

In [72]:
# print all the numbers from 0 to 9, except 5
for n in range(10):
    if n == 5:
        continue
    print(n)

print('done')

0
1
2
3
4
6
7
8
9
done


In [73]:
# print all the numbers up to (but not including) 5, then break out of the loop.
for n in range(10):
    if n == 5:
        break
    print(n)

print('done')


0
1
2
3
4
done


# Functions

Functions are ways to create reusable blocks of code that can be run with different variable values – the input variables to the function. Functions are defined using the syntax

    def <function name>(var1, var2, ...):
        <block of code...>
        return <return variable(s)>

Functions can be defined at any point in the code, and called at any subsequent point.

**Note:** 

1. indentation following the `def` line!
2. the return statement is not required if you don't want to return anything!

In [74]:
def addfive(x):
    return x + 5

**If you want to run the function, you need to call the function. You can use either positional argument or keyword argument**

In [75]:
addfive(3) # positional argument without x

8

In [None]:
addfive(x = 3) # keyword argument with x

**recap on the difference between interactive output and the real `print` function**

In [None]:
def addfive_with_print(x):
    print(x+5)
    return x+5

In [None]:
addfive_with_print(3)

In [None]:
a = addfive_with_print(3)

In [None]:
a

**Note that function stops after running `return`**

In [None]:
def addfive_with_print_wrong(x):
    return x + 5
    print(x + 5) # print will not work here!!!

why is nothing printed out when you run the cell below?

In [None]:
a = addfive_with_print_wrong(3)

## More on function inputs and outputs

Functions can have multiple input and output values.

In [1]:
def sasos(a, b, c):
    res1 = 2*a + b + c
    res2 = a**3 + b**2 + c**2
    return res1, res2

In [2]:
# note that "return res1, res2" in the function above will return a tuple
s = sasos(1,2,3) # positional argument
s

(7, 14)

In [None]:
s = sasos(a = 1, b = 2, c = 3) # keyword argument
s

In [None]:
s = sasos(c = 3, b = 2, a = 1) # keyword argument with a new sequence
s

In [None]:
s = sasos(3, 2, 1) # positional argument with a new sequence
s 

In [None]:
# unpack the tuple
s1, s2 = sasos(1, 2, 3)
print(s1)
print(s2)

In [None]:
# You can assign the output first to a single variable and then unpack later
# but this needs more lines of code
s = sasos(1, 2, 3)
s1 = s[0]
s2 = s[1]
print(s1)
print(s2)

Functions can have variables with default values.

In [4]:
def powsum(x, y, z, a=1, b=2, c=3):
    return x**a + y**b + z**c # x to the power of a, y to the powe of b, and z to the power of c

In [5]:
powsum(2, 3, 4) # positional argument with all default a, b, c values

75

In [6]:
powsum(2, 3, 4, b=5) # positional argument with new keyword argument b

309

In [7]:
powsum(z=2, c=2, x=3, y=4) # all arguments are keyword arguments

23

---
### *Exercise*

* Verify `powsum(z=2, x=3, y=4, c=2)` is the same as `powsum(3, 4, 2, c=2)`

* What happens when you do `powsum(3, 4, 2, x=2)`?  Why?


---

In [11]:
powsum(z=2,x=3,y=4,c=2) == powsum(3,4,2,c=2)


True

In [9]:
powsum(3,4,2,c=2)

23

In [10]:
powsum(3,4,2,x=2)

TypeError: powsum() got multiple values for argument 'x'

## Docstrings

You can add 'help' text to functions by adding a `docstring`(paired triple single-quotes or double-quotes), which is just a regular string, right below the definition of the function. **This should be considered a mandatory step in your code writing.**

> Do Something Today, That You'll Thank Yourself For Tomorrow.

In [12]:
def addfive(x):
    '''Return the argument plus five
    
    Input:  x
            A number
    
    Output: foo
            The number x plus five
    
    '''
    return x + 5

In [None]:
# now, try addfive?
# addfive?

In [13]:
addfive(3)

8

See [PEP-257](https://www.python.org/dev/peps/pep-0257/) for guidelines about writing good docstrings.

## Scope of variables
The scope of a variable refers to the segment of the code in which the variable is accessible.


* Variables that can be accessed throughout the program are called **global variables**. The variables we created up to now are almost all global variables (except inside of a function). <br><br>

* Variables with limited scope are called **local variables**. Variables defined inside a function have their scope limited to the function. They cannot be accessed from the outside and only exit in memory as long as the function executes. That is, all of the variables that are changed within the block of code inside a function are only changed within that block, and do not affect similarly named variables outside the function.

In [14]:
x = 5 # global variable

def changex(x):      # This x is local to the function
    y = 10           # This y is local to the function
    x = x + y         # h ere the local variable x is changed
    print('Inside changex, x =', x)
    return x

res = changex(x)    # supply the value of x in the 'global' scope

Inside changex, x = 15


In [15]:
print('The returned value is', res)          
print('Outside of changex, x =', x)            # The global x is unchanged 

The returned value is 15
Outside of changex, x = 5


In [None]:
# print(y) # y is inside of the changex function so it is not defined

Variables from the 'global' scope can be used within a function, as long as those variables are unchanged. This technique should generally only be used when it is very clear what value the global variable has, for example, in very short helper functions.

In [16]:
x = 5

def dostuffwithx():
    res = x + 5       # Here, the global value of x is used, since it is not passed to the function explicitly,
                      # or defined inside the function.
    return res

In [19]:
dostuffwithx()

10

**Tip**: Even if there is no argument in the function, you still need to use the round bracket following function's name when you run the function!

In [None]:
dostuffwithx

### Unpacking function arguments

You can provide a sequence of arguments to a function by placing a `*` in front of the sequence, like

    foo(*args)

This unpacks the elements of the **sequence** (like list or tuple) into the arguments of the function, in order.

**The advantage is that you don't need to extract the element from the sequence one by one when you pass those elements to the function arguments**.

In [20]:
# we define the powsum function here again
def powsum(x, y, z, a=1, b=2, c=3):
    return x**a + y**b + z**c # x to the power of a, y to the powe of b, and z to the power of c

In [21]:
# a quick example for unpacking
xyz = [1, 2, 3]
powsum(*xyz)

32

In [22]:
# the above is the same as:
powsum(xyz[0], xyz[1], xyz[2])

32

You can also **unpack dictionaries** as keyword arguments by placing `**` in front of the dictionary, like 

    foo(**kwargs)

In [23]:
powdict = {'a': 1, 'b': 2, 'c': 3}

powsum(1, 2, 3, **powdict)

32

In [24]:
# the above is the same as:
powsum(1, 2, 3, a=powdict['a'], b=powdict['b'], c=powdict['c'])

32

**Unpacking sequence (list or tuple) can be mixed with unpacking dictionary. E.g., `foo(*args, **kwargs)` works.**

In [None]:
xyz = [1, 2, 3]
powdict = {'a': 1, 'b': 2, 'c': 3}

powsum(*xyz, **powdict)

### Unpacking in other common cases

One common usage is using the builtin `zip` function to take a 'transpose' of a set of points.

In [25]:
names = ["Sam", "Tim", "Lidia"]
laptops = ["mac", "windows", "linux"]

In [26]:
zip(names, laptops)

<zip at 0x7ff609941200>

In [27]:
list(zip(names, laptops))

[('Sam', 'mac'), ('Tim', 'windows'), ('Lidia', 'linux')]

In [28]:
# the names list and the laptops list are contained in a single list
# how can we pair the name and the laptop and generate a new list?
names_laptops = [["Sam", "Tim", "Lidia"], ["mac", "windows", "linux"]]

In [29]:
list(zip(*names_laptops)) # this is exactly the same with list(zip(names, laptops))

[('Sam', 'mac'), ('Tim', 'windows'), ('Lidia', 'linux')]

In [30]:
# the above is the same as:
list(zip(names_laptops[0], names_laptops[1]))

[('Sam', 'mac'), ('Tim', 'windows'), ('Lidia', 'linux')]

# Reading and writing text files

There are many different file formats. Some data are stored in a specialized binary format. But there are also many datasets that are simple text files. Basic text file commands are included in the Python core language.

**This is the screenshot of the `02_GPS.dat` file**

![data](img/02_GPS_screenshot.png)

In [31]:
# open the "02_GPS.dat" data file created by a handheld GPS unit in the data folder
f = open('data/02_GPS.dat')

**Tip**: later when we are done with the file, we would close it with the `f.close()` command (note that `f` is what you define yourself).           


### *Exercise*

* Use `TAB` completion to explore the different attributes and methods of the `f` object.

* We will use the `f.readlines()` method to obtain all of the lines in the file. Run `f.readlines()` yourself and see what this command returns. Run it twice and see what happens.


In [32]:
f.readlines()

["Grid\tLat/Lon hddd°mm.mmm'\n",
 'Datum\tWGS 84\n',
 '\n',
 'Header\tName\tStart Time\tElapsed Time\tLength\tAverage Speed\tLink\n',
 '\n',
 'Track\tACTIVE LOG\t5/20/2006 1:34:55 PM \t01:56:53\t4.80 mi\t2.5 mph\t\n',
 '\n',
 'Header\tPosition\tTime\tAltitude\tDepth\tLeg Length\tLeg Time\tLeg Speed\tLeg Course\n',
 '\n',
 'Trackpoint\tN42 49.820 W70 45.415\t5/20/2006 1:35:10 PM \t16 ft\t\t17 ft\t00:00:15\t0.76 mph\t200° true\n',
 'Trackpoint\tN42 49.821 W70 45.408\t5/20/2006 1:35:25 PM \t15 ft\t\t30 ft\t00:00:15\t1.4 mph\t75° true\n',
 'Trackpoint\tN42 49.824 W70 45.400\t5/20/2006 1:35:40 PM \t19 ft\t\t38 ft\t00:00:15\t1.7 mph\t66° true\n',
 'Trackpoint\tN42 49.825 W70 45.393\t5/20/2006 1:35:55 PM \t18 ft\t\t35 ft\t00:00:15\t1.6 mph\t77° true\n',
 'Trackpoint\tN42 49.824 W70 45.379\t5/20/2006 1:36:10 PM \t24 ft\t\t64 ft\t00:00:15\t2.9 mph\t97° true\n',
 'Trackpoint\tN42 49.821 W70 45.370\t5/20/2006 1:36:25 PM \t19 ft\t\t43 ft\t00:00:15\t2.0 mph\t111° true\n',
 'Trackpoint\tN42 49.821 W

In [33]:
# This sets the pointer back to the beginning of the file. This allows us to run this
# block of code many times without reopening the file each time.

f.seek(0) # important!

0

In [34]:
# now we run f.readlines()
f.readlines()

["Grid\tLat/Lon hddd°mm.mmm'\n",
 'Datum\tWGS 84\n',
 '\n',
 'Header\tName\tStart Time\tElapsed Time\tLength\tAverage Speed\tLink\n',
 '\n',
 'Track\tACTIVE LOG\t5/20/2006 1:34:55 PM \t01:56:53\t4.80 mi\t2.5 mph\t\n',
 '\n',
 'Header\tPosition\tTime\tAltitude\tDepth\tLeg Length\tLeg Time\tLeg Speed\tLeg Course\n',
 '\n',
 'Trackpoint\tN42 49.820 W70 45.415\t5/20/2006 1:35:10 PM \t16 ft\t\t17 ft\t00:00:15\t0.76 mph\t200° true\n',
 'Trackpoint\tN42 49.821 W70 45.408\t5/20/2006 1:35:25 PM \t15 ft\t\t30 ft\t00:00:15\t1.4 mph\t75° true\n',
 'Trackpoint\tN42 49.824 W70 45.400\t5/20/2006 1:35:40 PM \t19 ft\t\t38 ft\t00:00:15\t1.7 mph\t66° true\n',
 'Trackpoint\tN42 49.825 W70 45.393\t5/20/2006 1:35:55 PM \t18 ft\t\t35 ft\t00:00:15\t1.6 mph\t77° true\n',
 'Trackpoint\tN42 49.824 W70 45.379\t5/20/2006 1:36:10 PM \t24 ft\t\t64 ft\t00:00:15\t2.9 mph\t97° true\n',
 'Trackpoint\tN42 49.821 W70 45.370\t5/20/2006 1:36:25 PM \t19 ft\t\t43 ft\t00:00:15\t2.0 mph\t111° true\n',
 'Trackpoint\tN42 49.821 W

**Extract info from the text:**

Our goal is to extract the latitude of each trackpoint in the unit of degree. For each latitude value, we have both  arc degree and arc minute in the original file. Hence, to get the latitude in the unit of degree, we need to convert the arc minute to arc degree and then add it to the original arc degree.

In [35]:
f.seek(0) # important!

0

In [36]:
# get all the text in the data file
lines_all = f.readlines()

In [84]:
lines_all

["Grid\tLat/Lon hddd°mm.mmm'\n",
 'Datum\tWGS 84\n',
 '\n',
 'Header\tName\tStart Time\tElapsed Time\tLength\tAverage Speed\tLink\n',
 '\n',
 'Track\tACTIVE LOG\t5/20/2006 1:34:55 PM \t01:56:53\t4.80 mi\t2.5 mph\t\n',
 '\n',
 'Header\tPosition\tTime\tAltitude\tDepth\tLeg Length\tLeg Time\tLeg Speed\tLeg Course\n',
 '\n',
 'Trackpoint\tN42 49.820 W70 45.415\t5/20/2006 1:35:10 PM \t16 ft\t\t17 ft\t00:00:15\t0.76 mph\t200° true\n',
 'Trackpoint\tN42 49.821 W70 45.408\t5/20/2006 1:35:25 PM \t15 ft\t\t30 ft\t00:00:15\t1.4 mph\t75° true\n',
 'Trackpoint\tN42 49.824 W70 45.400\t5/20/2006 1:35:40 PM \t19 ft\t\t38 ft\t00:00:15\t1.7 mph\t66° true\n',
 'Trackpoint\tN42 49.825 W70 45.393\t5/20/2006 1:35:55 PM \t18 ft\t\t35 ft\t00:00:15\t1.6 mph\t77° true\n',
 'Trackpoint\tN42 49.824 W70 45.379\t5/20/2006 1:36:10 PM \t24 ft\t\t64 ft\t00:00:15\t2.9 mph\t97° true\n',
 'Trackpoint\tN42 49.821 W70 45.370\t5/20/2006 1:36:25 PM \t19 ft\t\t43 ft\t00:00:15\t2.0 mph\t111° true\n',
 'Trackpoint\tN42 49.821 W

In [37]:
lines_all[0]

"Grid\tLat/Lon hddd°mm.mmm'\n"

In [85]:
lines_all[1]

'Datum\tWGS 84\n'

In [86]:
lines_all[2]

'\n'

In [88]:
lines_all[10]

'Trackpoint\tN42 49.821 W70 45.408\t5/20/2006 1:35:25 PM \t15 ft\t\t30 ft\t00:00:15\t1.4 mph\t75° true\n'

now we use `.split` to separate the words in the list

In [89]:
lines_all[10].split()

['Trackpoint',
 'N42',
 '49.821',
 'W70',
 '45.408',
 '5/20/2006',
 '1:35:25',
 'PM',
 '15',
 'ft',
 '30',
 'ft',
 '00:00:15',
 '1.4',
 'mph',
 '75°',
 'true']

In [90]:
lines_all[10].split()[0]

'Trackpoint'

In [91]:
lines_all[10].split()[1]

'N42'

In [92]:
lines_all[10].split()[1][1:]

'42'

In [93]:
# we get the degree in numeric values
float(lines_all[10].split()[1][1:])

42.0

In [94]:
lines_all[10].split()[2]

'49.821'

In [95]:
# we get the minute in numeric values
float(lines_all[10].split()[2])

49.821

In [96]:
# we convert the minute to degree
float(lines_all[10].split()[2])/60

0.8303499999999999

In [97]:
# join them together
float(lines_all[10].split()[1][1:]) + float(lines_all[10].split()[2])/60

42.83035

In [98]:
# since lines_all[10].split() is duplicated in the code, we can first create a variable to store this value
data = lines_all[10].split()
float(data[1][1:]) + float(data[2])/60

42.83035

now let's try to output the latitude of each trackpoint:
**the key is to find the line that contains the latitude**

In [38]:
# use for loop to get the latitude and longitude for each trackpoint line
for line in lines_all:        # iterate over each line in the file. Each line is a string.
    data = line.split()       # split the line of text into words, each separated by spaces or tabs        
    if data[0] == 'Trackpoint':   # We only want to consider lines that begin with 'Trackpoint', as these hold the data
        print(float(data[1][1:]) + float(data[2])/60.0) # print the latitude

IndexError: list index out of range

**Why do we see the error?**

There might be something wrong inside the for loop!

In [None]:
# lines_all[0].split()

In [None]:
# lines_all[0].split()[0]

In [None]:
# lines_all[2].split()

In [None]:
# lines_all[2].split()[0]

**The right way:**

In [39]:
# use for loop to get the latitude and longitude for each trackpoint line
for line in lines_all:        # iterate over each line in the file. Each line is a string.
    data = line.split()       # split the line of text into words, each separated by spaces or tabs
    if data == []:            # Test for an empty list, the same as "if not data"
        continue         
    if data[0] == 'Trackpoint':   # We only want to consider lines that begin with 'Trackpoint', as these hold the data
        print(float(data[1][1:]) + float(data[2])/60.0) # print the latitude

42.830333333333336
42.83035
42.8304
42.830416666666665
42.8304
42.83035
42.83035
42.83035
42.83026666666667
42.83011666666667
42.8299
42.82973333333333
42.8296
42.829683333333335
42.829766666666664
42.82966666666667
42.82945
42.829166666666666
42.828916666666665
42.82865
42.82835
42.82805
42.8278
42.827533333333335
42.827283333333334
42.82705
42.82685
42.82666666666667
42.82641666666667
42.826233333333334
42.826033333333335
42.82581666666667
42.8256
42.82541666666667
42.82521666666667
42.824983333333336
42.82476666666667
42.82455
42.82438333333333
42.8242
42.823933333333336
42.8237
42.82353333333333
42.82335
42.82313333333333
42.82295
42.82275
42.82256666666667
42.82236666666667
42.82215
42.82196666666667
42.821783333333336
42.821616666666664
42.82145
42.821266666666666
42.82106666666667
42.82086666666667
42.82073333333334
42.820616666666666
42.820433333333334
42.820283333333336
42.82013333333333
42.81998333333333
42.81985
42.8197
42.8195
42.819316666666666
42.81923333333334
42.8192333

In [40]:
# remember to close the file
f.close()

**Now let's put everything together!**

In [41]:
# open the file
f = open('data/02_GPS.dat')

# go to the first line
f.seek(0)

# get all the text in the data file
lines_all = f.readlines()

# use for loop to get the latitude and longitude for each trackpoint line
for line in lines_all:        # iterate over each line in the file. Each line is a string.
    data = line.split()       # split the line of text into words, each separated by spaces or tabs
    if data == []:            # Test for an empty list, the same as "if not data"
        continue         
    if data[0] == 'Trackpoint':   # We only want to consider lines that begin with 'Trackpoint', as these hold the data
        print(float(data[1][1:]) + float(data[2])/60.0) # print the latitude
        
# close the file
f.close()

42.830333333333336
42.83035
42.8304
42.830416666666665
42.8304
42.83035
42.83035
42.83035
42.83026666666667
42.83011666666667
42.8299
42.82973333333333
42.8296
42.829683333333335
42.829766666666664
42.82966666666667
42.82945
42.829166666666666
42.828916666666665
42.82865
42.82835
42.82805
42.8278
42.827533333333335
42.827283333333334
42.82705
42.82685
42.82666666666667
42.82641666666667
42.826233333333334
42.826033333333335
42.82581666666667
42.8256
42.82541666666667
42.82521666666667
42.824983333333336
42.82476666666667
42.82455
42.82438333333333
42.8242
42.823933333333336
42.8237
42.82353333333333
42.82335
42.82313333333333
42.82295
42.82275
42.82256666666667
42.82236666666667
42.82215
42.82196666666667
42.821783333333336
42.821616666666664
42.82145
42.821266666666666
42.82106666666667
42.82086666666667
42.82073333333334
42.820616666666666
42.820433333333334
42.820283333333336
42.82013333333333
42.81998333333333
42.81985
42.8197
42.8195
42.819316666666666
42.81923333333334
42.8192333

**Now, lets write the converted latitudes to a file**

To write a file, open it with the `'w'` flag, which specifies the file as writable. This example shows how to write the latitude with a specific format (**2 decimal places**) to a file called "gps_out.dat" in the data folder. [Learn more about string formatting from the python documentation](https://docs.python.org/3/library/string.html#format-string-syntax)

In [43]:
# open the output file
fout = open('data_out/gps_out_lat.txt', 'w')

# open the file containing the input
f = open('data/02_GPS.dat')

# go to the first line
f.seek(0)

for line in f.readlines():        # iterate over each line in the file. Each line is a string.
    data = line.split()           # split the line of text into words, each separated by spaces
    if data == []:                # Test for an empty list
        continue         
    if data[0] == 'Trackpoint':   # We only want to consider lines that begin with 'Trackpoint', as these hold the data
        lat = float(data[1][1:]) + float(data[2])/60.0
        
        # write to the fout file with 2 decimal places (don't forget the newline character \n)
        fout.write("{0:.2f}\n".format(lat)) # print the latitude with 2 decimal places            

f.close()
fout.close()


### *Exercise*

> Following the above tutorial, write the longitude with 2 decimal places and save the output to the 'data_out' folder with a filename of "gps_out_lon.txt" 

> Test "{0:.2f}\n" vs. "{0:.2f}" when you write to the file.