<img src="https://www.geog.uni-heidelberg.de/md/chemgeo/geog/3dgeo/3dgeo_logo.jpg" width=150 height=100 /><img src="https://www.python.org/static/img/python-logo.png" width=250 height=100 />
## Geoscripting
Bernhard Höfle, Lukas Winiwarter, Katharina Anders: Institute of Geography, Heidelberg University

# Assignment 1

## General

<b style=color:red>Deadline</b>: **30.05.2022, 22:00 h**\
Upload your notebook in **one ZIP archieve** (e.g. test1_anders.zip) to moodle.

**Deliverables:**\
Provide your **source code of your scripts with every step commented** in the code! Your scripts should be **executable on other machines** (e.g., avoid "hardcoding" file names and paths) in order to be tested if they work.\
While the assignment will not be graded, it is necessary to **practice coding** througout the semester. You will be given **feedback** on your code to help you for the next assignments and the final test.\
**# Comment each step** in the source code so that a third person can easily follow what is going on in the code.\\
**How to proceed:**
1. How does the algorithm work? Think of the solution (e.g. equation). What is the "recipe" of making the Sacher cake? What are the ingredients, cooking steps and sequence of steps?
2. Note all necessary steps that have to be done to solve the problem (algorithm/workflow): <b>Split the (large) problem into smaller, easy-to-solve problems</b> (e.g. open file, read lines, ...)!
3. Write it down in Python source code. Step-by-step!\
**Start with parts you are familiar with** and add others later (e.g. downloading from an URL).

# 1) Harvest GPS Tracks from Web and export to KML

Write a script that **opens** a comma separated value (csv) ASCII file of a GPS track from a Web URL given by the user as **input**, reformats the input file and **writes a KML ASCII file**, which can be displayed in Google Earth or GIS software. In general, such tracks could be trajectories from geomorphological field campaigns or also GNSS reference data (e.g. of river courses). Thus, a quick view of the results after field campaigns can be made with this new script.\
-> From where to where is the GPS track going?\
Your script will answer it by watching the result in Google Earth or QGIS/ArcGIS.\
**The script will work like this:**
- The input URL and output KML file name should be <b>asked from the user</b>.
- Get GPS data in your script for testing from here: [https://heibox.uni-heidelberg.de/f/824751be6df74441860f/?dl=1](https://heibox.uni-heidelberg.de/f/824751be6df74441860f/?dl=1).
- The script will <b>take only every 5th coordinate pair</b> of the input file
- The script will be able to <b>catch an error (try/except)</b> of reading the data from the Web and inform the user about the error for stopping <b>exit()</b> the script.
- <b>Reformat the input ASCII file</b> lines and add the KML header and footer tor the output file. Lines have to be skipped in the input and other lines have to be added.
- Furthermore, <b>latitude and longitude have to be switched (!):</b> KML needs: long,lat,elevation.
- A simple KML file (with header and footer) looks like this. Note that there are <b>no spaces between coordinates</b>:

```html
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>GPS track</name>
<MultiGeometry>
<LineString>
<coordinates>
8.683964290,49.40520080,116.00000
8.683169590,49.40450620,118.00000
…
</coordinates>
</LineString>
</MultiGeometry>
</Placemark>
</kml>
```

**Hint:** Variables with text of multiple lines can be defined as:
```python
footer="""
</coordinates>
</LineString>
</MultiGeometry>
</Placemark>
</kml>
```


**Import modules:** This code block should be extended while writing your code so all modules needed will be imported here at the end

In [1]:
# Importing modules that are required for the script
import os
import urllib.request
import sys

Get the **required arguments (filenames)**, if they are not in the same directory as the notebook the directory should be changed:

In [2]:
# Print the current working directory (cwd)
print(os.getcwd())

# Change the cwd to the location of the script (this file)
os.chdir(".\\")

C:\Users\GrHalbgott\Downloads\geoscripting\Feedback1
C:\Users\GrHalbgott\Downloads\geoscripting\Feedback1


**Read** the data from the **URL**:

In [3]:
# Ignore (or use as test data source): https://heibox.uni-heidelberg.de/f/824751be6df74441860f/?dl=1

# Create a variable with the URL specified by the user
webURL = input("Enter the data source (URL):")

# Use this to open the URL and get the contents
webpage = urllib.request.urlopen(webURL)

# Read the contents of the URL and decode them as UTF-8, store this data in a variable
encode = 'UTF-8'
webtext = webpage.read().decode(encoding=encode)

# Print the first 247 characters of the data to check if the request worked (the lines are recognised because there are \n terms)
print(webtext[0:247])

Enter the data source (URL):https://heibox.uni-heidelberg.de/f/824751be6df74441860f/?dl=1
Latitude,Longitude,Elevation
48.18284160,16.37945650,200.0
48.18284000,16.37945000,200.0
48.18235000,16.37928000,203.0
48.18225000,16.37916000,204.0
48.18193000,16.37856000,209.0
48.18193000,16.37855000,209.0
48.18189000,16.37850000,209.0

Latitude,Longitude,Elevation
48.18284160,16.37945650,200.0
48.18284000,16.37945000,200.0
48.18235000,16.37928000,203.0
48.18225000,16.37916000,204.0
48.18193000,16.37856000,209.0
48.18193000,16.37855000,209.0
48.18189000,16.37850000,209.0



**Reformat** the input data that looks like this right now:

Latitude,Longitude,Elevation\
48.18284160,16.37945650,200.0\
48.18284000,16.37945000,200.0\
48.18235000,16.37928000,203.0\
48.18225000,16.37916000,204.0\
48.18193000,16.37856000,209.0\
...

First, add a counter for the line numbers and an empty string where the text for the output file should be written to:

In [None]:
# Create variable for a line counter
line_num = 0

# Ask for a name for the output file
kml_name = input("Enter a name for the output file (KML):")

# Variable with empty String is skipped, writing procedure is integrated in the following code

In the next step a loop should iterate over each line and 
1. decode the line as urllib puts out a byte stream
2. increment the counter for the current line while reading the input file
3. skip the header
4. select coordinates (every 5th pair!)
5. read the single columns separated by commas "," and remove new lines with `strip()` before
6. change the order of latitude and longitude to fit KML and add the line to the output text variable which stores the entire content

In [None]:
# Test if a new file with the name from above is writable and write into it when no error occurs
try:
    # Open new file with name from above
    outfile = open(".\\outfile.txt", "w") 
    # Iterate through the data and seperate the continuous data into columns and lines
    # Skip the first line
    for line in webtext.splitlines()[1:]: 
        # Increase the line counter by 1
        line_num += 1 
        # Check whether the line_number is not dividable by 5, only continue if true
        if line_num % 5 != 0: 
            continue
        # Split the data into columns at "," and clean up the data (no spaces, no empty lines)
        linevals = line.strip().split(",") 
        # Change the order of the list elements from 1 2 (3) to 2 1 (3)
        linevals[0], linevals[1] = linevals[1], linevals[0] 
        # Write the line values to the output file specified before
        outfile.write(str(linevals) + "\n") #
    # At the end, close the output file
    outfile.close() 
    print("File successfully written.")
# Do a sys.exit when an error occurs when writing into the file
except Exception as err:
    print("Could not write to file: ", err)
    sys.exit()

**Write** the output text to the **output file** and catch possibly occuring errors:

In [None]:
# integrated in the code above

**Define** Header and Footer for the KML file

In [None]:
# Hardcode the XML/KML header and footer with two variables 
header = """<?xml version="1.0" encoding=""" + encode + """?>
<kml xmlns="http://www.opengis.net/kml/2.2">
    <Placemark>
        <name>"""+ kml_name +"""</name>
        <MultiGeometry>
            <LineString>
                <coordinates>
"""
footer = """                </coordinates>
            </LineString>
        </MultiGeometry>
    </Placemark>
</kml>"""

Combine the **variables** to **write the output KML file**, be aware of the order in which the variables should be written to the file and do not forget to close the file:

In [None]:
# Create a new KML-file and open it in appending mode 
kml_outfile = open(".\\" + kml_name + ".kml", "a")
#Open the previous result file in reading mode
outfile = open(".\\outfile.txt", "r")
# Append the header to the KML-file
kml_outfile.write(header)
# Append the contents of the outfile into the KML-file
for line in outfile:
    kml_outfile.write(line)
# Append the footer to the KML-file
kml_outfile.write(footer)
# Close both files
outfile.close()
kml_outfile.close()

# Delete the outfile so there only is one result file
os.remove(".\\outfile.txt")

In the final step, the user should get **information** about the successful process:

In [None]:
# Show that the process has finished and give further info
print("Your KML-file '" + kml_name + ".kml" + "' is now ready.")
print("There were " + str(line_num) + " lines in total.")

# 2) Automatic ASCII File Merging - Laser Scanning Point Clouds

In practice you often get multiple ASCII files (e.g. tiles), which should be merged into one file. Having one file 
is much  easier to work with, e.g. in GIS programs or for visualization.  Given are multiple files representing 
different  objects  (e.g.  ground  surface,  tree)  as  **X Y Z R G B**  laser  points  (RGB  =  red  green  blue).  You  can 
visualize them and the final merged result using the open source software CloudCompare (installed in the PC 
lab or download [here](www.cloudcompare.org)).\
**Test data can be found in Moodle for manual download.**\
**The script will:**
- <b>Automatically</b> read all files with extension ".asc" in the user-given directory on your local computer and merge the files into one output file with file extension ".asc". If there is already an output file existing in the folder, it should not be merged and just be overwritten.
- Provide system arguments <b>(sys.argv)</b> at program call to read the (i) <b>input directory</b> and (ii) <b>output file</b> (i.e. merged ASCII file).
- The part of determining all files with the given extension should be put into a newly defined <b>function</b> `getAllFilesByExt (directory, fileext)` returning a list with all file names.
- Adapt you script such that the laser points of the file <b>"BigTree.asc"</b> representing the tree in the scene <b>are not written into the merged point cloud file</b>.
- The script will <b>measure the time</b> the merging took and print this info together with the <b>number of written points</b> at the end of the script.

**Hints:**
- Do not forget to change into directory of the files <b>(os.chdir())</b> before listing all elements in the directory.
- Either provide an absolute path for output file or set current working directory to path where output file should be written.
- With `filein.read()` the whole content of an ASCII file can be read into string variable and thus also written directly to the output without iterating over each line.
- If you want to geht the number of lines of string content, you can also look deeper into the function `splitlines()` combined with the function `len()`.
- View the resulting merged file in CouldCompare to verify the result.

**Import** all necessary modules:

In [None]:
import os
import sys
import glob
import time

**Module sys** is required for reading **program options and exit** in case something is wrong. **Module os** is required for **handling files and directories**. **Module time** is used for **time measurement**.

For this exercise a **function** should be written to get all files in the input directory. As input for the function `getAllFilesByExt` the **directory** has to be given and the **file extension** has to be defined. The **directrory** should be changed to the one given to the function and the files used should be saved in a **list**. The function can **iterate** over each element in the directory and check the extension of the file: If the file has the **right extension** the file should be written to the **list**. Finally the function should **retun the list** of the selected files.

In [None]:
# Print the current working directory (cwd)
print(os.getcwd())

# Change the cwd to the location of the script (this file)
os.chdir(".\\")

# Create empty list
allfiles = []

# Define function with two variables
def getAllFilesByExt(directory, fileext):
    # Search in directory for files with name defined by user input later on 
    file_list = glob.glob(directory + "*." + fileext)
    for file in file_list:
        allfiles.append(file.read())
    return allfiles

As some information from the user is needed concerning the **input directory and output file** that information should be stored, again if the file is not in the same directory as the notebook, the directory should be changed:

In [None]:
in_direc = ".\\" + input("Enter the directory of input data: ")
result_name = input("Enter the name of the output file: ")

Now that all information is collected the time measurement should be started:

In [None]:
start = time.time()
for i in range(1,1000000):
    pass

The function is created above, but not applied yet, so it should be **applied to the input directory and the extension** should be provided:

In [None]:
print(getAllFilesByExt(in_direc, "asc"))

Results should be written to the output file, so the file should be **openend in the directory of the input files**:

In [None]:
result = open(".\\" + result_name + ".asc", "w")

As information for the user should be provided a **counter for the total number of points** should be created:

In [None]:
total_pts = 0

A **loop over all files** will write the output immediatly to the output file. To do so the iteration must **check the filename** ("BigTree.asc" and the possibly already existing output file should not be used) and then **open and read the file**. The **counter** should be increased by the number of points in the file and the **content of the file should be written to the output file**. Do not forget to **close each file**:

In [None]:
# for i in range:
#    total_pts += len(file)
#    open(file, "r")

The last file to **close** should be the **output file**:

In [None]:
result.close()

Now the **time measurement** can stop as all necessary steps are done:

In [None]:
end = time.time()

The user should be provided **information** about how many points were written to which file and how long it took:

In [None]:
runtime = end - start

print("")
print("The whole process took %.3f sec." % (runtime))