# Premise - If shown a dirt path that provides a quicker route to perform a task, dedicated engineers will turn it into a highway.

In [1]:
import civilpy

# Why Python

Because python provides a simple interface to relay information, it's easy to use, easy to setup pipelines and integrations, it's not sandboxed, it's repeatable and explicit, and many of the tools we rely on were developed alongside modern tools similar to it or actually with it, and due to that, a large portion of the work is already done.

### Can We Use VBA/Excel?

Not for most of these tasks as you'll see, it's too sandboxed. It's constrained to the functions that Microsoft allows it to do, which often don't allow for direct manipulation of certain external file or data types and/or certain remote connections and functions. I assume it's extensible, but I hate writing stuff in it, so if anyone could get this stuff working in it I'd happily swap for it$^1$, but I legitimately don't believe it's entirely possible, an efficient use of resources, or that the individual willing to attempt it exists within the organization. Sandboxing standard users who may damage a system is a good thing, sandboxing integration tooling, APIs and power users is a bad thing.

1. (Thats a lie)

### What about C# and the .NET ecosystem?

I don't personally know it, I have Java experience because I ran scripts on video games with it as a kid and wrote some android apps. I don't know if the GIS, geospatial and engineering libraries are available for it, like I know they are in python, which is heavily used in academic research. From what I've read and understand, it's not as cross-platform or web friendly, and it's difficult to train engineering employees on C style coding, memory management and pointers, none of which are needed to effectively 'use' python. I would be willing to learn it, but I don't think the organization would be willing to expend the resources to enable that or that it would significantly change any outcomes.

I know the general consensus I've seen online, [as well as my personal experience](https://daneparks.com), is that it's much simpler to design and publicly host a webapp using other open source solutions (ruby-on-rails, Nginx, Go, and JS), than it is to compile a functional application into Microsoft's ecosystem which is complex. Most importantly, throughout my learning experience, python leaning open-source communities have shown to be much more focused on making high quality, free training available to people trying to learn new tools while most .NET courses required payment, which isn't an option for people looking to explore "potential" avenues that might not work out and I think is a bad indicator for the community. I firmly believe the most valuable aspect of python is the number of people who have tried to learn it and it's resulting ability to be learned quickly with a combination of google and [especially youtube](https://www.youtube.com/user/schafer5).

Utimately if I wanted to modify an existing Windows application, C#/.NET would probably be the way to go, but 3rd party companies (Bentley) are often unsupportive of those types of solutions, and in my experience it's generally easier to circumvent 3rd party providers entirely than it is to get them to bespoke their software to you or try to get them to support an API.

### What's The Difference?

Mostly design philosphy, for years, the prevalent design philosphy for programming user interfaces [has been the MVC](https://developer.mozilla.org/en-US/docs/Glossary/MVC#:~:text=MVC%20(Model%2DView%2DController,software's%20business%20logic%20and%20display.) or model view controller. Which is the underlying philosphy most applications still use today.

![MVC Layout](https://upload.wikimedia.org/wikipedia/commons/a/a0/MVC-Process.svg)

### Web/Cloud Development

Web developers have preferred to use the terms 'frontend' or 'backend' to describe the systems. Basically the takeaway is the code that interacts with the server-side processor is written differently and has a different purpose to the code that works with the client(user)-side processor. User interface languages would be HTML/CSS/JS, things that are rendered or data that is transformed on the "client" side of the connection. Server side languages would be ruby on rails, python, Java, PHP.

## Client Vs. Server

#### Should Civil Applications be written from a client side perspective or a server? 

The main issue you run into is compatibility. Launching a dedicated app with a fully fleshed out accessible, cross-platform compatible interface requires a decent understanding of your user devices, and use cases, so that you can tailor your code to only features available on your user's devices. If instead you write your applications and libraries with a specific server environment in mind, or basically "device agnostic" you don't have to worry as much about it.  You may realize this means that you could loose direct access to say, the Iphone 8 camera's API for taking photos, but a workaround for this would be making the application work with existing photos already on the device, and having the application open the photo application for the user to point it to the correct folder on the device to interact with the photos. Although a bespoke Iphone app written in swift by a professional team of apple developers would obviously be better than anything I could write, I haven't seen those getting created at any of the organizations I've been a part of, and I can't see the result being cost effective or well-integrated. I can personally provide a (relatively) cross-platform solution to a number of issues with python today.

#### So Why Server Side Tools

Modern server and database software is insanely powerful, in the server world the open-source packages are legitimately superior for many applications, with the advent of IoT, linux based servers like Ubuntu or RHEL have exploded in popularity due to users being able to throw them on any device without worrying about license management. Tiny core linux is 11 MB total, the entire operating system is 11 MB, you could put it on a microusb and integrate it into a streetlamp. Obviously simplicity generally comes with reduced features, but one of the features most if not all do come with is python, further the "import only what you need" mentality ensures these systems always run efficiently, and are rarely bogged down. Due to this, certain sectors have organically grown to basically depend on certain python libraries and other open source packages. This isn't without it's headaches, but it's better than not defining a standard at all. GIS, Data Science and Robotics all have very developed python libraries written by skilled professionals from that space that learned how to program. Nothing is stopping roadway, structural or hydraulic engineering from developing the same kinds of tools today, except existing policies geared towards closed and incompatible workflows. 

Software written in python may be able to be improved or rewritten in other languages to be made to be more efficient. For now, I think it's important to get working software tools and standards created, which we could optimize later. The fact is currently, the tools don't exist at all, some not even in a disjointed way. It's easier to teach an engineer python than to teach them .NET, and it's more realistic to teach an engineer how to __USE__ python tools than to teach a .NET developer the full workflow and duties of a roadway engineer.

#### So what's the difference between client and server applications?

Primarily the resources available to the program, the permissions it operates under, and the APIs it has access to with regard to other software.  When you run a jupyter notebook, it boots up a server on your local computer you're able to connect to with your webbrowser as the client. This process is device agnostic and works on windows, mac and linux machines. Basically notebooks replace the view and controller in the MVC diagram, giving the user direct access to the server side model. The model can be anything, a .txt file, a .csv file, a .pdf, a database connection, a .jpg, a .dwg, a datastream or a webhook. This is what seperates python from Excel/VBA, it's ability to seamlessly function and integrate with essentially any 'model' or data type on any kind of system with little to no work on the part of the developer.

Since the notebook server is being run on your local machine, there's no external access, an alternative to this would be to have a centralized server where individuals had their own user profiles, as well as shared folders between all users, network folders also accomplish the same purpose. Generally this can be done in a way that builds off the existing userprofiles and permissions that already exist. If each user is given an account on a centralized server, they could open the terminal and run the following command to access the server.

`ssh -NL [PORT]:localhost:[PORT] [USER]@[SERVERNAME]`

where `[PORT]` is replaced by the port the jupyter server is listening on, `[USER]` is replaced by the user name (so for me `dparks1`) and server name would be the name of the local server listening for locations. This server could be set up to accept internal connections only, meaning organization tools/data would stay internal to the organization, which isn't true of modern cloud systems.

If they were to then open their web browser, and navigate to `http://localhost:[PORT]` in their browsers URL, they would find themselves within a jupyter notebook environment within the server.  

#### Two design philosphies to help with good python programming

four original rules from bell systems;

1. Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".
2. Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
3. Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them.
4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.

and, __The zen of python__

1. Beautiful is better than ugly.
1. Explicit is better than implicit.
1. Simple is better than complex.
1. Complex is better than complicated.
1. Flat is better than nested.
1. Sparse is better than dense.
1. Readability counts.
1. Special cases aren't special enough to break the rules.
1. Although practicality beats purity.
1. Errors should never pass silently.
1. Unless explicitly silenced.
1. In the face of ambiguity, refuse the temptation to guess.
1. There should be one-- and preferably only one --obvious way to do it.
1. Although that way may not be obvious at first unless you're Dutch.
1. Now is better than never.
1. Although never is often better than *right* now.
1. If the implementation is hard to explain, it's a bad idea.
1. If the implementation is easy to explain, it may be a good idea.
1. Namespaces are one honking great idea -- let's do more of those

Neither of these are meant to be hard and fast rules, they're simply intended to direct your decision making at a general level.

## What's an API?

Application program interface, [essentially the "list of commands"](https://www.youtube.com/watch?v=s7wmiS2mSXY) available to be run within the development evironment. APIs are commonly thought of in terms of client server relationships, the most famous of which being http methods (get, post) that the internet and webservers fuction on. Engineering schools were still teaching Putty/HTML coding when I was there. It seems like two missing elements have contributed to the decline of the modern civil engineers technical skillset. I believe these to be the lack of a firm understanding of Object Oriented Programming, and various protocols, like HTTP, that have effectively built the digital world but have largely functioned in the background and have been ignored by our field entirely. They truely are as critical to our modern roles as any individual formula taught in statics, they're more modern building blocks of the engineering systems we function in.

## Object Oriented Programming

I don't know why but it felt like every college level math class I took started out with a review of triginometry, so here's the trig derivatives,

![image.png](attachment:image.png)

Now that you're set let's get into object oriented programming! The reason I'm demoing in python is because it's accessible to a beginner, as an engineer I assume you've used matlab or definately excel. Functions in python work similiarly to excel, but the important thing to realize is that everything is an 'Object'.

In [2]:
type(3)

int

in python there is a function called `type()` this function returns the type of data that you passed into it, `int` is short for integer, because the number 3 is a whole number without a decimal representation. Excel actually also has the `TYPE()` function which does exactly the same thing and returns one of five categories,

```Excel
=TYPE(100) // returns 1
=TYPE('apple') // returns 2
=TYPE(TRUE) // returns 4
=TYPE({1;2;3}) // returns 64
```

Right away we can see one of the common criticisms of working in excel, rather than returning "Human Readable" results to me, it's returning a code which requires decrypting, if I want to know what those results are while looking at the sheet, a common solution would be to provide the following formula in another cell.

```Excel
=IF(TYPE(A1)=1,"Number",IF(TYPE(A1)=2,"Text",IF(TYPE(A1)=4,"Logical Value",IF(TYPE(A1)=16,"Error Value","Array"))))
```

and if you think that's fun and cool... idk what to tell you.

Even though that entire function is completely unneccesary in python because it returned something readable to you in the first place, here's what it would look like;

```python
if type(A1) == 1:
    print('Number')
elif type(A1) == 2:
    print('Text')
elif type(A1) == 4:
    print('Logical Value')
elif type(A1) == 16:
    print('Error Value')
else:
    print('Array')    
```

which still isn't great to be honest, but at least it's a lot easier to read and understand from a review standpoint.

### Python vs. C#

I occasionally have to diverge from my overarching theme to address what I think people will bring forward as valid criticisims of python, some would say that not having the objects fixed to a specific type causes issues, and they are correct, and if that is happening, the code can be modified to be statically typed if needed. Another argument against python is that it is slow, which can also be true, but it's much much faster with big data than excel for example, because there's less "computational overhead", if you were to run into an issue where the speed of python was causing issues, those particular functions could always be rewritten in a faster language, or pushed to a faster server with more computational resources available, there are always multiple solutions to a problem, which is why we would be remiss to limit ourselves strictly to one.

Why not use a language that's more "professional" or 'secure'? Here is how you tell a terminal to display the text `Hello World!` in the terminal in C#;

``` C#
// Hello World! program
namespace HelloWorld
{
    class Hello {         
        static void Main(string[] args)
        {
            System.Console.WriteLine("Hello World!");
        }
    }
}
```

and here is the same program in python,

``` python
print('Hello World!')
```

This is purely coming from a "What are we trying to get from this system?" perspective, I have no ill will towards C# developers, or stand-alone applications in general, for civil engineering needs, I'm theorizing we can assume a slightly more talented user base, but not people able to commit the resources C# requires, where we can mostly focus on in-memory manipulation of data, and displaying critical information to the engineer in as few steps as possible, while still giving them access to all the 'software functions' they need in their current workflow, via a command line interface.

### Types in python

Variables can store data of different types, and different types can do different things.

Python has the following data types built-in by default, in these categories:

Text Type:	`str`  
Numeric Types:	`int`, `float`, `complex`  
Sequence Types:	`list`, `tuple`, `range`  
Mapping Type:	`dict`  
Set Types:	`set`, `frozenset`  
Boolean Type:	`bool`  
Binary Types:	`bytes`, `bytearray`, `memoryview`  
None Type:	`NoneType`  

to assign a variable in python, use the following syntax;

In [3]:
x = 3

Since we're working in a notebook, that value is available within any cell, similar to how matlab works.

In [4]:
print(x)

3


Rember how everything is an object? Objects can have attributes and functions, but generally, objects of one type will have different attributes and functions based on the type of object they are. This is a concept known as __inheritance__. This is one of the ways object oriented languages help us to quickly build models of the real world. One of the primary object types is a string, or text object;

In [5]:
x_as_string = '3'
print(x_as_string)

3


Because we used the `'` character around the number three, python read the value as text instead of as an integer. Loading objects as strings instead of integers can have unintended consequesences, when you apply a function meant for one data type to another, for instance;

In [6]:
print(x * 3)

9


In [7]:
print(x_as_string * 3)

333


So as you can see, when you apply the `*` operator to two numbers in python, it performs the simple math multiplication function, however when applied to a string, it repeats the character 3 times. [Here's the other operators in python and their effects](https://www.w3schools.com/python/python_operators.asp). Its for sure going to do wacky stuff that frustrates you at first, but it's far far more likely for those things to result in fatal errors that crash your cell and result in an error, rather than minor errors that pass unnoticed.

In [8]:
x.replace('3', '15')

AttributeError: 'int' object has no attribute 'replace'

In [9]:
x_as_string.replace('3', '15')

'15'

Beyond the basic number and text values, python has advanced data types that provide powerful functions, one of my favorites to implement is a `dict` or dictionary object which has the ability to rapidily determine values based on previously defined definitions. Remember the excel if statement from before?

In [10]:
excel_type_conversion_dict = {
    1: 'Number',
    2: 'Text',
    4: 'Logical Value',
    16: 'Error Value',
    64: 'Array'
}


excel_type_conversion_dict[4]

'Logical Value'

If you pass the value of `4` into the `excel_type_conversion_dict` it returns the corresponding value, effectively replacing a very long `if/elif` statement and extending it indefinately if more options are needed, python has no problem quickly parsing through dicts with thousands of keys and values. You'll notice that I'm using long, explicit variable names that describe what the object actually contains, that's because python programmers _want_ people to read and check their code. Now that this dict has been defined once, it's very easy to copy and paste it, or import the definitions into other python software. Better yet, people familiar with the JSON standard may realize that a python dict looks pretty similar to a json file. Python contains libraries to load json files directly into memory in less lines of code than our C# hello_world from earlier.

Lots of data is available directly online in json format, for instance [fips conversions](https://gist.github.com/wavded/1250983/bf7c1c08f7b1596ca10822baeb8049d7350b0a4b#file-statecodetofips-json) using just a link to the file, python can load that file into memory. With the correct organizational controls (dedicated urls) we could have standards at dedicated locations for applications to pull the most up to date standard from. Sharepoint doesn't do this very well, every Linux and open source package repository does, usually by using `url/latest` or by keeping old revisions under different urls.

In [11]:
from urllib.request import urlopen
import json

url = "https://gist.githubusercontent.com/wavded/1250983/raw/bf7c1c08f7b1596ca10822baeb8049d7350b0a4b/stateToFips.json"

response = urlopen(url)
fips_data_json = json.loads(response.read())

fips_data_json['Ohio']

'39'

## How can python simplify this process?

By encouraging development of standardized file systems and checks, as well defining data structures at a fundamental level in a way the organization controls. One of the core types of software is a relational database. In the majority of engineer's head's they picture a spreadsheet, which is slightly different. 

In [12]:
creds = json.load(open("../secrets/secrets.json", 'r'))
creds

{'PG_UN': 'dane',
 'PG_DB_NAME': 'civilpy',
 'SSH_PORT': 2271,
 'PG_DB_PW': 'oDbVx%TuuW8r^SXACy#EH7dtovIs$S',
 'SSH_PKEY': 'C:\\Users\\drpar\\.ssh\\id_rsa',
 'SSH_USER': 'dane',
 'SSH_HOST': 'daneparks.com',
 'DB_HOST': 'localhost',
 'LOCALHOST': 'localhost',
 'PORT': 5432}

In [13]:
import psycopg2 as pg
from sshtunnel import SSHTunnelForwarder

def ssh_into_postgres(creds):
    """
    Function to open an ssh tunnel directly to a postgres database to gather
    data from it

    :param creds: dictionary of necessary parameters to connect to the database
    :return:
    """
    try:
        ssh_tunnel = SSHTunnelForwarder(
            (creds["SSH_HOST"], creds["SSH_PORT"]),
            ssh_username=creds["SSH_USER"],
            ssh_private_key=creds['SSH_PKEY'],
            ssh_private_key_password=creds["SSH_PKEY"],
            remote_bind_address=(creds["DB_HOST"], creds['PORT'])
        )

        ssh_tunnel.start()

        conn = pg.connect(
            host=creds["LOCALHOST"],
            port=ssh_tunnel.local_bind_port,
            user=creds["PG_UN"],
            password=creds["PG_DB_PW"],
            database=creds["PG_DB_NAME"]
        )

        return conn

    except:
        print("Connection Failed, ensure you have the correct values in the secrets/secrets.json file")

In [14]:
conn = ssh_into_postgres(creds)

In [None]:
try:
    ssh_tunnel = SSHTunnelForwarder(
        (creds["SSH_HOST"], creds["SSH_PORT"]),
        ssh_username=creds["PG_UN"],
        ssh_private_key=creds['SSH_PKEY'],
        ssh_private_key_password=creds["SSH_PKEY"],
        remote_bind_address=(creds["DB_HOST"], creds['PORT'])
    )

    ssh_tunnel.start()

    conn = pg.connect(
        host=creds["LOCALHOST"],
        port=ssh_tunnel.local_bind_port,
        user=creds["PG_UN"],
        password=creds["PG_DB_PW"],
        database=creds["PG_DB_NAME"]
    )
except:
    pass

In [None]:
conn.execute()

In [15]:
ssh_tunnel = SSHTunnelForwarder(
        (creds["SSH_HOST"], creds["SSH_PORT"]),
        ssh_username=creds["SSH_USER"],
        ssh_private_key=creds['SSH_PKEY'],
        ssh_private_key_password=creds["SSH_PKEY"],
        remote_bind_address=(creds["DB_HOST"], creds['PORT'])
    )


In [16]:
ssh_tunnel.start()

In [17]:
conn = pg.connect(
            host=creds["LOCALHOST"],
            port=ssh_tunnel.local_bind_port,
            user=creds["PG_UN"],
            password=creds["PG_DB_PW"],
            database=creds["PG_DB_NAME"]
)

In [18]:
conn.execute("SELECT * FROM bridges")

AttributeError: 'psycopg2.extensions.connection' object has no attribute 'execute'

In [None]:
conn.close()
server.stop()

## Where Are The Issues/Hangups?

__3 Primary Pain Points to solve__

1. "Approved Software Lists"; they're difficult to maintain at a 'package' level, like impossibly difficult in open source without just blanket approving a parent package/product like "Ubuntu", which would implicity approve any dependency it's built on in perpetuity? Most organizational software policies contain confusing language like this, at least to me, it doesn't feel like these standards were written with the more modern modular tooling and software packages available today particularly in mind, it's written in regards to full, packaged applications which is an entirely different realm. 

    To alleviate some of the risk with this, open-source systems are generally designed to be more "ephemeral" and segmented. Some examples of this, with a large amount of popularity are "container" systems like Docker or Kubernetes. These systems rely on extensive API's which they try to keep as consistent as possible, as the products are changed and updated they're kept on release schedules and generally come with fantastic documentation.  


2. Lack of organizational standards pertaining to file systems; no matter what system environment, programming language, or software package we're talking about, certain general conventions apply.  Microsoft has traditionally been more willing to accomodate a wide variaty of file system hierarchies and in general, the ones I see are heavily catered to human use, and by extension very poorly designed for scripting or bots. Working in windows environments has historically inspired certain conventions, some of which I think are mostly still around as vestigial structures.

    Things like encoding documents with 15 digit numbers are ridiculous, nested file systems generated by a bot where 60% of the folders are never used are pervasive in our industry. Bad metadata, nothing checking it, inconsistent file names and folder locations due to human error. If you were to hand a professional programmer 90% of the file systems I've encountered on a drive they'd triple their hourly rate on the spot. We can't feign ignorance and act like it's not our mess as an industry to clean up. These are more often than not records and documents of critical structures, that with little effort could be preserved indefinately. I have zero faith that our industry across the board isn't destroying hundreds of gigabytes of unrecoverable, important data every day via poorly implemented content managment systems, entirely accidentally.


3. Not understanding the foundational principles of the tools we use every day. At some point our industry seemed content to push our work onto 3rd party software providers that have frankly, done a mostly terrible job and have increasingly displayed concerning behaviors at an instituional level. At a most basic level, engineers understand economics and competition of the free market place. There are two providers for CAD software in our industry. One of them is more expensive and has a wide variety of API's and tools available that can be integrated into their products easily and is generally more pleasant to work with. The other doesn't let you import an object from one of their software packages to another (OpenRoads to OpenBridges) and is overly complicated to the point that organizations struggle to maintain effective drafting evironments.

    Further on this point, the existance of all of our drafting/design data as .dgn, .dwg or other proprietary file formats presents an issue with archival records. CADD files are proprietary and can only be read by the system that created them. Many CADD files can't be read by earlier versions of __THE SAME SOFTWARE THAT CREATED THEM__ these companies are clearly making an attempt at "locking" their clients into their systems and subscription based services, which I've seen little to no pushback to from the general industry. I fully believe their attempts at locking in their clients have gotten to the point that it's clear we need to start making real effort towards dumping files into plain text formats, or at the very least a filesystem backup we have control over. As it stands with projectwise, we're using a CADD system to maintain non-CADD files, and we've lost access to being able to write scripts to access and manipulate our own organizational files like we previously did with shared filesystems and archival friendly formats, because they are "secured" in Bentley's environment. We should be flat out demanding these services, at a minimum, as a function of their software. All of these issues, weren't issues, until companies made a business decision to make them issues under the guise of 'improving' their products, they're significantly worse from a consumer perspective than even 5 years ago, and the added functions don't overcome the negative business practices. 

People may push back and say "Well what legitimate alternatives are available?" and there aren't any. I'm not telling you to stop using Autodesk or Bentley products at all, I'm especially not telling you to stop using filesystems or Windows or learn git and try to replace projectwise with it. What I'm saying, is that if you look into open source software, specifically [The unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy#:~:text=The%20Unix%20philosophy%20emphasizes%20building,as%20opposed%20to%20monolithic%20design.), and by extension the design philosphies of the various open-source packages they inspired. You'll see that it's really much more about keeping everything simple, and setting easily implementable standards that can be checked for at a glance. Better yet, they already set out and defined most of these general principles for complicated systems, and told us we can take them, so really all any modern engineer has to do is read them once and keep them in mind. This is a critical aspect to being an informed software consumer and really understanding what we're paying for, and what sacrifices we're making to use certain products.