<a href="https://colab.research.google.com/github/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/GSI2DXF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Convert GSI data to CAD drawing

[GSI](https://jaisonjustus.wordpress.com/2011/02/13/leica-geosystem-file-format-gsi-file/) is a file format used by Leica instruments. It is a text file with fixed field, but whose size but number of fields can be changed row by row. There are two variants of GSI, GSI8 and GSI16. The only difference between them is the field size. For this class, we'll use only the GSI16 variant, which is mostly used nowadays.

In [14]:
!wget -q -O sample_data/labor.gsi https://raw.githubusercontent.com/OSGeoLabBp/tutorials/master/english/data_processing/lessons/code/labor.gsi

##Field structure

The first few lines of the file:

```
*110001+0000000000000101 81..10+0000000000119197 82..10+0000000000118827 83..10+0000000000120014
*110002+0000000000000102 81..10+0000000000119192 82..10+0000000000123834 83..10+0000000000120019
*110003+0000000000000103 81..10+0000000000119191 82..10+0000000000130036 83..10+0000000000120000
*110004+0000000000000104 81..10+0000000000119196 82..10+0000000000136218 83..10+0000000000119988
```

The asterisk ('*') at the beginning of each line marks the GSI16 variant. Fields are 23 character long and are separated by space. The first six characters of the fields are coded information about the rest of the field, located after the +/- sign. Field values are zero padded on the left.

We will use only the 6th digit of the coded part, it defines the unit for the value:

```
0: Meter (last digit: 1mm)
1: Feet (last digit: 1/1000ft)
2: 400 gon
3: 360° decimal
4: 360° sexagesimal
5: 6400 mil
6: Meter (last digit: 1/10mm)
7: Feet (last digit: 1/10‘000ft)
8: Meter (last digit: 1/100mm)
```

Example: 81..10+0000000000119197 (there is no decimal point in the value!)

|Position|Description|
|--------|-----------|
|1-2     |Word index (type of data e.g. 11-point id, 81-easting, 82-northing, 83-elevation|
|3-6     |Information releated data e.g. the 6th character defines units, 0-meters last digit mm|
|7       |sign for value|
|8-23    |zero padded value|

The field above means easting coordinate in meters (not feet) in millimeter units: 119.197 m

##Reading and parsing GSI file

First we will write functions. One to split line into fields, and two other to get field values in meters.

In [2]:
def line2fields(line):
    """ split GSI line into fields """
    fields = []                       # result list
    i = 1                             # straight from one to skip the asterisk at the begining of the line
    while i < len(line):
        fields.append(line[i:i+23])   # get the next 23 character long field
        i += 24                       # move position after space
    return fields


Let's test our function:

In [3]:
line2fields("*110001+0000000000000101 81..10+0000000000119197 82..10+0000000000118827 83..10+0000000000120014")

['110001+0000000000000101',
 '81..10+0000000000119197',
 '82..10+0000000000118827',
 '83..10+0000000000120014']

The second function gets the coordinates value from the field, as follows:

In [4]:
# transformation of GSI unit constants to meter
#     mm     1/1000ft   gon    DEG    DMS    mil  1/10 mm 1/10000ft     1/100mm
u = [1000, 1000 * 3.28, 'N/A', 'N/A', 'N/A', 'N/A', 10000, 10000 * 3.28, 100000]

def field2num(field):
    """ get field value in meters """
    s = 1 if field[6] == "+" else -1  # sign of coord
    d = u[int(field[5])]              # factor to meters
    w = int(field[7:])                # value in field
    return s * w / d                  # value in meters with sign

# test
field2num("81..10+0000000000119197")

119.197

The third function is implemented to get coordinates from the fields, as follows:

In [5]:
import re

def fields2coo(fields):
    """ get coords from fields of a GSI line """
    coords = {}                             # initialize coordinates dictionary
    coords[0] = re.sub('^0+', '', fields[0][7:])  # point id always first, remove leading zeros
    for field in fields[1:]:
        if re.match('8[123]', field):       # or re.search('^8[123]', field)
            i = int(field[1])               # 1/2/3 easting/northing/elevation
            coords[i] = field2num(field)    # the coordinate
    return coords

# test
fields2coo(line2fields("*110001+0000000000000101 81..10+0000000000119197 82..10+0000000000118827 83..10+0000000000120014"))

{0: '101', 1: 119.197, 2: 118.827, 3: 120.014}

Finaly let's write the code to process GSI input file.

In [6]:
coord_list = []
with open('sample_data/labor.gsi') as fp:
  for line in fp:
    coords = fields2coo(line2fields(line.strip('\n')))  # remove EOL before processing
    if len(coords) == 4:                                # 3D data found?
      coord_list.append(coords)

coord_list[0:5]                                          # first five points

[{0: '101', 1: 119.197, 2: 118.827, 3: 120.014},
 {0: '102', 1: 119.192, 2: 123.834, 3: 120.019},
 {0: '103', 1: 119.191, 2: 130.036, 3: 120.0},
 {0: '104', 1: 119.196, 2: 136.218, 3: 119.988},
 {0: '105', 1: 119.199, 2: 141.225, 3: 119.989}]

##Creating CSV output

To construct a map using the points in GSI file it is necessary import them into a CAD/GIS software. So let's export coordinates to a csv file.

In [7]:
with open('sample_data/labor.csv', 'w') as fo:
  for point in coord_list:
    print(f'{point[0]},{point[1]:.3f},{point[2]:.3f},{point[3]:.3f}', file=fo)

Try to load the newly-created csv file into QGIS.

##Creating DXF output from scrach

DXF file is a popular data exchange format. Let's create one that our first solution is minimal a direct receipt.

In the DXF file, two rows are used for one value: The first row an integer code for the value in the next row. The list of codes below are used in the folowing program:

|Code|Value|
|----|-----|
|0   |Start of a section/entity|
|1   |Text to display|
|2   |Name of section|
|8   |Layer name|
|10  |X coordinate|
|20  |Y coordinate|
|30  |Z coordinate|
|40  |Text height|

In [8]:
with open('sample_data/labor.dxf', 'w') as fo:
  print("  0\nSECTION\n  2\nENTITIES", file=fo)   # minimal dxf header
  for point in coord_list:
    print(f"  0\nTEXT\n  8\nPTEXT\n 10\n{point[1]+0.1:.3f}\n 20\n{point[2]-0.25:.3f}\n 30\n0.0\n 40\n0.5", file=fo)
    print(f"  1\n{point[0]}\n 50\n0.0", file=fo)
    print(f"  0\nPOINT\n  8\nPOINT\n 10\n{point[1]:.3f}\n 20\n{point[2]:.3f}\n 30\n{point[3]:.3f}", file=fo)
  print("  0\nENDSEC\n  0\nEOF", file=fo)    # dxf footer


##Creating DXF output using ezdxf

There are more Python packages to handle DXF files. [*ezdxf*](https://ezdxf.readthedocs.io/en/stable/) is one of them. We'll write another code to save points into a DXF file. As *ezdxf* is not part of the Colab preinstalled Python packages, first we have to install it using *pip*.

In [9]:
!pip install ezdxf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ezdxf
  Downloading ezdxf-1.0.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m44.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: ezdxf
Successfully installed ezdxf-1.0.1


In [13]:
import ezdxf
from ezdxf.enums import TextEntityAlignment

dxf = ezdxf.new(dxfversion='R2010')   # create a new empty dxf
dxf.layers.add('POINT', color=2)      # create new layers
dxf.layers.add('PTEXT', color=3)
msp = dxf.modelspace()
for point in coord_list:
  msp.add_point([point[1], point[2], point[3]], dxfattribs={'layer': 'POINT'})
  msp.add_text(point[0],dxfattribs={"layer": "PTEXT", 'height': 0.5}).set_placement((point[1]+0.1,point[2]-0.25), align=TextEntityAlignment.LEFT)

dxf.saveas('sample_data/test.dxf')

Finally we've created converter program from GSI to DXF. Is it practical to start Colab and upload your GSI to the cloud and download the DXF file? What happens if you have many files to convert?

Let's try to use the code on your own computer. Is it comfortable to use fixed input and output file names in your code? Instead of burnt in file names we should use the command line parameters (see *argv* in *sys* package).

Let's start to work on...

##Tasks

- Read the Zen of Python, use the *import this* command at the Python prompt (>>>), try also *import antigravity*
- Rewrite the *line2fields* funtion using list comprehension!
- Create the command line version of the GSI to DXF converter using *sys.argv*
- What are the advantages and disadvantages to use *ezdxf* package?
- Search on pypi.org for packages to handle DXF files
- Can you find other solution to create DXF/SHP/GML/... files on the Internet?
- Write a Python program to save points into a SHP file using Python package(s)
- Can you solve to convert between different geospatial file formats from the command line?