<a href="https://colab.research.google.com/github/OSGeoLabBp/tutorials/blob/master/english/data_processing/lessons/dxfinfo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Generate report from a DXF file

DXF (Drawing eXchange Format) defined by Autodesk th exchange CAD data. Several program can read/write DXF files, among other there is a Python module for this called ezdxf.

We will use ezdxf package to read and a scan dxf file and we will print out information about the file and its content (number of different entities on the layers), like the following sample.

```
sample_data/test.dxf version: AC1024 AutoCAD R2010/R2011/R2012
EXTMIN: 3.063 0.342 -1.343
EXTMAX: 13.392 7.929 1.343

Layer                            3DSOLI    ARC CIRCLE   LINE LWPOLY  POINT   TEXT 
0                                     0      0      0      2      0      0      0 
another_layer                         1      0      0      0      0      2      0

```

First we install ezdxf and import the necessary packages.

In [1]:
!pip install ezdxf
import ezdxf
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ezdxf
  Downloading ezdxf-1.0.2-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: ezdxf
Successfully installed ezdxf-1.0.2


Some constants ...

In [2]:
LAYER_FIELD = 32        # default length of layer name field in output
NUMBER_FIELD = 6        # default length of entity counts in output

dxf2cad_version = {'AC1002': 'AutoCAD R2',
                   'AC1004': 'AutoCAD R9',
                   'AC1006': 'AutoCAD R10',
                   'AC1009': 'AutoCAD R11/R12',
                   'AC1012': 'AutoCAD R13',
                   'AC1014': 'AutoCAD R14',
                   'AC1015': 'AutoCAD R2000/R2002',
                   'AC1018': 'AutoCAD R2004/R2005/R2006',
                   'AC1021': 'AutoCAD R2007/R2008/R2009',
                   'AC1024': 'AutoCAD R2010/R2011/R2012',
                   'AC1027': 'AutoCAD R2013/R2014/R2015/R2016/R2017',
                   'AC1032': 'AutoCAD R2018/R2019/R2020/R2021/R2022/R2023'}

Function to get user friendly CAD version from the *ACnnnn* code.

In [3]:
def cad_version(dxf_version):
    """ return AuoCAD version from DXF version
        :param dxf_version: DXF version from dxf file
    """
    if dxf_version in dxf2cad_version:
        return dxf2cad_version[dxf_version]
    return dxf_version   # unknown version return code name

Let's create a class to solve the task.

A dictionary is used to collect the number of entities where the key is a tuple of the layer name and the entity type. Yes, the key in a dictionary can be a tuple!

```
{
    ('0', 'LINE'): 2, 
    ('Layer1', 'CIRCLE'): 5, 
    ('Layer1', 'ARC'): 3, 
    ('Layer1', 'TEXT'): 2, 
    ('Layer1', 'LWPOLYLINE'): 4, 
    ('Layer2', 'POINT'): 2, 
    ('Layer2', '3DSOLID'): 1
}
```

Our class has three methods beside the initialization. The ```layer_entity``` method scans the DXF file and fills the dictionary with entity count by layer and entity type. The ```print_row``` metod output a formated line from the dictionary. Finally the ```dxf_info``` puts together the output using the two other methods and adds a TOTAL line.

In [8]:
class DxfInfo():
    """ class to collect DXF information
        :param dxf_file: the dxf file to process
        :param template_file: dxf file to compare layers and blocks
        :param output_file: output txt file
        :param layer_name: layer name length in output
        :param num_length: length of numbers in output
    """
    def __init__(self, dxf_file, layer_name=LAYER_FIELD, num_length=NUMBER_FIELD):
        """ initialize object """
        self.dxf_file = dxf_file
        # load dxf
        try:
            self.doc = ezdxf.readfile(dxf_file)
        except IOError:
            print(f"*** ERROR Not a DXF file or a generic I/O error: {dxf_file}")
            sys.exit()
        except ezdxf.DXFStructureError:
            print(f"*** ERROR Invalid or corrupted DXF file: {dxf_file}")
            sys.exit()
        self.entities = None
        self.layers = None
        self.blocks = None
        self.layer_name = layer_name
        self.num_length = num_length

    def print_row(self, lay, lay_row):
        """ print a row of table
            :param lay: layer name
            :param lay_row: numpy vector of entity counts
        """
        print(f'{lay[:self.layer_name]:{self.layer_name}s}', end=' ')
        for i in range(lay_row.shape[0]):
            print(f'{lay_row[i]:{self.num_length}d}', end=' ')
        print()

    def layer_entity(self):
        """ collect entities by layer into a dictionary, the dictionary
            has tuple indices composed of layer and entity type
        """
        msp = self.doc.modelspace()
        entities = {}
        for entity in msp:
            e_typ = entity.dxftype()
            try:
                layer = entity.dxf.layer
            except:     # TODO no layer for mpolygon from ezdxf
                print(f'missing layer for entity {e_typ} skipped')
                continue
            if (entity.dxf.layer, e_typ) not in entities:
                entities[(layer, e_typ)] = 0
            entities[(entity.dxf.layer, e_typ)] += 1
        # collect different entity types
        self.entities = entities

    def dxf_info(self):
        """ collect and print layer/entity info of a DXF file
        """
        print(80 * '-')
        print(f"{self.dxf_file} version: {self.doc.dxfversion} {cad_version(self.doc.dxfversion)}")
        e_min = self.doc.header['$EXTMIN']
        e_max = self.doc.header['$EXTMAX']
        print(f"EXTMIN: {e_min[0]:.3f} {e_min[1]:.3f} {e_min[2]:.3f}")
        print(f"EXTMAX: {e_max[0]:.3f} {e_max[1]:.3f} {e_max[2]:.3f}")
        if self.entities is None:
            self.layer_entity()
        keys = sorted(self.entities.keys())
        entities_found = sorted(list({key[1] for key in keys}))
        num_ent_types = len(entities_found)
        entity_dict = {e[1]:e[0] for e in enumerate(entities_found)}
        # print header of table
        print(f'\n{"Layer":{self.layer_name}s}', end=' ')
        layer_row = np.zeros(num_ent_types, dtype=np.int32)
        total_row = np.zeros(num_ent_types, dtype=np.int32)
        for e in entities_found:
            print(f'{e[:self.num_length]:>{self.num_length}s}', end=' ')
        print()
        last_layer = ""
        for key in keys:
            layer = key[0]
            if layer != last_layer and np.sum(layer_row) > 0:
                self.print_row(last_layer, layer_row)
                total_row += layer_row
                layer_row.fill(0)           # intialize row
            layer_row[entity_dict[key[1]]] = self.entities[key]
            last_layer = layer
        if np.sum(layer_row) > 0:
            self.print_row(last_layer, layer_row)
            total_row += layer_row
        print()
        self.print_row("TOTAL", total_row)

Sample data to test the code.

In [9]:
!wget -q -O sample_data/test.dxf https://raw.githubusercontent.com/OSGeoLabBp/tutorials/master/english/data_processing/lessons/code/test.dxf

Test the code.

In [12]:
DxfInfo('sample_data/test.dxf').dxf_info()

--------------------------------------------------------------------------------
sample_data/test.dxf version: AC1024 AutoCAD R2010/R2011/R2012
EXTMIN: 3.063 0.342 -1.343
EXTMAX: 13.392 7.929 1.343

Layer                            3DSOLI    ARC CIRCLE   LINE LWPOLY  POINT   TEXT 
0                                     0      0      0      2      0      0      0 
another_layer                         1      0      0      0      0      2      0 
something                             0      1      1      0      1      0      1 

TOTAL                                 1      1      1      2      1      2      1 


##Tasks

*   Test the code with different DXF files
*   Include empty layers of the DXF file also in the output
*   Write a program the get several DXF files from the command line and print the layer information table for each input file
*   Write a program which gets the output filed length for layer and number field from the command line (use argparse module)

