# Preview

This notebook was inspired by the work done within the framework of the First International Summer School on Data Science for Mobility 2022, Santorini, Greece.

For more details and examples about the human mobility analysis and simulation in Python, please have a look at these [tutorials](https://github.com/scikit-mobility/tutorials/tree/master/DSM_summer_school).

The following formats, along with their extensions and other associated formats are available at [this link.](https://docs.fileformat.com/)

**NB**: the storage of geographic data differs from one format /dataset to another. Indeed, sometimes the data is in the form of a list, other times we have dictionaries, or strings, or points, etc.
The safest way is to display the data once it is extracted so that we know how to handle it afterwards.

## Table of contents

* [Installing packages](#chapter1)
* [Reading datasets with Pandas](#chapter2)
    * [CSV/TXT formats](#section_2_1)
    * [XML/PHP formats](#section_2_2)
       * [ XSD format](#sub_section_2_2_1)
       * [ ODS format](#sub_section_2_2_2)
    * [XLS/XLSX formats](#section_2_3)
    * [JSON/GBFS formats](#section_2_4)
    * [HTML format](#section_2_5)
    * [API format](#section_2_6)
* [GIS API](#chapter3)
* [Reading datasets with GeoPandas](#chapter4)
  * [GeoJSON/SHP/GPKG formats](#section_4_1)
  * [OGC standards](#section_4_2)
    * [KML format](#sub_section_4_2_1)
    * [GML/XPLANGML formats](#sub_section_4_2_2)
    * [ WMS format  ](#sub_section_4_2_3)
    * [ CSW format  ](#sub_section_4_2_4)
    * [WFS format](#sub_section_4_2_5)
    * [WCS format](#sub_section_4_2_6)
* [ZIP files](#chapter5)  
* [SPARQL format](#chapter6)  
* [PDF format](#chapter7)  
* [OSM (DBF) format](#chapter8)  
* [TIFF format](#chapter9) 
* [DXF format](#chapter10) 

## Installing packages <a class="anchor" id="chapter1"></a>

Installing Python packages in Anaconda is a simple process that can be done through various methods, such as using the conda command, pip, or the Anaconda Navigator. For more details about these methods, please refer to [this tutorial.](https://www.tutorialspoint.com/how-do-i-install-python-packages-in-anaconda)

## Reading Datasets with Pandas <a class="anchor" id="chapter2"></a>

The following formats are manipulable with **Pandas**, a Python library for data analysis. Pandas introduced two new types of objects for storing data that make analytical tasks easier and eliminate the need to switch tools: Series, which have a list-like structure, and DataFrames, which have a tabular structure.

### CSV/TXT formats  <a class="anchor" id="section_2_1"></a>

Files with **Comma Separated Values (CSV)** extension represent plain text files that contain records of data with comma separated values. Each line in a CSV file is a new record from the set of records contained in the file.

A file with **.TXT extension** represents a text document that contains plain text in the form of lines. The default character set of text files is ASCII that is used for creating and display of text file contents. Characters are encoded using ASCII character set, but this imposes limitation of usage on characters such as Pound Sign, Dollar and Euro sign that can’t be represented using the ASCII character set. Thus, text files can also be saved in Unicode format, with UTF-8 being the mostly used. Text files can also store large amount of data as there is no limitation on the size of contents. A standard text document can be opened in any text editor or word processing application on different operating systems. 



In [None]:
import pandas as pd
import io

#same function is applicable for '.txt' files
try:
    df = pd.read_csv(data_url, sep=';')
except Exception as e:
    print('Sorry, could not handle the csv/txt file. Error : ' + str(e)) 

### XML/PHP formats  <a class="anchor" id="section_2_2"></a>

The **Extensible Markup Language (XML)** is similar to HTML but different in using tags for defining objects (HTML is for data representation over the web, whereas XML is for exchange of data). The markup tag pairs used inside XML define the key elements of the structure to be utilized by reading applications.


 

On the other hand, a file with **.php** extension refers to open source programming language, used to write server side scripts, to be executed on a web server. The result is returned to the browser as plain HTML. PHP files can be opened with any text editor and edited in place, though applications like Adobe Dreamweaver, Eclipse PHP Development tools offer a convenient way to write and modify PHP code.

In [None]:
import pandas as pd
import io

#same function is applicable for files that end with '.php'
try:
    df=pd.read_xml(data_url)
except Exception as e:
        print('Sorry, could not handle the xml/php file. Error : ' + str(e))

#### XSD format <a class="anchor" id="sub_section_2_2_1"></a>

An **XML schema definition (XSD)** is a framework document that defines the rules and constraints of an XML document. This ensures that data is properly interpreted, and errors are caught, resulting in appropriate XML validation. XSD files ensure that the data entered follows the same structure as defined in the file. XSD files are stored in XML file format and can be opened or edited in any text editor.

In [None]:
import xmlschema

try:
    my_schema = xmlschema.XMLSchema(data_url)
    print(my_schema)
except Exception as e:
        print('Sorry, could not handle the xsd file. Error : ' + str(e))   

#### ODS format <a class="anchor" id="sub_section_2_2_2"></a>

**OpenDocument Spreadsheet (ODS)** is a XML-based format and is one of the several subtypes in the Open Document Formats (ODF) family. The format supports document representation as a single XML document as well as a collection of several subdocuments within a package as ZIP archive.  Each of the files from the ZIP archive stores part of the complete document. Each subdocument stores a particular aspect of the document. For example, one subdocument contains the style information and another subdocument contains the content of the document.

In [None]:
from pandas_ods_reader import read_ods
try:
    df = read_ods(data_url)
except Exception as e:
        print('Sorry, could not handle the ODS file. Error : ' + str(e))        

### XLS/XLSX formats <a class="anchor" id="section_2_3"></a>

Files with XLS extension represent **Excel Binary File Format**. Just like the CSV format, a file saved by Excel is known as Workbook where each workbook can have one or more worksheets. Data is stored and displayed to users in table format in worksheet and can span numeric values, text data, formulas, external data connections, images, and charts.
 

In [None]:
import pandas as pd
import io

#same function is applicable for the "xlsx" format
try:
    df=pd.read_excel(data_url)
except Exception as e:
    print('Sorry, could not handle the xls file. Error : ' + str(e))

### JSON/GBFS  formats <a class="anchor" id="section_2_4"></a>

The **JavaScript Object Notation (JSON)** is an open standard file format for sharing data that uses human-readable text to store and transmit data. It is derived from JavaScript but is a language-independent data format. 

JSON data is written in key/value pairs. The key and value are separated by a colon(:) in the middle with the key on the left and the value on the right. Different key/value pairs are separated by a comma(,). The key is a string surrounded by double quotation marks for example “name”. 

The **General Bikeshare Feed Specification (GBFS)**, is the open data standard for shared mobility. GBFS makes real-time data feeds in a uniform format publicly available online, with an emphasis on findability.

In [None]:
import pandas as pdx
import io

#same function is applicable for the "DORA-Service"/"GBFS" formats
try:
    df = pd.read_json(data_url)
except Exception as e:
    print('Sorry, could not handle the json/GBFS file. Error : ' + str(e))

### HTML format <a class="anchor" id="section_2_5"></a>

The **Hyper Text Markup Language (HTML)** is the extension for web pages created for display in browsers. HTML pages are either received from server, where these are hosted, or can be loaded from local system as well. Each HTML page is made up of HTML elements such as forms, text, images, animations, links, etc. These elements are represented by tags and several others where each tag has start and end. It can also embed applications written in scripting languages such as JavaScript and Style Sheets (CSS) for overall layout representation.

In [None]:
import pandas as pd
import io

try:
    table = pd.read_html(data_url)
except Exception as e:
    print('Sorry, could not handle the html file. Error : ' + str(e))

### API format  <a class="anchor" id="section_2_6"></a>

An **application programming interface (API)** is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. document or standard that describes how to build or use such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation.

In [None]:
import requests
import json
import pandas as pd
from pandas import DataFrame
import io

try:
    response_API = requests.get(data_url)
    data = response_API.text 
    parse_json = json.loads(data)
    df = pd.DataFrame.from_dict(pd.json_normalize(parse_json), orient='columns') 
except Exception as e:
        print('Sorry, could not access to this api. Error : ' + str(e))   

## GIS API <a class="anchor" id="chapter3"></a>

A **Geographic Information System (GIS)** is a type of database containing geographic data, combined with software tools for managing, analyzing, and visualizing those data. 

The core of any GIS is a database that contains representations of geographic phenomena (such as roads, land use, elevation, trees, waterways, and states), modeling their geometry (location and shape) and their properties or attributes. A GIS database may be stored in a variety of forms, such as a collection of separate data files or a single spatially-enabled relational database. 

For more informations about the standard, see also [this link](https://en.wikipedia.org/wiki/Geographic_information_system) and [this one.](https://www.gistandards.eu/gis-standards/#:~:text=What%20are%20GIS%20standards%3F,use%20of%20any%20geographic%20information.) 

To perform GIS visualization, analysis, data management, and GIS system administration task, we use the ArcGIS API library. To explore this library more in details, have a look at [Documentation and samples for ArcGIS API for Python](https://github.com/Esri/arcgis-python-api). For more informations about the GIS module (parameters, examples, etc.), see also [this link.](https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#gis)

In [None]:
from arcgis.gis import GIS
import io
from IPython.display import display

def read_gis(data_url, city, query):
    
    try:
        
        #The GIS Object provides a mapping widget that can be used in the Jupyter Notebook environment for visualizing GIS 
        #content as well as the results of your analysis.

        gis = GIS(url=data_url) # data_url is an Optional string. 
                                #If URL is None, then the URL will be ArcGIS Online. This should be a web address to either an ArcGIS Enterprise portal 

        map1 = gis.map(city) #The GIS object includes a map widget that can be used to visualize the content of your 
                         #GIS as well as see the results of your analysis. 
        print("Displaying the map of " +city)
        display(map1)

        #We can search for content in our GIS. We do that by calling gis.content.search() and for each web map or web layers that gets 
        #returned, we can display its rich representation within the notebook

        print("Searching for " + query)
        items = gis.content.search(query, item_type="Feature Layer", outside_org=True)
        for item in items[:5]:
            display(item) #print the first 5 results of the search
            
    except Exception as e:
             print('Sorry, could not handle the GIS API. Error : ' + str(e)) 

## Reading Datasets with GeoPandas <a class="anchor" id="chapter4"></a>

### GeoJSON/SHP/GPKG  formats <a class="anchor" id="section_4_1"></a>

**GeoJSON** is a JSON based format designed to represent the geographical features with their non-spatial attributes. It represents a collective information about the Geographical features, their spatial extents, and properties. An object of this file may indicate a geometry (Point, LineString, Polygon), a feature or collection of features. The features reflect addresses and places as point’s streets, main roads and borders as line strings and countries, provinces, and land regions as polygons. 

The **shapefile** format is a geospatial vector data format for geographic information system (GIS) software. Files with the extention '.shp' represent the shape format; the feature geometry itself {content-type: x-gis/x-shapefile}.

**GeoPackage (GPKG)** is an open, non-proprietary, platform-independent and standards-based data format for geographic information systems built as a set of conventions over a SQLite database.

In [None]:
import geopandas as gpd
import io

#same function is applicable for the- 'shp' and 'gpkg' format
try:
    df = gpd.read_file(data_url)
except Exception as e:
        print('Sorry, could not handle the geojson file. Error : ' + str(e))    

### OGC standards <a class="anchor" id="section_4_2"></a>

The **Open Geospatial Consortium (OGC)** is an international consortium of more than 500 businesses, government agencies, research organizations, and universities driven to make geospatial (location) information and services *FAIR - Findable, Accessible, Interoperable, and Reusable*.

They are used by software developers to build open interfaces and encodings into their products and services. Standards are the main "products" of OGC and have been developed by the membership to address specific interoperability challenges, such as publishing map content on the Web, exchanging critical location data during disaster response & recovery, and enabling the fusion of information from diverse Internet of Things (IoT) devices. 

For more informations, see also [this link.](https://www.ogc.org/standards)

In [6]:
from owslib.ogcapi.features import Features
from owslib.ogcapi.coverages import Coverages
from owslib.ogcapi.records import Records
from owslib.ogcapi.processes import Processes

#### KML format  <a class="anchor" id="sub_section_4_2_1"></a>

**Keyhole Markup Language (KML)** is a type of language tag, inspired by XML, which incorporates in an ASCII (text) file the geometric descriptions of points, lines and polygons, as well as attribute data like longitude and latitude, in addition to other data can make a view more specific, such as tilt, heading, or altitude, which together define a "camera view" along with a timestamp or timespan. The geographic annotation and visualization can be done within two-dimensional maps and three-dimensional Earth browsers.

For more informations, see also [this link.](https://en.wikipedia.org/wiki/Keyhole_Markup_Language)

In [None]:
import geopandas as gpd
import fiona
import io

def read_kml(data_url):
    try:
        gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
        df = gpd.read_file(data_url, driver='KML')
    except Exception as e:
            print('Sorry, could not handle the kml file. Error : ' + str(e))

#### GML/XPLANGML formats <a class="anchor" id="sub_section_4_2_2"></a>


Similar to KML format, the **The Geography Markup Language (GML)** consists of a set of XML schemas that define an open format for the exchange of geographic data and allow the construction of specific data models for specialized domains, such as urban planning, hydrology or geology.

For more informations, have a look at [this link.](https://www.ogc.org/standards/gml)

In [None]:
import geopandas as gpd
import io

#same function is applicable for the "xplangml" format
try:
    df = gpd.read_file(data_url, driver='GML')
except Exception as e:
        print('Sorry, could not handle the gml file. Error : ' + str(e))

#### WMS format  <a class="anchor" id="sub_section_4_2_3"></a>

**The Web Map Service (WMS)** is a standard protocol developed by the Open Geospatial Consortium (OGC) for serving georeferenced map images over the Internet. These images are typically produced by a map server from data provided by a GIS database. 

A WMS server usually serves the map in a bitmap format, e.g. PNG, GIF, JPEG, etc. In addition, vector graphics can be included, such as points, lines, curves and text, expressed in SVG or WebCGM format.

WMS specifies a number of different request types, two of which are required by any WMS server:

- GetCapabilities: returns parameters about the WMS (such as map image format and WMS version compatibility) and the available layers (map bounding box, coordinate reference systems, URI of the data and whether the layer is mostly opaque or not)


- GetMap: returns a map image. Parameters include: width and height of the map, coordinate reference system, rendering style, image format

For more details, have a look at [this link.](https://en.wikipedia.org/wiki/Web_Map_Service#:~:text=A%20Web%20Map%20Service%20(WMS,provided%20by%20a%20GIS%20database)

In [None]:
from owslib.wms import WebMapService
from owslib.fes import PropertyIsEqualTo, PropertyIsLike, BBox
import io

#functions related to the WMS standard 

def wms_details(wms, layer):      #details of a layer
     
    print("name of the layer : " + wms[layer].title)   
    print("queryable : " + str(wms[layer].queryable))
    print("opacity : " + str(wms[layer].opaque))
    print("bounding box : " + str(wms[layer].boundingBox))
    print ("bounding boxWGS84 : " + str(wms[layer].boundingBoxWGS84))
    print("crsOptions : " +str(wms[layer].crsOptions))
    print("styles : " + str(wms[layer].styles)) #pseudo_bright, visual, pseudo_low, etc.
          
def wms_methods(wms, methods):
          
    for m in range(len(methods)): 
        print("method : " + methods[m])
        urls=wms.getOperationByName(methods[m]).methods
        formats=wms.getOperationByName(methods[m]).formatOptions # image/jpeg, image/png, image/geotiff, image/tiff, etc.
        print("Formats : " +str(formats))
        print('---------------------------')
        
        
def wms_visualisation(layer,style,crsOptions, boundingBox, size, fmt, name):

    img = wms.getmap(layers=layer,
                         styles=style,
                         srs=crsOptions,
                         bbox=boundingBox,
                         size=size, #for example : (300, 250)
                         format= fmt,
                         transparent=True)
                        
    out = open(name, 'wb')
    out.write(img.read())
    out.close()
    
def wms_main(data_url):
    
    #Connect to a WMS, and inspect its properties
    wms = WebMapService(data_url)
    title = wms.identification.title #name of the dataset          
    layers = list(wms.contents) #available layers 
    for i in range(len(layers)):
        wms_details(wms, layers[i])
        print('---------------------------')
    operations = [op.name for op in wms.operations]
    wms_methods(wms, operations)
    #wms_visualisation(layers[0], 'default','EPSG:4258', (3.8356, 50.757, 7.21, 53.446), (300, 250),'image/png', 'test.jpg')
    

#### CSW format  <a class="anchor" id="sub_section_4_2_4"></a>

**Catalogue Service for the Web (CSW)** is a standard for exposing a catalogue of geospatial records in XML on the Internet (over HTTP). The catalogue is made up of records that describe geospatial data (e.g. KML), geospatial services (e.g. WMS), and related resources.

Each record must contain certain core fields including: Title, Format, Type (e.g. Dataset, DatasetCollection or Service), BoundingBox (a rectangle of interest, expressed in latitude and longitude), Coordinate Reference System, and Association (a link to another metadata record).

Operations defined by the CSW standard include:

- *GetCapabilities*: allows CSW clients to retrieve service metadata from a server

- *DescribeRecord*: allows a client to discover elements of the information model supported by the target catalogue service. The operation allows some or all of the information model to be described.

- *GetRecords*: search for records, returning record IDs

- *GetRecordById*: retrieves the default representation of catalogue records using their identifier

- *GetDomain (optional)*: "used to obtain runtime information about the range of values of a metadata record element or request parameter"

For more details, have a look at [this link](https://www.ogc.org/standards/cat) and [this one.](https://en.wikipedia.org/wiki/Catalogue_Service_for_the_Web)

In [None]:
from owslib.csw import CatalogueServiceWeb
from owslib.fes import PropertyIsEqualTo, PropertyIsLike, BBox
import io

#functions related to the csw standard 

def csw_search_data(data, nb_results):
    
    #Search for a specific data, for example: search for bird data
    query = PropertyIsEqualTo('csw:AnyText', data)
    csw.getrecords2(constraints=[query], maxrecords=nb_results) # return n results out of all, for ex: nb_results=20
    print(csw.results)
    for rec in csw.records:
        print(csw.records[rec].title)
        
def csw_search_data_place(data, bbox):
    
    #Search for a specific data in a specific place, for example: search for bird data in Canada
    bbox_query = BBox(bbox) # for ex: bbox = [-141,42,-52,84]
    query =  PropertyIsEqualTo('csw:AnyText', data)
    csw.getrecords2(constraints=[query, bbox_query])
    print(csw.results)

def csw_search_like_keywords(keyword):
    
    #Search for keywords
    query_like = PropertyIsLike('dc:subject', '%'+keyword+'%') #for ex: PropertyIsLike('dc:subject', '%birds%')
    csw.getrecords2(constraints=[query_like])
    print(csw.results)
    
def csw_main(data_url):
    
    #Connect to a CSW, and inspect its properties
    csw = CatalogueServiceWeb(data_url) 
    csw_operations=[op.name for op in csw.operations]
    print(csw_operations)
    try:
        for i in range(len(csw_operations)): #get supported resultType’s of each operation              
            csw.getdomain(csw_operations[i]+'.resultType')
            print(csw.results)
    except:
            print('Sorry, could not get the resultType. Please check the csw_operation')


#### WFS format <a class="anchor" id="sub_section_4_2_5"></a>

The **Web Feature Service (WFS)** allows, by means of a formatted URL, to query map servers in order to manipulate geographic objects (lines, points, polygons...), contrary to the Web Map Service or WMS which allows the production of georeferenced maps from geographic servers.

For more details, have a look at [this link](https://en.wikipedia.org/wiki/Web_Feature_Service) and [this link.](https://geopython.github.io/OWSLib/usage.html#wfs)

In [None]:
from owslib.wfs import WebFeatureService
from owslib.fes import PropertyIsEqualTo, PropertyIsLike, BBox
import io

try:
    wfs = WebFeatureService(url=data_url) #Connect to a WFS and inspect its capabilities
    print(wfs.identification.title)
    operations = [operation.name for operation in wfs.operations]
    print(operations)
    contents = list(wfs.contents) #List FeatureTypes
    print(contents)
except Exception as e:
        print('Sorry, could not handle the wfs file. Error : ' + str(e))

#### WCS format  <a class="anchor" id="sub_section_4_2_6"></a>

The **Web Coverage Service (WCS)** is a standard that provides an interface for downloading coverage data (digital terrain models, orthoimages, numerical weather prediction).

For more details, have a look at [this link.](https://www.ogc.org/standards/wcs)

In [None]:
from owslib.wcs import WebCoverageService
from owslib.fes import PropertyIsEqualTo, PropertyIsLike, BBox
import io

#functions related to the wcs standard 

def wcs_main(data_url):
    
    # Create coverage object
    my_wcs = WebCoverageService(data_url)

    # Get list of coverages
    coverages=my_wcs.contents.keys()
    print(coverages)

    
    for i in range(len(coverages)):
        wcs_covrages(covrages[i])
               
def wcs_covrages(covrage): # Get geo-bounding boxes and native CRS of coverages
    
    boudingboxes = my_wcs.contents[coverage].boundingboxes

    # Get axis labels
    axislabels = my_wcs.contents[coverage].grid.axislabels

    # Get dimension
    dimension = my_wcs.contents[coverage].grid.dimension

    # Get grid lower and upper bounds
    lowlimits = my_wcs.contents[coverage].grid.lowlimits
    highlimits = my_wcs.contents[coverage].grid.highlimits

    # Get offset vectors for geo axes
    offsetvectors= my_wcs.contents[coverage].grid.offsetvectors

    # For coverage with time axis get the date time values
    timepositions = my_wcs.contents[coverage].timepositions
    

### ZIP Files <a class="anchor" id="chapter5"></a>

A file with **.zip extension** is an archive that can hold one or more files or directories. The archive can have compression applied to the included files in order to reduce the ZIP file size.

In [4]:
from zipfile import ZipFile 
import io

def unzip_files_url(data_url): #unzip a file coming from an internet source
    try:
        r = requests.get(data_url)
        z = ZipFile(io.BytesIO(r.content))
        z.extractall()
        filenames = [y for y in sorted(z.namelist())]
        print(filenames)
        return filenames
    except Exception as e:
             print('Sorry, could not handle unzip the file. Error : ' + str(e))    

In [66]:
def unzip_files(filename):
    with ZipFile(filename, 'r') as zObject:    
        # Extracting all the members of the zip into a specific location.
        zObject.extractall()
        filenames = [y for y in sorted(zObject.namelist())]
        print(filenames) 
        return filenames

### SPARQL  format <a class="anchor" id="chapter6"></a>

**SPARQL** is the standard query language and protocol for Linked Open Data and RDF databases. SPARQL can be used to add, remove and retrieve data from RDF-style graph databases. 

SPARQL queries can not only match patterns of subject-predicate-object triples, but can also use mathematical operations (*JOIN, SORT, AGGREGATE,etc.*) and a wide range of utility functions to create filters and new variable bindings. 

In [None]:
from SPARQLWrapper import SPARQLWrapper, JSON
import io

def read_sparql(data_url, query):
    try:   
        sparql = SPARQLWrapper(data_url)
        sparql.setQuery(query)
        #The response from the given endpoint is retrieved in JSON and converted to a Python object, 
        #ret, which is then iterated over and printed        
        ret = sparql.queryAndConvert()
        for r in ret["results"]["bindings"]: 
            print(r)
    except Exception as e:
         print('Sorry, could not handle the SPARQL file. Error : ' + str(e))            

### PDF format  <a class="anchor" id="chapter7"></a>

**tabula-py** is a simple Python wrapper of tabula-java, which enables you to extract tables from a PDF into a DataFrame, or a JSON. It can also extract tables from a PDF and save the file as a CSV, a TSV, or a JSON.    

In [None]:
import tabula
import io

def read_pdf(data_url):
    try:
        df = tabula.read_pdf(data_url)
    except Exception as e:
         print('Sorry, could not handle the PDF file. Please check parameters (URL, etc.). Error : ' + str(e))

### OSM (DBF) format   <a class="anchor" id="chapter8"></a>

OSM file parsing by osmium is built around the concept of handlers. A handler is a class with a set of callback functions. Each function processes exactly one type of object as it is read from the file.

A handler has to inherit from one of the handler classes (in our case osmium.SimpleHandler). Then it needs to implement functions for each object type it wants to process.

After that, the handler needs to be applied to an OSM file, and that is by calling the apply_file() convenience function, which in its simplest form only requires the file name as a parameter.

For more detais, have a look at [the official documentation of OSMIUM.](https://docs.osmcode.org/pyosmium/latest/index.html)

In [None]:
import osmium as osm
from dbfread import DBF

class OSMHandler(osm.SimpleHandler):
    def __init__(self):
        osm.SimpleHandler.__init__(self)

In [None]:
osmhandler = OSMHandler()
osmhandler.apply_file(data_url)

### TIFF format   <a class="anchor" id="chapter9"></a>

The **Tag Image File Format (TIFF)** is an image file format for storing images that are rasterized.

The following step-by-step implementation was taken from [this tutorial.](https://www.javatpoint.com/visualize-tiff-file-using-matplotlib-and-gdal-in-python)

In [72]:
from osgeo import gdal as GD  
import matplotlib.pyplot as plt
import matplotlib.pyplot as mplot  
import numpy as npy 

data_set = GD.Open(data_url) 
print("Number of bands: " + str(data_set.RasterCount))  #Counting the total number of bands

In [74]:
# Fetching the bands. As, there are 3 bands, We will store in 3 different variables.
# We utilize the GDAL's GetRasterBand(int) to get the bands. 
#It is important to note that the value that we pass will always begin with one (indexing of bands starts at 1) 

band_1 = data_set.GetRasterBand(1) # red channel  
band_2 = data_set.GetRasterBand(2) # green channel  
band_3 = data_set.GetRasterBand(3) # blue channel  

In [75]:
#Step 4: Reading the bands as NumPy arrays.

b1 = band_1.ReadAsArray()  
b2 = band_2.ReadAsArray()  
b3 = band_3.ReadAsArray()  

# Plotting the arrays using imshow() function of matplotlib. In our case, for plotting the three arrays, we'll stack them up in order.

img_1 = npy.dstack((b1, b2, b3))  
f = mplot.figure()  
plt.imshow(img_1)  
mplot.savefig('Tiff.png')  
mplot.show()  

### DXF format   <a class="anchor" id="chapter10"></a>

The **Drawing Interchange File (DXF)** format stores and describes the content of 2D and 3D design data and metadata. All graphical DXF entities are stored in layouts; these layouts can be iterated and do support the index operator. A layout can contain entities like LINE, CIRCLE, TEXT and so on. Each DXF entity can only reside in exact one layout.

There are three different layout types:

- **Modelspace:** the common construction space
- **Paperspace:** used to to create print layouts
- **BlockLayout:** reusable elements, every block has its own entity space

A DXF document consist of exact one modelspace and at least one paperspace. The modelspace contains the “real” world representation of the drawing subjects in real world units.

For more information, have a look at [the official documentation.](https://ezdxf.readthedocs.io/en/stable/index.html)


In [52]:
import ezdxf
import sys

# helper function
def print_entity(e):
    print("LINE on layer: %s\n" % e.dxf.layer)
    print("start point: %s\n" % e.dxf.start)
    print("end point: %s\n" % e.dxf.end)
    print('-------------------------')

def read_ezdxf(filename, zipname):

    try:
      #ezdxf supports loading ASCII and binary DXF documents from a file or a ZIP-file

      #doc = ezdxf.readfile(filename)
      doc = ezdxf.readzip(zipname, filename )
      msp = doc.modelspace()  # Getting the modelspace of the DXF document and iterate over all entities 
      for e in msp:
        if e.dxftype() == "LINE":
          print_entity(e)

      # entity query for all LINE entities in modelspace
      for e in msp.query("LINE"):
        print_entity(e)

    except IOError:
        print(f"Not a DXF file or a generic I/O error.")
        sys.exit(1)
    except ezdxf.DXFStructureError:
        print(f"Invalid or corrupted DXF file.")
        sys.exit(2)