GDBee
is a PyQt5 desktop application which you can use to write and execute SQL queries against tables and feature classes stored inside an Esri file geodatabase. It provides a tabbed interface which lets you connect to multiple geodatabases within a single session. It has a rich code editor featuring auto-completion (with suggestions), syntax highlight, and result set export.
If you are a QGIS Desktop user, you are already able to execute SQL against file geodatabases using QGIS DBManager plugin, but GDBee
has some extra features that the DBManager is missing (for instance, you do not need to add your datasets as layers first and you can choose to copy individual cells instead of the whole row) from the result table.
This project has started as an experimental playground to see what functionality PyQt provides. As I spent more time working with PyQt, I have started wondering what it would take to build a simple desktop application. Because I often find myself in need of querying a file geodatabase's datasets, I have decided to build a GUI-based SQL editor that would let me execute SQL queries against a table or a feature class and draw the result set in a table form for further visual inspection. I have thought that other GIS users and developers may find this application useful and I therefore have decided to start a GitHub repository to let others take advantage of my work.
- Working with multiple geodatabases using multiple tabs (single geodatabase connection per tab)
- Having a schema panel showing tables and their columns for each connected geodatabase
- Exporting result sets into various formats (
WKT
strings to paste into QGIS using QuickWKT plugin,arcpy
code to paste into ArcMap Python window,pandas
data frame via.csv
file (which can be taken intogeopandas
), and Markdown table via.md
file or plain text) - Executing SQL query with respect to the user selection (only selected text is executed)
- Loading/saving SQL queries from and to text files on disk
- Convenient keyboard shortcuts for query execution (
F5
andCtrl-Enter
), tab interaction (Ctrl-N
andCtrl-W
for opening and closing tabs), and browsing to a geodatabase (Ctrl-B
) - Copying data from the result set table (either individual cell values or row(s) with the headers preserved) - ready to paste properly into an Excel sheet
- Choosing whether you want to have geometry column in the result set as WKT
- Choosing what SQL dialect to use for querying (
OGR SQL
orSQLite
) - Auto-completion and highlights for geodatabase tables and columns as well as SQL keywords and functions
- Pagination of the result table to load rows on request as user scrolls down
- Reporting query execution time and number of records returned
- All queries should target datasets stored within a single geodatabase; you won't be able to run this query:
SELECT * FROM DB1.Parcels WHERE Parcels.ID in (SELECT ID FROM DB2.Parcels)
-
The application can be used only for selecting existing data and creating new data within the application session; you won't be able to execute any
UPDATE
,INSERT
, orDELETE
queries in the current version. If you want to save the generated shapes (for instance, as a result of buffering or getting vertices of polygons as points), you can use the export functionality to load the WKT representation of the shapes into an ArcMap or QGIS layer. -
Depending on what SQL dialect you are using and what version of
GDAL
you work with, there will be some features of the SQL language you won't be able to take advantage of. For instance, usingLIMIT
withOGR SQL
dialect is possible only starting withGDAL
2.2. NativeOGR SQL
does not provide any spatial SQL functions to calculate the distances between features, get their boundaries, and so on. You would need to useSQLite
dialect for this. You can choose what SQL dialect to use before executing the query. To learn more about these two SQL dialects, refer to these two pages: OGR SQL and SQLite. -
It is hard to achieve high performance of SQL querying without loading the data into a proper DBMS first. This application would not be an appropriate choice for tasks when you need to run cross joins calculating distances between million points. The
GDAL
's SQL execution time depends a lot on the SQL spatial functions being used. Also, getting the length of the result set returned from the database can be slow depending on whether you are querying a single table or creating a virtual view joining multiple tables together. The performance will also be different depending on what SQL dialect (OGRSQL
andSQLite
) is being used. You would most likely want to use theOGR SQL
unless you need to use SQL spatial functions. -
TODO compare performance of
OGRSQL
andSQLite
- Python 3.5
GDAL
2.1+PyQt5
pandas
tabulate
(optional, used for exporting result set into a markdown table)
Tested against:
- GDAL 2.1.3 and GDAL 2.2.3
- PyQt 5.6.0
The GDAL compilation details (can be run in the SQL query window):
select geos_version() --3.6.2-CAPI-1.10.2 999999
select proj4_version() --Rel. 4.9.3, 15 August 2016
select spatialite_version() --4.3.0-RC1
The GDAL compilation details (can be run in Python interpreter window):
osgeo.gdal.VersionInfo() # '2010400' for GDAL 2.1.3 and '2020300' for GDAL 2.2.3
You don't need to install anything apart from PyQt
and GDAL
(and pandas
which is used for exporting) in order to be able to run the application. All of these can be installed using Anaconda Navigator which is arguably the easiest way to proceed.
I've tried various combinations to get PyQt5 and GDAL working happily together with Python 2.7 but haven't managed this. However, if you do have a Python 2.7 environment with the requirements satisfied, by all means go ahead and use it.
To run the application, just run the main.py
file using the right Python interpreter.
You may find this PyQt desktop application useful if:
- You would like to be able to interrogate your file geodatabase datasets using SQL (instead of Python-based interface such as Esri
arcpy
or open-sourceogr
) - You are an ArcGIS user that does not want to have QGIS Desktop installed just to be able to execute SQL against a file geodatabase
- You use SQL on a daily basis working with spatial databases (such as PostgreSQL or Microsoft SQL Server) and want to be able to execute ad hoc SQL queries against file geodatabase datasets without loading them into a proper DBMS database
- You already have a lot SQL code targeting tables stored in a DBMS spatial database and you would like to be able to reuse this code when targeting a file geodatabase
If you are an ArcGIS user, you have already used SQL for querying your file geodatabase datasets when using the Select By Attributes
and Select By Location
dialog windows or Select Layer By Attributes
, Select Layer By Location
, Make Feature Layer
or Select
geoprocessing tools. You have supplied a where
clause specifying what features or rows you would like to select. This can be done simply to see what data match your query or to perform some analysis on the subset of your data. The querying functionality available through these tools is somewhat limited, though, because you will not be able to take advantage of rich SQL operations such as grouping your data, joining multiple tables, or using spatial SQL functions.
Open source GIS users have better tools for querying file geodatabases using SQL; QGIS DBManager plugin does provide an excellent interface for executing SQL queries against map layers added from a file geodatabase. However, this requires installing QGIS Desktop software and adding all the datasets you would like to participate in the SQL query into a map. Also, there are no tools to quickly export the result set in a suitable format.
Esri does provide File Geodatabase C++ API for Windows, MacOS and Linux, however it also has a number of limitations; for instance, SQL joins are not supported and you would need to have Microsoft Visual Studio for development (gcc/clang on Linux). You would also need to bind the API to your own language of choice since only .NET bindings are included.
Because Python is widely used in the GIS community, I thought it would make sense to take advantage of Python bindings of GDAL
(via GEOS
) to be able to connect to a file geodatabase and execute SQL queries. Working with a file geodatabase via GEOS
makes it possible to take advantage of SQL spatial functions that are otherwise inaccessible to an ArcGIS user.
In this section, you find just a few sample SQL queries that have been executed in the GDBee
against an Esri file geodatabase containing datasets accessible via PostGIS workshop materials by Boundless which can be downloaded from this link.
The queries and the result sets are here to show you the power of SQL when working with file geodatabases. You can learn more about using SQL for querying file geodatabases in the Esri documentation available at Use SQL for reporting and analysis on file geodatabases. The full list of the spatial and non-spatial functions available for a GDAL
user is listed in the SpatialLite documentation. Nearly all SQL functions are available for a GDAL
user (look for the GEOS
column).
The SQL queries are executed with the help of GDAL
Python bindings using the ExecuteSQL()
method (mind that SQLITE
dialect is being used). The result set returned is post-processed to include the abridged WKT
representation of the geometries. The result sets have been converted subsequently into a markdown
table for displaying in this guide.
More examples can be found in the samples document.
SELECT
BLKID
, BORONAME
, CAST(POPN_TOTAL AS INTEGER) AS TotalPop
, CAST(POPN_WHITE AS INTEGER) AS WhitePop
, CAST(POPN_TOTAL - POPN_WHITE AS INTEGER) AS NonWhitePopulation
FROM
census_blocks
ORDER BY
POPN_TOTAL DESC
LIMIT
5
BLKID | BORONAME | TotalPop | WhitePop | NonWhitePopulation | |
---|---|---|---|---|---|
0 | 360050001001000 | The Bronx | 8634 | 1242 | 7392 |
1 | 360610238011009 | Manhattan | 4970 | 2704 | 2266 |
2 | 360810797012000 | Queens | 4563 | 436 | 4127 |
3 | 360610155006000 | Manhattan | 4067 | 3507 | 560 |
4 | 360471034001004 | Brooklyn | 3918 | 605 | 3313 |
SELECT
BORONAME
, SUM(POPN_TOTAL) AS TotalPop
FROM
census_blocks
GROUP BY
BORONAME
HAVING
SUM(POPN_TOTAL) > 1500000
ORDER BY
TotalPop DESC
BORONAME | TotalPop | |
---|---|---|
0 | Brooklyn | 2504700.0000 |
1 | Queens | 2230621.0000 |
2 | Manhattan | 1585873.0000 |
SELECT *
FROM
homicides
WHERE
ST_Distance(Shape, (SELECT shape FROM subway_stations WHERE name = 'Essex St')) < 300
INCIDENT_D | BORONAME | NUM_VICTIM | PRIMARY_MO | ID | WEAPON | LIGHT_DARK | YEAR | Shape | |
---|---|---|---|---|---|---|---|---|---|
0 | 2008/05/13 00:00:00 | Manhattan | 1 | 144.0000 | knife | D | 2008.0000 | POINT (585477.5157 4508066.903) | |
1 | 2003/01/12 00:00:00 | Manhattan | 1 | ROBBERY/BURGLARY | 721.0000 | gun | 2003.0000 | POINT (585252.013 4507899.3078) | |
2 | 2003/06/10 00:00:00 | Manhattan | 1 | DRUGS | 728.0000 | gun | 2003.0000 | POINT (585747.6144 4507957.7557) | |
3 | 2004/10/25 00:00:00 | Manhattan | 1 | DOMESTIC | 1404.0000 | other | 2004.0000 | POINT (585688.4776 4508016.8018) | |
4 | 2005/01/27 00:00:00 | Manhattan | 1 | ROBBERY/BURGLARY | 1553.0000 | gun | 2005.0000 | POINT (585710.8339 4508019.2805) | |
5 | 2005/08/17 00:00:00 | Manhattan | 1 | REVENGE | 1825.0000 | knife | 2005.0000 | POINT (585404.4815 4508200.4864) | |
6 | 2007/07/06 00:00:00 | Manhattan | 1 | DISPUTE | 2834.0000 | knife | D | 2007.0000 | POINT (585305.7312 4508053.368) |
7 | 2011/04/10 00:00:00 | Manhattan | 1 | 4171.0000 | gun | 2011.0000 | POINT (585724.1624 4508111.4603) |
Find neighborhoods with the largest number of crimes committed (count number of homicides in each neighborhood) (ST_Contains
)
SELECT
Polys.Name AS NeighborhoodName
, Count(*) AS CrimeCount
FROM
Homicides AS Points
JOIN
Neighborhoods AS Polys
ON
ST_Contains(Polys.Shape, Points.Shape) = 1
AND
Points.YEAR in (2008, 2009, 2010)
GROUP BY
Polys.Name
ORDER BY
CrimeCount DESC
LIMIT 10
NeighborhoodName | CrimeCount | |
---|---|---|
0 | Bedford-Stuyvesant | 113 |
1 | South Bronx | 66 |
2 | Brownsville | 50 |
3 | East Brooklyn | 43 |
4 | Harlem | 41 |
5 | Jamaica | 37 |
6 | Tremont | 30 |
7 | Wakefield-Williamsbridge | 29 |
8 | Mott Haven | 27 |
9 | Fort Green | 25 |
- FileGDB API to add editing mode of geodatabase (supported by ogr)
Did you find a bug? Do you think there is some other functionality that should be added? Please let me know by submitting an issue.
MIT