Skip to content

PortellJ/awkql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AwkQL

Simple Awk-based SQL-like query language for CSVs.

Description

This small project was created as a convenience tool to easily and quickly examine large CSV or ECSV files from ESA's Gaia space astrometry mission.

It aims at providing basic SQL-like capabilities directly from the command line, yet with a slightly peculiar syntax.

Disclaimer

It was created before we knew about DuckDB and its amazing functionalities and performance on CSV files, so this tool is probably a bit useless now! Nevertheless, it's quite fun to see the kind of things that you can do just with shell scripting and Awk.

This tool doesn't do much error checks (if any at all), so you should double-check your commands and perhaps start with small tests before launching it on large files.

Usage

Basic use: single-file, single-thread

The main program is awkql.sh, and actually there's not much more than this (just the ancillary cols2nums.awk program). It processes just one CSV file in single-thread. Its usage is:

./awkql.sh select <'a , b , c...'|'count'|"\$0"> from <file.csv> where <condition>

Please note that the column names, either in the 'select' or in the 'where', must be surrounded by blank spaces. Otherwise the tool won't recognize them.

The output is written to the console, so you can redirect it to a file.

Some examples:

  • Just "describe" the file by listing its column names and also showing just the very first record:

    ./awkql.sh desc GaiaSource_005264-006601.csv

  • Count the records from a CSV file that fulfil certain conditions:

    ./awkql.sh select 'count' from GaiaSource_005264-006601.csv where '(( astrometric_params_solved >= 31) && ( parallax_over_error >= 10))'

  • Get some fields, apply operations on them, and just for records fulfilling certain condition - and store the output in a file:

    ./awkql.sh select 'phot_g_mean_mag + 5* (log( parallax )/log(10.0) + 1), bp_rp' from GaiaSource_005264-006601.csv where '(( astrometric_params_solved >= 31) && ( parallax > 0 ) && ( parallax_over_error >= 3))' > absmag.csv

Note how we carefully leave blank spaces around the column names, and also how we do NOT leave any space in at least one side of the multiplication (so that bash doesn't expand it as a wildcard).

  • Get all of the fields from records which have a specific column contained in a file:

    ./awkql.sh select "\$0" from GaiaSource_005264-006601.csv where '( source_id in sourcelist.txt )'

About

Simple Awk-based query language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors