Skip to content

bytebutcher/pydfql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pydfql Logo

Python Display Filter Query Language

made-with-python PyPI GitHub Build Status Coverage


Python Display Filter Query Language (PyDFQL) offers an intuitive and powerful query language, similar to Wireshark's display filter, for working with various data structures and formats, including Python dictionaries, lists, objects, and SQL databases.

Table of Contents

  1. Quick Start

    1.1 Installation

    1.2 Initialization

    1.3 Filtering Data

  2. Features

  3. Examples

  4. Acknowledgements

Quick Start

To quickly get started follow the steps below:

Installation

First, install the package using pip:

pip3 install pydfql

Initialization

Next, import the necessary module and initialize the appropriate display filter with some data. In the example below we are initializing the ObjectDisplayFilter with a list of objects:

from dataclasses import dataclass
from pydfql import ObjectDisplayFilter

@dataclass
class Actor:
    name: list
    age: dict
    gender: str

actors = [
    Actor(["Laurence", "Fishburne"], {"born": "1961"}, "male"),
    Actor(["Keanu", "Reeves"], {"born": "1964"}, "male"),
    Actor(["Joe", "Pantoliano"], {"born": "1951"}, "male"),
    Actor(["Carrie-Anne", "Moss"], {"born": "1967"}, "female")
]

df = ObjectDisplayFilter(actors)

Note, that PyDFQL supports various other data sources like Python dictionaries, lists and SQL databases.

Filtering Data

Once the display filter is initialized, you can start filtering the data using the display filter query language. For example, let's filter the actors whose birth year is after 1960:

filter_query = "age.born > 1960"
filtered_data = df.filter(filter_query)
print(list(filtered_data))
[
    Actor(name=['Laurence', 'Fishburne'], age={'born': '1961'}, gender='male'),
    Actor(name=['Keanu', 'Reeves'], age={'born': '1964'}, gender='male'),
    Actor(name=['Carrie-Anne', 'Moss'], age={'born': '1967'}, gender='female')
]

You can also use more complex queries to filter the data. For example, let's filter male actors born between 1960 and 1964 whose names end with "e":

filter_query = "gender == male and (age.born > 1960 and age.born < 1965) and name matches .*e$"
filtered_data = df.filter(filter_query)
print(list(filtered_data))
[
   Actor(name=['Laurence', 'Fishburne'], age={'born': '1961'}, gender='male')
]

Features

Overall, PyDFQL supports a wide range of features, including:

  • Data Sources: Dictionaries, Lists, Objects, SQL Databases
  • Comparison Operators: ==, !=, <=, <, >=, >, ~=, ~, &
  • Combining Operators: and, or, xor, not
  • Membership Operators: in
  • Types: Text, Number, Date & Time, Ethernet-, IPv4-, IPv6-Address
  • Slicing: Text, Ethernet-, IPv4-, IPv6-Address
  • Functions: upper, lower, len

For a detailed description of the individual features check out the User Guide.

Examples

PyDFQL can be applied in many contexts due to its flexible design. It is well-suited for working with various data formats and can be easily integrated into your data analysis workflow. Here are some examples where PyDFQL can be particularly useful:

Acknowledgements

This project wouldn't be possible without these awesome projects: