# Data queries on single fields and download of LAZ point clouds

## Package imports

In [None]:
import sys
from pathlib import Path
import urllib.request

sys.path.append("..")
from pytreedb import db

## Import data

Specify the file where database is stored locally additionally to MongoDB.

In [None]:
mydbfile = "my_first_pytree.db"

Specify the location of the data to be imported. Here, we use a URL to a zipped folder with GeoJSON files for each tree to be added. You can also provide the path to a local directory containing the GeoJSON files.

In [None]:
data_url = "https://github.com/3dgeo-heidelberg/pytreedb/raw/main/data/test/geojsons.zip"

Define the (local) MongoDB connection and import data into pytreedb from URL.

In [None]:
mydb = db.PyTreeDB(dbfile=mydbfile)
mydb.import_data(data_url, overwrite=True)

## Query data

We can query data by providing a filter dictionary.

In a first example, let's extract all trees of the species _Abies alba_ and print their tree IDs:

In [None]:
res = mydb.query({"properties.species": "Abies alba"})
print(len(res))
[tree["properties"]["id"] for tree in res]

For how many trees do we have point cloud data of quality 1 (= highest quality)?

In [None]:
len(mydb.query({"properties.data.quality": 1}))

Now what if we want all trees with point clouds with a quality "grade" equal to or better than 2? We can do this by using numerical comparisons in our queries. 

- `$gt` = Matches values that are **greater than** a specified value
- `$gte` = Matches values that are **greater than or equal to** a specified value
- `$lt` = Matches values that are **less than** a specified value.
- `$lte` = Matches values that are **less than or equal to** a specified value.
- `$eq` = Matches values that are **equal to** a specified value.
- `$ne` = Matches all values that are **not equal to** a specified value.
- `$in` = matches **any** of the values specified **in an array**
- `$nin` = matches **none** of the values specified **in an array**

In [None]:
len(mydb.query({"properties.data.quality": {"$lte": 2}}))

Our trees have measurements (possibly recorded or estimated from different data sources). We can also query by these tree measurements. Let's find out how many trees are taller than 40 meters.

In [None]:
len(mydb.query({"properties.measurements.height_m": {"$gt": 40}}))

When we filter the trees by certain parameters, we are probably interested in downloading the tree point clouds to use them for our own analyses or applications.

Let's define some functions for downloading files from a URL.

In [None]:
def reporthook(count, block_size, total_size):
    percent = min(int(count * block_size * 100 / total_size), 100)
    print("\r...{}%".format(percent), end="")


def download_data(filename, url):
    if not Path(filename).exists():
        print(f"Downloading data from '{url}' to '{filename}'. Please wait ...")
        if not Path(filename).parent.exists():
            Path(filename).parent.mkdir()
        urllib.request.urlretrieve(url, filename, reporthook=reporthook)
        print("\nDownload finished")
    else:
        print("File already exists. Great!")

Next, we define an output folder. Change the value in the next cell to download the data to a different location.

In [None]:
output_folder = "../temp"

We now get all trees which are taller than 50 m using a query. We then create a list of download URLs, which are retrieved from the tree dictionaries. Finally, we download the data to our output folder.

In [None]:
query_res = mydb.query({"properties.measurements.height_m": {"$gt": 50}})
download_links = mydb.get_pointcloud_urls(query_res)

for url in download_links:
    download_data(Path(output_folder) / url.split("/")[-1], url)