**Imports**

pandas for dataframes

os for file searching and path manipulation

rasterio for accessing .tif files

re for regex pattern finding in str

IPython.display.clear_output for another way of outputting progress with print()

In [1]:
import pandas as pd
import os
import rasterio as rio
import re
from IPython.display import clear_output

**File Searcher**

loops through all files in a certain path, and applies a given function to every file of the desired type

In [2]:
# get the relative path of the current project that this notebook is in
project_path = os.path.abspath(os.path.join("readfiles.ipynb", os.pardir, os.pardir))
print("Found project in: ", project_path)
data_path = os.path.join(project_path, "Data")

external_data_path = "/media/seppe/DiskSpaceInvader/3DHouseData"


def search_files(fpath: str, file_type: str, func):
    total_files = 0
    # Walk through the folder containing the data, and check if the files match the file type
    # If so; open that file, and print how many there were found
    for root, dirs, files in os.walk(fpath):
        for name in files:
            if name.endswith(file_type):
                func(root + "/" + name)
                total_files += 1
    print(f"Found {total_files} {file_type} files.")
    return

# go through all files in the data path, 
# and store every .tif file in a list as a string containing the path to that file
all_tifs = []
search_files(external_data_path, ".tif", all_tifs.append)

Found project in:  /home/seppe/Projects/BeCode/3D-House-Project
Found 88150 .tif files.


**Filter**

Since we only want the files that have been divided by the splitter, we want to ignore all the .tifs that don't use the same naming configuration.

The files that we want to keep, will end in "_\d_\d_\d_\d.tif" where any \d is either 0, 1, 2 or 3.

In [3]:
def filter_list(lst:[str], txt_pattern:str) -> [str]:
    filtered = []
    for txt in lst:
        if re.search(txt_pattern, txt):
            filtered.append(txt)
    return filtered

# filter all tif files, and only select those that follow my naming convention
small_tifs = filter_list(all_tifs, "_\d_\d_\d_\d.tif")

# when checking the length of this list vs. the one containing all, 
# there should be 86 files less, since that's the number of original large tifs
print(len(small_tifs))
print(len(all_tifs))
print(len(all_tifs)-len(small_tifs) == 86)

88064
88150
True


**Dictionary**

Constructs a panda dataframe containing the necessary info about every .tif file in a list of tifs, so that these files shouldn't be opened to find a single location anymore. It's kind of like an address book, with every row containing the string with the path to the file, and a few key specifics of this file.

This dictionary is not really a python Dict type. I know, it's a terribly confusing name, but I really couldn't find anything more intuitive. SorryNotSorry

After constructing this, store it locally as a .csv so it can be accessed from anywhere at anytime.

In [4]:
def make_dict_panda(lst:[str]):
    # just a way of presenting progress in creating the panda to the user
    # divides the length of the input list into percentages
    lst_len = len(lst)
    onepercent = lst_len // 100
    i = 0
    percent_finished = 0
    
    d = []
    for fpath in lst:
        i += 1
        tif = rio.open(fpath)
        d.append(
            {
                'file_path': fpath,
                'id': fpath[-17:-4],    #store the substring of the filename that is relevant.
                                        # ex. DSM(k19_0_1_2_3_0).tif only what's inside the brackets is stored
                'left_bound': tif.bounds.left,
                'bottom_bound': tif.bounds.bottom,
                'right_bound': tif.bounds.right,
                'top_bound': tif.bounds.top,
                'width_bound': tif.width,
                'height_bound': tif.height
            }
        )     
        tif.close()
        
        # a more visually pleasing way of showing that this is working
        if i%onepercent == 0:
            percent_finished += 1
            print(f"{percent_finished}% | file {i} of {lst_len}")
            clear_output(wait=True)
    df = pd.DataFrame(d)
    return df

dict_panda = make_dict_panda(small_tifs)

# Save the DataFrame to a csv file so it doesn't have to be constructed every time.
dict_panda.to_csv(os.path.join(project_path, "overviewdataframe.csv"), encoding='utf-8', index=False)
dict_panda.info

KeyboardInterrupt: 