This notebook can be used to search through a list of directories (and their subfolders) for a list of keywords. It uses Geopandas to search for:

- Folder Names
- File Names
- Text in txt files (readmes)
- Attributes of shapefiles
- Names of feature classes in geodatabases
- Attributes of feature classes in geodatabases

Make sure to use the Python Central Clone, or a custom environment that has Fiona installed.


Currently, this seems to work, but it takes a really long time. There is likely a more efficient way to get to the attributes of shp and geodatabases without reading the entire thing?

Could replace the txt output with a csv for legibility

Another improvement could add support for regular expressions

In [9]:
# set parameters

# keywords are non case-sensitive
keywordList = [
    "HV1",
    "BRFN"
]

dirList = [
    r"\\spatialfiles.bcgov\work\srm\nr\NEGSS\NEDD\First_Nations_Agreements"
]

output_file = r'C:\Users\NROSS\OneDrive - Government of BC\Documents\GitHub\gis-pantry\scripts\keyword-search\search_results.txt'

In [6]:
import os
import fiona

In [None]:
def get_shapefile_attributes(shapefile_path, layer=None):
    """
    Retrieves the field names and text attributes from a shapefile without reading the geometry.
    
    Parameters:
    shapefile_path (str): The path to the shapefile or file geodatabase.
    layer (str): If a file geodatabase is specified, enter the feature class name. Else leave as null.
    
    Returns:
    A string containing the field names and text attributes.
    """
    with fiona.open(shapefile_path, 'r', layer=layer) as shapefile:
        # Get the field names
        field_names = shapefile.schema['properties'].keys()
        
        # Get the text attributes
        text_attributes = []
        for feature in shapefile:
            text_attributes.append(', '.join([str(feature['properties'][field]) for field in field_names if isinstance(feature['properties'][field], str)]))
        
        # Combine the field names and text attributes into a single string
        output = '\n'.join([', '.join(field_names), '\n'.join(text_attributes)])
        
        return output
    
def search_and_save(folder_path, search_string_list, output_file):
    """
    Searches through a folder and all its subfolders for files or folders that contain a specific string,
    and saves the names in a text file.
    
    Parameters:
    folder_path (str): The path to the folder to search.
    search_string (str): The string to search for.
    output_file (str): The path to the output text file.
    """
    # open output txt file
    with open(output_file, 'a') as f:
        for root, dirs, files in os.walk(folder_path):
            print(f"{os.path.basename(root)}: searching {len(dirs)} folders and {len(files)} files")
            for dir_or_file in dirs + files:
                path = os.path.join(root, dir_or_file)
                
                # don't waste time searching files within geodatabases:
                if dir_or_file[-4:] != '.gdb':
                    # loop through search strings
                    for search_string in search_string_list:
                        if search_string.lower() in dir_or_file.lower():
                            f.write(f"{search_string}: {path}\n") # write to output on success

                # use try for this next part so that it continues on a failure
                try:
                    # for txt files:
                    if dir_or_file[-4:] =='.txt':
                        with open(path, 'r') as file:
                            txt_str = file.read()
                            for search_string in search_string_list:
                                if search_string.lower() in txt_str.lower():
                                    f.write(f"\t{search_string}: Text in {path}\n")

                    
                    # for shapefiles and file geodatabases:
                    if dir_or_file[-4:] == '.shp':
                        shp_str = get_shapefile_attributes(path)
                        for search_string in search_string_list:
                            if search_string.lower() in shp_str.lower():
                                f.write(f"\t{search_string}: Attribute in {path}\n")
                        del shp_str
                    elif dir_or_file[-4:] == '.gdb':
                        fcList = fiona.listlayers(path)
                        print(f"\tSearching {len(fcList)} feature classes in {dir_or_file}")
                        for fc in fcList:
                            fc_str =  get_shapefile_attributes(path, fc)
                            for search_string in search_string_list:
                                if search_string.lower() in fc.lower():
                                    f.write(f"\t{search_string}: In feature class name: {path}\\{fc}\n")
                                if search_string.lower() in fc_str.lower():
                                    f.write(f"\t{search_string}: Attribute in {path}\\{fc}\n")
                            del fc_str
                    
                except Exception as e:
                    f.write(f"\nERROR: {e} on {path}\n")

In [11]:
for dir in dirList:
    search_and_save(dir, keywordList, output_file)

First_Nations_Agreements: searching 4 folders and 3 files
BRFN_Implementation_Agreement: searching 4 folders and 2 files
FileGeodatabase: searching 4 folders and 2 files
BRFNAgreement.gdb: searching 0 folders and 123 files
FOR_ScheduleK.gdb: searching 0 folders and 52 files
Pending_Permits_Schedule_O.gdb: searching 0 folders and 94 files
Schedule_O.gdb: searching 0 folders and 111 files
KML: searching 0 folders and 1 files
Shapefiles: searching 1 folders and 88 files
gundy_complex_plan_shp_11jun2024: searching 0 folders and 7 files
Shapefiles_NAD83CSRS: searching 0 folders and 10 files
Moose_Not_For_Distribution: searching 3 folders and 6 files
KML: searching 0 folders and 2 files
Moose_AOI_HEM_20201117.gdb: searching 0 folders and 82 files
Shapefile: searching 0 folders and 16 files
PeaceMoberlyTract: searching 0 folders and 7 files
T8: searching 6 folders and 11 files
Consensus.gdb: searching 0 folders and 73 files
Kihtsaadze_Tribal_Park: searching 2 folders and 0 files
Park_Boundary