This notebook can be used to search through a list of directories (and their subfolders) for a list of keywords. It uses Geopandas to search for:

- Folder Names
- File Names
- Text in txt files (readmes)
- Attributes of shapefiles
- Names of feature classes in geodatabases
- Attributes of feature classes in geodatabases

Make sure to use the Python Central Clone, or a custom environment that has Fiona installed.


Currently, this seems to work, but it takes a really long time. There is likely a more efficient way to get to the attributes of shp and geodatabases without reading the entire thing?

Could replace the txt output with a csv for legibility

Another improvement could add support for regular expressions

In [None]:
# set parameters

# keywords are non case-sensitive
keywordList = [
    "HV1",
    "BRFN"
]

dirList = [
    #<place list of paths to scan here>
    r"\\xxx.zz\yy\z"
]

output_file = r'search_results.txt'

In [6]:
import os
import fiona

In [None]:
def get_shapefile_attributes(shapefile_path, layer=None):
    """
    Retrieves the field names and text attributes from a shapefile/geodatabase feature class without reading the geometry.
    
    Parameters:
    shapefile_path (str): The path to the shapefile or file geodatabase.
    layer (str): If a file geodatabase is specified, enter the feature class name. Else leave as None.
    
    Returns:
    A string containing the field names and text attributes.
    """
    with fiona.open(shapefile_path, 'r', layer=layer) as shapefile:
        # Get the field names
        field_names = shapefile.schema['properties'].keys()
        
        # Get the text attributes
        text_attributes = []
        for feature in shapefile:
            text_attributes.append(', '.join([str(feature['properties'][field]) for field in field_names if isinstance(feature['properties'][field], str)]))
        
        # Combine the field names and text attributes into a single string
        output = '\n'.join([', '.join(field_names), '\n'.join(text_attributes)])
        
        return output
    
def search_and_save(folder_path, search_string_list, output_file):
    """
    Searches through a folder and all its subfolders for files or folders that contain a specific string,
    and saves the names in a text file.
    
    Parameters:
    folder_path (str): The path to the folder to search.
    search_string (str): The string to search for.
    output_file (str): The path to the output text file.
    """
    # open output txt file
    with open(output_file, 'a') as f:
        for root, dirs, files in os.walk(folder_path):
            print(f"{os.path.basename(root)}: searching {len(dirs)} folders and {len(files)} files")
            for dir_or_file in dirs + files:
                path = os.path.join(root, dir_or_file)
                
                # loop through search strings
                for search_string in search_string_list:
                    if search_string.lower() in dir_or_file.lower():
                        f.write(f"{search_string}: {path}\n") # write to output on success

                # use try for this next part so that it continues on a failure
                try:
                    # for txt files:
                    if dir_or_file[-4:] =='.txt':
                        with open(path, 'r') as file:
                            txt_str = file.read()
                            for search_string in search_string_list:
                                if search_string.lower() in txt_str.lower():
                                    f.write(f"\t{search_string}: Text in {path}\n")

                    
                    # for shapefiles and file geodatabases:
                    if dir_or_file[-4:] == '.shp':
                        shp_str = get_shapefile_attributes(path)
                        for search_string in search_string_list:
                            if search_string.lower() in shp_str.lower():
                                f.write(f"\t{search_string}: Attribute in {path}\n")
                        del shp_str
                    elif dir_or_file[-4:] == '.gdb':
                        fcList = fiona.listlayers(path)
                        print(f"\tSearching {len(fcList)} feature classes in {dir_or_file}")
                        for fc in fcList:
                            fc_str =  get_shapefile_attributes(path, fc)
                            for search_string in search_string_list:
                                if search_string.lower() in fc.lower():
                                    f.write(f"\t{search_string}: In feature class name: {path}\\{fc}\n")
                                if search_string.lower() in fc_str.lower():
                                    f.write(f"\t{search_string}: Attribute in {path}\\{fc}\n")
                            del fc_str
                    
                except Exception as e:
                    f.write(f"\nERROR: {e} on {path}\n")

In [14]:
for dir in dirList:
    search_and_save(dir, keywordList, output_file)

RSEA_LEVELCD_AUG_2018.gdb: searching 0 folders and 72 files
RSEA_Disturbance2024Saulteau: searching 1 folders and 4 files
FinalResultsgdb: searching 0 folders and 0 files
South Peace Strategy Planning: searching 0 folders and 9 files
Strategic_Land_and_Resource_Plans_SLRP: searching 1 folders and 1 files
Sulphur_8Mile: searching 1 folders and 2 files
sulphur_8mile: searching 6 folders and 0 files
info: searching 0 folders and 71 files
lbnd_s8me: searching 0 folders and 18 files
lelk_s8me: searching 0 folders and 18 files
lgoat_s8me: searching 0 folders and 18 files
lmoose_s8me: searching 0 folders and 18 files
lsheep_s8me: searching 0 folders and 18 files
TreatiesPre1975: searching 2 folders and 7 files
	Searching 1 feature classes in Traite_Post_1975_Treaty_FGDB.gdb
	Searching 1 feature classes in Traite_Pre_1975_Treaty_FGDB.gdb
Traite_Post_1975_Treaty_FGDB.gdb: searching 0 folders and 47 files
Traite_Pre_1975_Treaty_FGDB.gdb: searching 0 folders and 46 files
TSR: searching 1 folders 