# 1. Gathering Information about the files in source folder:

Blow Info is displayed after gathering the details from the `source-dir`
* List total number of files 
* List of all File Formats
* Check if it matches the MASTER_REGEX 

Information is also logged inside a csv file named `info-ddmmyy-hhmmss.csv` after information is gathered about the `source-dir`

## [1.1] Get `configs.ini` : 

In [1]:
# https://stackoverflow.com/questions/8884188/how-to-read-and-write-ini-file-with-python3
import configparser
config = configparser.ConfigParser()

In [2]:
config.read('configs.ini')
source_dir      = config['INFO']['source_dir']
destination_dir = config['INFO']['destination_dir'] 
info_dir        = config['INFO']['info_dir'] 

MASTER_REGEX_PHOTOS_1 = config['NOEDIT']['MASTER_REGEX_PHOTOS_1'] 
MASTER_REGEX_PHOTOS_2 = config['NOEDIT']['MASTER_REGEX_PHOTOS_2'] 
MASTER_REGEX_VIDEOS_1 = config['NOEDIT']['MASTER_REGEX_VIDEOS_1'] 
MASTER_REGEX_VIDEOS_2 = config['NOEDIT']['MASTER_REGEX_VIDEOS_2'] 


In [3]:
print("Source Directory: ",source_dir)
print("Destionation Directory: ",destination_dir)
print("Log Info Directory: ",info_dir)

Source Directory:  E:/R_PhotosVideos/R_Photos_Unsorted/Plain-11
Destionation Directory:  E:/R_PhotosVideos/R_Photos_Sorted
Log Info Directory:  E:/R_PhotosVideos/PythonPhotoSort/logs


In [4]:
print("MASTER_REGEX_PHOTOS_1 : ",MASTER_REGEX_PHOTOS_1)
print("MASTER_REGEX_PHOTOS_2 : ",MASTER_REGEX_PHOTOS_2)
print("MASTER_REGEX_VIDEOS_1 : ",MASTER_REGEX_VIDEOS_1)
print("MASTER_REGEX_VIDEOS_2 : ",MASTER_REGEX_VIDEOS_2)

MASTER_REGEX_PHOTOS_1 :  ^[iImMgG]{3}[-_]([0-9]{8})[-_].*\.(?:jpg|jpeg)$
MASTER_REGEX_PHOTOS_2 :  ^[iImMgG]{3}([0-9]{8}).*\.(?:jpg|jpeg)$
MASTER_REGEX_VIDEOS_1 :  ^[VvIiDdvideo]{3,}[-_]([0-9]{8})[-_].*\.(?:mp4)$
MASTER_REGEX_VIDEOS_2 :  ^[VvIiDd]{3}([0-9]{8}).*\.(?:mp4)$


## [1.2] Loop through folder and fetch info: 


In [5]:
import re
import os
from datetime import datetime
import shutil

In [6]:
input_folder_path  = source_dir
output_folder_path = destination_dir


In [7]:
# Log file name 
temp      = "info-" + datetime.now().strftime("%Y%m%d-%H%M%S") + ".csv"
csv_file  = os.path.join(info_dir, temp)
print("Info CSV File: ", csv_file) 

Info CSV File:  E:/R_PhotosVideos/PythonPhotoSort/logs\info-20240519-154135.csv


In [8]:
FILE_COUNT = 0
FILE_EXT_LIST = set([])
FILE_EXT_COUNTER = []

In [9]:
csvfile_handle = open(csv_file, "w", encoding="utf-8")
csvfile_handle.write("Filename;Extension;Matched_Regex_Name;Extracted_Date;Path;Null\n")
print()




In [10]:
src = input_folder_path
dst = output_folder_path

# OS.walk() generate the file names in a directory tree (nested subfolders) by walking the tree either top-down or bottom-up.
for root, subdirs, files in os.walk(src):
    for file in files:
        path = os.path.join(root, file)
        
        _filenameonly = file   # e.g. IMG_20150829_141244.jpg
        _extension = os.path.splitext(file)[1] # e.g. jpg

        matched_regex_name = "None"
        _extracted_ts      = "Null" # extracted timestamp 

        ## Try matching against each regex listed in config.ini

        m1 = re.search(r'{}'.format(MASTER_REGEX_PHOTOS_1), file)
        if m1: 
            _extracted_ts = m1.group(1)
            matched_regex_name = "MASTER_REGEX_PHOTOS_1"
            
        m2 = re.search(r'{}'.format(MASTER_REGEX_PHOTOS_2), file)
        if m2: 
            _extracted_ts = m2.group(1)
            matched_regex_name = "MASTER_REGEX_PHOTOS_2"

        m3 = re.search(r'{}'.format(MASTER_REGEX_VIDEOS_1), file)
        if m3: 
            _extracted_ts = m3.group(1)
            matched_regex_name = "MASTER_REGEX_VIDEOS_1"

        m4 = re.search(r'{}'.format(MASTER_REGEX_VIDEOS_2), file)
        if m4: 
            _extracted_ts = m4.group(1)
            matched_regex_name = "MASTER_REGEX_VIDEOS_2"

        
            
        _fullfilepath = path   # e.g. H:/myfolder/IMG_20150829_141244.jpg
 
        #  Filename;Extension;Matched_Regex_Name;Extracted_Date;Path;Null\n
        csv_line = _filenameonly + ";" + _extension + ";" + matched_regex_name + ";" + \
                  _extracted_ts + ";" + _fullfilepath + ";NOTHING;\n"
        
        csvfile_handle.write(csv_line)
        
        # Increment File Count
        FILE_COUNT = FILE_COUNT+ 1
        # Add file extension to set
        FILE_EXT_LIST.add(_extension)
        # Group counter
        FILE_EXT_COUNTER.append(_extension)
 

csvfile_handle.close()

## [1.3] Print Info: 

In [11]:
print("Number of files: ",FILE_COUNT)
print("List of file extensions: ",FILE_EXT_LIST)
print("Info written to CSV File : ", csv_file) 

Number of files:  189
List of file extensions:  {'.jpg', '.mp4'}
Info written to CSV File :  E:/R_PhotosVideos/PythonPhotoSort/logs\info-20240519-154135.csv


In [12]:
from collections import Counter
import pandas as pd
filecount_by_ext = Counter(FILE_EXT_COUNTER)
df = pd.DataFrame.from_records(list(dict(filecount_by_ext).items()), columns=['extensions','count'])
df.sort_values(by=['count'],inplace=True, ascending=False)
df.head()


Unnamed: 0,extensions,count
0,.jpg,184
1,.mp4,5
