# How to get a Pandas dataframe with the contents of the filesystem

I recently had a need to find a notebook that I had recently modified but didn't remember where it was saved.
I'm sure there's a built in method to accomplish this on various operating systems, but nothing seemed to be 
working for me.

But when you have a hammer ...

In [1]:
import datetime
from pathlib import  Path
import collections

import pandas as pd

### Gather a list of the files we want to look through.

We'll be using pathlib for this but this could be accomplished with other means in Python (shutil, os.path, etc.)

* You'll need to update the directory and query string below

In [2]:
p = Path(r"C:\projects")  # Change this directory to be directory we'll be starting from.
files = p.glob("**\*.ipynb") # Change this extension / query string as needed.

### Now we'll loop through the files we've gathered.

In [3]:
rows = []
for n in files:
    row = collections.OrderedDict()
    row['name'] = n.name
    row['path'] = str(n)
    
    stat = n.stat()
    row['size'] = stat.st_size
    lstat = n.lstat()
    row['date_created'] = datetime.datetime.fromtimestamp(n.lstat().st_ctime)
    row['date_accessed'] = datetime.datetime.fromtimestamp(n.lstat().st_atime)
    row['date_modified'] = datetime.datetime.fromtimestamp(n.lstat().st_mtime)
    rows.append(row)

### Convert this to a Pandas dataframe

In [4]:
df = pd.DataFrame(rows)
df.head(3)

Unnamed: 0,name,path,size,date_created,date_accessed,date_modified
0,BatVideoPreProcessing.ipynb,C:\projects\BatTurbines\notebooks\BatVideoPreP...,768660,2018-11-05 07:37:16.356656,2018-11-05 07:37:16.356656,2018-11-05 08:25:17.242419
1,InitialDataExploration-backup.ipynb,C:\projects\BatTurbines\notebooks\InitialDataE...,2234543,2018-09-20 08:21:11.492632,2018-09-20 08:21:11.492632,2018-09-20 10:46:07.075862
2,InitialDataExploration.ipynb,C:\projects\BatTurbines\notebooks\InitialDataE...,285760,2018-09-17 16:38:34.138953,2018-09-17 16:38:34.138953,2018-10-03 12:01:40.687842


### We can then use this dataframe for easy and powerful searching.

In [5]:
df.sort_values('date_modified', ascending=False)

Unnamed: 0,name,path,size,date_created,date_accessed,date_modified
247,QueryDirectoryContents.ipynb,C:\projects\notebooks\misc_utilities\QueryDire...,48760,2019-03-07 08:34:05.222089,2019-03-07 08:34:05.222089,2019-03-07 09:15:20.894413
170,CleanUpRGFO.ipynb,C:\projects\misc\CleanUpRGFO.ipynb,27703,2019-02-27 11:12:25.837446,2019-03-05 07:42:13.119110,2019-03-07 09:12:14.194383
213,BatchUpdateGuanoMD.ipynb,C:\projects\notebooks\bats\BatchUpdateGuanoMD....,4673,2019-03-06 08:08:29.467091,2019-03-06 08:08:29.467091,2019-03-07 08:59:25.432336
221,BatchUpdateGuanoMD-checkpoint.ipynb,C:\projects\notebooks\bats\.ipynb_checkpoints\...,4673,2019-03-06 08:08:29.481091,2019-03-06 08:08:29.467091,2019-03-07 08:59:25.432336
271,Untitled.ipynb,C:\projects\notebooks\scratch\Untitled.ipynb,1509,2019-03-06 12:53:02.255170,2019-03-06 12:53:02.255170,2019-03-06 12:57:03.493692
273,Untitled-checkpoint.ipynb,C:\projects\notebooks\scratch\.ipynb_checkpoin...,72,2019-03-06 12:53:02.276670,2019-03-06 12:53:02.255170,2019-03-06 12:53:02.259668
176,CleanUpRGFO-checkpoint.ipynb,C:\projects\misc\.ipynb_checkpoints\CleanUpRGF...,25646,2019-02-27 11:12:25.857448,2019-03-05 07:42:13.119110,2019-03-06 12:52:08.317927
248,QueryDirectoryContents-checkpoint.ipynb,C:\projects\notebooks\misc_utilities\.ipynb_ch...,31534,2019-03-07 08:45:18.790528,2019-03-07 08:34:05.222089,2019-03-06 10:15:12.574596
270,QueryDirectoryContents.ipynb,C:\projects\notebooks\scratch\QueryDirectoryCo...,31534,2018-11-15 17:35:56.803119,2018-11-15 17:35:56.803119,2019-03-06 10:15:12.574596
173,ConvertPost2SQLite.ipynb,C:\projects\misc\ConvertPost2SQLite.ipynb,7560,2019-02-27 11:12:25.837446,2019-02-27 11:12:25.837446,2019-02-28 10:53:03.051429
