# Folder Usage Analysis

This Jupyter Notebook performs a comprehensive analysis of the folders and subfolders located in `C:\study` (for Windows) or `/mnt/c/study` (for WSL/Linux). The analysis includes:

- Ranking folders and subfolders from most used to least used based on their size.
- Displaying useful details such as total size, number of files, and last modified date.
- Visualizing the top folders using bar charts.

### Instructions to Use:

1. **Run Each Cell in Order**: Execute each cell sequentially to perform the analysis.
2. **Ensure Correct Path**: Verify that the path `C:\study` (Windows) or `/mnt/c/study` (WSL/Linux) exists on your system.

## Importing Necessary Libraries

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from pathlib import Path
import sys

## Define the Path to Analyze

In [None]:
def get_study_path():
    if os.name == 'nt':  # For Windows
        return Path('C:/study')
    else:  # For Unix/Linux (e.g., WSL)
        return Path('/mnt/c/study')

study_path = get_study_path()

if not study_path.exists():
    print(f"The path {study_path} does not exist. Please check the path and try again.")
    sys.exit(1)

print(f"Analyzing folders in: {study_path}")

## Function to Get Folder Statistics

In [None]:
def get_folder_stats(path):
    total_size = 0
    total_files = 0
    last_modified = None
    try:
        for root, dirs, files in os.walk(path):
            for f in files:
                fp = os.path.join(root, f)
                try:
                    total_size += os.path.getsize(fp)
                    total_files += 1
                    mtime = os.path.getmtime(fp)
                    if last_modified is None or mtime > last_modified:
                        last_modified = mtime
                except Exception as e:
                    print(f"Error accessing file {fp}: {e}")
    except Exception as e:
        print(f"Error walking through {path}: {e}")

    return {
        'path': str(path),
        'size_bytes': total_size,
        'num_files': total_files,
        'last_modified': datetime.fromtimestamp(last_modified) if last_modified else None
    }

## Traverse Directories and Collect Data

In [None]:
# List to store folder statistics
folder_stats = []

# Iterate through all folders and subfolders
for root, dirs, files in os.walk(study_path):
    for d in dirs:
        folder_path = Path(root) / d
        stats = get_folder_stats(folder_path)
        folder_stats.append(stats)

# Also include the root study_path itself
root_stats = get_folder_stats(study_path)
folder_stats.append(root_stats)

## Create a Pandas DataFrame from the Collected Data

In [None]:
df = pd.DataFrame(folder_stats)

# Convert size from bytes to a more readable format (e.g., MB)
df['size_MB'] = df['size_bytes'] / (1024 * 1024)

# Sort the DataFrame by size in descending order
df_sorted = df.sort_values(by='size_MB', ascending=False).reset_index(drop=True)

df_sorted

## Display Top 20 Largest Folders

In [None]:
top_n = 20
df_top = df_sorted.head(top_n)

df_top_display = df_top[['path', 'size_MB', 'num_files', 'last_modified']]
df_top_display.columns = ['Folder Path', 'Size (MB)', 'Number of Files', 'Last Modified']

df_top_display

## Visualize the Top 20 Largest Folders

In [None]:
plt.figure(figsize=(12, 8))
plt.barh(df_top_display['Folder Path'][::-1], df_top_display['Size (MB)'][::-1], color='skyblue')
plt.xlabel('Size (MB)')
plt.title('Top 20 Largest Folders in Study Directory')
plt.tight_layout()
plt.show()

## Additional Analysis (Optional)

You can extend this notebook to include additional analyses such as:

- **File Type Distribution**: Analyze the types of files present in each folder.
- **Access Patterns**: Determine which folders are accessed most frequently based on `last_modified` dates.
- **Growth Over Time**: Track how the size of each folder has changed over time (requires historical data).
- **Duplicate Files**: Identify and handle duplicate files within folders.


## Conclusion

This notebook provides a foundational framework for analyzing folder usage within a specified directory. By customizing and extending the provided code, you can gain deeper insights into your data storage patterns and optimize your file organization accordingly.