# File Paths and the File System: Finding Your Data

Before Python can work with your lab data files, it needs to know **where to find them**. Just like you need to know which folder contains your experimental data, Python needs to understand file paths and the file system.

This is essential for working with:
- CSV files from lab instruments
- Image files from microscopes  
- Data exports from analysis software
- Results files you want to save

## What is a File Path?

A **file path** is like a postal address for your files. It tells the computer exactly where to find a file on your hard drive.

### Examples:
- **Windows**: `C:\Users\YourName\Documents\Lab_Data\experiment1.csv`
- **Mac/Linux**: `/Users/YourName/Documents/Lab_Data/experiment1.csv`
- **Google Colab**: `/content/experiment1.csv`

## Understanding Your Current Location

Python always has a "current working directory" - like the folder you're currently "standing in".

In [1]:
import os

# Where am I right now?
current_location = os.getcwd()
print(f"Current working directory: {current_location}")

# What files are here?
files_here = os.listdir('.')
print(f"\nFiles in current directory: {files_here}")

Current working directory: /Users/hh65/code/y3-bio-python/notebooks/lecture_1

Files in current directory: ['datatypes.ipynb', 'for_loops.ipynb', 'looping_and_appending.ipynb', 'lists.ipynb', 'reading_and_writing_files.ipynb', 'functions.ipynb', 'collab_notebooks.ipynb', 'paths_and_filesystem.ipynb', 'calculate_volume.ipynb', 'lab_calculator_toolkit.ipynb', 'variables_and_comments.ipynb', 'f_strings.ipynb']


## Absolute vs Relative Paths

### Absolute Paths
- Start from the root of your computer
- Always work, regardless of where you are
- Long but unambiguous

### Relative Paths  
- Start from your current location
- Shorter and more portable
- Depend on where you are

In [None]:
# Examples of different path types

# Relative paths (from current directory)
relative_examples = [
    "data.csv",                    # File in current directory
    "data/experiment1.csv",        # File in 'data' subdirectory
    "../results/analysis.csv",     # File in parent directory's 'results' folder
    "../../backup/old_data.csv"    # Two levels up, then into 'backup'
]

print("Relative path examples:")
for path in relative_examples:
    print(f"  {path}")

# Absolute path (this will be different on your computer!)
print("\nAbsolute path example:")
print(f"  {os.path.abspath('data.csv')}")

Relative path examples:
  data.csv
  data/experiment1.csv
  ../results/analysis.csv
  ../../backup/old_data.csv

Absolute path example:
  /Users/hh65/code/y3-bio-python/notebooks/lecture_1/data.csv


## Special Path Symbols

- `.` = Current directory
- `..` = Parent directory (one level up)
- `/` = Directory separator on Mac/Linux
- `\` = Directory separator on Windows (Python handles this automatically!)

In [None]:
# Demonstrating special symbols
print("Current directory (.):", os.listdir('.'))

# Try to list parent directory (might not work in all environments)
try:
    print("\nParent directory (..):", os.listdir('..')[:5])  # Show first 5 items
except:
    print("\nCannot access parent directory in this environment")

## Working with Paths in Python

Python's `os.path` module helps you work with file paths safely:

In [None]:
# Building paths safely (works on Windows, Mac, and Linux)
lab_folder = "Lab_Data"
experiment_folder = "Experiment_1"
filename = "results.csv"

# Join path components
full_path = os.path.join(lab_folder, experiment_folder, filename)
print(f"Constructed path: {full_path}")

# Check if a file exists
print(f"\nDoes this file exist? {os.path.exists(full_path)}")

# Get information about a path
sample_path = "Lab_Data/Experiment_1/results.csv"
print(f"\nPath analysis for: {sample_path}")
print(f"Directory: {os.path.dirname(sample_path)}")
print(f"Filename: {os.path.basename(sample_path)}")
print(f"File extension: {os.path.splitext(sample_path)[1]}")

## Exercise 1: Exploring Your File System

Let's practice navigating and understanding file paths:

In [None]:
# YOUR TASK: Complete these exercises

# 1. Print your current working directory
# YOUR CODE HERE

# 2. List all files in your current directory
# YOUR CODE HERE

# 3. Create a path to a hypothetical file called "protein_data.csv" 
#    in a folder called "experiments"
# YOUR CODE HERE

# 4. Check if the file "sample_data.csv" exists in the current directory
# YOUR CODE HERE

## Creating Sample Files for Practice

Let's create some sample files to work with:

In [None]:
# Create a simple CSV file for practice
sample_data = """Sample_ID,Concentration,Activity
Control,0,100
Drug_A_1uM,1,85
Drug_A_5uM,5,62
Drug_A_10uM,10,43
Drug_B_1uM,1,92
Drug_B_5uM,5,78
Drug_B_10uM,10,56"""

# Write to a file
filename = "sample_experiment.csv"
with open(filename, 'w') as file:
    file.write(sample_data)

print(f"Created file: {filename}")
print(f"File exists: {os.path.exists(filename)}")
print(f"File size: {os.path.getsize(filename)} bytes")

## Working with Directories

Sometimes you need to create folders or organize your files:

In [None]:
# Create a directory structure
data_dir = "Lab_Results"
experiment_dir = os.path.join(data_dir, "Week_1")

# Create directories if they don't exist
if not os.path.exists(data_dir):
    os.mkdir(data_dir)
    print(f"Created directory: {data_dir}")

if not os.path.exists(experiment_dir):
    os.makedirs(experiment_dir)  # Creates parent directories too
    print(f"Created directory: {experiment_dir}")

# List what we've created
print(f"\nContents of {data_dir}:")
if os.path.exists(data_dir):
    print(os.listdir(data_dir))

## Exercise 2: File Organization

Practice organizing files like you would in a real lab:

In [None]:
# YOUR TASK: Create a lab file organization system

# 1. Create a main directory called "My_Lab_Data"
# YOUR CODE HERE

# 2. Inside that, create subdirectories for different experiments:
#    - "Cell_Culture"
#    - "PCR_Results
#    - "Microscopy"
# YOUR CODE HERE

# 3. Create a file path for a hypothetical file called "pcr_gel_1.jpg" 
#    that would go in the PCR_Results folder
# YOUR CODE HERE

# 4. Check if your directory structure was created successfully
# YOUR CODE HERE

## Real-World Example: Lab Data Organization

Here's how you might organize files from a real experiment:

In [None]:
import datetime

# Function to create organized file paths
def create_lab_file_path(experiment_type, date, filename):
    """Create an organized file path for lab data."""
    year = date.year
    month = f"{date.month:02d}_{date.strftime('%B')}"
    
    path = os.path.join(
        "Lab_Data",
        str(year),
        month,
        experiment_type,
        filename
    )
    return path

# Examples of organized file paths
today = datetime.date.today()

file_examples = [
    ("PCR", "gel_electrophoresis_1.jpg"),
    ("Cell_Culture", "growth_curves.csv"),
    ("Microscopy", "fluorescence_images.tiff"),
    ("Protein_Assay", "bradford_results.xlsx")
]

print("Organized lab file paths:")
print("=" * 50)

for exp_type, filename in file_examples:
    path = create_lab_file_path(exp_type, today, filename)
    print(f"{exp_type:15}: {path}")

## Finding Files with Patterns

Sometimes you need to find all files matching a pattern:

In [None]:
import glob

# Create some sample files
sample_files = [
    "experiment_1.csv",
    "experiment_2.csv", 
    "experiment_3.csv",
    "notes.txt",
    "image_1.jpg",
    "image_2.jpg",
    "protocol.pdf"
]

# Create the files (empty ones for demonstration)
for filename in sample_files:
    with open(filename, 'w') as f:
        f.write(f"Sample content for {filename}")

print("Created sample files:")
print(sample_files)

# Find files with patterns
print("\nFinding files with patterns:")
print(f"All CSV files: {glob.glob('*.csv')}")
print(f"All image files: {glob.glob('*.jpg')}")
print(f"All experiment files: {glob.glob('experiment_*.csv')}")
print(f"All files: {glob.glob('*')}")

## Exercise 3: File Pattern Matching

Practice finding files with specific patterns:

In [None]:
# Create more sample files for practice
practice_files = [
    "data_2024_01_15.csv",
    "data_2024_01_16.csv",
    "data_2024_02_01.csv",
    "results_final.xlsx",
    "results_draft.xlsx",
    "image_control.png",
    "image_treatment.png",
    "protocol_v1.pdf",
    "protocol_v2.pdf"
]

for filename in practice_files:
    with open(filename, 'w') as f:
        f.write("sample")

# YOUR TASKS:
# 1. Find all files that start with "data_"
# YOUR CODE HERE

# 2. Find all Excel files (.xlsx)
# YOUR CODE HERE

# 3. Find all files from January 2024 (hint: they contain "2024_01")
# YOUR CODE HERE

# 4. Find all image files (.png)
# YOUR CODE HERE

## Best Practices for Lab File Management

### Good File Organization:
1. **Use descriptive names**: `pcr_optimization_2024_01_15.csv` not `data1.csv`
2. **Include dates**: `YYYY_MM_DD` format sorts chronologically
3. **Use consistent naming**: `experiment_01.csv`, `experiment_02.csv`
4. **Organize by project/date**: `Project_A/2024/January/data.csv`
5. **Avoid spaces and special characters**: Use `_` or `-` instead

### Path Safety:
- Always use `os.path.join()` to build paths
- Check if files exist before trying to open them
- Use relative paths when possible for portability

In [None]:
# Example of safe file handling
def safe_file_access(filename):
    """Safely check and access a file."""
    if os.path.exists(filename):
        file_size = os.path.getsize(filename)
        print(f"✓ File '{filename}' exists ({file_size} bytes)")
        return True
    else:
        print(f"✗ File '{filename}' not found")
        return False

# Test with our sample files
test_files = ["sample_experiment.csv", "nonexistent_file.csv"]

for filename in test_files:
    safe_file_access(filename)

## Exercise 4: Complete File Management System

Create a complete file management system for a lab:

In [None]:
def create_lab_structure(project_name):
    """Create a complete lab directory structure."""
    # YOUR TASK: Complete this function
    # 1. Create main project directory
    # 2. Create subdirectories: Raw_Data, Analysis, Results, Protocols
    # 3. Return a dictionary with all the paths
    
    base_dir = project_name
    subdirs = ["Raw_Data", "Analysis", "Results", "Protocols"]
    
    paths = {}
    
    # YOUR CODE HERE
    
    return paths

# Test your function
project_paths = create_lab_structure("Drug_Screening_Project")
print("Created lab structure:")
for folder, path in project_paths.items():
    print(f"{folder}: {path}")

## Cleanup: Removing Practice Files

Let's clean up the files we created for practice:

In [None]:
import shutil

# List of files and directories to clean up
cleanup_items = [
    # Files
    "sample_experiment.csv",
    "experiment_1.csv", "experiment_2.csv", "experiment_3.csv",
    "notes.txt", "image_1.jpg", "image_2.jpg", "protocol.pdf"
]

# Add practice files
cleanup_items.extend(practice_files)

# Clean up files
cleaned = 0
for item in cleanup_items:
    if os.path.exists(item):
        os.remove(item)
        cleaned += 1

# Clean up directories (be careful with this!)
cleanup_dirs = ["Lab_Results", "Lab_Data", "My_Lab_Data", "Drug_Screening_Project"]
for directory in cleanup_dirs:
    if os.path.exists(directory):
        shutil.rmtree(directory)
        cleaned += 1

print(f"Cleaned up {cleaned} items")
print("Workspace is clean!")

## Summary: Key Concepts

### File Paths
- **Absolute paths**: Full address from root directory
- **Relative paths**: Address from current location
- **Special symbols**: `.` (current), `..` (parent), `/` (separator)

### Essential Functions
- **`os.getcwd()`**: Get current directory
- **`os.listdir(path)`**: List files in directory
- **`os.path.join()`**: Build paths safely
- **`os.path.exists()`**: Check if file/directory exists
- **`glob.glob(pattern)`**: Find files matching pattern

### Best Practices
- Use descriptive filenames with dates
- Organize files in logical directory structures
- Always check if files exist before accessing
- Use `os.path.join()` for cross-platform compatibility

## Next Up: Looping and Appending Data

Now that you understand how to find and organize files, you'll learn how to:
- Process multiple files automatically
- Build up data collections piece by piece
- Prepare data for reading and writing

Understanding file paths is the foundation - everything else builds on this!