## Setup

Before running this demo, ensure you have:
1. Docker and docker-compose installed
2. Started the HDFS environment: `docker-compose up -d`
3. Configuration file at `~/.webhdfsmagic/config.json`

In [None]:
# Load the extension
%load_ext webhdfsmagic

In [None]:
# View help and available commands
%hdfs help

In [None]:
# Check current configuration
import json
import os

config_path = os.path.expanduser('~/.webhdfsmagic/config.json')
with open(config_path) as f:
    config = json.load(f)
    
print("Current configuration:")
print(f"  URL: {config['knox_url']}{config['webhdfs_api']}")
print(f"  User: {config['username']}")
print(f"  SSL: {config['verify_ssl']}")

## 1Ô∏è‚É£ Directory Listing

In [None]:
# List root directory
%hdfs ls /

## 2Ô∏è‚É£ Creating Directories

In [None]:
# Create a test directory
%hdfs mkdir /demo

In [None]:
# Create nested directories
%hdfs mkdir /demo/data

In [None]:
# Verify directory creation
%hdfs ls /

In [None]:
# List contents of demo directory
%hdfs ls /demo

## 3Ô∏è‚É£ Uploading Files

In [None]:
# Create a local test file
import pandas as pd

# Create sample data
df = pd.DataFrame({
    'id': range(1, 11),
    'customer': [f'Customer{i}' for i in range(1, 11)],
    'amount': [100.5 * i for i in range(1, 11)]
})

# Save locally
df.to_csv('test_data.csv', index=False)
print("File test_data.csv created:")
print(df.head())

In [None]:
# Upload to HDFS
%hdfs put test_data.csv /demo/data/customers.csv

In [None]:
# Verify file exists
%hdfs ls /demo/data

## 4Ô∏è‚É£ Reading Files

In [None]:
# Read file content
%hdfs cat /demo/data/customers.csv

In [None]:
# Read only first 5 lines
%hdfs cat -n 5 /demo/data/customers.csv

## 5Ô∏è‚É£ Downloading Files

In [None]:
# Download from HDFS
%hdfs get /demo/data/customers.csv ./downloaded_customers.csv

In [None]:
# Verify downloaded file
df_downloaded = pd.read_csv('downloaded_customers.csv')
print("File downloaded from HDFS:")
print(df_downloaded)

## 6Ô∏è‚É£ Complete Workflow Example

In [None]:
# Generate multiple sales data files
from datetime import datetime, timedelta

print("üìä Generating sales data...")

for i in range(3):
    date = datetime.now() - timedelta(days=i)
    date_str = date.strftime('%Y%m%d')
    
    # Generate data
    df_sales = pd.DataFrame({
        'date': [date.strftime('%Y-%m-%d')] * 10,
        'product_id': range(1, 11),
        'quantity': [10 + i*5 + j for j in range(10)],
        'price': [50.0 + j*10 for j in range(10)]
    })
    
    filename = f'sales_{date_str}.csv'
    df_sales.to_csv(filename, index=False)
    
    print(f"  Created: {filename} ({len(df_sales)} rows)")

print("\n‚úì Data generated")

In [None]:
# Create destination directory
%hdfs mkdir /demo/sales

In [None]:
# Upload all files using wildcards
%hdfs put sales_*.csv /demo/sales/

In [None]:
# Verify uploaded files
print("üìÅ Files in HDFS:\n")
%hdfs ls /demo/sales

## 7Ô∏è‚É£ Cleanup

In [None]:
# Delete a file
%hdfs rm /demo/data/customers.csv

In [None]:
# Delete a directory recursively (be careful!)
%hdfs rm -r /demo/sales

In [None]:
# Verify deletion
%hdfs ls /demo

## ‚úÖ Summary

If all cells above executed successfully, webhdfsmagic is working correctly with your HDFS cluster!

### Features demonstrated:

- ‚úÖ Configuration and connection through Knox Gateway
- ‚úÖ Directory listing (`ls`)
- ‚úÖ Directory creation (`mkdir`)
- ‚úÖ File upload (`put`) with streaming support
- ‚úÖ File reading (`cat`) with line limit option
- ‚úÖ File download (`get`) with streaming support
- ‚úÖ Wildcard support for batch operations
- ‚úÖ File deletion (`rm`) with recursive option
- ‚úÖ Complete data workflow

### Useful URLs:

- **HDFS NameNode UI**: http://localhost:9870
- **WebHDFS Gateway**: http://localhost:8080/gateway/default/webhdfs/v1/

### To stop the environment:

```bash
docker-compose down
# or to also remove data:
docker-compose down -v
```

### Advantages of webhdfsmagic:

1. **Simpler syntax**: Magic commands vs Python API calls
2. **Less boilerplate**: No client initialization code needed
3. **Better integration**: Works naturally in Jupyter notebooks
4. **Streaming support**: Efficient for large files
5. **Wildcard support**: Batch operations made easy
6. **Knox Gateway ready**: Built-in support for enterprise security