# WebHDFS Magic - Examples

## Prerequisites

Before running this notebook, you need to:

1. **Install the package:**
   ```bash
   pip install -e .
   ```

2. **Configure auto-loading:**
   ```bash
   jupyter-webhdfsmagic
   ```

3. **Restart the kernel:**
   - Kernel → Restart Kernel (or press `00`)

After these steps, the `%hdfs` magic will be available automatically!

## Quick Test: Verify Extension is Loaded

If auto-loading works, this command should display the help without any `%load_ext` statement.

In [20]:
# Load the extension
%load_ext webhdfsmagic

The webhdfsmagic extension is already loaded. To reload it, use:
  %reload_ext webhdfsmagic


---

**⚠️ If you get "Line magic function `%hdfs` not found":**

Run the cell above to manually load the extension.

In [21]:
%hdfs help

Command,Description
%hdfs help,Display this help
"%hdfs setconfig {""knox_url"": ""..."", ""webhdfs_api"": ""..."",  ""username"": ""..."", ""password"": ""..."", ""verify_ssl"": false}",Set configuration and credentials directly in the notebook
%hdfs ls [path],List files on HDFS
%hdfs mkdir <path>,Create a directory on HDFS
%hdfs rm <path or pattern> [-r],Delete a file/directory. Supports wildcards.  Example: %hdfs rm /user/files* [-r]
%hdfs put <local_file_or_pattern> <hdfs_destination>,"Upload one or more local files (wildcards allowed) to HDFS.  If the HDFS path ends with '/' or '.', the original file name is preserved."
%hdfs get <hdfs_file_or_pattern> <local_destination>,"Download one or more files from HDFS.  If the local destination is a directory (or "".""/~),  the original file name is appended."
%hdfs cat <file> [-n <number_of_lines>],"Display file content. Default is 100 lines.  Use ""-n -1"" to display the full file."
%hdfs chmod [-R] <permission> <path>,"Set permissions (SETPERMISSION).  The ""-R"" option applies recursively."
%hdfs chown [-R] <user:group> <path>,"Set owner and group (SETOWNER).  The ""-R"" option applies recursively."


---

## Mock Testing Setup

Let's configure a fake HDFS server and create mock responses.

In [22]:
import json
from unittest.mock import MagicMock, patch

# Configure extension with fake credentials
%hdfs setconfig {"knox_url": "http://fake-hdfs:8443/gateway/default", "webhdfs_api": "/webhdfs/v1", "username": "testuser", "password": "testpass", "verify_ssl": false}

print("✓ Configuration set")

✓ Configuration set


In [23]:
# Create mock functions for basic operations (ls, cat)
def mock_request(method, url, **kwargs):
    response = MagicMock()
    response.status_code = 200

    params = kwargs.get("params", {})
    operation = params.get("op", "")

    if operation == "LISTSTATUS":
        json_data = {
            "FileStatuses": {
                "FileStatus": [
                    {
                        "pathSuffix": "test_file.txt",
                        "type": "FILE",
                        "length": 1024,
                        "owner": "testuser",
                        "group": "hadoop",
                        "permission": "644",
                        "modificationTime": 1638360000000,
                        "blockSize": 134217728,
                        "replication": 3,
                    },
                    {
                        "pathSuffix": "test_dir",
                        "type": "DIRECTORY",
                        "length": 0,
                        "owner": "testuser",
                        "group": "hadoop",
                        "permission": "755",
                        "modificationTime": 1638360000000,
                        "blockSize": 0,
                        "replication": 0,
                    },
                ]
            }
        }
        response.json = MagicMock(return_value=json_data)
        response.content = json.dumps(json_data).encode()
    else:
        response.json = MagicMock(return_value={})
        response.content = b"{}"

    response.raise_for_status = MagicMock()
    return response


# Mock for requests.get (used by cat, get)
def mock_get(url, **kwargs):
    response = MagicMock()
    response.status_code = 200

    if "op=OPEN" in url:
        response.content = (
            b"Mock file content line 1\nMock file content line 2\nMock file content line 3"
        )
        response.text = response.content.decode()
        response.iter_content = MagicMock(return_value=[response.content])
    else:
        response.content = b""
        response.text = ""

    response.raise_for_status = MagicMock()
    return response


print("✓ Basic mocks configured")

✓ Basic mocks configured


### Test 1: List Directory (ls)

In [24]:
# Test ls with the mock
with patch("webhdfsmagic.magics.requests.request", side_effect=mock_request):
    result = %hdfs ls /user/test
    display(result)

Unnamed: 0,name,type,size,owner,group,permissions,block_size,modified,replication
0,test_file.txt,FILE,1024,testuser,hadoop,rw-r--r--,134217728,2021-12-01 13:00:00,3
1,test_dir,DIR,0,testuser,hadoop,rwxr-xr-x,0,2021-12-01 13:00:00,0


### Test 2: Read File (cat)

In [25]:
# Test cat with the mock
with patch("webhdfsmagic.magics.requests.get", side_effect=mock_get):
    result = %hdfs cat /user/test/file.txt -n 10
    print(result)

Mock file content line 1
Mock file content line 2
Mock file content line 3


---

## Extended Tests

Now let's test all other commands: mkdir, rm, chmod, chown, put, get

In [26]:
# Extended mock for all operations
def mock_request_extended(method, url, **kwargs):
    response = MagicMock()
    response.status_code = 200

    params = kwargs.get("params", {})
    operation = params.get("op", "")

    if operation == "LISTSTATUS":
        json_data = {
            "FileStatuses": {
                "FileStatus": [
                    {
                        "pathSuffix": "test_file.txt",
                        "type": "FILE",
                        "length": 1024,
                        "owner": "testuser",
                        "group": "hadoop",
                        "permission": "644",
                        "modificationTime": 1638360000000,
                        "blockSize": 134217728,
                        "replication": 3,
                    },
                    {
                        "pathSuffix": "test_dir",
                        "type": "DIRECTORY",
                        "length": 0,
                        "owner": "testuser",
                        "group": "hadoop",
                        "permission": "755",
                        "modificationTime": 1638360000000,
                        "blockSize": 0,
                        "replication": 0,
                    },
                ]
            }
        }
        response.json = MagicMock(return_value=json_data)
        response.content = json.dumps(json_data).encode()
    elif operation == "MKDIRS":
        json_data = {"boolean": True}
        response.json = MagicMock(return_value=json_data)
        response.content = json.dumps(json_data).encode()
    elif operation == "DELETE":
        json_data = {"boolean": True}
        response.json = MagicMock(return_value=json_data)
        response.content = json.dumps(json_data).encode()
    elif operation == "SETPERMISSION":
        response.json = MagicMock(return_value={})
        response.content = b"{}"
    elif operation == "SETOWNER":
        response.json = MagicMock(return_value={})
        response.content = b"{}"
    else:
        response.json = MagicMock(return_value={})
        response.content = b"{}"

    response.raise_for_status = MagicMock()
    return response


print("✓ Extended mock configured")

✓ Extended mock configured


### Test 3: Create Directory (mkdir)

In [27]:
# Test mkdir with the mock
with patch("webhdfsmagic.magics.requests.request", side_effect=mock_request_extended):
    result = %hdfs mkdir /user/test/new_directory
    print(result if result else "✓ Directory created successfully")

{'boolean': True}


### Test 4: Delete File/Directory (rm)

In [28]:
# Test rm with the mock
with patch("webhdfsmagic.magics.requests.request", side_effect=mock_request_extended):
    result = %hdfs rm /user/test/file_to_delete.txt
    print(result if result else "✓ File deleted successfully")

{'boolean': True}


### Test 5: Change Permissions (chmod)

In [29]:
# Test chmod with the mock
with patch("webhdfsmagic.magics.requests.request", side_effect=mock_request_extended):
    result = %hdfs chmod 755 /user/test/test_file.txt
    print(result if result else "✓ Permissions changed successfully")

Permission 755 set for /user/test/test_file.txt


### Test 6: Change Owner (chown)

In [30]:
# Test chown with the mock
with patch("webhdfsmagic.magics.requests.request", side_effect=mock_request_extended):
    result = %hdfs chown newuser:newgroup /user/test/test_file.txt
    print(result if result else "✓ Owner changed successfully")

Owner newuser:newgroup set for /user/test/test_file.txt


### Test 7: Upload File (put)

Note: The `put` command requires a two-step process with 307 redirect.

In [31]:
# Test put with the mock
import os
import tempfile

# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as f:
    f.write("Test content for HDFS upload\nLine 2\nLine 3")
    temp_file = f.name

try:
    # Mock for PUT (two-step upload)
    def mock_put(url, **kwargs):
        response = MagicMock()

        # Step 1: Return 307 redirect
        if not kwargs.get("allow_redirects"):
            response.status_code = 307
            response.headers = {
                "Location": "http://fake-datanode:50075/webhdfs/v1/user/test/uploaded.txt?op=CREATE"
            }
        # Step 2: Return 201 created
        else:
            response.status_code = 201

        response.raise_for_status = MagicMock()
        return response

    with patch("webhdfsmagic.magics.requests.put", side_effect=mock_put):
        result = %hdfs put {temp_file} /user/test/uploaded.txt
        print(result if result else "✓ File uploaded successfully")
finally:
    if os.path.exists(temp_file):
        os.unlink(temp_file)

/var/folders/gj/8m77y6g96j5d2py4czvh1brr0000gn/T/tmph52n5f0z.txt uploaded successfully to /user/test/uploaded.txt


### Test 8: Download File (get)

In [32]:
# Test get with the mock
import os
import tempfile

# Create a temporary directory for download
temp_dir = tempfile.mkdtemp()
download_path = os.path.join(temp_dir, "downloaded.txt")

try:
    # Mock for GET (file download)
    def mock_get_download(url, **kwargs):
        response = MagicMock()
        response.status_code = 200
        response.content = b"Downloaded content from HDFS\nLine 2\nLine 3"
        response.iter_content = MagicMock(return_value=[response.content])
        response.raise_for_status = MagicMock()
        return response

    with patch("webhdfsmagic.magics.requests.get", side_effect=mock_get_download):
        result = %hdfs get /user/test/file.txt {download_path}

        if os.path.exists(download_path):
            with open(download_path) as f:
                content = f.read()
            print("✓ File downloaded successfully")
            print(f"Content preview: {content[:50]}...")
        else:
            print(result)
finally:
    if os.path.exists(download_path):
        os.unlink(download_path)
    if os.path.exists(temp_dir):
        os.rmdir(temp_dir)

✓ File downloaded successfully
Content preview: Downloaded content from HDFS
Line 2
Line 3...


---

## Summary

If all the cells above executed successfully, it means:

✅ **Auto-loading** works correctly  
✅ **ls** - List files and directories  
✅ **cat** - Read file content  
✅ **mkdir** - Create directories  
✅ **rm** - Delete files/directories  
✅ **chmod** - Change permissions  
✅ **chown** - Change owner  
✅ **put** - Upload local files to HDFS  
✅ **get** - Download files from HDFS  

All main webhdfsmagic commands work with mocks!