# Exploring AiiDA archives

For more information see [the AiiDA documentation](https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/data.html).

To explore a custom archive, you should first make it available in your RenkuLab session, e.g., via download through `wget` in a terminal launcher.

Then, you can run either:

```shell
❯ verdi profile setup core.sqlite_zip --filepath <your-archive>.aiida
```

to create a read-only AiiDA profile from the archive, or:

```shell
❯ verdi presto
❯ verdi archive import <your-archive>.aiida
```

# Live inspection of provenance from an AiiDA archive
## Dataset: {{ title }}
* DOI of the data: [{{ doi_url }}]({{ doi_url }})
* Materials Cloud Archive entry: `{{ mca_entry }}`
* Archive file: `{{ archive_filename }}`
* AiiDA profile name: `{{ aiida_profile }}`

## Instructions
This session is configured to work with the archive file mentioned above. The archive has not been downloaded yet to keep startup fast.

**Follow these steps:**
1. **Run the cells below** to download the archive and set up AiiDA
2. **Start exploring** using the AiiDA commands and examples

**NOTE**: *If you were expecting a different archive or file, you probably already have an open Renku session. Each Renku user can only have one session at a given time. To see the new file, close the current session by clicking on the trash button on the top left corner of this browser window, and then click again on the file in Materials Cloud Archive to open a new session pointing to the file you want.*

In [None]:
# Check for session warnings
import os
warning_file = '/tmp/session_warning.txt'
if os.path.exists(warning_file):
    with open(warning_file, 'r') as f:
        print(f.read())
    print("\n" + "="*80 + "\n")

## Step 1: Download the Archive
This cell will download the archive file from Materials Cloud Archive.

In [None]:

import os
import json
import subprocess
from pathlib import Path

# Load metadata from JSON file
metadata_file = '/tmp/mca_metadata.json'

if not os.path.exists(metadata_file):
    print("❌ No metadata file found. Please run the session setup first.")
else:
    with open(metadata_file, 'r') as f:
        metadata = json.load(f)
    
    # Extract information from metadata
    archive_url = os.environ.get('archive_url')  # Still get this from env as it's the trigger
    archive_filename = metadata.get('archive_filename')
    archive_title = metadata.get('title', 'Unknown Dataset')
    doi = metadata.get('doi')
    mca_entry = metadata.get('mca_entry')
    
    if not archive_url:
        print("❌ No archive URL found. This cell is only for pre-configured archives.")
    elif not archive_filename:
        print("❌ No archive filename found in metadata.")
    else:
        print(f"📦 Dataset: {archive_title}")
        print(f"📁 Archive file: {archive_filename}")
        print(f"🔗 Source URL: {archive_url}")
        if doi:
            print(f"📄 DOI: {doi}")
        if mca_entry:
            print(f"🏷️ MCA Entry: {mca_entry}")
        print("")
        
        # Create data directory
        data_dir = Path('aiida_data')
        data_dir.mkdir(exist_ok=True)
        
        archive_path = data_dir / archive_filename
        
        if archive_path.exists():
            size_mb = archive_path.stat().st_size / (1024*1024)
            print(f"✅ Archive already exists ({size_mb:.1f} MB)")
        else:
            print("⬇️ Downloading archive... (this may take a moment)")
            
            try:
                result = subprocess.run(
                    ['wget', '-q', '--show-progress', '-O', str(archive_path), archive_url],
                    capture_output=True, text=True, check=True
                )
                
                size_mb = archive_path.stat().st_size / (1024*1024)
                print(f"\n✅ Archive downloaded successfully ({size_mb:.1f} MB)")
                
            except subprocess.CalledProcessError as e:
                print(f"❌ Download failed: {e}")
                print(f"You can try downloading manually with: wget '{archive_url}' -O {archive_path}")
            except FileNotFoundError:
                print("❌ wget not found. Trying with Python...")
                
                # Fallback to Python download
                import urllib.request
                from urllib.error import URLError
                
                try:
                    print("⬇️ Downloading with Python... (no progress bar)")
                    urllib.request.urlretrieve(archive_url, archive_path)
                    size_mb = archive_path.stat().st_size / (1024*1024)
                    print(f"✅ Archive downloaded successfully ({size_mb:.1f} MB)")
                except URLError as e:
                    print(f"❌ Python download also failed: {e}")
        
        print(f"\n📍 Archive location: {archive_path.absolute()}")
        
        # Display additional metadata if available
        files_info = metadata.get('files', [])
        if files_info:
            print(f"\n📊 Dataset contains {len(files_info)} files:")
            for file_info in files_info:
                filename = file_info.get('filename', 'Unknown')
                file_type = file_info.get('type', 'unknown')
                size = file_info.get('size')
                if size:
                    size_str = f" ({size / (1024*1024):.1f} MB)" if size > 1024*1024 else f" ({size / 1024:.1f} KB)"
                else:
                    size_str = ""
                print(f"   • {filename} ({file_type}){size_str}")

## Step 2: Set up AiiDA Profile
This cell creates a read-only AiiDA profile from the downloaded archive.

In [None]:
import subprocess
import os
from pathlib import Path

# Get profile information
profile_name = os.environ.get('MCA_AIIDA_PROFILE', 'aiida-renku')
archive_filename = os.environ.get('MCA_ARCHIVE_FILENAME')
archive_path = Path('aiida_data') / archive_filename

if not archive_path.exists():
    print("❌ Archive file not found. Please run the download cell above first.")
else:
    print(f"🔧 Setting up AiiDA profile: {profile_name}")
    print(f"📁 Using archive: {archive_path}")
    print("")
    
    # Check if profile already exists
    try:
        result = subprocess.run(
            ['verdi', 'profile', 'show', profile_name],
            capture_output=True, text=True, check=True
        )
        print(f"✅ Profile '{profile_name}' already exists")
        
    except subprocess.CalledProcessError:
        # Profile doesn't exist, create it
        print("⚙️ Creating AiiDA profile... (this may take a few minutes)")
        
        try:
            result = subprocess.run([
                'verdi', 'profile', 'setup', 'core.sqlite_zip',
                '--profile-name', profile_name,
                '--first-name', 'AiiDA',
                '--last-name', 'User', 
                '--email', 'aiida@renku.local',
                '--institution', 'RenkuLab',
                '--set-as-default',
                '--non-interactive',
                '--no-use-rabbitmq',
                '--filepath', str(archive_path.absolute())
            ], capture_output=True, text=True, check=True)
            
            print(f"✅ Profile '{profile_name}' created successfully!")
            
        except subprocess.CalledProcessError as e:
            print(f"❌ Failed to create profile: {e}")
            print(f"Error output: {e.stderr}")
            print("\nYou can try creating the profile manually with:")
            print(f"verdi profile setup core.sqlite_zip --filepath {archive_path}")
    
    # Set as default profile
    try:
        subprocess.run(['verdi', 'profile', 'setdefault', profile_name], 
                      capture_output=True, check=True)
        print(f"🎯 Profile '{profile_name}' set as default")
    except subprocess.CalledProcessError:
        print("⚠️ Could not set as default profile, but it should still work")
    
    print("\n🎉 AiiDA setup complete! You can now explore the archive data below.")

## Step 3: Load AiiDA and Start Exploring
Now you can start exploring the AiiDA database!

In [None]:
from aiida import orm, load_profile

# Load the AiiDA profile
profile = load_profile()
print(f"✅ Loaded AiiDA profile: {profile.name}")
print(f"📊 Profile storage: {profile.storage}")

In [None]:
# Get basic statistics about the database
qb = orm.QueryBuilder()
qb.append(orm.Node)
total_nodes = qb.count()
print(f"📈 Total number of nodes in the database: {total_nodes:,}")

# Show different types of nodes
from collections import defaultdict

node_types = defaultdict(int)
qb = orm.QueryBuilder()
qb.append(orm.Node, project=['node_type'])

for node_type, in qb.iterall():
    # Simplify node type names for readability
    short_type = node_type.split('.')[-1] if '.' in node_type else node_type
    node_types[short_type] += 1

print("\n📋 Node types in the database:")
for node_type, count in sorted(node_types.items(), key=lambda x: x[1], reverse=True):
    print(f"   {node_type:<25} {count:>8,}")

In [None]:
# Show available groups
qb = orm.QueryBuilder()
qb.append(orm.Group, project=['label', 'description', 'extras'])

groups = list(qb.iterall())
print(f"👥 Available groups ({len(groups)} total):")
print()

for label, description, extras in groups:
    print(f"📁 {label}")
    if description:
        print(f"   Description: {description}")
    
    # Count nodes in this group
    try:
        group = orm.Group.get(label=label)
        node_count = group.count()
        print(f"   Nodes: {node_count:,}")
    except:
        pass
    print()

## Advanced Exploration

You can now use all AiiDA functionality to explore the archive. Here are some useful commands to get you started:

In [None]:
# Example: Find and examine calculation nodes
qb = orm.QueryBuilder()
qb.append(orm.CalcJobNode, project=['id', 'ctime', 'process_state'], limit=10)

calculations = list(qb.iterall())
if calculations:
    print(f"🔬 Recent calculations (showing {len(calculations)} of many):")
    print()
    for calc_id, ctime, state in calculations:
        print(f"   Calculation {calc_id}: {state} (created: {ctime.strftime('%Y-%m-%d %H:%M')})")
else:
    print("No calculation nodes found in this archive.")

In [None]:
# Example: Explore structures
qb = orm.QueryBuilder()
qb.append(orm.StructureData, project=['id', 'extras'], limit=5)

structures = list(qb.iterall())
if structures:
    print(f"🏗️ Structure data (showing {len(structures)} of many):")
    print()
    for struct_id, extras in structures:
        structure = orm.load_node(struct_id)
        formula = structure.get_formula()
        num_atoms = len(structure.sites)
        print(f"   Structure {struct_id}: {formula} ({num_atoms} atoms)")
else:
    print("No structure data found in this archive.")

## Export Data for External Analysis

You can also export data to disk for analysis with other tools:

In [None]:
# Uncomment and modify these commands to export specific data:

# Export all data from the profile
# !verdi profile dump --all

# Export specific group data 
# !verdi group dump <group-name>

# Export specific nodes
# !verdi export create -N <node-id> export.aiida

print("💡 Uncomment the commands above to export data for external analysis")
print("📚 Check the AiiDA documentation for more export options: https://aiida.readthedocs.io/")