**IMPORTANT** 

- For requirements and initial setup go to https://github.com/OliveiraEdu/OpenScience/Readme.md;
- To execute the notebook run all cells.

 # Cross Linking Account and Project accounts, IPFS Search Engine implementation. 

# Part - 1 Cross Linking Account and Project accounts

## Activities

1 - Deploys a smart contract into the Iroha 1 blockchain for details (attributes) setting;

2 - User and Project id extraction from CSVs;

3 - Queries Iroha 1 for User and Project accounts and checks the present values;

4 - Sets details for both User and Project accounts in Iroha 1 providing a logical link between them for later references;

5 - Queries the User and Project accounts again and checks the proper setting of details.

## Sequence Diagram

```mermaid

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#ffffff',
      'primaryTextColor': '#000000',
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#f4f4f4',
      'secondaryTextColor': '#000000',
      'tertiaryColor': '#d3d3d3',
      'tertiaryTextColor': '#000000',
      'background': '#ffffff',
      'actorBkg': '#B4B4B4',
      'actorTextColor': '#000000',
      'actorBorder': '#000000',
      'actorLineColor': '#000000',
      'signalColor': '#000000',
      'signalTextColor': '#000000',
      'activationBorderColor': '#000000',
      'activationBkgColor': '#d3d3d3',
      'sequenceNumberColor': '#000000',
      'noteBkgColor': '#F0F0F0',
      'noteTextColor': '#000000',
      'noteBorderColor': '#000000'
    }
  }
}%%

sequenceDiagram
    participant Platform as "Platform"
    participant self as "self"
    participant Blockchain as "Iroha 1 Blockchain"
    

    Note over Platform, Blockchain: Deploy smart contract for details setting
    Platform->>Blockchain: Deploy Smart Contract
    Blockchain-->>Platform: Smart Contract Deployed Successfully

    Note over Platform, Blockchain: Extract user and project IDs from CSVs
    Platform->>self: User ID Extraction
    Platform->>self: Project ID Extraction

    Note over Platform, Blockchain: Queries the blockchain for <br/>User and Project accounts details
    Platform->>Blockchain: Get User Account Details
    Blockchain-->>Platform: Query Response
    Platform->>Blockchain: Get Project Account Details
    Blockchain-->>Platform: Query Response
    
    Note over Platform, Blockchain: Set details for User and Project accounts
    Platform->>Blockchain: Set User Details in Blockchain
    Blockchain-->>Platform: User Details Set Successfully
    Platform->>Blockchain: Set Project Details in Blockchain
    Blockchain-->>Platform: Project Details Set Successfully
    
    Note over Platform, Blockchain: Queries the blockchain to <br/>confirm proper setting of details
    Platform->>Blockchain: Get User Account Details
    Blockchain-->>Platform: Query Response
    Platform->>Blockchain: Get Project Account Details
    Blockchain-->>Platform: Query Response
    
```

1 - Deploys a smart contract into the Iroha 1 blockchain for details (attributes) setting;

In [8]:
from Crypto.Hash import keccak
import os
import binascii
from iroha import IrohaCrypto
from iroha import Iroha, IrohaGrpc
from iroha.ed25519 import H
import integration_helpers
from iroha.primitive_pb2 import can_set_my_account_detail
import sys
import csv
import json
import icecream as ic

if sys.version_info[0] < 3:
    raise Exception("Python 3 or a more recent version is required.")

# Load configuration from config.json file
config_path = "config.json"  # Update this path as needed
with open(config_path, "r") as f:
    config = json.load(f)

IROHA_HOST_ADDR = config["IROHA_HOST_ADDR"]
IROHA_PORT = config["IROHA_PORT"]
ADMIN_ACCOUNT_ID = config["ADMIN_ACCOUNT_ID"]
ADMIN_PRIVATE_KEY = config["ADMIN_PRIVATE_KEY"]

iroha = Iroha(ADMIN_ACCOUNT_ID)
net = IrohaGrpc("{}:{}".format(IROHA_HOST_ADDR, IROHA_PORT))


@integration_helpers.trace
def create_contract():
    bytecode = "608060405234801561001057600080fd5b5073a6abc17819738299b3b2c1ce46d55c74f04e290c6000806101000a81548173ffffffffffffffffffffffffffffffffffffffff021916908373ffffffffffffffffffffffffffffffffffffffff160217905550610b4c806100746000396000f3fe608060405234801561001057600080fd5b506004361061004c5760003560e01c80635bdb3a41146100515780637949a1b31461006f578063b7d66df71461009f578063d4e804ab146100cf575b600080fd5b6100596100ed565b6040516100669190610879565b60405180910390f35b61008960048036038101906100849190610627565b61024c565b6040516100969190610879565b60405180910390f35b6100b960048036038101906100b49190610693565b6103bb565b6040516100c69190610879565b60405180910390f35b6100d761059b565b6040516100e4919061085e565b60405180910390f35b606060006040516024016040516020818303038152906040527f5bdb3a41000000000000000000000000000000000000000000000000000000007bffffffffffffffffffffffffffffffffffffffffffffffffffffffff19166020820180517bffffffffffffffffffffffffffffffffffffffffffffffffffffffff8381831617835250505050905060008060008054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff16836040516101be9190610830565b600060405180830381855af49150503d80600081146101f9576040519150601f19603f3d011682016040523d82523d6000602084013e6101fe565b606091505b509150915081610243576040517f08c379a000000000000000000000000000000000000000000000000000000000815260040161023a9061091e565b60405180910390fd5b80935050505090565b60606000838360405160240161026392919061089b565b6040516020818303038152906040527f7949a1b3000000000000000000000000000000000000000000000000000000007bffffffffffffffffffffffffffffffffffffffffffffffffffffffff19166020820180517bffffffffffffffffffffffffffffffffffffffffffffffffffffffff8381831617835250505050905060008060008054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff168360405161032a9190610830565b600060405180830381855af49150503d8060008114610365576040519150601f19603f3d011682016040523d82523d6000602084013e61036a565b606091505b5091509150816103af576040517f08c379a00000000000000000000000000000000000000000000000000000000081526004016103a69061091e565b60405180910390fd5b80935050505092915050565b606060008484846040516024016103d4939291906108d2565b6040516020818303038152906040527fb7d66df7000000000000000000000000000000000000000000000000000000007bffffffffffffffffffffffffffffffffffffffffffffffffffffffff19166020820180517bffffffffffffffffffffffffffffffffffffffffffffffffffffffff8381831617835250505050905060008060008054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff168360405161049b9190610830565b600060405180830381855af49150503d80600081146104d6576040519150601f19603f3d011682016040523d82523d6000602084013e6104db565b606091505b509150915081610520576040517f08c379a00000000000000000000000000000000000000000000000000000000081526004016105179061091e565b60405180910390fd5b8460405161052e9190610847565b6040518091039020866040516105449190610847565b60405180910390208860405161055a9190610847565b60405180910390207f5e1b38cd47cf21b75d5051af29fa321eedd94877db5ac62067a076770eddc9d060405160405180910390a48093505050509392505050565b60008054906101000a900473ffffffffffffffffffffffffffffffffffffffff1681565b60006105d26105cd84610963565b61093e565b9050828152602081018484840111156105ea57600080fd5b6105f5848285610a14565b509392505050565b600082601f83011261060e57600080fd5b813561061e8482602086016105bf565b91505092915050565b6000806040838503121561063a57600080fd5b600083013567ffffffffffffffff81111561065457600080fd5b610660858286016105fd565b925050602083013567ffffffffffffffff81111561067d57600080fd5b610689858286016105fd565b9150509250929050565b6000806000606084860312156106a857600080fd5b600084013567ffffffffffffffff8111156106c257600080fd5b6106ce868287016105fd565b935050602084013567ffffffffffffffff8111156106eb57600080fd5b6106f7868287016105fd565b925050604084013567ffffffffffffffff81111561071457600080fd5b610720868287016105fd565b9150509250925092565b610733816109e2565b82525050565b600061074482610994565b61074e81856109aa565b935061075e818560208601610a23565b61076781610ab6565b840191505092915050565b600061077d82610994565b61078781856109bb565b9350610797818560208601610a23565b80840191505092915050565b60006107ae8261099f565b6107b881856109c6565b93506107c8818560208601610a23565b6107d181610ab6565b840191505092915050565b60006107e78261099f565b6107f181856109d7565b9350610801818560208601610a23565b80840191505092915050565b600061081a6027836109c6565b915061082582610ac7565b604082019050919050565b600061083c8284610772565b915081905092915050565b600061085382846107dc565b915081905092915050565b6000602082019050610873600083018461072a565b92915050565b600060208201905081810360008301526108938184610739565b905092915050565b600060408201905081810360008301526108b581856107a3565b905081810360208301526108c981846107a3565b90509392505050565b600060608201905081810360008301526108ec81866107a3565b9050818103602083015261090081856107a3565b9050818103604083015261091481846107a3565b9050949350505050565b600060208201905081810360008301526109378161080d565b9050919050565b6000610948610959565b90506109548282610a56565b919050565b6000604051905090565b600067ffffffffffffffff82111561097e5761097d610a87565b5b61098782610ab6565b9050602081019050919050565b600081519050919050565b600081519050919050565b600082825260208201905092915050565b600081905092915050565b600082825260208201905092915050565b600081905092915050565b60006109ed826109f4565b9050919050565b600073ffffffffffffffffffffffffffffffffffffffff82169050919050565b82818337600083830152505050565b60005b83811015610a41578082015181840152602081019050610a26565b83811115610a50576000848401525b50505050565b610a5f82610ab6565b810181811067ffffffffffffffff82111715610a7e57610a7d610a87565b5b80604052505050565b7f4e487b7100000000000000000000000000000000000000000000000000000000600052604160045260246000fd5b6000601f19601f8301169050919050565b7f4572726f722063616c6c696e67207365727669636520636f6e7472616374206660008201527f756e6374696f6e0000000000000000000000000000000000000000000000000060208201525056fea26469706673582212206ad40afbd4cc9c87ae154542d003c9538e4b89473a13cadd3cbf618ea181206864736f6c63430008040033"
    """Bytecode was generated using remix editor  https://remix.ethereum.org/ from file detail.sol. """
    tx = iroha.transaction(
        [iroha.command("CallEngine", caller=ADMIN_ACCOUNT_ID, input=bytecode)]
    )
    IrohaCrypto.sign_transaction(tx, ADMIN_PRIVATE_KEY)
    net.send_tx(tx)
    hex_hash = binascii.hexlify(IrohaCrypto.hash(tx))
    for status in net.tx_status_stream(tx):
        print(status)
    return hex_hash

hash = create_contract()
integration_helpers.get_engine_receipts_result(hash)
print("done")

	Entering "create_contract"
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
	Leaving "create_contract"
	Entering "get_engine_receipts_result"

	Leaving "get_engine_receipts_result"
done


2 - Data extraction from JSON-LD.

Extracts account ids from `datasets/accounts.json` and `datasets/projects.json`.

Must update `json_ld_index` with a line number related to an existing object in `datasets/accounts.json` and `datasets/projects.json`

In [9]:
# Index for objects in both user account and project account JSON-LDs.
json_ld_index = 0

4 - Sets details for both User and Project accounts providing a logical link between them for later references.

In [10]:
address = integration_helpers.get_engine_receipts_address(hash)


# Function to link details using blockchain
def set_account_detail(address, account, key, value):
    params = integration_helpers.get_first_four_bytes_of_keccak(
        b"setAccountDetail(string,string,string)"
    )
    no_of_param = 3
    for x in range(no_of_param):
        params = params + integration_helpers.left_padded_address_of_param(
            x, no_of_param
        )
    params = params + integration_helpers.argument_encoding(account)  # account id
    params = params + integration_helpers.argument_encoding(key)  # key
    params = params + integration_helpers.argument_encoding(value)  # value
    tx = iroha.transaction(
        [
            iroha.command(
                "CallEngine", caller=ADMIN_ACCOUNT_ID, callee=address, input=params
            )
        ]
    )
    IrohaCrypto.sign_transaction(tx, ADMIN_PRIVATE_KEY)
    response = net.send_tx(tx)
    for status in net.tx_status_stream(tx):
        print(status)
    hex_hash = binascii.hexlify(IrohaCrypto.hash(tx))
    return hex_hash

# Function to read user accounts from JSON-LD
def read_user_accounts_from_jsonld(file_path):
    with open(file_path, mode='r', encoding='utf-8') as file:
        data = json.load(file)
        user_accounts = []
        for entry in data["@graph"]:
            if entry["@type"] == "foaf:Person":
                account_id = entry.get("foaf:holdsAccount", {}).get("schema:identifier")
                if account_id:
                    user_accounts.append({
                        'account_id': account_id
                    })
        return user_accounts

# Function to read project accounts from JSON-LD
def read_project_accounts_from_jsonld(file_path):
    with open(file_path, mode='r', encoding='utf-8') as file:
        data = json.load(file)
        project_accounts = []
        for entry in data["@graph"]:
            if entry["@type"] == "schema:ResearchProject":
                project_id = entry.get("schema:identifier")
                if project_id:
                    project_accounts.append({
                        'account_id': project_id
                    })
        return project_accounts

# Paths to the JSON-LD files
user_accounts_jsonld_file_path = 'datasets/accounts.json'
project_accounts_jsonld_file_path = 'datasets/projects.json'

# Read accounts from JSON-LD
user_accounts = read_user_accounts_from_jsonld(user_accounts_jsonld_file_path)
project_accounts = read_project_accounts_from_jsonld(project_accounts_jsonld_file_path)

# Example to use the [n] row from the JSON-LD for the operation
# csv_index = 0  # Assuming an index
user_account = user_accounts[json_ld_index]
project_account = project_accounts[json_ld_index]

print(f"User Account ID: {user_account['account_id']}")
print(f"Project Account ID: {project_account['account_id']}")

# Set project_id as a detail for the user account
hash_user_to_project = set_account_detail(
    address, 
    user_account['account_id'], 
    "linked_project", 
    project_account['account_id']
)

# Set user_account_id as a detail for the project account
hash_project_to_user = set_account_detail(
    address, 
    project_account['account_id'], 
    "linked_user", 
    user_account['account_id']
)

# Confirming the operation
print(f"User account {user_account['account_id']} linked to project {project_account['account_id']}")
print(f"Project account {project_account['account_id']} linked to user {user_account['account_id']}")


	Entering "get_engine_receipts_address"
	Leaving "get_engine_receipts_address"
User Account ID: hardcore_northcutt@test
Project Account ID: 33829@test
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
User account hardcore_northcutt@test linked to project 33829@test
Project account 33829@test linked to user hardcore_northcutt@test


3 - Queries Iroha 1 for User account and checks its values

In [11]:
#Query - GetAccountDetail
query = iroha.query('GetAccountDetail',account_id=user_account['account_id'])
# print(query)
IrohaCrypto.sign_query(query, ADMIN_PRIVATE_KEY)
response = net.send_query(query)
# print(response)

user_data = response.account_detail_response
user_details = user_data.detail

print(f'User Account id = {user_account}, {user_details}')



User Account id = {'account_id': 'hardcore_northcutt@test'}, { "admin@test" : { "linked_project" : "33829@test", "user_json_ld_cid" : "QmSMLJZBvLTLP1YcKtgp8tE4oyT35DuthMWE1AeVJy2j6g" } }


3 - Queries Iroha 1 for Project account and checks its values

In [12]:
#Query - GetAccountDetail
query = iroha.query('GetAccountDetail',account_id=project_account['account_id'])
IrohaCrypto.sign_query(query, ADMIN_PRIVATE_KEY)
response = net.send_query(query)
# print(response)
project_data = response.account_detail_response
project_details = project_data.detail
print(f'Project Account id = {project_account}, {project_details}')

Project Account id = {'account_id': '33829@test'}, { "admin@test" : { "linked_user" : "hardcore_northcutt@test", "project_metadata_cid" : "QmW8fa17i54v8wKLJwYZ7REqZPpMvphHPYnYKqj2jSEPrh" } }


# Part2 - Querying Project Metadata 

## Activities

6 - Queries the user account, locates the project id, queries the project account, gets the metadata and files from IPFS.


## Sequence Diagram

```mermaid
%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#ffffff',
      'primaryTextColor': '#000000',
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#f4f4f4',
      'secondaryTextColor': '#000000',
      'tertiaryColor': '#d3d3d3',
      'tertiaryTextColor': '#000000',
      'background': '#ffffff',
      'actorBkg': '#B4B4B4',
      'actorTextColor': '#000000',
      'actorBorder': '#000000',
      'actorLineColor': '#000000',
      'signalColor': '#000000',
      'signalTextColor': '#000000',
      'activationBorderColor': '#000000',
      'activationBkgColor': '#d3d3d3',
      'sequenceNumberColor': '#000000',
      'noteBkgColor': '#F0F0F0',
      'noteTextColor': '#000000',
      'noteBorderColor': '#000000'
    }
  }
}%%

sequenceDiagram
    participant Platform as "Platform"
    participant Blockchain as "Iroha 1 Blockchain"
    participant IPFS as "Interplanetary File System"
    participant FrontEnd as "Front End"

    Note over Platform, Blockchain: Queries the user account <br/> and get the project id 
    Platform->>Blockchain: Query User Account Details
    Blockchain-->>Platform: Query Response
        
    Note over Platform, Blockchain: Queries the Project Account details <br/> and get the project metadata CID 
    Platform->>Blockchain: Query Project Account Details
    Blockchain-->>Platform: Query Response

    Note over Platform, IPFS: Process and displays the metadata CID 
    Platform->>IPFS: Sends the project metadata CID
    IPFS-->>Platform: Sends back the project metadata JSON
    Platform->>FrontEnd: Displays the project metadata JSON   
```

6 - Queries the user account, locates the project id, queries the project account, gets the metadata and files from IPFS.

In [20]:
from ipfs_functions import *

# Process the account details response
user_details_dict = json.loads(user_details)  # Convert the string to a JSON object
print(user_details_dict)

# Now you can access the specific key like this
linked_project = user_details_dict["admin@test"]["linked_project"]
print(linked_project)

{'admin@test': {'linked_project': '33829@test', 'user_json_ld_cid': 'QmSMLJZBvLTLP1YcKtgp8tE4oyT35DuthMWE1AeVJy2j6g'}}
33829@test


In [21]:
#Query - GetAccountDetail
query = iroha.query('GetAccountDetail', account_id=linked_project)
IrohaCrypto.sign_query(query, ADMIN_PRIVATE_KEY)
response = net.send_query(query)
# print(response)
project_data = response.account_detail_response
project_details = project_data.detail
print(f'Project Account id = {project_account}, {project_details}')

Project Account id = {'account_id': '33829@test'}, { "admin@test" : { "linked_user" : "hardcore_northcutt@test", "project_metadata_cid" : "QmW8fa17i54v8wKLJwYZ7REqZPpMvphHPYnYKqj2jSEPrh" } }


In [23]:
# Convert the JSON string to a Python dictionary
project_details_dict = json.loads(project_details)

# Now you can access the specific key like this
project_metadata_cid = project_details_dict["admin@test"]["project_metadata_cid"]
print(project_metadata_cid)


project_metadata = download_json_from_ipfs(project_metadata_cid)

# print(20*"-")

print(project_metadata)

QmW8fa17i54v8wKLJwYZ7REqZPpMvphHPYnYKqj2jSEPrh
{'@context': {'dc': 'http://purl.org/dc/terms/', 'schema': 'http://schema.org/'}, '@type': 'schema:ResearchProject', 'dc:abstract': 'This study explores the impact of blockchain in urban development, focusing on public health.', 'schema:endDate': '2028-05-02', 'schema:funding': {'@type': 'schema:Organization', 'schema:name': 'National Science Foundation'}, 'schema:keywords': ['blockchain', 'urban development', 'public health'], 'schema:location': {'@type': 'schema:Place', 'schema:name': 'Phuket, Thailand'}, 'schema:name': 'Investigating the Role of blockchain in urban development', 'schema:startDate': '2018-11-27'}


# Part 3 - File Operations 

7 -  Sends every file in the `upload` directory to IPFS, extracts theirs respective metadata with Apache Tika and sends it to IPFS, get the CIDs back and store in Iroha as details of the project account.

```mermaid

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#ffffff',
      'primaryTextColor': '#000000',
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#f4f4f4',
      'secondaryTextColor': '#000000',
      'tertiaryColor': '#d3d3d3',
      'tertiaryTextColor': '#000000',
      'background': '#ffffff',
      'actorBkg': '#B4B4B4',
      'actorTextColor': '#000000',
      'actorBorder': '#000000',
      'actorLineColor': '#000000',
      'signalColor': '#000000',
      'signalTextColor': '#000000',
      'activationBorderColor': '#000000',
      'activationBkgColor': '#d3d3d3',
      'sequenceNumberColor': '#000000',
      'noteBkgColor': '#F0F0F0',
      'noteTextColor': '#000000',
      'noteBorderColor': '#000000'
    }
  }
}%%

sequenceDiagram
    participant Platform as "Platform"
    participant Metadata Extractor
    participant Indexer
    participant Blockchain as "Iroha 1 Blockchain"
    participant IPFS as "Interplanetary File System"

    Note over Platform, IPFS: File Operations 
    Platform->>IPFS: Upload local file to IPFS
    IPFS-->>Platform: Send back file CIDs
    Platform->>Blockchain: Set CID as Project Account Details
    Blockchain-->>Platform:Details set successfully

    Note over Platform, IPFS: File Metadata Operations
    Platform->>Metadata Extractor: Parse file and extract metadata 
    Metadata Extractor->>Indexer: Send file metadata for indexing
    Indexer->>IPFS: Store file metadata JSON
    IPFS-->>Platform: Send back file metadata JSON CIDs
    Platform->>Blockchain: Set file metadata JSON CID as Project Account Details
    Blockchain-->>Platform:Details set successfully
         
```

In [24]:
import os
import time
import json
import shutil
from tika import parser  # Apache Tika for metadata extraction
from whoosh.index import create_in, LockError
from whoosh.fields import Schema, TEXT, ID, NUMERIC  # <-- Add this line
import mimetypes

ic(linked_project)

def reset_index_writer():
    """Manually remove the lock file if it exists."""
    lock_file = "indexdir/WRITELOCK"
    if os.path.exists(lock_file):
        os.remove(lock_file)
        print("Lock file removed. Writer reset.")

def recreate_index(schema):
    """Recreate the index directory and start fresh."""
    if os.path.exists("indexdir"):
        shutil.rmtree("indexdir")  # Remove the entire index directory
    os.mkdir("indexdir")
    ix = create_in("indexdir", schema)  # Recreate index
    print("Index recreated.")
    return ix

def get_writer_with_retry(ix, retries=5, delay=1):
    """Retry acquiring the writer with a delay, and reset lock if retries exhausted."""
    for attempt in range(retries):
        try:
            return ix.writer()
        except LockError:
            print(f"Writer is locked, retrying in {delay} seconds... (Attempt {attempt + 1})")
            time.sleep(delay)

    # If retries are exhausted, remove the lock file
    reset_index_writer()
    raise Exception("Failed to acquire writer lock after several attempts.")

def parse_documents_in_directory(directory_path, schema, recreate=False):
    """Parses documents in a directory and indexes them, with reset logic."""
    if recreate:
        ix = recreate_index(schema)  # Reset the index if needed
    else:
        ix = create_in("indexdir", schema)  # Use the existing index

    index = 1
    writer = get_writer_with_retry(ix)  # Retry logic for writer

    for filename in os.listdir(directory_path):
        print(filename)

        if not os.path.basename(filename).startswith('.'):
            file_path = os.path.join(directory_path, filename)
            file_cid = upload_file_to_ipfs(file_path)

            variable_1 = f"file_{index}_CID"
            variable_2 = file_cid

            hash = set_account_detail(address, linked_project, variable_1, variable_2)

            try:
                parsed_document = parser.from_file(file_path)

                if parsed_document:  # Check if parsing was successful
                    metadata = parsed_document.get('metadata', {})
                    full_text = parsed_document.get("content", "").strip()

                    title = metadata.get("title", "Unknown")
                    author = metadata.get("Author", "Unknown")
                    keywords = metadata.get("Keywords", "")

                    metadata_cid = upload_json_to_ipfs(metadata)

                    # Fetch file stats from IPFS
                    stats = client.object.stat(metadata_cid)
                    file_size = stats['CumulativeSize']

                    # Use mimetypes to get the file type
                    filetype = mimetypes.guess_type(filename)[0] or "unknown"

                    # Index the document
                    writer.add_document(
                        cid=metadata_cid,
                        name=filename,
                        size=file_size,
                        filetype=filetype,
                        title=title,
                        author=author,
                        keywords=keywords,
                        full_text=full_text,  # Index full text for searching
                    )
                    print(f"Indexed {filename} with CID: {metadata_cid}")

                    variable_3 = f"file_{index}_metadata_CID"
                    variable_4 = metadata_cid

                    hash = set_account_detail(address, linked_project, variable_3, variable_4)

                else:
                    print(f"Parsing failed for '{filename}'.")

            except Exception as e:
                print(f"Error with file '{filename}': {e}")
                continue

        print("-" * 40)
        index += 1

    writer.commit()  # Commit changes once after all files are processed

# Example usage
schema = Schema(
    cid=ID(stored=True),
    name=TEXT(stored=True),
    size=NUMERIC(stored=True),
    filetype=TEXT(stored=True),
    title=TEXT(stored=True),
    author=TEXT(stored=True),
    keywords=TEXT(stored=True),
    full_text=TEXT(stored=False)
)

# Parse documents in the directory "upload", and optionally reset the index
parse_documents_in_directory("upload", schema, recreate=False)


ic| linked_project: '33829@test'


Editorial-Board_2023_Expert-Systems-with-Applications.pdf
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
Indexed Editorial-Board_2023_Expert-Systems-with-Applications.pdf with CID: Qmc4d9fpDBnqfY7SNSSgV2YqEUZ41uvGUiTHMgLSJTvUwf
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
----------------------------------------
Diabetic-retinopathy-identification-using-parallel-convo_2023_Expert-Systems.pdf
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
Indexed Diabetic-retinopathy-identification-using-parallel-convo_2023_Expert-Systems.pdf with CID: QmTLp6JYFx91pfHx7azKuUm6MHRjc3647xfKZzCT7pxFts
('STATELESS_VALIDATION_SUCCESS', 1, 0)
('ENOUGH_SIGNATURES_COLLECTED', 9, 0)
('STATEFUL_VALIDATION_SUCCESS', 3, 0)
('COMMITTED', 5, 0)
-----------

In [25]:
from whoosh.index import open_dir, create_in
from whoosh.fields import Schema, TEXT, ID, NUMERIC
from whoosh.qparser import QueryParser
import os

# Define the schema
schema = Schema(
    cid=ID(stored=True),
    name=TEXT(stored=True),
    size=NUMERIC(stored=True),
    filetype=TEXT(stored=True),
    title=TEXT(stored=True),
    author=TEXT(stored=True),
    keywords=TEXT(stored=True),
    full_text=TEXT(stored=False)
)

# Ensure the index directory exists
if not os.path.exists("indexdir"):
    os.mkdir("indexdir")
    ix = create_in("indexdir", schema)  # Create index if it doesn't exist
else:
    ix = open_dir("indexdir")  # Open existing index

def search_ipfs(keyword, ix):
    """Search for a keyword in the indexed documents."""
    try:
        with ix.searcher() as searcher:
            query = QueryParser("full_text", ix.schema).parse(keyword)
            results = searcher.search(query)

            if results:
                for result in results:
                    print(f"CID: {result['cid']}, Name: {result['name']}, Title: {result['title']}, "
                          f"Author: {result['author']}, Size: {result['size']} bytes")
            else:
                print(f"No results found for '{keyword}'")
    except Exception as e:
        print(f"Error occurred during search: {e}")

# Example search usage
search_ipfs("longer", ix)


CID: QmTfQnQkUahbEYBnbh5cEZr1WH9w71TBuxEDtCdsUi2A7R, Name: bitcoin.pdf, Title: Unknown, Author: Unknown, Size: 1569 bytes
CID: QmPYqRCYvSUZLLHnjoiMNcetqNU5arEqwErJiao3FLETPb, Name: COVID19-MLSF--A-multi-task-learning-based-stock-market_2023_Expert-Systems-w.pdf, Title: Unknown, Author: Unknown, Size: 3852 bytes


8 - Query the project account to verify the details update

In [26]:
#Query - GetAccountDetail
query = iroha.query('GetAccountDetail', account_id=linked_project)
IrohaCrypto.sign_query(query, ADMIN_PRIVATE_KEY)
response = net.send_query(query)
# print(response)
project_data = response.account_detail_response
project_details = project_data.detail
print(f'Project Account id = {project_account}, {project_details}')

Project Account id = {'account_id': '33829@test'}, { "admin@test" : { "file_10_CID" : "QmXh37WXcrLXkJX2cAPjbPdKZ2cxJXD58XX6eU1Zc41ka4", "file_10_metadata_CID" : "QmVbveTui6awvqkHe3L6nfxX9AYXEajptWyS763xHS74gE", "file_11_CID" : "QmSSY49SnmbCZ3oSaTki7CYZe1ZaWZfE1CsWHpt8Ge7acJ", "file_11_metadata_CID" : "QmZ98TU3kN48oc7Bn4rYdS2YFnvSte497PQMtxm6Ni44vi", "file_12_CID" : "QmVTTcqbGvYRn7n7uPYwk4vi8NdbeYvnZkX9MVT3byrAx2", "file_1_CID" : "QmZnh5Uuo7moZKMESFHpRjQFbbND5e3nArCzZg8uLARAgg", "file_1_metadata_CID" : "Qmc4d9fpDBnqfY7SNSSgV2YqEUZ41uvGUiTHMgLSJTvUwf", "file_2_CID" : "QmT7ximApXtExAedhzUHGHDwRismRzuKykpH4dgTvGrNZs", "file_2_metadata_CID" : "QmTLp6JYFx91pfHx7azKuUm6MHRjc3647xfKZzCT7pxFts", "file_3_CID" : "QmdiRawzVNUiB28ENKQ7WefeFLEJ1xMjsJjwtHL2jnJ9xW", "file_3_metadata_CID" : "QmdkLSqoRmNpXtkZFVgFvurSBwE8MBzADH5tCzFMz4yfB9", "file_4_CID" : "QmeBTz4ZwyqPkPigpASZ3wJodBXzJRpeXQMwNcsWRqU5Jv", "file_4_metadata_CID" : "QmNvfzYmDYePPKuZnJhtGKgZYQhzdF3J3vap8qFz8kKkTi", "file_5_CID" : "QmRA3NWM82

9 - Read CIDs from Iroha and download file metadata and files from IPFS to the project home directory

```mermaid
%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#ffffff',
      'primaryTextColor': '#000000',
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#f4f4f4',
      'secondaryTextColor': '#000000',
      'tertiaryColor': '#d3d3d3',
      'tertiaryTextColor': '#000000',
      'background': '#ffffff',
      'actorBkg': '#B4B4B4',
      'actorTextColor': '#000000',
      'actorBorder': '#000000',
      'actorLineColor': '#000000',
      'signalColor': '#000000',
      'signalTextColor': '#000000',
      'activationBorderColor': '#000000',
      'activationBkgColor': '#d3d3d3',
      'sequenceNumberColor': '#000000',
      'noteBkgColor': '#F0F0F0',
      'noteTextColor': '#000000',
      'noteBorderColor': '#000000'
    }
  }
}%%

sequenceDiagram
    participant Platform as "Platform"
    participant self as "self"
    participant Blockchain as "Iroha 1 Blockchain"
    participant IPFS as "Interplanetary File System"
    participant FrontEnd as "Front End"
    
       
    Note over Platform, Blockchain: Queries the Project Account details and get details
    Platform->>Blockchain: Query Project Account Details
    Blockchain->>Platform: Query Response

    Note over Platform, IPFS: Process project account metadata
    Platform->>self: Parse Project Details JSON and retrieve file CIDs

    Note over Platform, IPFS: Download file from IPFS 
    Platform->>IPFS: Sends the file CID
    IPFS->>Platform: Sends back the file
    Platform->>FrontEnd:    Saves the file locally and display info and status

10 - Read details from the project account retrieve the CID of every file, download the it file from IPFS and store locally.

In [27]:
# from ipfs_functions import *
from clean_file_name import *

# Convert the JSON string to a Python dictionary
project_details_dict = json.loads(project_details)

# Process the account details response
# account_details_dict = json.loads(data.detail)  # Convert the string to a JSON object
# ic(project_details_dict)



# Get the value of the dictionary (the actual file metadata)
files_metadata = project_details_dict['admin@test']
ic(files_metadata)

for key, value in files_metadata.items():
    if 'metadata_CID' not in key:  # Check if this is a file CID
        key = '_'.join(key.split('_')[:-1])+"_CID"    
        # ic(key)
        file_CID = value
        # ic(value)
        
    else:
        file_metadata_key = '_'.join(key.split('_')[:-2])  # Extract the actual filename from the key
        ic(file_metadata_key)
        file_metadata_CID = value  # Get the corresponding metadata CID
        ic(file_metadata_CID)
        # print(f"Downloading {file_metadata_CID} metadata...")
        file_metadata_json = download_json_from_ipfs(file_metadata_CID)
        # ic(file_metadata_json)
        if 'resourceName' in file_metadata_json:  # check if key exists in the dictionary
            raw_original_file_name = file_metadata_json['resourceName']
            ic(raw_original_file_name)
            clean_original_file_name = clean_file_name(raw_original_file_name)  # Remove the 'b' prefix and quotes
            ic(clean_original_file_name)

            # Create a home directory for the user with the account ID as the username under /download/
            user_id = project_account['account_id']
            download_directory = os.path.join("download", user_id)
            if not os.path.exists(download_directory):
                os.makedirs(download_directory)  # Create the directory if it doesn't exist

            file_path = os.path.join(download_directory, clean_original_file_name)
            download_file_from_ipfs(file_CID, file_path)

ic| files_metadata: {'file_10_CID': 'QmXh37WXcrLXkJX2cAPjbPdKZ2cxJXD58XX6eU1Zc41ka4',
                     'file_10_metadata_CID': 'QmVbveTui6awvqkHe3L6nfxX9AYXEajptWyS763xHS74gE',
                     'file_11_CID': 'QmSSY49SnmbCZ3oSaTki7CYZe1ZaWZfE1CsWHpt8Ge7acJ',
                     'file_11_metadata_CID': 'QmZ98TU3kN48oc7Bn4rYdS2YFnvSte497PQMtxm6Ni44vi',
                     'file_12_CID': 'QmVTTcqbGvYRn7n7uPYwk4vi8NdbeYvnZkX9MVT3byrAx2',
                     'file_1_CID': 'QmZnh5Uuo7moZKMESFHpRjQFbbND5e3nArCzZg8uLARAgg',
                     'file_1_metadata_CID': 'Qmc4d9fpDBnqfY7SNSSgV2YqEUZ41uvGUiTHMgLSJTvUwf',
                     'file_2_CID': 'QmT7ximApXtExAedhzUHGHDwRismRzuKykpH4dgTvGrNZs',
                     'file_2_metadata_CID': 'QmTLp6JYFx91pfHx7azKuUm6MHRjc3647xfKZzCT7pxFts',
                     'file_3_CID': 'QmdiRawzVNUiB28ENKQ7WefeFLEJ1xMjsJjwtHL2jnJ9xW',
                     'file_3_metadata_CID': 'QmdkLSqoRmNpXtkZFVgFvurSBwE8MBzADH5tCzFMz4yfB9',
               