**IMPORTANT** 

- For requirements and initial setup go to https://github.com/OliveiraEdu/OpenScience/Readme.md;
- To execute the notebook run all cells.

# Open Science Platform Notebook 4 - Indexing, searching, validation and download

1 - Load modules

In [1]:
from iroha_helper import *
from new_helper import *
from super_helper import *
from loguru import logger
import pickle

2 - Sets variables and values

In [2]:
# Index for objects in both user account and project account JSON-LDs.
json_ld_index = 2

# Local path for file upload
directory_path = "upload"

# Directory for file downloads
download_path = "download"

#for the index system
index_path = "indexdir"
index = open_dir(index_path)


3 - Load variable `project_id` from the local file system.

In [3]:
# Load project_id from the file
file_path = "temp/project_id.pkl"

try:
    with open(file_path, "rb") as f:
        project_id = pickle.load(f)
    logger.success(f"Successfully loaded project_id from {file_path}: {project_id}")
except FileNotFoundError:
    logger.error(f"File not found: {file_path}")
except pickle.UnpicklingError:
    logger.error(f"Failed to unpickle the file: {file_path}. The file may be corrupted.")
except Exception as e:
    logger.error(f"An unexpected error occurred while loading project_id: {e}")



[32m2025-01-19 09:53:22.899[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36m<module>[0m:[36m7[0m - [32m[1mSuccessfully loaded project_id from temp/project_id.pkl: example_project_id[0m


4 - Index files in the `upload` directory

In [4]:
schema = get_schema() #super_helper.py

logger.info(schema)

processed_data = process_files(directory_path, project_id, schema) #new_helper.py
 
    

[32m2025-01-19 09:53:22.903[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m3[0m - [1m<Schema: ['abstract', 'created', 'creator', 'date', 'description', 'file_cid', 'format', 'full_text', 'language', 'metadata_cid', 'modified', 'project_id', 'publisher', 'subject', 'title']>[0m
[32m2025-01-19 09:53:22.904[0m | [1mINFO    [0m | [36mnew_helper[0m:[36mprocess_files[0m:[36m118[0m - [1mIndexing file:  upload/Munafò et al. - 2022 - The reproducibility debate is an opportunity, not .pdf[0m
[32m2025-01-19 09:53:22.948[0m | [1mINFO    [0m | [36miroha_helper[0m:[36mtracer[0m:[36m244[0m - [1m	Entering "create_contract"[0m
[32m2025-01-19 09:53:22.970[0m | [1mINFO    [0m | [36miroha_helper[0m:[36mcreate_contract[0m:[36m263[0m - [1m('STATELESS_VALIDATION_SUCCESS', 1, 0)[0m
[32m2025-01-19 09:53:22.973[0m | [1mINFO    [0m | [36miroha_helper[0m:[36mcreate_contract[0m:[36m263[0m - [1m('ENOUGH_SIGNATURES_COLLECTED', 9, 0)[0m
[32m2025-

5 - To perform a search update the variable `keyword` with the word or term you are searching for.

In [5]:
# Perform a keyword search
keyword = "k-nearest"
search_results, project_ids = search_index(index, keyword)


[32m2025-01-19 09:53:28.590[0m | [1mINFO    [0m | [36msuper_helper[0m:[36msearch_index[0m:[36m341[0m - [1mStarting keyword search...[0m
[32m2025-01-19 09:53:28.592[0m | [1mINFO    [0m | [36msuper_helper[0m:[36msearch_index[0m:[36m342[0m - [1mKeyword: 'k-nearest'[0m
[32m2025-01-19 09:53:28.604[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m33[0m - [1m
[32m2025-01-19 09:53:28.604[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m34[0m - [1mSTARTING BLOCK: Keyword Search[0m
[32m2025-01-19 09:53:28.607[0m | [1mINFO    [0m | [36msuper_helper[0m:[36msearch_index[0m:[36m360[0m - [1mTotal search results found for keyword 'k-nearest': 1[0m
[32m2025-01-19 09:53:28.608[0m | [1mINFO    [0m | [36msuper_helper[0m:[36msearch_index[0m:[36m363[0m - [1mListing all search results:[0m
[32m2025-01-19 09:53:28.609[0m | [1mINFO    [0m | [36msuper_helper[0m:[36msearch_index[0m:[36m367

6 - Processes the search results and validates the result files (Validation criteria: File CID must exist in IPFS `AND` the blockchain)

In [6]:
processed_results = processing_search_results_block(search_results)

[32m2025-01-19 09:53:28.615[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m33[0m - [1m
[32m2025-01-19 09:53:28.617[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m34[0m - [1mSTARTING BLOCK: Processing Search Results[0m
[32m2025-01-19 09:53:28.618[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m33[0m - [1m
[32m2025-01-19 09:53:28.618[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m34[0m - [1mSTARTING BLOCK: Processing Result for Project ID: 55943@test[0m
[32m2025-01-19 09:53:28.619[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mprocessing_search_results_block[0m:[36m72[0m - [1mFile CID: QmUq29KRwpTdvScB5oYzEDobDHyb4N1f9eaZXm4VaMCgiW[0m
[32m2025-01-19 09:53:28.619[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mprocessing_search_results_block[0m:[36m73[0m - [1mMetadata CID: QmcG2x9U1YcRTB4zMob5QgwZDJWJLohYF25zNEeNJubzQf[0m
[3

8 - Displays additional metadata from IPFS about the Project and the User and download valid file in the project directory beneath `download`.

In [7]:
metadata_results = metadata_block(processed_results, download_path)

[32m2025-01-19 09:53:28.639[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m33[0m - [1m
[32m2025-01-19 09:53:28.641[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m34[0m - [1mSTARTING BLOCK: Processing Metadata Block[0m
[32m2025-01-19 09:53:28.642[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m33[0m - [1m
[32m2025-01-19 09:53:28.642[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mwith_logging_block[0m:[36m34[0m - [1mSTARTING BLOCK: Processing Project Metadata[0m
[32m2025-01-19 09:53:28.647[0m | [1mINFO    [0m | [36msuper_helper[0m:[36mmetadata_block[0m:[36m117[0m - [1mDownloaded project metadata: {'@context': {'dc': 'http://purl.org/dc/terms/', 'schema': 'http://schema.org/'}, '@type': 'schema:ResearchProject', 'dc:abstract': 'This study aims to investigate the effects of deforestation on green transportation and propose strategies for improvement.', 'schema:e