General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main

General Guide

Since it is deployed directly from the repository, master branch will stay.
We will have a separate main branch and will merge refactored code into main.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main as a new master.

TL;DR: Open a pull request into main, not into master.

Higher Level TODOs

Create a package scicat_ingestor and gradually move all the existing code to the package.
It not for deployment so it just needs to be accessible to the entry point script (The command or a script that is used for
deployment)
Setting up smaller scope tests environments
Setting up automated continuous integration tests environments
Update documentation
Turn main branch the new master branch and archive master

Brainstorming what should be done for refactoring.
Feel free to shoot down the ideas : D

Encapsulating partial code into modules/functions

For example, these can be split into another module...?

scicat-filewriter-ingest/online_ingestor.py

Lines 116 to 124 in 3c5c22e

    
           parser = argparse.ArgumentParser() 
        
           parser.add_argument( 
        
               '-c','--cf','--config','--config-file', 
        
               default='config.20230125.json', 
        
               dest='config_file', 
        
               help='Configuration file name. Default": config.20230125.json', 
        
               type=str 
        
           )

Kafka Consumer/Message Handling

As it is carrying the core-logic, we maybe better wrap the confluent kafka interfaces so that we can easily test them as well...?
For example,

scicat-filewriter-ingest/online_ingestor.py

Lines 71 to 77 in 3c5c22e

    
           if message is None: 
        
               logger.info("Received empty message") 
        
               continue 
        
           if message.error(): 
        
               logger.info("Consumer error: {}".format(message.error())) 
        
               continue

This kind of error-handling can be separated into a smaller function.
Or
Wrap this into a function or automate this somehow...?

scicat-filewriter-ingest/online_ingestor.py

Lines 32 to 40 in 3c5c22e

    
           kafka_config_run_time = {} 
        
           for k1,v in kafka_config.items(): 
        
               if k1 in ["topics", "individual_message_commit"]: 
        
                   continue 
        
               k2 = k1.replace("_",".") 
        
               kafka_config_run_time[k2] = v 
        
               logger.info(f" - {k2:<30}: {v}") 
        
           consumer = Consumer(kafka_config_run_time)

Logging

Logging seems like to require some configuration steps but despite of the configuration the logging interface logger.log(info/debug/error) should be the same.
But since the configuration of the logger requires file or link handling, it'd better be tested I guess...?

Ingestor

This guy definitely seems like to need some chopping.

scicat-filewriter-ingest/ingestor_lib.py

Line 222 in 3c5c22e

def ingest_message(

For example,
I think this should be wrapped into a separate file handling interface so that we can test them.

scicat-filewriter-ingest/ingestor_lib.py

Lines 258 to 261 in 3c5c22e

    
           if config["run_options"]["message_to_file"]: 
        
               message_file_path = ingestor_files_path 
        
               logger.info("message file will be saved in {}".format(message_file_path)) 
        
               if os.path.exists(message_file_path):

Redefining Interfaces

Nexus structure path

scicat-filewriter-ingest/ingestor_lib.py

Lines 28 to 33 in 3c5c22e

    
           METADATA_PROPOSAL_PATH = [ 
        
               "children", 
        
               ("children", "name", "entry"), 
        
               ("config", "module", "dataset"), 
        
               (None, "name", "experiment_identifier", "values") 
        
           ]

This mixture of filter and indexing can maybe refactored into more explicit way...?
But I also think it is already working conveniently already... hmm...

Ingestor

Maybe not redefining, but just better type-hinting or wrapping some of the arguments into a tighter data structure.

scicat-filewriter-ingest/ingestor_lib.py

Line 222 in 3c5c22e

def ingest_message(

Some of if statements can be merged into one if we let get_instrument to accept defaultInstrument like get_nested_value_with_default.

scicat-filewriter-ingest/ingestor_lib.py

Lines 384 to 392 in 3c5c22e

    
           if instrument_id or instrument_name: 
        
               instrument = get_instrument( 
        
                   scClient, 
        
                   instrument_id, 
        
                   instrument_name.lower(), 
        
                   logger 
        
               ) 
        
           if instrument is None: 
        
               instrument = defaultInstrument

Documentation / Docstrings

Maybe we can collect the comments into docstrings. Copilot is good at it : D
And the documentation in the readme might be sufficient but we probably need to clean that up too.
Here is a list of some sections that might be needed in the documentation.

Deployment Environment/Method
Whole structure (briefly, maybe mermaid graph : D)
Logging options
API dependencies
Kafka Consumer/Message Handling
It seems quite straight forward but since it's the core part of it. Maybe we can document it better.
Shall we write some mermaid graph : D....??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring

General Guide

General Guide

Higher Level TODOs

Encapsulating partial code into modules/functions

Kafka Consumer/Message Handling

Logging

Ingestor

Redefining Interfaces

Nexus structure path

Ingestor

Documentation / Docstrings

	parser = argparse.ArgumentParser()

	parser.add_argument(
	'-c','--cf','--config','--config-file',
	default='config.20230125.json',
	dest='config_file',
	help='Configuration file name. Default": config.20230125.json',
	type=str
	)

	if message is None:
	logger.info("Received empty message")
	continue

	if message.error():
	logger.info("Consumer error: {}".format(message.error()))
	continue

	kafka_config_run_time = {}
	for k1,v in kafka_config.items():
	if k1 in ["topics", "individual_message_commit"]:
	continue
	k2 = k1.replace("_",".")
	kafka_config_run_time[k2] = v
	logger.info(f" - {k2:<30}: {v}")

	consumer = Consumer(kafka_config_run_time)

	if config["run_options"]["message_to_file"]:
	message_file_path = ingestor_files_path
	logger.info("message file will be saved in {}".format(message_file_path))
	if os.path.exists(message_file_path):

	METADATA_PROPOSAL_PATH = [
	"children",
	("children", "name", "entry"),
	("config", "module", "dataset"),
	(None, "name", "experiment_identifier", "values")
	]

	if instrument_id or instrument_name:
	instrument = get_instrument(
	scClient,
	instrument_id,
	instrument_name.lower(),
	logger
	)
	if instrument is None:
	instrument = defaultInstrument