Node Scraper is a tool which performs automated data collection and analysis for the purposes of system debug.
Node Scraper requires Python 3.10+ for installation. After cloning this repository, call dev-setup.sh script with 'source'. This script creates an editable install of Node Scraper in a python virtual environment and also configures the pre-commit hooks for the project.
source dev-setup.sh
The Node Scraper CLI can be used to run Node Scraper plugins on a target system. The following CLI options are available:
usage: node-scraper [-h] [--sys-name STRING] [--sys-location {LOCAL,REMOTE}] [--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}] [--sys-sku STRING]
[--sys-platform STRING] [--plugin-configs [STRING ...]] [--system-config STRING] [--connection-config STRING] [--log-path STRING]
[--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}] [--gen-reference-config] [--skip-sudo]
{summary,run-plugins,describe,gen-plugin-config} ...
node scraper CLI
positional arguments:
{summary,run-plugins,describe,gen-plugin-config}
Subcommands
summary Generates summary csv file
run-plugins Run a series of plugins
describe Display details on a built-in config or plugin
gen-plugin-config Generate a config for a plugin or list of plugins
options:
-h, --help show this help message and exit
--sys-name STRING System name (default: <my_system_name>)
--sys-location {LOCAL,REMOTE}
Location of target system (default: LOCAL)
--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}
Specify system interaction level, used to determine the type of actions that plugins can perform (default: INTERACTIVE)
--sys-sku STRING Manually specify SKU of system (default: None)
--sys-platform STRING
Specify system platform (default: None)
--plugin-configs [STRING ...]
built-in config names or paths to plugin config JSONs. Available built-in configs: NodeStatus (default: None)
--system-config STRING
Path to system config json (default: None)
--connection-config STRING
Path to connection config json (default: None)
--log-path STRING Specifies local path for node scraper logs, use 'None' to disable logging (default: .)
--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Change python log level (default: INFO)
--gen-reference-config
Generate reference config from system. Writes to ./reference_config.json. (default: False)
--skip-sudo Skip plugins that require sudo permissions (default: False)
Node Scraper can operate in two modes: LOCAL and REMOTE, determined by the --sys-location
argument.
- LOCAL (default): Node Scraper is installed and run directly on the target system. All data collection and plugin execution occur locally.
- REMOTE: Node Scraper runs on your local machine but targets a remote system over SSH. In this mode, Node Scraper does not need to be installed on the remote system; all commands are executed remotely via SSH.
To use remote execution, specify --sys-location REMOTE
and provide a connection configuration file with --connection-config
.
node-scraper --sys-name <remote_host> --sys-location REMOTE --connection-config ./connection_config.json run-plugins DmesgPlugin
{
"InBandConnectionManager": {
"hostname": "remote_host.example.com",
"port": 22,
"username": "myuser",
"password": "mypassword",
"key_filename": "/path/to/private/key"
}
}
Notes:
- If using SSH keys, specify
key_filename
instead ofpassword
. - The remote user must have permissions to run the requested plugins and access required files. If needed, use the
--skip-sudo
argument to skip plugins requiring sudo.
Plugins to run can be specified in two ways, using a plugin JSON config file or using the 'run-plugins' sub command. These two options are not mutually exclusive and can be used together.
You can use the describe
subcommand to display details about built-in configs or plugins.
List all built-in configs:
node-scraper describe config
Show details for a specific built-in config
node-scraper describe config <config-name>
List all available plugins**
node-scraper describe plugin
Show details for a specific plugin
node-scraper describe plugin <plugin-name>
The plugins to run and their associated arguments can also be specified directly on the CLI using the 'run-plugins' sub-command. Using this sub-command you can specify a plugin name followed by the arguments for that particular plugin. Multiple plugins can be specified at once.
You can view the available arguments for a particular plugin by running
node-scraper run-plugins <plugin-name> -h
:
usage: node-scraper run-plugins BiosPlugin [-h] [--collection {True,False}] [--analysis {True,False}] [--system-interaction-level STRING]
[--data STRING] [--exp-bios-version [STRING ...]] [--regex-match {True,False}]
options:
-h, --help show this help message and exit
--collection {True,False}
--analysis {True,False}
--system-interaction-level STRING
--data STRING
--exp-bios-version [STRING ...]
--regex-match {True,False}
Examples
Run a single plugin
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123
Run multiple plugins
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123 RocmPlugin --exp-rocm TestRocm123
Run plugins without specifying args (plugin defaults will be used)
node-scraper run-plugins BiosPlugin RocmPlugin
Use plugin configs and 'run-plugins'
node-scraper run-plugins BiosPlugin
The 'gen-plugin-config' sub command can be used to generate a plugin config JSON file for a plugin or list of plugins that can then be customized. Plugin arguments which have default values will be prepopulated in the JSON file, arguments without default values will have a value of 'null'.
Examples
Generate a config for the DmesgPlugin:
node-scraper gen-plugin-config --plugins DmesgPlugin
This would produce the following config:
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"collection": true,
"analysis": true,
"system_interaction_level": "INTERACTIVE",
"data": null,
"analysis_args": {
"analysis_range_start": null,
"analysis_range_end": null,
"check_unknown_dmesg_errors": true,
"exclude_category": null
}
}
},
"result_collators": {}
}
The 'summary' subcommand can be used to combine results from multiple runs of node-scraper to a single summary.csv file. Sample run:
node-scraper summary --summary_path /<path_to_node-scraper_logs>
This will generate a new file '/<path_to_node-scraper_logs>/summary.csv' file. This file will contain the results from all 'nodescraper.csv' files from '/<path_to_node-scarper_logs>'.
A plugin JSON config should follow the structure of the plugin config model defined here. The globals field is a dictionary of global key-value pairs; values in globals will be passed to any plugin that supports the corresponding key. The plugins field should be a dictionary mapping plugin names to sub-dictionaries of plugin arguments. Lastly, the result_collators attribute is used to define result collator classes that will be run on the plugin results. By default, the CLI adds the TableSummary result collator, which prints a summary of each plugin’s results in a tabular format to the console.
{
"globals_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "TestBios123"
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm_version": "TestRocm123"
}
}
}
}
Global args can be used to skip sudo plugins or enable/disble either collection or analysis. Below is an example that skips sudo requiring plugins and disables analysis.
"global_args": {
"collection_args": {
"skip_sudo" : 1
},
"collection" : 1,
"analysis" : 0
},
A plugin config can be used to compare the system data against the config specifications:
node-scraper --plugin-configs plugin_config.json
Here is an example of a comprehensive plugin config that specifies analyzer args for each plugin:
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "3.5"
}
},
"CmdlinePlugin": {
"analysis_args": {
"cmdline": "imgurl=test NODE=nodename selinux=0 serial console=ttyS1,115200 console=tty0",
"required_cmdline" : "selinux=0"
}
},
"DkmsPlugin": {
"analysis_args": {
"dkms_status": "amdgpu/6.11",
"dkms_version" : "dkms-3.1",
"regex_match" : true
}
},
"KernelPlugin": {
"analysis_args": {
"exp_kernel": "5.11-generic"
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": "Ubuntu 22.04.2 LTS"
}
},
"PackagePlugin": {
"analysis_args": {
"exp_package_ver": {
"gcc": "11.4.0"
},
"regex_match": false
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm": "6.5"
}
}
},
"result_collators": {},
"name": "plugin_config",
"desc": "My golden config"
}
This command can be used to generate a reference config that is populated with current system configurations. Plugins that use analyzer args (where applicable) will be populated with system data. Sample command:
node-scraper --gen-reference-config run-plugins BiosPlugin OsPlugin
This will generate the following config:
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": [
"M17"
],
"regex_match": false
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": [
"8.10"
],
"exact_match": true
}
}
},
"result_collators": {}
This config can later be used on a different platform for comparison, using the steps at #2:
node-scraper --plugin-configs reference_config.json
An alternate way to generate a reference config is by using log files from a previous run. The example below uses log files from 'scraper_logs_/':
node-scraper gen-plugin-config --gen-reference-config-from-logs scraper_logs_<path>/ --output-path custom_output_dir
This will generate a reference config that includes plugins with logged results in 'scraper_log_' and save the new config to 'custom_output_dir/reference_config.json'.
Nodescraper can be integrated inside another Python tool by leveraging its classes and functionality. See below for a comprehensive example on how to create plugins and run the associated data collection and analysis. Sample run command:
python3 sample.py
Sample.py file:
import logging
import sys
from nodescraper.plugins.inband.bios.bios_plugin import BiosPlugin
from nodescraper.plugins.inband.bios.analyzer_args import BiosAnalyzerArgs
from nodescraper.plugins.inband.kernel.kernel_plugin import KernelPlugin
from nodescraper.plugins.inband.kernel.analyzer_args import KernelAnalyzerArgs
from nodescraper.plugins.inband.os.os_plugin import OsPlugin
from nodescraper.plugins.inband.os.analyzer_args import OsAnalyzerArgs
from nodescraper.models.systeminfo import SystemInfo, OSFamily
from nodescraper.enums import EventPriority, SystemLocation
from nodescraper.resultcollators.tablesummary import TableSummary
from nodescraper.connection.inband.inbandmanager import InBandConnectionManager
from nodescraper.connection.inband.sshparams import SSHConnectionParams
from nodescraper.pluginregistry import PluginRegistry
from nodescraper.models.pluginconfig import PluginConfig
from nodescraper.pluginexecutor import PluginExecutor
def main():
#setting up my custom logger
log_level = "INFO"
handlers = [logging.StreamHandler(stream=sys.stdout)]
logging.basicConfig(
force=True,
level=log_level,
format="%(asctime)25s %(levelname)10s %(name)25s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S %Z",
handlers=handlers,
encoding="utf-8",
)
logging.root.setLevel(logging.INFO)
logging.getLogger("paramiko").setLevel(logging.ERROR)
logger = logging.getLogger("nodescraper")
#setting up system info
system_info = SystemInfo(name="test_host",
platform="X",
os_familty=OSFamily.LINUX,
sku="some_sku")
#initiate plugins
bios_plugin = BiosPlugin(system_info=system_info, logger=logger)
kernel_plugin = KernelPlugin(system_info=system_info, logger=logger)
#launch data collection
_ = bios_plugin.collect()
_ = kernel_plugin.collect()
#launch data analysis
bios_plugin.analyze(analysis_args=BiosAnalyzerArgs(exp_bios_version="XYZ"))
kernel_plugin.analyze(analysis_args=KernelAnalyzerArgs(exp_kernel="ABC"))
#log plugin data models
logger.info(kernel_plugin.data.model_dump())
logger.info(bios_plugin.data.model_dump())
#alternate method
all_res = []
#launch plugin collection & analysis
bios_result = bios_plugin.run(analysis_args={"exp_bios_version":"ABC"})
all_res.append(bios_result)
table_summary = TableSummary()
table_summary.collate_results(all_res, None)
#remote connection
system_info.location=SystemLocation.REMOTE
ssh_params = SSHConnectionParams(hostname="my_system",
port=22,
username="my_username",
key_filename="/home/user/.ssh/ssh_key")
conn_manager = InBandConnectionManager(system_info=system_info, connection_args=ssh_params)
os_plugin = OsPlugin(system_info=system_info, logger=logger, connection_manager=conn_manager)
os_plugin.run(analysis_args=OsAnalyzerArgs(exp_os="DEF"))
#run multiple plugins through a queue
system_info.location=SystemLocation.LOCAL
config_dict = {
"global_args": {
"collection" : 1,
"analysis" : 1
},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "123",
}
},
"KernelPlugin": {
"analysis_args": {
"exp_kernel": "ABC",
}
}
},
"result_collators": {},
"name": "plugin_config",
"desc": "Auto generated config"
}
config1 = PluginConfig(**config_dict)
plugin_executor = PluginExecutor(
logger=logger,
plugin_configs=[config1],
system_info=system_info
)
results = plugin_executor.run_queue()
if __name__ == "__main__":
main()