Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configs run -cdir fails when 40 or more files are present (with Oracle source connection) #1127

Closed
sundar-mudupalli-work opened this issue May 1, 2024 · 2 comments · Fixed by #1130
Labels
priority: p1 High priority. Fix may be included in the next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@sundar-mudupalli-work
Copy link
Contributor

Hi,

When I try data-validation -v configs run -cdir test_yamls and the directory has more than 40 files (with Oracle source connection), I get the error message

05/01/2024 03:55:17 AM-ERROR: Error Connection Type "Oracle" could not connect: (cx_Oracle.DatabaseError) ORA-04031: unable to allocate 226048 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","Fixed Uga")
(Background on this error at: https://sqlalche.me/e/14/4xp6) occurred while running config file row_access_control_code.yaml. Skipping it for now.

Upon review the issue seems to be in the code. What we do when presented with multiple files is to build a list config_manager objects - one for each yaml file. After we have built the config managers for all the yaml files, we then execute the validation for each config manager. This is in lines 361-371 of the file data_validation/__main__.py as follows:

            for file in config_file_names:
                config_managers.extend(build_config_managers_from_yaml(args, file))
    else:
        if args.kube_completions:
            logging.warning(
                "--kube-completions or -kc specified, which requires a config directory, however a specific config file is provided."
            )
        config_file_path = _get_arg_config_file(args)
        config_managers = build_config_managers_from_yaml(args, config_file_path)

    run_validations(args, config_managers)

This all seems fine - until we have 40 yaml files. So we build 40 config manager objects before we execute any validation - sure not a problem? When we build a config manager object - we open a connection to the database - so we have 40 connections and have not run any validation. The Oracle library seems to have a limit of 39 or so connections and after that complains of running out of shared memory. It appears the right approach might be to build a config_manager object, run the validation and then repeat. The code is in data_validation/__main__.py - lines 374-386 as follows:

def build_config_managers_from_yaml(args, config_file_path):
    """Returns List[ConfigManager] instances ready to be executed."""
    if "config_dir" in args and args.config_dir:
        yaml_configs = cli_tools.get_validation(config_file_path, args.config_dir)
    else:
        yaml_configs = cli_tools.get_validation(config_file_path)

    mgr = state_manager.StateManager()
    source_conn = mgr.get_connection_config(yaml_configs[consts.YAML_SOURCE])
    target_conn = mgr.get_connection_config(yaml_configs[consts.YAML_TARGET])

    source_client = clients.get_data_client(source_conn)
    target_client = clients.get_data_client(target_conn)

Suggestions on how to fix ?

@nj1973
Copy link
Contributor

nj1973 commented May 1, 2024

The error ORA-04031: unable to allocate 226048 bytes of shared memory indicates that the Oracle database instance is undersized for the volume of connections. I don't think this can be fixed from the code side. Assuming this is a small test instance under your control we should increase the size of the shared pool.

@sundar-mudupalli-work
Copy link
Contributor Author

We could change Oracle settings to allow for more connections. I have 353 yaml files in that directory. The approach of opening connections for each yaml file before running a validation is a problem I think.

@helensilva14 helensilva14 added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 High priority. Fix may be included in the next release. labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 High priority. Fix may be included in the next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
3 participants