This script outputs an SQL file suitable for running against an SQLServer instance. It is expecting SQL server type information from the database.  
The data analysis query does some super-basic "what's actually in the fields" analysis such as null counts and distinct counts. The intent is to help identify unused fields for satellite splitting.

The output is designed to be run through SSMS "results to file" - you will need to post-process the file that is output.  
The output will generate blocks of up to 100 fields to examine per query so as not to hit SQL server's internal limits on number of SELECTs in a query

<span style="color: var(--vscode-foreground);">1. Open SQL Server Management Studio<br></span><span style="color: var(--vscode-foreground);">2. Go to Tools &gt; Options. Then: Query Results &gt; SQL Server &gt; Results To Text<br></span><span style="color: var(--vscode-foreground);">3. In the right panel, there is a drop down box called Output Format<br></span><span style="color: var(--vscode-foreground);">4. Choose Comma Delimited and click OK</span>

\- get list of attributes in target structures from attributes table

\- spin dfs to json dicts, and split down to appropriate sizes

\- throw at jinja

In [6]:
import json
import os
import sqlite3

from pathlib import Path
from jinja2 import Template, Environment
from datetime import datetime

import pandas as pd
import numpy as np

conn = sqlite3.connect('../full_metadata.db')
cur = conn.cursor()

template_path = "../export_templates/SQLServer/"
template_filename = "SQLServer_data_analysis_query.tem"
template = Path(os.path.join(template_path, template_filename)).read_text()

time_string = datetime.now().strftime('%Y%m%d%H%M%S')
output_path = "../export_output_files/SQLServer/"
output_filename = "SQLServer_data_analysis_query_" + time_string + ".txt"

In [2]:
# set variables for the extract

# Set the date string that you want to appear as the 'SCAN_DATE' in the output
# either an auto-string for ease of use, or manual if you want a specific date
date_string = datetime.now().strftime('%Y%m%d')
#date_string = '20241231' # A date string in the format YYYYMMDD (as above)

target_server_name = 'VHLOPRHP2S01\P2SLIVE'
target_db_name = 'Pro2_ih'
target_schema_name = '' # a value or '' (blank)

In [3]:
# get list of attributes in target structures from attributes table

if (target_schema_name != ''):
  target_schema_phrase = """AND "SCHEMA_NAME" = ?"""
else:
  target_schema_phrase = ''

sql_query_core = """
SELECT
  '"' || SCHEMA_NAME || '"."' || TABLE_NAME || '"' AS "TableKeyPhrase"
	, SCHEMA_NAME
	, TABLE_NAME
	, COLUMN_NAME
	, DATA_TYPE
FROM
	bv_SQLServerPhysicalAttributeOutputForAnalysis
WHERE
	1=1
	AND "SERVER_NAME" = ?
	AND "DATABASE_NAME" = ?
""";

sql_query = sql_query_core + target_schema_phrase

if (target_schema_name != ''):
  sql_parameters = (target_server_name, target_db_name, target_schema_name)
else:
  sql_parameters = (target_server_name, target_db_name)

df = pd.read_sql_query(sql_query, conn, params = sql_parameters)
df = df.replace({np.nan: None})

#df

In [4]:
# spin dfs to json dicts

table_group_dict = df.groupby(['TableKeyPhrase']).apply(lambda x: x.drop(columns=['TableKeyPhrase']).to_dict(orient='records')).to_dict()

row_group_list = [list(values)[i:i + 50] for values in table_group_dict.values() for i in range(0, len(values), 50)]

template_data = {
  "date_string": date_string
  , "target_server_name": target_server_name
  , "target_db_name": target_db_name
  , "target_row_groups": row_group_list
}

#template_data



In [7]:
# throw at jinja

j2_template = Template(template, autoescape=True, trim_blocks=True, lstrip_blocks=True)

rendered_template_string = j2_template.render(template_data=template_data)
with open(os.path.join(output_path, output_filename), "w") as text_file:
  text_file.write(rendered_template_string)