Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No metrics for collector: query works in one case, not in other #154

Closed
jakob-reesalu opened this issue Oct 20, 2022 · 18 comments
Closed

No metrics for collector: query works in one case, not in other #154

jakob-reesalu opened this issue Oct 20, 2022 · 18 comments

Comments

@jakob-reesalu
Copy link

jakob-reesalu commented Oct 20, 2022

Hi! I got one collector working but got issues with the second one. It's something with the query, will be a long running one in the end, like 15-30mins (gonna run once a day). It's purpose is to scan all indexes on some tables for all dbs.

I started chopping the query up to find when it works and when it won't. At some occasions I got "context deadline exceeded" messages. I read up on some old threads saying that one might need to increase Prometheus and Sql Exporter scrape_timeout, but this didn't make a difference. Eventually I cut the query down bit by bit, and even if it takes less than 1 sec in Microsoft SQL Server Management Studio, no metrics would be provided - timing shouldn't be an issue then, right?

Finally got it working selecting indexes for only one specific DB, this would provide metrics in the UI:

      SELECT
          DB_NAME(DDIPS.database_id) AS 'database',
          DDIPS.index_id AS 'index',
          DDIPS.avg_fragmentation_in_percent AS 'avgFragmentationInPercent'
      FROM 
          sys.dm_db_index_physical_stats (DB_ID('<DatabaseName>'), NULL, NULL, NULL, NULL) AS DDIPS
      WHERE 
          DDIPS.database_id = DB_ID('<DatabaseName>')
      -- AND DDIPS.object_id = OBJECT_ID('AvlEvent')

This query provides results instantly when starting up. BUT, if un-commenting the last line, then suddenly I get no results. Both versions works instantly in MSSM.

SQL Exporter config:

# Global defaults.
global:
  scrape_timeout: 120s
  # Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
  scrape_timeout_offset: 500ms
  # Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
  min_interval: 0s
  # Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
  # as will concurrent scrapes.
  max_connections: 3
  # Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
  # always be the same as max_connections.
  max_idle_connections: 3
  # Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.
  # If 0, connections are not closed due to a connection's age.
  max_connection_lifetime: 5m

Prometheus config:

global:
  scrape_interval:     115s # Default is 1m.
  evaluation_interval: 1m # Evaluate rules every 1 minute. Default is 1m.
  scrape_timeout: 110s     # For how long we let a scrape run before timing it out and it fails. Default is 10s.

Very grateful for any ideas on this!

P.S. I first ran version 0.5 but changed to 0.9 now (misunderstood which github page was the active one), and in the earlier version I could change log level with "-v 4" but using the new flag like "-log.level debug" won't work (wrote a comment here).. So can't get logging on this currently. Issue persists in new version anyhow.

@burningalchemist
Copy link
Owner

burningalchemist commented Oct 20, 2022

Hey @jakob-reesalu, I commented on the logging stuff in the PR, please take a look.

As for the query issue, I'm happy to pair up to investigate and solve the issue. 👍 Could you share the message you get if you run the full query? If you like we can also hop on a call or something, as I don't really have MSSQL around to try things out on my side. What do you think?

@jakob-reesalu
Copy link
Author

@burningalchemist Thanks a lot! I didn't try the full query iterating through all DBs as it takes 15+ mins. But when manipulating it to just do the first DB in the list (which takes 1 sec querying the DB server directly), I get no error, just an infomsg of the returned columns. Log level is debug.

I'm up for a call, just gotta have lunch first. When are you available and how to contact you? Maybe on Googlemeet?

@burningalchemist
Copy link
Owner

@jakob-reesalu Feel free to pick up the slot here: https://calendly.com/szyubin
I'm available roughly after 4pm today. Google Meet or Jitsi both should work. 👍

@jakob-reesalu
Copy link
Author

@burningalchemist Dang it, I gotta leave at about 4pm specifically today. xD I'll continue troubleshooting during the day, maybe a call tomorrow instead?

@burningalchemist
Copy link
Owner

@jakob-reesalu Yeah, tomorrow works too, also after 4pm unfortunately. But in the meantime, if could drop any intermediate results, that might be useful. 👍

@burningalchemist
Copy link
Owner

Could you drop me a screenshot from SQL Management Studio (with sensitive parts removed if there are any) for the full query you expect to run? I might have an idea. 👍

@jakob-reesalu
Copy link
Author

jakob-reesalu commented Oct 20, 2022

@burningalchemist Alright! Here you go:

NOTE: There's a typo on AvgFragmentationInPREcent in the image but got that fixed
image

Here's the collector config:


# Similar to global.min_interval, but applies to the queries defined by this collector only.
min_interval: 3m
      
metrics:
  - metric_name: index_fragmentation
    type: gauge
    help: 'Checks all databases with x, x and x tables, returns index fragmentation as percentage when its >5%. Query executed once a day.'
    key_labels: 
      - Database
      - Table
      - Index
    values: [AvgFragmentationInPercent]
    query: |
      -- Delete temp tables if exists
    IF OBJECT_ID('tempdb.dbo.#tempDbList') IS NOT NULL
    DROP TABLE #tempDbList
          
    -- Put relevant database names in temp table
    SELECT IDENTITY(INT,1,1) AS Id, Name INTO #tempDbList
    FROM master.sys.databases WHERE 
            (Name LIKE '%data%' AND Name LIKE '%<productname>%' AND Name NOT LIKE '%log%' AND Name NOT LIKE '%archive%'
            OR Name LIKE '%data%' AND Name LIKE '%<productname>%' AND Name NOT LIKE '%log%' AND Name NOT LIKE '%archive%')
          
    -- Iterate through each database and all indexes in <tablename>, <tablename> and <tablename> tables.
    DECLARE @currentId INT, @currentDb sysname, @dbCount INT, @sql NVARCHAR(max);
                        
    SELECT @currentId = 1, @dbCount = MAX(Id)
    FROM #tempDbList
                        
    WHILE @currentId <= @dbCount
    BEGIN
        SELECT @currentDb = Name
        FROM #tempDbList
        WHERE Id = @currentId
                
        -- Get results, indexes with >5% fragmentation, for each db
        SELECT
      	    DB_NAME(DDIPS.database_id) AS 'Database',
            T.name AS 'Table',
            I.name AS 'Index',
            DDIPS.avg_fragmentation_in_percent AS 'AvgFragmentationInPercent'
        FROM 
            sys.dm_db_index_physical_stats (DB_ID(@currentDb), NULL, NULL, NULL, NULL) AS DDIPS
            INNER JOIN sys.tables T ON T.object_id = DDIPS.object_id
            INNER JOIN sys.indexes I ON I.object_id = DDIPS.object_id
          	    AND DDIPS.index_id = I.index_id
        WHERE 
            DDIPS.database_id = DB_ID(@currentDb)
      	    AND DDIPS.avg_fragmentation_in_percent > 5
      	    AND (T.name = '<tablename>' OR 
      		     T.name = '<tablename>' OR
      		     T.name = '<tablename>')
            AND I.name IS NOT NULL
          
        SET @currentId = @currentId + 1;
    END
      	    
    -- Cleanup
    DROP TABLE #tempDbList

@jakob-reesalu
Copy link
Author

jakob-reesalu commented Oct 20, 2022

@burningalchemist Having been away on lunch, letting the exporter run, I found a bunch of "context deadline exceeded" errors (this is the above posted query but set to only check 1 DB):

"index_fragmentation_checks" is the collector at issue

ts=2022-10-20T10:55:21.945Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T10:57:16.926Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded"
ts=2022-10-20T10:59:11.914Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:01:06.890Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:03:01.867Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:04:56.857Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:06:51.851Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:08:46.840Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:10:41.809Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:12:36.791Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:14:31.786Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:16:26.760Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:18:21.769Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded"
ts=2022-10-20T11:20:16.738Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:22:11.719Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:24:06.700Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:26:01.689Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:27:56.678Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:29:51.661Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:31:46.646Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:33:41.644Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:35:36.608Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:37:31.616Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:39:26.585Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded" 

Then afterwards getting these messages, without errors:

ts=2022-10-20T11:40:01.892Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=2929.098s"
ts=2022-10-20T11:40:01.895Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Collecting fresh metrics: min_interval=180.000s cache_age=2929.098s"
ts=2022-10-20T11:40:02.280Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[Database Table Index AvgFragmentationInPercent]\"collector=\"index_fragmentation_checks\", query=\"index_fragmentation\""
ts=2022-10-20T11:40:36.044Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=63.980s"
ts=2022-10-20T11:40:36.044Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=2993.078s"
ts=2022-10-20T11:41:27.066Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=115.009s"
ts=2022-10-20T11:41:27.066Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3044.109s"
ts=2022-10-20T11:43:22.060Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Collecting fresh metrics: min_interval=180.000s cache_age=230.024s"
ts=2022-10-20T11:43:22.060Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3159.122s"
ts=2022-10-20T11:43:22.092Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[Database Table Index AvgFragmentationInPercent]\"collector=\"index_fragmentation_checks\", query=\"index_fragmentation\""
ts=2022-10-20T11:44:14.923Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=52.871s"
ts=2022-10-20T11:44:14.923Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3211.993s"
ts=2022-10-20T11:45:17.027Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=114.985s"
ts=2022-10-20T11:45:17.027Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3274.106s"```

@burningalchemist
Copy link
Owner

@jakob-reesalu thanks, let's try to catch up tomorrow, I have some thoughts. 👍

@jakob-reesalu
Copy link
Author

@burningalchemist Yo! So I tried usql as we talked about. It gave this error:
"Must declare the scalar variable "@currentDb"."

Problem is, the variable is declared in the query above. Googling on it, some ppl suggested this can happen when mixing dynamic and static SQL so eventually I put the whole query within the @Sql variable and then EXECUTE(@Sql) on that. Didn't get any error message but nor any result. Tried it in SQL Exporter and no change from before.

While testing various things on the way I would get a result "IF 21" but have no idea what that was about, and the usql github didn't have any threads on that.

An interesting note is that the other queries I have that do work in SQL Exporter would also yield the error of "@CurrentDB" not being declared.

Anywho, due to time I think I'll have to give up on this with variables and the while loop and maybe just make a bunch of static queries.

@jakob-reesalu
Copy link
Author

@burningalchemist FYI: the issue might be with doing INNER JOIN with other tables. Because I use a while loop and declare variables in other queries that work, but I don't do joins there. So maybe that's the issue, in case you encounter some similar issue in the future!

@burningalchemist
Copy link
Owner

@jakob-reesalu

Hey Jakob! Thanks for the follow up and the detailed investigation. 👍

Let me think what I can do to get access to some MSSQL server to check your hypothesis.

If I get any results or explanation I'll mention you in this issue.

What's your MSSQL version by the way?

@jakob-reesalu
Copy link
Author

jakob-reesalu commented Oct 25, 2022

No worries :) The version is SQL Server 14.0.2037.2

Additionally on INNER JOIN. I noticed that INNER JOIN is used in the example config so it can't be that SQL Exporter can't deal with it. But I see that it only appears under the "queries" property (like the below one) and then is referenced under the "query_ref" property of a metric:

queries:
  # Populates `mssql_io_stall` and `mssql_io_stall_total`
  - query_name: mssql_io_stall
    query: |
      SELECT
        cast(DB_Name(a.database_id) as varchar) AS [db],
        sum(io_stall_read_ms) / 1000.0 AS [read],
        sum(io_stall_write_ms) / 1000.0 AS [write],
        sum(io_stall) / 1000.0 AS io_stall
      FROM
        sys.dm_io_virtual_file_stats(null, null) a
      INNER JOIN sys.master_files b ON a.database_id = b.database_id AND a.file_id = b.file_id
      GROUP BY a.database_id

So I gave it a try and placed my complex query under that property and then referenced it with "query_ref", and now, I can at least see that the query hits the database - that didn't happen before. So this might work after all, just need to fiddle a bit with scrape timeouts since I'm getting the "context deadline exceeded" errors doing this.

EDIT: @burningalchemist So, this only worked once. Then I couldn't see the query anymore. But then I changed the query name and it worked again, but not after that. I tried a query without INNER JOIN and that worked giving results instantly. If adding an INNER JOIN I get nothing. Running the query in usql gives the result "IF 12", not sure what that means.

@burningalchemist
Copy link
Owner

@jakob-reesalu Thanks for sharing the query, I'll use it on my end to investigate. 👍

So if you run it several times it returns nothing? Any logs on the SQL Exporter side or MSSQL side?

@jakob-reesalu
Copy link
Author

jakob-reesalu commented Oct 25, 2022

@burningalchemist You're welcome ;)

So, me and a colleague looked at the issue and we changed some things, like declaring each variable in a separate DECLARE statement and changing temporary tables to variables instead.

The query now looks like this:

 -- Put relevant database names in temp table
DECLARE @tempDbList AS TABLE (Id INT IDENTITY(1,1), [Name] SYSNAME)
INSERT INTO @tempDbList ([Name])
  SELECT [Name]
    FROM master.sys.databases WHERE 
    (Name NOT IN ('master','model','msdb','tempdb') AND 		
     Name NOT LIKE '%<name>%' AND 
     Name NOT LIKE '%<name>%' AND
     Name NOT LIKE '%<name>%' AND
     Name NOT LIKE '%<name>%');
          
  DECLARE @tempFragmentations AS TABLE ([Database] sysname, [Table] NVARCHAR(200), [Index] NVARCHAR(500), AvgFragmentationInPercent FLOAT)            
  
  DECLARE @currentId INT 
DECLARE @currentDb sysname
DECLARE @dbCount INT

  -- Iterate through each database and all indexes in <table>, <table> and <table> tables.                          
  SELECT @currentId = 1, @dbCount = 9
  FROM @tempDbList;
                      
  WHILE @currentId <= @dbCount
  BEGIN
      SELECT @currentDb = Name
      FROM @tempDbList
      WHERE Id = @currentId;
              
      -- Get results, indexes with >5% fragmentation, for each db
     INSERT INTO @tempFragmentations
      SELECT
         DB_NAME(DDIPS.database_id),
          T.name,
          I.name,
          DDIPS.avg_fragmentation_in_percent
      FROM 
          sys.dm_db_index_physical_stats (DB_ID(@currentDb), NULL, NULL, NULL, NULL) AS DDIPS
          JOIN sys.tables T ON T.object_id = DDIPS.object_id
          JOIN sys.indexes I ON I.object_id = DDIPS.object_id
             AND DDIPS.index_id = I.index_id
      WHERE 
          DDIPS.database_id = DB_ID(@currentDb)
         AND DDIPS.avg_fragmentation_in_percent > 5
         AND (T.name = '<table>' OR 
           T.name = '<table>' OR
           T.name = '<table>')
          AND I.name IS NOT NULL;
      SET @currentId = @currentId + 1;
  END
         
  SELECT * FROM @tempFragmentations;

This gives the following result in usql now:

DECLARE 1
error: sqlserver: 1087: Must declare the table variable "@tempDbList".
error: sqlserver: 137: Must declare the scalar variable "@currentDb".
error: sqlserver: 137: Must declare the scalar variable "@currentDb".
error: sqlserver: 137: Must declare the scalar variable "@currentId".
error: sqlserver: 1087: Must declare the table variable "@tempFragmentations".
error: sqlserver: 1087: Must declare the table variable "@tempFragmentations".
error: test1.txt: sqlserver: 1087: Must declare the table variable "@tempFragmentations".

So.... something with variable declarations it seems, but I don't get what really. I had a look at the driver for MSSQL https://github.com/microsoft/go-mssqldb but no clues there really.

@burningalchemist
Copy link
Owner

burningalchemist commented Oct 27, 2022

@jakob-reesalu I think the problem that you're using multiple statement query, which isn't supported. DECLARE is a statement itself. The other question is if it works for some other queries (it should not).

I'm still trying to get an access to MSSQL 2014 somewhere to try things out myself. :)

@jakob-reesalu
Copy link
Author

@burningalchemist Hm... here's another query and it actually works. It checks databases for whether a cleanup has been performed and there are multiple statements:

      -- Delete temp table if it exists
      IF OBJECT_ID('tempdb.dbo.#tempCleanupPerformed') IS NOT NULL
      DROP TABLE #tempCleanupPerformed
              
      -- Get all database names and create temp table
      SELECT IDENTITY(INT,1,1) AS Id, Name INTO #tempCleanupPerformed
      FROM master.sys.Databases WHERE 
              (Name LIKE '%<something>%' AND Name LIKE '%<something>%' AND Name NOT LIKE '%log%' AND Name NOT LIKE '%archive%')
      
      ALTER TABLE #tempCleanupPerformed
      ADD SomeDateTimeColumn DATETIME2
              
      -- Iterate through all databases for <table> with a time greater than 3 months
      DECLARE @currentId INT, @currentDb sysname, @dbCount INT, @sql NVARCHAR(max);
              
      SELECT @currentId = 1, @dbCount = MAX(Id)
      FROM #tempCleanupPerformed
              
      WHILE @currentId <= @dbCount
      BEGIN
          SELECT @currentDb = Name
          FROM #tempCleanupPerformed
          WHERE Id = @currentId
                  
          SET @sql = 
          'DECLARE @currentTime DATETIME2
          
          SELECT TOP (1) @currentTime = Time 
          FROM '+@currentDb+'.[dbo].[<table>]
          
          UPDATE #tempCleanupPerformed 
          SET SomeDateTimeColumn = 
              CASE WHEN DATEDIFF(MONTH, @currentTime, GETUTCDATE()) > 3
                THEN @currentTime
                ELSE NULL
                END
          WHERE Id = '+CAST(@currentId as varchar(3))    
          
          EXEC (@sql);
          SET @currentId = @currentId + 1;
      END
              
      -- Return 1 or 0 for each database to indicate whether cleanup has run or not
      SELECT Name AS 'Database', (CASE WHEN SomeDateTimeColumn IS NULL THEN 1 ELSE 0 END) AS CleanupPerformed FROM #tempCleanupPerformed
      
      -- Cleanup
      DROP TABLE #tempCleanupPerformed

Even if it works uSQL gives these errors:

IF 1
error: sqlserver: 137: Must declare the scalar variable "@sql".
error: sqlserver: 137: Must declare the scalar variable "@currentId".
error: sqlserver: 156: Incorrect syntax near the keyword 'SELECT'.
error: test1.txt: sqlserver: 156: Incorrect syntax near the keyword 'SELECT'.

@jakob-reesalu
Copy link
Author

@burningalchemist We finally nailed it, in quite an ugly way, but at least it's working!

Some lessons: uSQL is a liar, queries that work in sql-exporter may render errors in uSQL. And to get the final query working in uSQL we had to put it in a stored procedure and execute that, but that's not necessary in sql-exporter where we can put the query directly in the collector's config.

This is how it turned out in the end, I think maybe the big thing is that we put the USE ['+@currentDb+'] to get the context right, this was probably the issue with joining from before:

-- Declare variables
      DECLARE @tempDbList AS TABLE (Id INT IDENTITY(1,1), [Name] SYSNAME)
      INSERT INTO @tempDbList ([Name])
      SELECT [Name]
        FROM master.sys.databases WHERE 
        (Name NOT IN ('master','model','msdb','tempdb') AND 		
         Name NOT LIKE '%log%' AND 
         Name NOT LIKE '%archive%' AND
         Name NOT LIKE '%<something>%' AND
         Name NOT LIKE '%<something>%'
         Name NOT LIKE '%<something>%')              
      DECLARE @tempFragmentations AS TABLE ([database] SYSNAME, [table] NVARCHAR(200), [index] NVARCHAR(500), avgFragmentationInPercent FLOAT)	  
      DECLARE @currentId INT;
      DECLARE @currentDb SYSNAME
      DECLARE @dbCount INT
      
      -- Iterate through each database and all indexes in <table>, <table> and <table> tables.                          
      SELECT @currentId = 1, @dbCount = MAX(Id)
      FROM @tempDbList;
      
      DECLARE @sql NVARCHAR(max);
                          
      WHILE @currentId <= @dbCount
      BEGIN
          SELECT @currentDb = Name
          FROM @tempDbList
          WHERE Id = @currentId;
                  
          -- Get results, indexes with >5% fragmentation, for each db
          SET @sql = '
              USE ['+@currentDb+']
              SELECT
                  DB_NAME(DDIPS.database_id),
                  T.name,
                  I.name,
                  ROUND(DDIPS.avg_fragmentation_in_percent, 2)
              FROM 
                  sys.dm_db_index_physical_stats (DB_ID('''+@currentDb+'''), NULL, NULL, NULL, NULL) AS DDIPS
                  JOIN sys.tables T ON T.object_id = DDIPS.object_id
                  JOIN sys.indexes I ON I.object_id = DDIPS.object_id
                        AND DDIPS.index_id = I.index_id
              WHERE 
                  DDIPS.database_id = DB_ID('''+@currentDb+''')
                    AND DDIPS.avg_fragmentation_in_percent > 5
                    AND (T.name = ''<table>'' OR 
                         T.name = ''<table>'' OR
                         T.name = ''<table>'')
                  AND I.name IS NOT NULL;'
      
          INSERT INTO @tempFragmentations EXEC sp_executesql @sql
          
          SET @currentId = @currentId + 1;
      END
            	
      SELECT * FROM @tempFragmentations;

But there is still some spooky "context deadline exceeded" errors. Sometimes the above query runs to completion and provides the metrics, and I can see that it's a query currently running on the db instance. Other times I see the query starting to run but immediately stops due to context deadline exceeded.
Sql-exporter config:

global:
  scrape_timeout: 1h
  scrape_timeout_offset: 500ms
  min_interval: 0s
  max_connections: 3
  max_idle_connections: 3
  max_connection_lifetime: 5m

Prometheus config:

global:
  scrape_interval:     60s # Default is 1m.
  evaluation_interval: 1m # Evaluate rules every 1 minute. Default is 1m.
  scrape_timeout: 15s  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants