No metrics for collector: query works in one case, not in other #154

jakob-reesalu · 2022-10-20T09:57:33Z

Hi! I got one collector working but got issues with the second one. It's something with the query, will be a long running one in the end, like 15-30mins (gonna run once a day). It's purpose is to scan all indexes on some tables for all dbs.

I started chopping the query up to find when it works and when it won't. At some occasions I got "context deadline exceeded" messages. I read up on some old threads saying that one might need to increase Prometheus and Sql Exporter scrape_timeout, but this didn't make a difference. Eventually I cut the query down bit by bit, and even if it takes less than 1 sec in Microsoft SQL Server Management Studio, no metrics would be provided - timing shouldn't be an issue then, right?

Finally got it working selecting indexes for only one specific DB, this would provide metrics in the UI:

      SELECT
          DB_NAME(DDIPS.database_id) AS 'database',
          DDIPS.index_id AS 'index',
          DDIPS.avg_fragmentation_in_percent AS 'avgFragmentationInPercent'
      FROM 
          sys.dm_db_index_physical_stats (DB_ID('<DatabaseName>'), NULL, NULL, NULL, NULL) AS DDIPS
      WHERE 
          DDIPS.database_id = DB_ID('<DatabaseName>')
      -- AND DDIPS.object_id = OBJECT_ID('AvlEvent')

This query provides results instantly when starting up. BUT, if un-commenting the last line, then suddenly I get no results. Both versions works instantly in MSSM.

SQL Exporter config:

# Global defaults.
global:
  scrape_timeout: 120s
  # Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
  scrape_timeout_offset: 500ms
  # Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
  min_interval: 0s
  # Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
  # as will concurrent scrapes.
  max_connections: 3
  # Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
  # always be the same as max_connections.
  max_idle_connections: 3
  # Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.
  # If 0, connections are not closed due to a connection's age.
  max_connection_lifetime: 5m

Prometheus config:

global:
  scrape_interval:     115s # Default is 1m.
  evaluation_interval: 1m # Evaluate rules every 1 minute. Default is 1m.
  scrape_timeout: 110s     # For how long we let a scrape run before timing it out and it fails. Default is 10s.

Very grateful for any ideas on this!

P.S. I first ran version 0.5 but changed to 0.9 now (misunderstood which github page was the active one), and in the earlier version I could change log level with "-v 4" but using the new flag like "-log.level debug" won't work (wrote a comment here).. So can't get logging on this currently. Issue persists in new version anyhow.

The text was updated successfully, but these errors were encountered:

burningalchemist · 2022-10-20T10:25:02Z

Hey @jakob-reesalu, I commented on the logging stuff in the PR, please take a look.

As for the query issue, I'm happy to pair up to investigate and solve the issue. 👍 Could you share the message you get if you run the full query? If you like we can also hop on a call or something, as I don't really have MSSQL around to try things out on my side. What do you think?

jakob-reesalu · 2022-10-20T10:48:54Z

@burningalchemist Thanks a lot! I didn't try the full query iterating through all DBs as it takes 15+ mins. But when manipulating it to just do the first DB in the list (which takes 1 sec querying the DB server directly), I get no error, just an infomsg of the returned columns. Log level is debug.

I'm up for a call, just gotta have lunch first. When are you available and how to contact you? Maybe on Googlemeet?

burningalchemist · 2022-10-20T10:51:47Z

@jakob-reesalu Feel free to pick up the slot here: https://calendly.com/szyubin
I'm available roughly after 4pm today. Google Meet or Jitsi both should work. 👍

jakob-reesalu · 2022-10-20T10:56:07Z

@burningalchemist Dang it, I gotta leave at about 4pm specifically today. xD I'll continue troubleshooting during the day, maybe a call tomorrow instead?

burningalchemist · 2022-10-20T11:15:12Z

@jakob-reesalu Yeah, tomorrow works too, also after 4pm unfortunately. But in the meantime, if could drop any intermediate results, that might be useful. 👍

burningalchemist · 2022-10-20T11:16:57Z

Could you drop me a screenshot from SQL Management Studio (with sensitive parts removed if there are any) for the full query you expect to run? I might have an idea. 👍

jakob-reesalu · 2022-10-20T11:33:12Z

@burningalchemist Alright! Here you go:

NOTE: There's a typo on AvgFragmentationInPREcent in the image but got that fixed

Here's the collector config:


# Similar to global.min_interval, but applies to the queries defined by this collector only.
min_interval: 3m
      
metrics:
  - metric_name: index_fragmentation
    type: gauge
    help: 'Checks all databases with x, x and x tables, returns index fragmentation as percentage when its >5%. Query executed once a day.'
    key_labels: 
      - Database
      - Table
      - Index
    values: [AvgFragmentationInPercent]
    query: |
      -- Delete temp tables if exists
    IF OBJECT_ID('tempdb.dbo.#tempDbList') IS NOT NULL
    DROP TABLE #tempDbList
          
    -- Put relevant database names in temp table
    SELECT IDENTITY(INT,1,1) AS Id, Name INTO #tempDbList
    FROM master.sys.databases WHERE 
            (Name LIKE '%data%' AND Name LIKE '%<productname>%' AND Name NOT LIKE '%log%' AND Name NOT LIKE '%archive%'
            OR Name LIKE '%data%' AND Name LIKE '%<productname>%' AND Name NOT LIKE '%log%' AND Name NOT LIKE '%archive%')
          
    -- Iterate through each database and all indexes in <tablename>, <tablename> and <tablename> tables.
    DECLARE @currentId INT, @currentDb sysname, @dbCount INT, @sql NVARCHAR(max);
                        
    SELECT @currentId = 1, @dbCount = MAX(Id)
    FROM #tempDbList
                        
    WHILE @currentId <= @dbCount
    BEGIN
        SELECT @currentDb = Name
        FROM #tempDbList
        WHERE Id = @currentId
                
        -- Get results, indexes with >5% fragmentation, for each db
        SELECT
      	    DB_NAME(DDIPS.database_id) AS 'Database',
            T.name AS 'Table',
            I.name AS 'Index',
            DDIPS.avg_fragmentation_in_percent AS 'AvgFragmentationInPercent'
        FROM 
            sys.dm_db_index_physical_stats (DB_ID(@currentDb), NULL, NULL, NULL, NULL) AS DDIPS
            INNER JOIN sys.tables T ON T.object_id = DDIPS.object_id
            INNER JOIN sys.indexes I ON I.object_id = DDIPS.object_id
          	    AND DDIPS.index_id = I.index_id
        WHERE 
            DDIPS.database_id = DB_ID(@currentDb)
      	    AND DDIPS.avg_fragmentation_in_percent > 5
      	    AND (T.name = '<tablename>' OR 
      		     T.name = '<tablename>' OR
      		     T.name = '<tablename>')
            AND I.name IS NOT NULL
          
        SET @currentId = @currentId + 1;
    END
      	    
    -- Cleanup
    DROP TABLE #tempDbList

jakob-reesalu · 2022-10-20T11:43:13Z

@burningalchemist Having been away on lunch, letting the exporter run, I found a bunch of "context deadline exceeded" errors (this is the above posted query but set to only check 1 DB):

"index_fragmentation_checks" is the collector at issue

ts=2022-10-20T10:55:21.945Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T10:57:16.926Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded"
ts=2022-10-20T10:59:11.914Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:01:06.890Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:03:01.867Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:04:56.857Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:06:51.851Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:08:46.840Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:10:41.809Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:12:36.791Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:14:31.786Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:16:26.760Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:18:21.769Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded"
ts=2022-10-20T11:20:16.738Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:22:11.719Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:24:06.700Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:26:01.689Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:27:56.678Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:29:51.661Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:31:46.646Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:33:41.644Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:35:36.608Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:37:31.616Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-10-20T11:39:26.585Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"

Then afterwards getting these messages, without errors:

ts=2022-10-20T11:40:01.892Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=2929.098s"
ts=2022-10-20T11:40:01.895Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Collecting fresh metrics: min_interval=180.000s cache_age=2929.098s"
ts=2022-10-20T11:40:02.280Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[Database Table Index AvgFragmentationInPercent]\"collector=\"index_fragmentation_checks\", query=\"index_fragmentation\""
ts=2022-10-20T11:40:36.044Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=63.980s"
ts=2022-10-20T11:40:36.044Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=2993.078s"
ts=2022-10-20T11:41:27.066Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=115.009s"
ts=2022-10-20T11:41:27.066Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3044.109s"
ts=2022-10-20T11:43:22.060Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Collecting fresh metrics: min_interval=180.000s cache_age=230.024s"
ts=2022-10-20T11:43:22.060Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3159.122s"
ts=2022-10-20T11:43:22.092Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[Database Table Index AvgFragmentationInPercent]\"collector=\"index_fragmentation_checks\", query=\"index_fragmentation\""
ts=2022-10-20T11:44:14.923Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=52.871s"
ts=2022-10-20T11:44:14.923Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3211.993s"
ts=2022-10-20T11:45:17.027Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=180.000s cache_age=114.985s"
ts=2022-10-20T11:45:17.027Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=86400.000s cache_age=3274.106s"```

burningalchemist · 2022-10-20T20:17:32Z

@jakob-reesalu thanks, let's try to catch up tomorrow, I have some thoughts. 👍

jakob-reesalu · 2022-10-24T11:37:15Z

@burningalchemist Yo! So I tried usql as we talked about. It gave this error:
"Must declare the scalar variable "@currentDb"."

Problem is, the variable is declared in the query above. Googling on it, some ppl suggested this can happen when mixing dynamic and static SQL so eventually I put the whole query within the @Sql variable and then EXECUTE(@Sql) on that. Didn't get any error message but nor any result. Tried it in SQL Exporter and no change from before.

While testing various things on the way I would get a result "IF 21" but have no idea what that was about, and the usql github didn't have any threads on that.

An interesting note is that the other queries I have that do work in SQL Exporter would also yield the error of "@CurrentDB" not being declared.

Anywho, due to time I think I'll have to give up on this with variables and the while loop and maybe just make a bunch of static queries.

jakob-reesalu · 2022-10-25T07:19:47Z

@burningalchemist FYI: the issue might be with doing INNER JOIN with other tables. Because I use a while loop and declare variables in other queries that work, but I don't do joins there. So maybe that's the issue, in case you encounter some similar issue in the future!

burningalchemist · 2022-10-25T07:23:25Z

@jakob-reesalu

Hey Jakob! Thanks for the follow up and the detailed investigation. 👍

Let me think what I can do to get access to some MSSQL server to check your hypothesis.

If I get any results or explanation I'll mention you in this issue.

What's your MSSQL version by the way?

jakob-reesalu · 2022-10-25T07:35:59Z

No worries :) The version is SQL Server 14.0.2037.2

Additionally on INNER JOIN. I noticed that INNER JOIN is used in the example config so it can't be that SQL Exporter can't deal with it. But I see that it only appears under the "queries" property (like the below one) and then is referenced under the "query_ref" property of a metric:

queries:
  # Populates `mssql_io_stall` and `mssql_io_stall_total`
  - query_name: mssql_io_stall
    query: |
      SELECT
        cast(DB_Name(a.database_id) as varchar) AS [db],
        sum(io_stall_read_ms) / 1000.0 AS [read],
        sum(io_stall_write_ms) / 1000.0 AS [write],
        sum(io_stall) / 1000.0 AS io_stall
      FROM
        sys.dm_io_virtual_file_stats(null, null) a
      INNER JOIN sys.master_files b ON a.database_id = b.database_id AND a.file_id = b.file_id
      GROUP BY a.database_id

So I gave it a try and placed my complex query under that property and then referenced it with "query_ref", and now, I can at least see that the query hits the database - that didn't happen before. So this might work after all, just need to fiddle a bit with scrape timeouts since I'm getting the "context deadline exceeded" errors doing this.

EDIT: @burningalchemist So, this only worked once. Then I couldn't see the query anymore. But then I changed the query name and it worked again, but not after that. I tried a query without INNER JOIN and that worked giving results instantly. If adding an INNER JOIN I get nothing. Running the query in usql gives the result "IF 12", not sure what that means.

burningalchemist · 2022-10-25T12:19:09Z

@jakob-reesalu Thanks for sharing the query, I'll use it on my end to investigate. 👍

So if you run it several times it returns nothing? Any logs on the SQL Exporter side or MSSQL side?

jakob-reesalu · 2022-10-25T14:45:41Z

@burningalchemist You're welcome ;)

So, me and a colleague looked at the issue and we changed some things, like declaring each variable in a separate DECLARE statement and changing temporary tables to variables instead.

The query now looks like this:

 -- Put relevant database names in temp table
DECLARE @tempDbList AS TABLE (Id INT IDENTITY(1,1), [Name] SYSNAME)
INSERT INTO @tempDbList ([Name])
  SELECT [Name]
    FROM master.sys.databases WHERE 
    (Name NOT IN ('master','model','msdb','tempdb') AND 		
     Name NOT LIKE '%<name>%' AND 
     Name NOT LIKE '%<name>%' AND
     Name NOT LIKE '%<name>%' AND
     Name NOT LIKE '%<name>%');
          
  DECLARE @tempFragmentations AS TABLE ([Database] sysname, [Table] NVARCHAR(200), [Index] NVARCHAR(500), AvgFragmentationInPercent FLOAT)            
  
  DECLARE @currentId INT 
DECLARE @currentDb sysname
DECLARE @dbCount INT

  -- Iterate through each database and all indexes in <table>, <table> and <table> tables.                          
  SELECT @currentId = 1, @dbCount = 9
  FROM @tempDbList;
                      
  WHILE @currentId <= @dbCount
  BEGIN
      SELECT @currentDb = Name
      FROM @tempDbList
      WHERE Id = @currentId;
              
      -- Get results, indexes with >5% fragmentation, for each db
     INSERT INTO @tempFragmentations
      SELECT
         DB_NAME(DDIPS.database_id),
          T.name,
          I.name,
          DDIPS.avg_fragmentation_in_percent
      FROM 
          sys.dm_db_index_physical_stats (DB_ID(@currentDb), NULL, NULL, NULL, NULL) AS DDIPS
          JOIN sys.tables T ON T.object_id = DDIPS.object_id
          JOIN sys.indexes I ON I.object_id = DDIPS.object_id
             AND DDIPS.index_id = I.index_id
      WHERE 
          DDIPS.database_id = DB_ID(@currentDb)
         AND DDIPS.avg_fragmentation_in_percent > 5
         AND (T.name = '<table>' OR 
           T.name = '<table>' OR
           T.name = '<table>')
          AND I.name IS NOT NULL;
      SET @currentId = @currentId + 1;
  END
         
  SELECT * FROM @tempFragmentations;

This gives the following result in usql now:

DECLARE 1
error: sqlserver: 1087: Must declare the table variable "@tempDbList".
error: sqlserver: 137: Must declare the scalar variable "@currentDb".
error: sqlserver: 137: Must declare the scalar variable "@currentDb".
error: sqlserver: 137: Must declare the scalar variable "@currentId".
error: sqlserver: 1087: Must declare the table variable "@tempFragmentations".
error: sqlserver: 1087: Must declare the table variable "@tempFragmentations".
error: test1.txt: sqlserver: 1087: Must declare the table variable "@tempFragmentations".

So.... something with variable declarations it seems, but I don't get what really. I had a look at the driver for MSSQL https://github.com/microsoft/go-mssqldb but no clues there really.

burningalchemist · 2022-10-27T09:58:30Z

@jakob-reesalu I think the problem that you're using multiple statement query, which isn't supported. DECLARE is a statement itself. The other question is if it works for some other queries (it should not).

I'm still trying to get an access to MSSQL 2014 somewhere to try things out myself. :)

jakob-reesalu · 2022-10-31T09:39:53Z

@burningalchemist Hm... here's another query and it actually works. It checks databases for whether a cleanup has been performed and there are multiple statements:

      -- Delete temp table if it exists
      IF OBJECT_ID('tempdb.dbo.#tempCleanupPerformed') IS NOT NULL
      DROP TABLE #tempCleanupPerformed
              
      -- Get all database names and create temp table
      SELECT IDENTITY(INT,1,1) AS Id, Name INTO #tempCleanupPerformed
      FROM master.sys.Databases WHERE 
              (Name LIKE '%<something>%' AND Name LIKE '%<something>%' AND Name NOT LIKE '%log%' AND Name NOT LIKE '%archive%')
      
      ALTER TABLE #tempCleanupPerformed
      ADD SomeDateTimeColumn DATETIME2
              
      -- Iterate through all databases for <table> with a time greater than 3 months
      DECLARE @currentId INT, @currentDb sysname, @dbCount INT, @sql NVARCHAR(max);
              
      SELECT @currentId = 1, @dbCount = MAX(Id)
      FROM #tempCleanupPerformed
              
      WHILE @currentId <= @dbCount
      BEGIN
          SELECT @currentDb = Name
          FROM #tempCleanupPerformed
          WHERE Id = @currentId
                  
          SET @sql = 
          'DECLARE @currentTime DATETIME2
          
          SELECT TOP (1) @currentTime = Time 
          FROM '+@currentDb+'.[dbo].[<table>]
          
          UPDATE #tempCleanupPerformed 
          SET SomeDateTimeColumn = 
              CASE WHEN DATEDIFF(MONTH, @currentTime, GETUTCDATE()) > 3
                THEN @currentTime
                ELSE NULL
                END
          WHERE Id = '+CAST(@currentId as varchar(3))    
          
          EXEC (@sql);
          SET @currentId = @currentId + 1;
      END
              
      -- Return 1 or 0 for each database to indicate whether cleanup has run or not
      SELECT Name AS 'Database', (CASE WHEN SomeDateTimeColumn IS NULL THEN 1 ELSE 0 END) AS CleanupPerformed FROM #tempCleanupPerformed
      
      -- Cleanup
      DROP TABLE #tempCleanupPerformed

Even if it works uSQL gives these errors:

IF 1
error: sqlserver: 137: Must declare the scalar variable "@sql".
error: sqlserver: 137: Must declare the scalar variable "@currentId".
error: sqlserver: 156: Incorrect syntax near the keyword 'SELECT'.
error: test1.txt: sqlserver: 156: Incorrect syntax near the keyword 'SELECT'.

jakob-reesalu · 2022-11-01T10:51:19Z

@burningalchemist We finally nailed it, in quite an ugly way, but at least it's working!

Some lessons: uSQL is a liar, queries that work in sql-exporter may render errors in uSQL. And to get the final query working in uSQL we had to put it in a stored procedure and execute that, but that's not necessary in sql-exporter where we can put the query directly in the collector's config.

This is how it turned out in the end, I think maybe the big thing is that we put the USE ['+@currentDb+'] to get the context right, this was probably the issue with joining from before:

-- Declare variables
      DECLARE @tempDbList AS TABLE (Id INT IDENTITY(1,1), [Name] SYSNAME)
      INSERT INTO @tempDbList ([Name])
      SELECT [Name]
        FROM master.sys.databases WHERE 
        (Name NOT IN ('master','model','msdb','tempdb') AND 		
         Name NOT LIKE '%log%' AND 
         Name NOT LIKE '%archive%' AND
         Name NOT LIKE '%<something>%' AND
         Name NOT LIKE '%<something>%'
         Name NOT LIKE '%<something>%')              
      DECLARE @tempFragmentations AS TABLE ([database] SYSNAME, [table] NVARCHAR(200), [index] NVARCHAR(500), avgFragmentationInPercent FLOAT)	  
      DECLARE @currentId INT;
      DECLARE @currentDb SYSNAME
      DECLARE @dbCount INT
      
      -- Iterate through each database and all indexes in <table>, <table> and <table> tables.                          
      SELECT @currentId = 1, @dbCount = MAX(Id)
      FROM @tempDbList;
      
      DECLARE @sql NVARCHAR(max);
                          
      WHILE @currentId <= @dbCount
      BEGIN
          SELECT @currentDb = Name
          FROM @tempDbList
          WHERE Id = @currentId;
                  
          -- Get results, indexes with >5% fragmentation, for each db
          SET @sql = '
              USE ['+@currentDb+']
              SELECT
                  DB_NAME(DDIPS.database_id),
                  T.name,
                  I.name,
                  ROUND(DDIPS.avg_fragmentation_in_percent, 2)
              FROM 
                  sys.dm_db_index_physical_stats (DB_ID('''+@currentDb+'''), NULL, NULL, NULL, NULL) AS DDIPS
                  JOIN sys.tables T ON T.object_id = DDIPS.object_id
                  JOIN sys.indexes I ON I.object_id = DDIPS.object_id
                        AND DDIPS.index_id = I.index_id
              WHERE 
                  DDIPS.database_id = DB_ID('''+@currentDb+''')
                    AND DDIPS.avg_fragmentation_in_percent > 5
                    AND (T.name = ''<table>'' OR 
                         T.name = ''<table>'' OR
                         T.name = ''<table>'')
                  AND I.name IS NOT NULL;'
      
          INSERT INTO @tempFragmentations EXEC sp_executesql @sql
          
          SET @currentId = @currentId + 1;
      END
            	
      SELECT * FROM @tempFragmentations;

But there is still some spooky "context deadline exceeded" errors. Sometimes the above query runs to completion and provides the metrics, and I can see that it's a query currently running on the db instance. Other times I see the query starting to run but immediately stops due to context deadline exceeded.
Sql-exporter config:

global:
  scrape_timeout: 1h
  scrape_timeout_offset: 500ms
  min_interval: 0s
  max_connections: 3
  max_idle_connections: 3
  max_connection_lifetime: 5m

Prometheus config:

global:
  scrape_interval:     60s # Default is 1m.
  evaluation_interval: 1m # Evaluate rules every 1 minute. Default is 1m.
  scrape_timeout: 15s

jakob-reesalu mentioned this issue Nov 11, 2022

Metrics only produced for few initial scrapes #163

Closed

burningalchemist closed this as completed Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No metrics for collector: query works in one case, not in other #154

No metrics for collector: query works in one case, not in other #154

jakob-reesalu commented Oct 20, 2022 •

edited

burningalchemist commented Oct 20, 2022 •

edited

jakob-reesalu commented Oct 20, 2022

burningalchemist commented Oct 20, 2022

jakob-reesalu commented Oct 20, 2022

burningalchemist commented Oct 20, 2022

burningalchemist commented Oct 20, 2022

jakob-reesalu commented Oct 20, 2022 •

edited

jakob-reesalu commented Oct 20, 2022 •

edited

burningalchemist commented Oct 20, 2022

jakob-reesalu commented Oct 24, 2022

jakob-reesalu commented Oct 25, 2022

burningalchemist commented Oct 25, 2022

jakob-reesalu commented Oct 25, 2022 •

edited

burningalchemist commented Oct 25, 2022

jakob-reesalu commented Oct 25, 2022 •

edited

burningalchemist commented Oct 27, 2022 •

edited

jakob-reesalu commented Oct 31, 2022

jakob-reesalu commented Nov 1, 2022

No metrics for collector: query works in one case, not in other #154

No metrics for collector: query works in one case, not in other #154

Comments

jakob-reesalu commented Oct 20, 2022 • edited

burningalchemist commented Oct 20, 2022 • edited

jakob-reesalu commented Oct 20, 2022

burningalchemist commented Oct 20, 2022

jakob-reesalu commented Oct 20, 2022

burningalchemist commented Oct 20, 2022

burningalchemist commented Oct 20, 2022

jakob-reesalu commented Oct 20, 2022 • edited

jakob-reesalu commented Oct 20, 2022 • edited

burningalchemist commented Oct 20, 2022

jakob-reesalu commented Oct 24, 2022

jakob-reesalu commented Oct 25, 2022

burningalchemist commented Oct 25, 2022

jakob-reesalu commented Oct 25, 2022 • edited

burningalchemist commented Oct 25, 2022

jakob-reesalu commented Oct 25, 2022 • edited

burningalchemist commented Oct 27, 2022 • edited

jakob-reesalu commented Oct 31, 2022

jakob-reesalu commented Nov 1, 2022

jakob-reesalu commented Oct 20, 2022 •

edited

burningalchemist commented Oct 20, 2022 •

edited

jakob-reesalu commented Oct 20, 2022 •

edited

jakob-reesalu commented Oct 20, 2022 •

edited

jakob-reesalu commented Oct 25, 2022 •

edited

jakob-reesalu commented Oct 25, 2022 •

edited

burningalchemist commented Oct 27, 2022 •

edited