Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low priority long term plan: add new script to analyze index usage for SaaS clients #3183

Open
erikdarlingdata opened this issue Dec 7, 2022 · 7 comments

Comments

@erikdarlingdata
Copy link
Contributor

erikdarlingdata commented Dec 7, 2022

Version of the script
8.11

What is the current behavior?
When you specify @GetAllDatabases = 1 and @TableName = 'something', a few weird things happen:

  • Script stops running if it hits a database where the table doesn't exist
  • Only one database is listed in the "header" of the results (picture below)
  • Not all databases are processed (picture below)

If the current behavior is a bug, please provide the steps to reproduce.
On my server, I have three copies of the StackOverflow database, each with the Posts table in them:

image

When I run this:

EXEC dbo.sp_BlitzIndex 
    @TableName = N'Posts',
    @GetAllDatabases = 1,
    @Debug = 1;

The script bails at AdventureWorks because Posts doesn't exist there:

image

When I run this:

EXEC dbo.sp_BlitzIndex 
    @TableName = N'Posts',
    @GetAllDatabases = 1,
    @IgnoreDatabases = N'AdventureWorks, Crap',
    @Debug = 1;

The output ends up looking like this (edited to show issues):

image

Under the heading for StackOverflow2013_Clean are indexes for StackOverflow2013. This also highlights the fact that the StackOverflow database doesn't produce any results the final output even though data is logged in intermediate temp tables (shown via debug):

image

As well as:

image

And they do show up when I specify the StackOverflow database:

EXEC dbo.sp_BlitzIndex 
    @TableName = N'Posts',
    @DatabaseName = N'StackOverflow',
    @Debug = 1;

Like so:

image

What is the expected behavior?

  • Script should gracefully skip databases where table doesn't exist
  • Header data reset per database processed
  • All databases should show results

Which versions of SQL Server and which OS are affected by this issue? Did this work in previous versions of our procedures?
I've only tested this on supported versions of SQL Server: (2016+, Managed Instance). Windows is whatever.

@erikdarlingdata
Copy link
Contributor Author

Just to set expectations on this, I might not get to it quickly. If anyone feels like picking it up, leave a comment so I can assign it to you. Otherwise I'll work on this as time allows.

@BrentOzar
Copy link
Member

Yeah, I'm really intrigued by this. I think I'm going to do a couple of streaming sessions working on the FRK this week.

@erikdarlingdata
Copy link
Contributor Author

@BrentOzar I wish nothing but the best for you both

@BrentOzar BrentOzar self-assigned this Dec 13, 2022
@BrentOzar
Copy link
Member

This is interesting. If you specify @TableName, then the @mode parameter doesn't matter. My first thought on the use case was, "He wants to see Mode 2 and list all of the indexes for that object across all the databases." However, that doesn't work:

sp_BlitzIndex @GetAllDatabases = 1, @TableName = 'Posts', @Mode = 2

Produces:

ChicagoParkingTickets database failed to process. Setting the @Mode doesn't change behavior if you supply @TableName. Use default @Mode=0 to see table detail.

But in Mode 0, @GetAllDatabases doesn't really make sense because most of the result sets don't have a database name:
Screenshot 2022-12-13 at 4 24 03 AM

So lemme jump back and ask - what was the use case for this? I can totally see it being used in Mode 2 or 3, and in that case, the fix is different. If you want it to work in Mode 0, that means adding database name columns to a bunch of the result sets, but also adding a sort order to each result set when @GetAllDatabases = 1. For example, if you're looking at indexes that exist, I don't think you want them sorted by database name first - probably last, because you wanna look at each database to see if the indexes are different. (Or heaven forbid, summing the numbers, which is way beyond my pay grade.)

@BrentOzar BrentOzar changed the title sp_BlitzIndex: Bugs when GetAllDatabases and TableName are both used sp_BlitzIndex: how to handle running @TableName with @GetAllDatabases = 1 Dec 13, 2022
@erikdarlingdata
Copy link
Contributor Author

@BrentOzar good question! So the use case for me was a someone with a database-per-client, and one particular main table that drives a bunch of critical processes. I wanted to focus changes on that one table. Not to make all the indexes "the same" just to clean up unused and overlapping ones in each database.

The table mode output certainly isn't ideal, both because the database name is missing, and you're stuck figuring out unused/overlapping/other stuff manually. I think that would also be the case in Mode 2, though perhaps easier to manage with the output.

So yeah, whatever mode between 0/4 or 2 would be fine. Whatever is easiest. I do not need missing index requests for my purposes, here. This is pure clean up work.

@BrentOzar
Copy link
Member

BrentOzar commented Dec 14, 2022

We've needed to do this so many times! Time to roll up my sleeves and build something serious, something that'll also work across servers because that's a common issue too. This will be a new stored proc, pared down for this particular use case, and won't be part of the "core" set since it's pretty specialized.

Thinking out loud about the use cases:

Screenshot 2022-12-14 at 5 05 29 AM

(And in case anybody else is reading, SaaS could also refer to replication, and AGs could also refer to log shipping.)

Out of scope:

  • Schema drift (different columns on the table, FKs, constraints, etc) - if you want that, buy Redgate Source Control. Similarly, we're just gonna check all the databases on the server - if a table doesn't exist in one of the databases, we're going to assume it's not supposed to be, and just not include that database in the output list.
  • Automatically detecting which servers to connect to (like in an AG)
  • Setting up linked servers - I'm going to assume you pass in a list of server names, and we just try to connect to them with four-part names, and check all the databases on that server.
  • Error handling on connecting to servers - if a server is down, you have bigger problems
  • Operational stats - I'm just gonna standardize on usage stats for this one unless you feel strongly otherwise
  • Statistics histograms & update dates
  • Partitioning (I'm just going to sum up the index stats and not distinguish between which databases are partitioned and which aren't)
  • Index IDs - I'm going to go exclusively by index name (although I'm not going to assume they're on the same columns - I think if you have the same name, but different index contents, that's a problem you wanna know about)

Initial version:

  • Existing indexes only (not missing indexes)
  • One table only (not going to try to fetch all of the objects and give overall results)
  • Will handle those later with additional @mode switches

Overall version (not called for one table) output would look like:

  • Schema name
  • Table name
  • Number of databases where this object was found
  • Number of indexes: min, avg, max - to tell you if you have variances. However, we wouldn't tell you which databases have more/less indexes. For that, you would wanna call the table-level-detail one.
  • Number of unused indexes, total - just to give you a rough idea of whether you need to do pruning.
  • Number of rows: min, avg, max
  • Size, GB: min, avg, max

With that in mind, when you call it for a single table, then you already know you have variances, and you probably wanna know:
Result set 1: overall index inventory for this table

  • Index name - but keep in mind that we can have several indexes with the same name, but different keys/includes.
  • Index version ID - in case we have the same index name with different keys/includes
  • Number of databases where this object was found
  • Key columns
  • Includes
  • Drop script
  • Create script
    Sort that first result set by index name & version.

Result set 2: details

  • Index name
  • Index version ID
  • Server name
  • Database name (and sort output by these first 4, because you're probably looking one index at a time)
  • Index present, yes/no
  • Reads on this copy
  • Writes on this copy
  • Rows in this copy

For "Index present, yes/no" - there are two use cases:

  • If you look at result set 1 and almost everybody has the index, then you wanna glance down at result set 2 and ask, "Who's missing the index?" In that case, there should be a row in result set 2 with "NO" here so you can quickly identify who's missing the index. (But there should also be rows for the Yes's.)
  • On the other hand, if few databases have the index (say, less than 25% of the copies), then you don't wanna see a whole bunch of "NO" rows.

Stopping here in the thought process, but will come back to it. If you see any holes or wanna add anything, feel free. I'll probably think through this for a few days. I thought about posting a blog post asking folks for a design review, but everybody's just gonna ask for free candy.

@BrentOzar BrentOzar changed the title sp_BlitzIndex: how to handle running @TableName with @GetAllDatabases = 1 New script to analyze index usage for SaaS clients Dec 14, 2022
@BrentOzar BrentOzar removed their assignment Dec 14, 2022
@erikdarlingdata
Copy link
Contributor Author

Oh boy! It's a baby sp_BlitzaaS! I have a call starting shortly, but I'll give this a full read and big think.

@BrentOzar BrentOzar changed the title New script to analyze index usage for SaaS clients Low priority long term plan: add new script to analyze index usage for SaaS clients May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants