#### Topics:
    Indexes

Indexes are special data structures associated with tables or views that help speed up the query. 

SQL Server provides two types of indexes: 

    clustered index and 
    non-clustered index.

Ref: https://quadexcel.com/wp/how-do-sql-indexes-work/

![Indexes](TSQL-Indexes.jpg)

In [1]:
import pyodbc
import os
import pandas as pd

#Check if drivers are installed
#[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=BikeStores;'
    r'Trusted_Connection=yes;'
)

# Establish the connection
conn = pyodbc.connect(conn_str, autocommit=True)

# Create a cursor
cursor = conn.cursor()

##### Clustered Indexes

In [2]:
cursor.execute('''
CREATE TABLE production.parts(
    part_id   INT NOT NULL, 
    part_name VARCHAR(100)
);

''')

cursor.execute('''
INSERT INTO 
    production.parts(part_id, part_name)
VALUES
    (1,'Frame'),
    (2,'Head Tube'),
    (3,'Handlebar Grip'),
    (4,'Shock Absorber'),
    (5,'Fork');

''')



<pyodbc.Cursor at 0x1755fe020b0>

The production.parts table does not have a primary key. Therefore SQL Server stores its rows in an unordered structure called a heap.

When you query data from the production.parts table, the query optimizer needs to scan the whole table to search.

In [3]:
cursor.execute('''
SELECT 
    part_id, 
    part_name
FROM 
    production.parts
WHERE 
    part_id = 5;

''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,part_id,part_name
0,5,Fork


Because the production.parts table has only five rows, the query executes very fast. However, if the table contains a large number of rows, it’ll take a lot of time and resources to search for data.

To resolve this issue, SQL Server provides a dedicated structure to speed up the retrieval of rows from a table called index.

SQL Server has two types of indexes: clustered index and non-clustered index. We will focus on the clustered index in this tutorial.

**A clustered index stores data rows in a sorted structure based on its key values. Each table has only one clustered index because data rows can be only sorted in one order. A table that has a clustered index is called a clustered table.**

![Clustered-Index](SQL-Server-Clustered-B-Tree.png)

A clustered index organizes data using a special structured so-called B-tree (or balanced tree) which enables searches, inserts, updates, and deletes in logarithmic amortized time.

In this structure, the top node of the B-tree is called the root node. The nodes at the bottom level are called the leaf nodes. Any index levels between the root and the leaf nodes are known as intermediate levels.

In the B-Tree, the root node and intermediate-level nodes contain index pages that hold index rows. The leaf nodes contain the data pages of the underlying table. The pages in each level of the index are linked using another structure called a doubly-linked list.

SQL Server Clustered Index and Primary Key Constraint
When you create a table with a primary key, SQL Server automatically creates a corresponding clustered index that includes primary key columns.

This statement creates a new table named production.part_prices with a primary key that includes two columns: part_id and valid_from.

In [4]:
cursor.execute('''
CREATE TABLE production.part_prices(
    part_id int,
    valid_from date,
    price decimal(18,4) not null,
    PRIMARY KEY(part_id, valid_from) 
);
''')

<pyodbc.Cursor at 0x1755fe020b0>

When a table does not have a primary key, which is very rare, you can use the CREATE CLUSTERED INDEX statement to add a clustered index to it.

In [5]:
cursor.execute('''
CREATE CLUSTERED INDEX ix_parts_id
ON production.parts (part_id);  
''')

<pyodbc.Cursor at 0x1755fe020b0>

When executing the following statement, the SQL Server traverses the index (Clustered Index Seek) to locate the rows, which is faster than scanning the whole table.

#### Points to remember:

    A clustered index physically organizes the data in a table according to the index key.
    When creating a table with a primary key, SQL Server automatically creates a clustered index based on the primary key columns.
    A table has only one clustered index.
    Use the CREATE CLUSTERED INDEX statement to create a new clustered index for a table.

#### Non-Clustered Indexes:

A nonclustered index is a data structure that improves the speed of data retrieval from tables. Unlike a clustered index, a nonclustered index sorts and stores data separately from the data rows in the table. It is a copy of selected columns of data from a table with the links to the associated table.

Like a clustered index, a nonclustered index uses the B-tree structure to organize its data.

A table may have one or more nonclustered indexes and each non-clustered index may include one or more columns in a table.

Besides storing the index key values, the leaf nodes also store row pointers to the data rows that contain the key values. These row pointers are also known as row locators.

If the underlying table is a clustered table, the row pointer is the clustered index key. In case the underlying table is a heap, the row pointer points to the row of the table.

##### 1) Using the CREATE INDEX statement to create a nonclustered index for one column example

In [6]:
cursor.execute('''
SELECT 
    customer_id, 
    city
FROM 
    sales.customers
WHERE 
    city = 'Atwater';
''')

<pyodbc.Cursor at 0x1755fe020b0>

In [7]:
cursor.execute('''
CREATE INDEX ix_customers_city
ON sales.customers(city);
''')

<pyodbc.Cursor at 0x1755fe020b0>

#### 2) Using the CREATE INDEX statement to create a nonclustered index for multiple columns

In [8]:
cursor.execute('''
SELECT 
    customer_id, 
    first_name, 
    last_name
FROM 
    sales.customers
WHERE 
    last_name = 'Berg' AND 
    first_name = 'Monika';
''')

<pyodbc.Cursor at 0x1755fe020b0>

In [9]:
cursor.execute('''
CREATE INDEX ix_customers_name 
ON sales.customers(last_name, first_name);
''')

<pyodbc.Cursor at 0x1755fe020b0>

In [10]:
cursor.execute('''
SELECT * FROM sales.customers
''')


# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)
df.head(10)

Unnamed: 0,customer_id,first_name,last_name,phone,email,street,city,state,zip_code
0,1,Debra,Burks,,debra.burks@yahoo.com,9273 Thorne Ave.,Orchard Park,NY,14127
1,2,Kasha,Todd,,kasha.todd@yahoo.com,910 Vine Street,Campbell,CA,95008
2,3,Tameka,Fisher,,tameka.fisher@aol.com,769C Honey Creek St.,Redondo Beach,CA,90278
3,4,Daryl,Spence,,daryl.spence@aol.com,988 Pearl Lane,Uniondale,NY,11553
4,5,Charolette,Rice,(916) 381-6003,charolette.rice@msn.com,107 River Dr.,Sacramento,CA,95820
5,6,Lyndsey,Bean,,lyndsey.bean@hotmail.com,769 West Road,Fairport,NY,14450
6,7,Latasha,Hays,(716) 986-3359,latasha.hays@hotmail.com,7014 Manor Station Rd.,Buffalo,NY,14215
7,8,Jacquline,Duncan,,jacquline.duncan@yahoo.com,15 Brown St.,Jackson Heights,NY,11372
8,9,Genoveva,Baldwin,,genoveva.baldwin@msn.com,8550 Spruce Drive,Port Washington,NY,11050
9,10,Pamelia,Newman,,pamelia.newman@gmail.com,476 Chestnut Ave.,Monroe,NY,10950
