# SQL in Python: AdventureWorks Database Business Analysis

Welcome to my project: SQL in Python - AdventureWorks Database Business Analysis. In this project, I utilize with Python to perform data analysis on the AdventureWorks database, leveraging the power of SQL queries to gain valuable insights into business operations.

Join me on this exciting journey as I explore the data, analyze trends, and uncover key patterns to drive informed decision-making. Let's dive into the world of AdventureWorks and unleash the potential of SQL in Python! 🚀


## Python Code for SQL Data AnalysisThe  code snippet that performs SQL data analysis in Python using libraries such as `pyodbc` and `pandas`.

In [1]:
import pyodbc
import os
import pandas as pd

In [2]:
#Check if drivers are installed
[x for x in pyodbc.drivers() if x.startswith("Microsoft Access Driver")]

['Microsoft Access Driver (*.mdb, *.accdb)']

In [3]:
# Define the connection string
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=localhost;'
    r'DATABASE=AdventureWorksDW2019;'
    r'Trusted_Connection=yes;'
)

In [4]:
# Establish the connection
conn = pyodbc.connect(conn_str)

# Create a cursor
cursor = conn.cursor()



Relevant tables are:

    FactInternetSales
    DimProduct
    DimProductCategory
    DimProductSubcategory
    DimCustomer
    DimGeography
    DimReseller
    DimSalesTerritory
    DimEmployee
    DimDate

### Total Internet sales amount:

In [5]:
# execute a query
cursor.execute('''SELECT SUM(SalesAmount) AS [Total Sales Amount]
FROM FactInternetSales;''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
total_internet_sales_amount = pd.DataFrame(data)
total_internet_sales_amount.head()

Unnamed: 0,Total Sales Amount
0,29358677.2207


### Total Order Quantity

In [6]:
# execute a query
cursor.execute('''SELECT SUM(OrderQuantity) AS [Total Order Quanitity]
FROM FactInternetSales;''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
total_order_quantity = pd.DataFrame(data)
total_order_quantity.head()

Unnamed: 0,Total Order Quanitity
0,60398


### Total profit:

In [7]:
# execute a query
cursor.execute('''SELECT SUM(SalesAmount) - SUM(TotalProductCost) AS [Total Profit]
FROM FactInternetSales''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
total_profit = pd.DataFrame(data)
total_profit.head()

Unnamed: 0,Total Profit
0,12080883.645


### Total number of product by category

In [8]:
# execute a query
cursor.execute('''SELECT COUNT(DISTINCT EnglishProductName) As [Total Number of Products]
FROM DimProduct;''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
total_number_of_product_category = pd.DataFrame(data)
total_number_of_product_category.head()

Unnamed: 0,Total Number of Products
0,504


### Top 10 customers with highest order amount

In [9]:
# execute a query
cursor.execute('''-- Top 10 customers
SELECT TOP 10 CONCAT(LastName, ' ', FirstName) AS [Customer], 
SUM(SalesAmount) AS [Total Sales]
FROM FactInternetSales F
INNER JOIN DimCustomer D
ON F.CustomerKey = D.CustomerKey
GROUP BY CONCAT(LastName, ' ', FirstName)
ORDER BY [Total Sales] DESC; ''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
top_highest_order = pd.DataFrame(data)
top_highest_order.head()

Unnamed: 0,Customer,Total Sales
0,Turner Jordan,15999.0996
1,Xu Willie,13490.0596
2,Nara Nichole,13295.38
3,Henderson Kaitlyn,13294.27
4,He Margaret,13269.27


### Total sales by year

In [10]:
# execute a query
cursor.execute('''-- Total sales by year
SELECT YEAR(OrderDate) AS Year, SUM(SalesAmount) AS [Total Sales]
FROM FactInternetSales
GROUP BY YEAR(OrderDate)
ORDER BY YEAR(OrderDate) DESC;''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
total_sales_per_year = pd.DataFrame(data)
total_sales_per_year.head()

Unnamed: 0,Year,Total Sales
0,2014,45694.72
1,2013,16351550.34
2,2012,5842485.1952
3,2011,7075525.9291
4,2010,43421.0364


### Sales by regions

In [11]:
# execute a query
cursor.execute('''-- Sales by regions
SELECT D.SalesTerritoryRegion AS [Region], SUM(F.SalesAmount) AS [Total Sales]
FROM FactInternetSales F
INNER JOIN DimSalesTerritory D
ON F.SalesTerritoryKey = D.SalesTerritoryKey
GROUP BY D.SalesTerritoryRegion ''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
sales_by_region = pd.DataFrame(data)
sales_by_region.head()

Unnamed: 0,Region,Total Sales
0,Australia,9061000.5844
1,Canada,1977844.8621
2,Central,3000.8296
3,France,2644017.7143
4,Germany,2894312.3382


### Extract data with SQL queries

The tables include too many unnecessary columns. Following SQL queries have been run to extract necessary data only and to prepare new tables which will be loaded to Power BI for further transform or analysis.


#### Dim_Calendar:

In [12]:
# execute a query
cursor.execute('''-- Dim_Date Table --
SELECT 
  [DateKey], 
  [FullDateAlternateKey] AS Date, 
  [EnglishDayNameOfWeek] AS Day, 
  [EnglishMonthName] AS Month, 
  Left([EnglishMonthName], 3) AS MonthShort,   
  [MonthNumberOfYear] AS MonthNo, 
  [CalendarQuarter] AS Quarter, 
  [CalendarYear] AS Year 
FROM 
 [AdventureWorksDW2019].[dbo].[DimDate]
''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df_dim_calendar = pd.DataFrame(data)
df_dim_calendar.head()

Unnamed: 0,DateKey,Date,Day,Month,MonthShort,MonthNo,Quarter,Year
0,20050101,2005-01-01,Saturday,January,Jan,1,1,2005
1,20050102,2005-01-02,Sunday,January,Jan,1,1,2005
2,20050103,2005-01-03,Monday,January,Jan,1,1,2005
3,20050104,2005-01-04,Tuesday,January,Jan,1,1,2005
4,20050105,2005-01-05,Wednesday,January,Jan,1,1,2005


### Dim_Products

In [13]:
# execute a query
cursor.execute('''-- Dim_Products Table --
SELECT 
  p.[ProductKey], 
  p.[ProductAlternateKey] AS ProductItemCode, 
  p.[EnglishProductName] AS [Product Name], 
  ps.EnglishProductSubcategoryName AS [Sub Category], -- Joined in from Sub Category Table
  pc.EnglishProductCategoryName AS [Product Category], -- Joined in from Category Table
  p.[Color] AS [Product Color], 
  p.[Size] AS [Product Size], 
  p.[ProductLine] AS [Product Line],
  p.[ModelName] AS [Product Model Name],
  p.[EnglishDescription] AS [Product Description],
  ISNULL (p.Status, 'Outdated') AS [Product Status] 
FROM 
  [AdventureWorksDW2019].[dbo].[DimProduct] as p
  LEFT JOIN dbo.DimProductSubcategory AS ps ON ps.ProductSubcategoryKey = p.ProductSubcategoryKey 
  LEFT JOIN dbo.DimProductCategory AS pc ON ps.ProductCategoryKey = pc.ProductCategoryKey 
order by 
  p.ProductKey asc''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df_dim_products = pd.DataFrame(data)
df_dim_products.head()

Unnamed: 0,ProductKey,ProductItemCode,Product Name,Sub Category,Product Category,Product Color,Product Size,Product Line,Product Model Name,Product Description,Product Status
0,1,AR-5381,Adjustable Race,,,,,,,,Current
1,2,BA-8327,Bearing Ball,,,,,,,,Current
2,3,BE-2349,BB Ball Bearing,,,,,,,,Current
3,4,BE-2908,Headset Ball Bearings,,,,,,,,Current
4,5,BL-2036,Blade,,,,,,,,Current


### Fact_InternetSales

In [14]:
# execute a query
cursor.execute('''-- Fact_InternetSales Table --
SELECT 
  [ProductKey], 
  [OrderDateKey], 
  [DueDateKey], 
  [ShipDateKey], 
  [CustomerKey], 
  [SalesOrderNumber], 
  [SalesAmount] 
FROM 
  [AdventureWorksDW2019].[dbo].[FactInternetSales]
WHERE 
  LEFT (OrderDateKey, 4) >= YEAR(GETDATE()) -10 
-- Ensures we always only bring ten years of date from extraction.
ORDER BY
  OrderDateKey ASC''')

# Fetch all rows from the executed query
rows = cursor.fetchall()

# Get the column names
columns = [column[0] for column in cursor.description]

# Convert the rows into a list of dictionaries
data = [dict(zip(columns, row)) for row in rows]

# Create a DataFrame from the list of dictionaries
df_fact_internetsales = pd.DataFrame(data)
df_fact_internetsales.head()

Unnamed: 0,ProductKey,OrderDateKey,DueDateKey,ShipDateKey,CustomerKey,SalesOrderNumber,SalesAmount
0,535,20140101,20140113,20140108,11051,SO74253,24.99
1,528,20140101,20140113,20140108,11051,SO74253,4.99
2,222,20140101,20140113,20140108,11051,SO74253,34.99
3,535,20140101,20140113,20140108,11079,SO74254,24.99
4,539,20140101,20140113,20140108,15154,SO74255,24.99


In [15]:
# Close the connection
conn.close()