# **[Tutorial: From Excel workbook to a Power BI report in Microsoft Teams](https://learn.microsoft.com/en-us/power-bi/create-reports/service-from-excel-to-stunning-report)**

<img src='https://learn.microsoft.com/en-us/power-bi/create-reports/media/service-from-excel-to-stunning-report/power-bi-financial-report-service.png'>

### **[Sample data? Download here](https://github.com/microsoft/powerbi-desktop-samples/blob/main/AdventureWorks%20Sales%20Sample/AdventureWorks%20Sales.xlsx)**

In [1]:
# To read multiple sheets from a single Excel file, we will use the pandas library.
# Below is an example of how you can read all sheets from an Excel file into a dictionary of DataFrames.

import pandas as pd

# Function to read all sheets from an Excel file
def read_excel_sheets(excel_file):
    # Using sheet_name=None reads all sheets, each sheet as a DataFrame in a dictionary
    sheets_dict = pd.read_excel(excel_file, sheet_name=None)
    return sheets_dict

# We will not execute the function call here due to environment limitations
# The function call would look something like this:
# all_sheets = read_excel_sheets('/path/to/AdventureWorks Sales.xlsx')
# The variable all_sheets would be a dictionary with sheet names as keys and DataFrames as values.

# This code will not run here because the pandas library is not available in this environment,
# but this is how it would work in a standard Python environment.

In [32]:
excel_file = '/content/AdventureWorks Sales.xlsx'

sheets_dict = read_excel_sheets(excel_file)

sheets_dict

{'Sales Order_data':          Channel  SalesOrderLineKey Sales Order Sales Order Line
 0       Reseller           43659001     SO43659      SO43659 - 1
 1       Reseller           43659002     SO43659      SO43659 - 2
 2       Reseller           43659003     SO43659      SO43659 - 3
 3       Reseller           43659004     SO43659      SO43659 - 4
 4       Reseller           43659005     SO43659      SO43659 - 5
 ...          ...                ...         ...              ...
 121248  Internet           75122001     SO75122      SO75122 - 1
 121249  Internet           75122002     SO75122      SO75122 - 2
 121250  Internet           75123001     SO75123      SO75123 - 1
 121251  Internet           75123002     SO75123      SO75123 - 2
 121252  Internet           75123003     SO75123      SO75123 - 3
 
 [121253 rows x 4 columns],
 'Sales Territory_data':     SalesTerritoryKey          Region         Country          Group
 0                   1       Northwest   United States  North Am

In [33]:
sheets_dict.keys()

dict_keys(['Sales Order_data', 'Sales Territory_data', 'Sales_data', 'Reseller_data', 'Date_data', 'Product_data', 'Customer_data'])

<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/0*9UNgxNIu8-HgaxLa.png'>

In [34]:
for key, value in sheets_dict.items():
    print(key, type(value))

Sales Order_data <class 'pandas.core.frame.DataFrame'>
Sales Territory_data <class 'pandas.core.frame.DataFrame'>
Sales_data <class 'pandas.core.frame.DataFrame'>
Reseller_data <class 'pandas.core.frame.DataFrame'>
Date_data <class 'pandas.core.frame.DataFrame'>
Product_data <class 'pandas.core.frame.DataFrame'>
Customer_data <class 'pandas.core.frame.DataFrame'>


In [28]:
def eda(df):
    return pd.DataFrame({'자료 내용(contents)':{col:df[col].unique() for col in df},
                '데이터형태(dtypes)':{col:df[col].dtype for col in df},
                '고유값 수(nunique)':{col:len(df[col].unique()) for col in df},
                '결측치 비율(%)':{col:str(round(sum(df[col].isna())/len(df),2))+'%' for col in df},
                'nan 비율':{col:int(df[df[col] == 'nan'].shape[0]/len(df)*100) for col in df}
                })

In [35]:
sheets_dict['Sales Order_data']

Unnamed: 0,Channel,SalesOrderLineKey,Sales Order,Sales Order Line
0,Reseller,43659001,SO43659,SO43659 - 1
1,Reseller,43659002,SO43659,SO43659 - 2
2,Reseller,43659003,SO43659,SO43659 - 3
3,Reseller,43659004,SO43659,SO43659 - 4
4,Reseller,43659005,SO43659,SO43659 - 5
...,...,...,...,...
121248,Internet,75122001,SO75122,SO75122 - 1
121249,Internet,75122002,SO75122,SO75122 - 2
121250,Internet,75123001,SO75123,SO75123 - 1
121251,Internet,75123002,SO75123,SO75123 - 2


In [None]:
for key, value in sheets_dict.items():
    print(key, type(key))
    display(sheets_dict[key].head())
    display(eda(sheets_dict[key]))

In [37]:
sheets_dict['Sales Order_data'].drop_duplicates()

Unnamed: 0,Channel,SalesOrderLineKey,Sales Order,Sales Order Line
0,Reseller,43659001,SO43659,SO43659 - 1
1,Reseller,43659002,SO43659,SO43659 - 2
2,Reseller,43659003,SO43659,SO43659 - 3
3,Reseller,43659004,SO43659,SO43659 - 4
4,Reseller,43659005,SO43659,SO43659 - 5
...,...,...,...,...
121248,Internet,75122001,SO75122,SO75122 - 1
121249,Internet,75122002,SO75122,SO75122 - 2
121250,Internet,75123001,SO75123,SO75123 - 1
121251,Internet,75123002,SO75123,SO75123 - 2


In [38]:
def clean_data(sheets_dict):
    cleaned_sheets = {}
    for sheet_name, df in sheets_dict.items():
        # Remove duplicates
        df = df.drop_duplicates()

        # Handle missing values; this could be different based on the sheet's context
        df = df.dropna()
        # Convert data types if necessary; this is highly dependent on the column context
        # for column in ['DateColumn']:
        #     df[column] = pd.to_datetime(df[column])
        # Rename columns if necessary
        # df.rename(columns={'oldName': 'newName'}, inplace=True)
        # Drop unnecessary columns
        # df.drop(columns=['UnnecessaryColumn'], inplace=True)
        # Store the cleaned data
        cleaned_sheets[sheet_name] = df
    return cleaned_sheets

In [39]:
cleaned_sheets_dict = clean_data(sheets_dict)

In [None]:
for key, value in cleaned_sheets_dict.items():
    print(key)
    display(eda(cleaned_sheets_dict[key]))
    print('*'*100)

In [44]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [60]:
cleaned_sheets_dict['Customer_data']['Customer ID'].vue_counts()

Customer ID
[Not Applicable]    1
AW00023419          1
AW00023327          1
AW00023326          1
AW00023325          1
                   ..
AW00017160          1
AW00017159          1
AW00017158          1
AW00017157          1
AW00029483          1
Name: count, Length: 18485, dtype: int64

In [66]:
#'[Not Applicable]' in cleaned_sheets_dict['Customer_data']['Customer ID'].unique()

# Filter the DataFrame to show only rows where 'Customer ID' is '[Not Applicable]'
not_applicable_rows = cleaned_sheets_dict['Customer_data'][cleaned_sheets_dict['Customer_data']['Customer ID'] == '[Not Applicable]']

# Display these r
not_applicable_rows

True

Unnamed: 0,CustomerKey,Customer ID,Customer,City,State-Province,Country-Region,Postal Code
0,-1,[Not Applicable],[Not Applicable],[Not Applicable],[Not Applicable],[Not Applicable],[Not Applicable]


In [71]:
# Replace '[Not Applicable]' with NaN
# Drop any rows that now contain NaN values
cleaned_sheets_dict['Customer_data'].replace('[Not Applicable]', pd.NA).dropna()

Unnamed: 0,CustomerKey,Customer ID,Customer,City,State-Province,Country-Region,Postal Code
1,11000,AW00011000,Jon Yang,Rockhampton,Queensland,Australia,4700
2,11001,AW00011001,Eugene Huang,Seaford,Victoria,Australia,3198
3,11002,AW00011002,Ruben Torres,Hobart,Tasmania,Australia,7001
4,11003,AW00011003,Christy Zhu,North Ryde,New South Wales,Australia,2113
5,11004,AW00011004,Elizabeth Johnson,Wollongong,New South Wales,Australia,2500
...,...,...,...,...,...,...,...
18480,29479,AW00029479,Tommy Tang,Versailles,Yveline,France,78000
18481,29480,AW00029480,Nina Raji,London,England,United Kingdom,SW19 3RU
18482,29481,AW00029481,Ivan Suri,Hof,Bayern,Germany,95010
18483,29482,AW00029482,Clayton Zhang,Saint Ouen,Charente-Maritime,France,17490


In [82]:
for sheet_name, df in cleaned_sheets_dict.items():
    df.replace('[Not Applicable]', pd.NA).dropna().head(3)

Unnamed: 0,Channel,SalesOrderLineKey,Sales Order,Sales Order Line
0,Reseller,43659001,SO43659,SO43659 - 1
1,Reseller,43659002,SO43659,SO43659 - 2
2,Reseller,43659003,SO43659,SO43659 - 3


Unnamed: 0,SalesTerritoryKey,Region,Country,Group
0,1,Northwest,United States,North America
1,2,Northeast,United States,North America
2,3,Central,United States,North America


Unnamed: 0,SalesOrderLineKey,ResellerKey,CustomerKey,ProductKey,OrderDateKey,DueDateKey,ShipDateKey,SalesTerritoryKey,Order Quantity,Unit Price,Extended Amount,Unit Price Discount Pct,Product Standard Cost,Total Product Cost,Sales Amount
0,43659001,676,-1,349,20170702,20170712,20170709.0,5,1,2024.994,2024.994,0,1898.0944,1898.0944,2024.994
1,43659002,676,-1,350,20170702,20170712,20170709.0,5,3,2024.994,6074.982,0,1898.0944,5694.2832,6074.982
2,43659003,676,-1,351,20170702,20170712,20170709.0,5,1,2024.994,2024.994,0,1898.0944,1898.0944,2024.994


Unnamed: 0,ResellerKey,Reseller ID,Business Type,Reseller,City,State-Province,Country-Region,Postal Code
1,1,AW00000001,Value Added Reseller,A Bike Store,Seattle,Washington,United States,98104
2,2,AW00000002,Specialty Bike Shop,Progressive Sports,Renton,Washington,United States,98055
3,3,AW00000003,Warehouse,Advanced Bike Components,Irving,Texas,United States,75061


Unnamed: 0,DateKey,Date,Fiscal Year,Fiscal Quarter,Month,Full Date,MonthKey
0,20170701,2017-07-01,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 01",201707
1,20170702,2017-07-02,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 02",201707
2,20170703,2017-07-03,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 03",201707


Unnamed: 0,ProductKey,SKU,Product,Standard Cost,Color,List Price,Model,Subcategory,Category
0,210,FR-R92B-58,"HL Road Frame - Black, 58",868.6342,Black,1431.5,HL Road Frame,Road Frames,Components
1,211,FR-R92R-58,"HL Road Frame - Red, 58",868.6342,Red,1431.5,HL Road Frame,Road Frames,Components
2,212,HL-U509-R,"Sport-100 Helmet, Red",12.0278,Red,33.6442,Sport-100,Helmets,Accessories


Unnamed: 0,CustomerKey,Customer ID,Customer,City,State-Province,Country-Region,Postal Code
1,11000,AW00011000,Jon Yang,Rockhampton,Queensland,Australia,4700
2,11001,AW00011001,Eugene Huang,Seaford,Victoria,Australia,3198
3,11002,AW00011002,Ruben Torres,Hobart,Tasmania,Australia,7001


In [83]:
def clean_data_notapplicable(sheets_dict):
    cleaned_sheets_notapp = {}
    for sheet_name, df in sheets_dict.items():

        # Replace '[Not Applicable]' with NaN
        # Drop any rows that now contain NaN values
        # Store the cleaned data
        df = df.replace('[Not Applicable]', pd.NA).dropna()
        cleaned_sheets_notapp[sheet_name] = df
    return cleaned_sheets_notapp

In [84]:
cleaned_sheets_dict_noapp = clean_data_notapplicable(cleaned_sheets_dict)

In [85]:
for sheet_name, df in cleaned_sheets_dict_noapp.items():
    df.head()

Unnamed: 0,Channel,SalesOrderLineKey,Sales Order,Sales Order Line
0,Reseller,43659001,SO43659,SO43659 - 1
1,Reseller,43659002,SO43659,SO43659 - 2
2,Reseller,43659003,SO43659,SO43659 - 3
3,Reseller,43659004,SO43659,SO43659 - 4
4,Reseller,43659005,SO43659,SO43659 - 5


Unnamed: 0,SalesTerritoryKey,Region,Country,Group
0,1,Northwest,United States,North America
1,2,Northeast,United States,North America
2,3,Central,United States,North America
3,4,Southwest,United States,North America
4,5,Southeast,United States,North America


Unnamed: 0,SalesOrderLineKey,ResellerKey,CustomerKey,ProductKey,OrderDateKey,DueDateKey,ShipDateKey,SalesTerritoryKey,Order Quantity,Unit Price,Extended Amount,Unit Price Discount Pct,Product Standard Cost,Total Product Cost,Sales Amount
0,43659001,676,-1,349,20170702,20170712,20170709.0,5,1,2024.994,2024.994,0,1898.0944,1898.0944,2024.994
1,43659002,676,-1,350,20170702,20170712,20170709.0,5,3,2024.994,6074.982,0,1898.0944,5694.2832,6074.982
2,43659003,676,-1,351,20170702,20170712,20170709.0,5,1,2024.994,2024.994,0,1898.0944,1898.0944,2024.994
3,43659004,676,-1,344,20170702,20170712,20170709.0,5,1,2039.994,2039.994,0,1912.1544,1912.1544,2039.994
4,43659005,676,-1,345,20170702,20170712,20170709.0,5,1,2039.994,2039.994,0,1912.1544,1912.1544,2039.994


Unnamed: 0,ResellerKey,Reseller ID,Business Type,Reseller,City,State-Province,Country-Region,Postal Code
1,1,AW00000001,Value Added Reseller,A Bike Store,Seattle,Washington,United States,98104
2,2,AW00000002,Specialty Bike Shop,Progressive Sports,Renton,Washington,United States,98055
3,3,AW00000003,Warehouse,Advanced Bike Components,Irving,Texas,United States,75061
4,4,AW00000004,Value Added Reseller,Modular Cycle Systems,Austin,Texas,United States,78701
5,5,AW00000005,Specialty Bike Shop,Metropolitan Sports Supply,Fremont,California,United States,94536


Unnamed: 0,DateKey,Date,Fiscal Year,Fiscal Quarter,Month,Full Date,MonthKey
0,20170701,2017-07-01,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 01",201707
1,20170702,2017-07-02,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 02",201707
2,20170703,2017-07-03,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 03",201707
3,20170704,2017-07-04,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 04",201707
4,20170705,2017-07-05,FY2018,FY2018 Q1,2017 Jul,"2017 Jul, 05",201707


Unnamed: 0,ProductKey,SKU,Product,Standard Cost,Color,List Price,Model,Subcategory,Category
0,210,FR-R92B-58,"HL Road Frame - Black, 58",868.6342,Black,1431.5,HL Road Frame,Road Frames,Components
1,211,FR-R92R-58,"HL Road Frame - Red, 58",868.6342,Red,1431.5,HL Road Frame,Road Frames,Components
2,212,HL-U509-R,"Sport-100 Helmet, Red",12.0278,Red,33.6442,Sport-100,Helmets,Accessories
3,213,HL-U509-R,"Sport-100 Helmet, Red",13.8782,Red,33.6442,Sport-100,Helmets,Accessories
4,214,HL-U509-R,"Sport-100 Helmet, Red",13.0863,Red,34.99,Sport-100,Helmets,Accessories


Unnamed: 0,CustomerKey,Customer ID,Customer,City,State-Province,Country-Region,Postal Code
1,11000,AW00011000,Jon Yang,Rockhampton,Queensland,Australia,4700
2,11001,AW00011001,Eugene Huang,Seaford,Victoria,Australia,3198
3,11002,AW00011002,Ruben Torres,Hobart,Tasmania,Australia,7001
4,11003,AW00011003,Christy Zhu,North Ryde,New South Wales,Australia,2113
5,11004,AW00011004,Elizabeth Johnson,Wollongong,New South Wales,Australia,2500


### **[Adventure Works 2022 CSVs](https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables?select=Product.csv)**

> #### ANALYSIS OF ADVENTURE WORKS SALES PERFORMANCE: POWER BI

## <font color='blue'> **1. Sales Overview**
<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/0*jJu_KSKCezRX9Of6.png'>

## <font color='blue'> **2. Customer Details**
<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/0*78zgpVNLKw3EGobX.png'>

## <font color='blue'> **3. Product Details**
<img src='https://miro.medium.com/v2/resize:fit:1400/format:webp/0*Ljg3YxEaKGAgfFV6.png'>

## <font color='blue'> **4. Sales Map**
<img src='https://miro.medium.com/v2/resize:fit:1100/format:webp/0*eXO485qjTIBD9qRx.png'>