# Practical Python Example 1

This is an example of using Python to automate a typical data analysis operations. 

The source dataset is an Excel workbook containing [a fictitious dataset described here](https://zomalex.co.uk/datasets/montana_dataset.html).  This has a fact table, Store, and three related lookup tables: Countries, Products and Segments. 

The Store table is in the 'Store Tab' worksheet.  
The Countries, Products and Segments tables are in the 'Lookups Tab' worksheet.

This Python script

* loads these Excel tables into pandas dataframes
* add some extra calculated columns to the Store dataframe
* merges the lookup tables into the Store table
* create a summary of the dataset 
* plots this summary data as a bar chart
* exports the bar chart as a PNG file


In [None]:
import pandas as pd
import openpyxl

In [None]:
data_xlsx_filepath = "./resources/montana-data-tables.xlsx"

Inspect the structure of the Excel workbook: sheets and tables

In [None]:
wb = openpyxl.load_workbook(filename=data_xlsx_filepath)
for sheet in wb.worksheets:
    print(f"Sheet: {sheet.title}")
    for table in sheet.tables:
        print(f"Table: {table} Range: {sheet.tables[table].ref}")


The three Excel tables are on a worksheet named Lookups Tab. 
We cannot load directly from an Excel table.  
Instead we retrieve the table name and range then use that information to import the data.

Create a function that returns the data in an Excel table as a dataframe.  We will use this for several tables.

In [None]:
def get_dataframe_from_table(sheet_name, table_name):
    store_tab_sheet = wb[sheet_name]
    store_table = store_tab_sheet.tables[table_name]

    # Get the cell range for the table
    table_range = store_table.ref

    # Extract the cell values from the range
    data = store_tab_sheet[table_range] # This gives a tuple of tuples (rows of cells)
    # Convert to a list of lists (2D array) for easier DataFrame creation
    rows = [[cell.value for cell in row] for row in data]

    # Convert to DataFrame
    df = pd.DataFrame(rows[1:], columns=rows[0])
    return df


In [None]:
df_store = get_dataframe_from_table("Store Tab", "Store")
df_countries = get_dataframe_from_table("Lookups Tab", "Countries")
df_products = get_dataframe_from_table("Lookups Tab", "Products")
df_segments = get_dataframe_from_table("Lookups Tab", "Segments")

Add three Calculated columns to the Store DataFrame.

* Gross Sales = SalePrice * Quantity
* Sales = Gross Sales - Discount
* Profit = Sales - COGS

In [None]:
df_store["Gross Sales"] = df_store["Quantity"] * df_store["SalePrice"]
df_store["Sales"] = df_store["Gross Sales"] - df_store["Discount"]
df_store["Profit"] = df_store["Sales"] - df_store["COGS"]

Merge the three lookup tables into the Store table.

In [None]:
df_store_merged = df_store.merge(df_countries, on='Country', how='left')
df_store_merged = df_store_merged.merge(df_products, on='Product', how='left')
df_store_merged = df_store_merged.merge(df_segments, on='Segment', how='left')
df_store_merged.columns

In [None]:
df_sales_summary = df_store_merged.groupby(['Region', 'Tier'])['Sales'].sum().reset_index()
df_sales_summary

In [None]:
df_sales_summary.to_csv("./outputs/sales-summary.csv", index=False)

Create a chart from this Dataframe - a horizontal bar chart of Sales by Tier on the axis and Region on the legend.
Order the tiers as 'Gold', 'Silver', 'Bronze'


In [None]:
import matplotlib.pyplot as plt

df_pivot = df_sales_summary.pivot(index='Tier', columns='Region', values='Sales')

# Set the order for the index
tier_order = ['Gold', 'Silver', 'Bronze']
df_pivot.index = pd.CategoricalIndex(df_pivot.index, categories=tier_order, ordered=True)
df_pivot = df_pivot.sort_index()

df_pivot.plot(kind='barh', stacked=True)

plt.xlabel('Sales')
plt.title('Sales by Tier and Region')
plt.legend(title='Region')
plt.tight_layout()
plt.show()


Save the chart as an image file in the  outputs folder

In [None]:
fig, ax = plt.subplots()
df_pivot.plot(kind='barh', stacked=True, ax=ax)

ax.set_xlabel('Sales')
ax.set_title('Sales by Tier and Region')
ax.legend(title='Region')
plt.tight_layout()

plt.savefig('./outputs/sales_by_tier_region.png')
plt.close(fig)