## Using Python to Integrate MongoDB Data into an ETL Process
This notebook demonstrates the setup of an ETL (Extract, Transform, Load) pipeline.

In this project, I will build upon the **MyShop** dimensional database; however, you will be integrating new data sourced from an instance of MongoDB. The new data will be concerned with new business processes; inventory and purchasing. I will continue to interact with both the source systems (MongoDB and MySQL), and the destination system (the MyShop data warehouse) from a remote client running Python (Jupyter Notebooks). 

I fetch data into Pandas DataFrames, perform all the necessary transformations in-memory on the client, and then push the newly transformed DataFrame to the RDBMS data warehouse using a Pandas function that will create the table and fill it with data with a single operation.

### Prerequisites:
This notebook uses the PyMongo database connectivity library to connect to MySQL databases; therefore, you must have first installed that libary into your python environment by executing the following command in a Terminal window.

- `python -m pip install pymongo[srv]`

#### Import the Necessary Libraries

In [2]:
import os
import logging
from typing import Dict
import pandas as pd
import numpy as np
import pymysql


import pymongo
import sqlalchemy
from sqlalchemy import create_engine, text

In [3]:
print(f"Running SQL Alchemy Version: {sqlalchemy.__version__}")
print(f"Running PyMongo Version: {pymongo.__version__}")

Running SQL Alchemy Version: 2.0.34
Running PyMongo Version: 4.8.0


#### Declare & Assign Connection Variables for the MongoDB Server, the MySQL Server & Databases with which You'll be Working 

In [4]:
from pymongo import MongoClient
import json

# Example setup of logging for the notebook
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Function to get MongoDB client
def get_mongo_client(host: str, port: int, username: str = None, password: str = None, db_name: str = None) -> MongoClient:
    """Initialize MongoDB client for a remote database."""
    connection_string = f"mongodb://{host}"
    if username and password:
        connection_string = f"mongodb://{username}:{password}@{host}"

    try:
        client = MongoClient(connection_string)
        if db_name:
            client = client[db_name]
        logger.info("MongoDB client initialized successfully.")
        return client
    except Exception as e:
        logger.error(f"Failed to initialize MongoDB client: {e}")
        raise e


# SQL connection
def get_sql_connection(host: str, user: str, password: str, db: str):
    """Initialize SQL connection."""
    conn = pymysql.connect(host=host, user=user, password=password, db=db)
    return conn

In [5]:
# Set the path of the current working directory and append 'data' directory
data_dir = os.path.join(os.getcwd(), 'data')
logger.info(f"Data directory set to: {data_dir}")


INFO:__main__:Data directory set to: /Users/mac/Downloads/data-warehouse-project/data


In [6]:
# Define JSON files for MongoDB collections
json_files = {
    "sales_orders": 'StoreSales.json',
}


In [7]:
def set_mongo_collections(client: MongoClient, db_name: str, data_dir: str, json_files: dict):
    """Load JSON data into MongoDB collections."""
    db = client[db_name]
    for collection_name, file_name in json_files.items():
        file_path = os.path.abspath(os.path.join(data_dir, file_name))
        
        # Load JSON data and insert into MongoDB
        try:
            with open(file_path, 'r') as f:
                data = json.load(f)
                if isinstance(data, list): 
                    db[collection_name].insert_many(data)
                    logger.info(f"Inserted {len(data)} documents into '{collection_name}' collection.")
                else:
                    db[collection_name].insert_one(data)
                    logger.info(f"Inserted a single document into '{collection_name}' collection.")
        except Exception as e:
            logger.error(f"Error loading data for collection '{collection_name}': {str(e)}")


In [8]:
# MongoDB connection arguments (example)
mongodb_args = {
    "host": "localhost",
    "port": 27017,
    "username": "mikelangelo1",
    "password": "password123",
    "db_name": "data_ware_house"
}

# Initialize the MongoDB client
client = get_mongo_client(
    host=mongodb_args["host"],
    port=mongodb_args["port"],
    username=mongodb_args.get("username"),
    password=mongodb_args.get("password")
)

INFO:__main__:MongoDB client initialized successfully.


#### Populate MongoDB with Source Data
You only need to run this cell once; however, the operation is *idempotent*.  In other words, it can be run multiple times without changing the end result.

In [9]:
# Load data into MongoDB collections
set_mongo_collections(client, mongodb_args["db_name"], data_dir, json_files)

INFO:__main__:Inserted 51291 documents into 'sales_orders' collection.


#### Data Extractor
This class provides mock methods to:

Extract data from a MongoDB collection.

In [10]:
class DatabaseConnection:
    def __init__(self, mongodb_args: Dict):
        """Initialize the DatabaseConnection with MongoDB client parameters."""
        self.client = self.get_mongo_client(mongodb_args)
        self.db = None
        self.db_name = mongodb_args.get('db_name', 'default_db')

    def get_mongo_connection(self):
        """Return a MongoDB connection using the provided config."""
        try:
            # Connect to MongoDB and access the specified database
            self.db = self.client[self.db_name]
            logger.info(f"MongoDB connection established to database: {self.db_name}")
            return self.db  
        except Exception as e:
            logger.error(f"Error connecting to MongoDB: {str(e)}")
            raise
    
    def close_connections(self):
        """Close MongoDB connection."""
        try:
            if self.client:
                self.client.close()
                logger.info("MongoDB connection closed.")
        except Exception as e:
            logger.error(f"Error closing MongoDB connection: {str(e)}")
            raise

    def get_mongo_client(self, mongodb_args: Dict) -> MongoClient:
        """Initialize and return MongoDB client."""
        try:
            host = mongodb_args.get("host", "localhost")
            port = mongodb_args.get("port", 27017)
            username = mongodb_args.get("username")
            password = mongodb_args.get("password")

            if username and password:
                client = MongoClient(host, port, username=username, password=password)
            else:
                client = MongoClient(host, port)

            logger.info("MongoDB client initialized.")
            return client
        except Exception as e:
            logger.error(f"Error initializing MongoDB client: {str(e)}")
            raise

class DataExtractor:
    def __init__(self, db_connection: DatabaseConnection):
        """Initialize DataExtractor with the given database connection."""
        self.db_conn = db_connection
        
    def extract_from_mongodb(self, collection: str, query: Dict = None) -> pd.DataFrame:
        """Extract data from MongoDB collection."""
        try:
            mongo_db = self.db_conn.get_mongo_connection()
            data = mongo_db[collection].find(query or {})  # Find all documents or filtered by query
            return pd.DataFrame(list(data))  # Convert to DataFrame
        except Exception as e:
            logger.error(f"Error extracting from MongoDB collection '{collection}': {str(e)}")
            raise
            
    def extract_from_api(self, endpoint: str, params: Dict = None) -> pd.DataFrame:
        """Extract data from a REST API."""
        try:
            session = self.db_conn.get_api_session()  
            api_config = self.db_conn.config['api']
            response = session.get(f"{api_config['base_url']}/{endpoint}", params=params)
            response.raise_for_status()
            return pd.DataFrame(response.json())
        except Exception as e:
            logger.error(f"Error extracting from API endpoint '{endpoint}': {str(e)}")
            raise



db_conn = DatabaseConnection(mongodb_args)
extractor = DataExtractor(db_conn)

try:
    df_mongo = extractor.extract_from_mongodb("sales_orders")
    print("MongoDB Data:")
    print(df_mongo)
except Exception as e:
    logger.error(f"Error in MongoDB extraction test: {str(e)}")

# Note: For `extract_from_api`, replace with an actual endpoint or mock response as required.


INFO:__main__:MongoDB client initialized.
INFO:__main__:MongoDB connection established to database: data_ware_house


MongoDB Data:
                             _id Row ID         Order ID  Order Date  \
0       673167cbbdc428b0f7839e83  32298   CA-2012-124891  31-07-2012   
1       673167cbbdc428b0f7839e84  26341    IN-2013-77878  05-02-2013   
2       673167cbbdc428b0f7839e85  25330    IN-2013-71249  17-10-2013   
3       673167cbbdc428b0f7839e86  13524  ES-2013-1579342  28-01-2013   
4       673167cbbdc428b0f7839e87  47221     SG-2013-4320  05-11-2013   
...                          ...    ...              ...         ...   
256450  6747d510c608a76b1ff52dfd  35398   US-2014-102288  20-06-2014   
256451  6747d510c608a76b1ff52dfe  40470   US-2013-155768  02-12-2013   
256452  6747d510c608a76b1ff52dff   9596   MX-2012-140767  18-02-2012   
256453  6747d510c608a76b1ff52e00   6147   MX-2012-134460  22-05-2012   
256454  6747d510c608a76b1ff52e01                     NaN         NaN   

         Ship Date       Ship Mode Customer ID     Customer Name      Segment  \
0       31-07-2012        Same Day    RH

## Data Loader
This class provides the implementation of load_to_warehouse, simulating loading a DataFrame into a data warehouse. It connects to the MSQL database and load data


In [11]:
import logging
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.engine import Engine


class MsqlDatabaseConnection:
    def __init__(self, username: str, password: str, host: str = "localhost", port: int = 3306, db_name: str = "myshop"):
        self.username = username
        self.password = password
        self.host = host
        self.port = port
        self.db_name = db_name

    def get_sqlalchemy_engine(self) -> Engine:
        """Return a SQLAlchemy engine connected to MySQL database."""
        try:
            connection_uri = f"mysql+pymysql://{self.username}:{self.password}@{self.host}:{self.port}/{self.db_name}"
            engine = create_engine(connection_uri)
            logger.info("Successfully connected to MySQL database.")
            return engine
        except Exception as e:
            logger.error(f"Error connecting to MySQL: {str(e)}")
            raise
    
    def get_query_result(self, query: str) -> pd.DataFrame:
        """Execute a query and return the result as a DataFrame."""
        try:
            engine = self.get_sqlalchemy_engine()
            result = pd.read_sql(query, engine)
            logger.info(f"Query executed successfully: {query}")
            return result
        except Exception as e:
            logger.error(f"Error executing query: {str(e)}")
            raise

# DataLoader class
class DataLoader:
    def __init__(self, db_connection: MsqlDatabaseConnection):
        self.db_conn = db_connection
    
    def load_to_warehouse(self, df: pd.DataFrame, table_name: str, if_exists: str = 'append') -> None:
        """Load DataFrame to data warehouse."""
        try:
            engine = self.db_conn.get_sqlalchemy_engine()
            df.to_sql(
                name=table_name,
                con=engine,
                if_exists=if_exists,
                index=False,
                chunksize=1000
            )
            logger.info(f"Successfully loaded {len(df)} rows to {table_name}")
        except Exception as e:
            logger.error(f"Error loading data to warehouse: {str(e)}")
            raise


db_connection = MsqlDatabaseConnection(
    username="root",  # Replace with your MySQL username
    password="Akinolami6650!",  # Replace with your MySQL password
    db_name="myshop"           # The name of your database
)

# Create an instance of DataLoader
data_loader = DataLoader(db_connection)

#### Populate MongoDB with Source Data
You only need to run this cell once; however, the operation is *idempotent*.  In other words, it can be run multiple times without changing the end result.

### 1.0. Create and Populate the New Dimension Tables
#### 1.1. Extract Data from the Source MongoDB Collections Into DataFrames

In [19]:
class DataExtractor:
    def __init__(self, db_connection: DatabaseConnection):
        self.db_conn = db_connection
        
    def extract_from_mongodb(self, collection: str, query: Dict = None) -> pd.DataFrame:
        """Extract data from MongoDB collection."""
        try:
            mongo_db = self.db_conn.get_mongo_connection("data_ware_house")
            data = mongo_db[collection].find(query if query else {})
            return pd.DataFrame(list(data))
        except Exception as e:
            logger.error(f"Error extracting from MongoDB: {str(e)}")
            raise

# DataTransformer class
class DataTransformer:
    @staticmethod
    def clean_customer_data(df: pd.DataFrame) -> pd.DataFrame:
        """Clean and transform customer data."""
        try:
            # Remove duplicates
            df = df.drop_duplicates()
            
            # Handle missing values (example with customer data fields)
            df['Customer Name'] = df['Customer Name'].fillna('Unknown')
            df['City'] = df['City'].fillna('Unknown')
            df['State'] = df['State'].fillna('Unknown')
            df['Country'] = df['Country'].fillna('Unknown')
            df['Postal Code'] = df['Postal Code'].fillna('')
            
            # Standardize customer-related columns (e.g., postal codes as strings)
            df['Postal Code'] = df['Postal Code'].apply(lambda x: str(x).zfill(5))  # Fill leading zeros if necessary
            
            logger.info("Customer data cleaned successfully.")
            return df
        except Exception as e:
            logger.error(f"Error cleaning customer data: {str(e)}")
            raise
    
    @staticmethod
    def transform_sales_data(df: pd.DataFrame) -> pd.DataFrame:
        """Transform sales data."""
        try:
            # Convert numeric columns to the correct type
            df['Quantity'] = pd.to_numeric(df['Quantity'], errors='coerce')
            df['Sales'] = pd.to_numeric(df['Sales'], errors='coerce')
            df['Discount'] = pd.to_numeric(df['Discount'], errors='coerce')

            invalid_rows = df[df[['Quantity', 'Sales', 'Discount']].isna().any(axis=1)]
            if not invalid_rows.empty:
                logger.error(f"Invalid data in numeric columns:\n{invalid_rows}")
                df = df.dropna(subset=['Quantity', 'Sales', 'Discount'])
                logger.info(f"Dropped {len(invalid_rows)} rows with invalid numeric data.")

            # Calculate derived columns
            df['Total Amount'] = df['Quantity'] * df['Sales']  # Calculate total sales amount
            df['Discount Amount'] = df['Total Amount'] * df['Discount']  # Calculate discount amount
            df['Final Amount'] = df['Total Amount'] - df['Discount Amount']  # Calculate final sales amount

            # Convert dates
            df['Order Date'] = pd.to_datetime(df['Order Date'], format='%d-%m-%Y')
            df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%d-%m-%Y')

            # Add time dimensions (Year, Month, Quarter)
            df['Order Year'] = df['Order Date'].dt.year
            df['Order Month'] = df['Order Date'].dt.month
            df['Order Quarter'] = df['Order Date'].dt.quarter
            df['Ship Year'] = df['Ship Date'].dt.year
            df['Ship Month'] = df['Ship Date'].dt.month
            df['Ship Quarter'] = df['Ship Date'].dt.quarter

            logger.info("Sales data transformed successfully.")
            return df
        except Exception as e:
            logger.error(f"Error transforming sales data: {str(e)}")
            raise


try:
    df_sales = extractor.extract_from_mongodb("sales_orders")
    logger.info("Sales order data extracted successfully.")
except Exception as e:
    logger.error(f"Error extracting sales data: {str(e)}")

# Apply transformations to the extracted sales data
try:
    df_transformed_sales = DataTransformer.transform_sales_data(df_sales)
    logger.info("Sales data transformed successfully.")
except Exception as e:
    logger.error(f"Error transforming sales data: {str(e)}")

# Display transformed data
print("Transformed Sales Data:")
# print(df_transformed_sales)


INFO:__main__:MongoDB connection established to database: data_ware_house
INFO:__main__:Sales order data extracted successfully.
ERROR:__main__:Invalid data in numeric columns:
                             _id Row ID Order ID Order Date Ship Date  \
51290   673167cbbdc428b0f78466dd             NaN        NaN       NaN   
102581  67316b18bdc428b0f7852f39             NaN        NaN       NaN   
153872  67316b4fbdc428b0f785f795             NaN        NaN       NaN   
205163  67316ec2bdc428b0f786bff5             NaN        NaN       NaN   
256454  6747d510c608a76b1ff52e01             NaN        NaN       NaN   

       Ship Mode Customer ID Customer Name Segment City  ... Product ID  \
51290        NaN         NaN           NaN     NaN  NaN  ...        NaN   
102581       NaN         NaN           NaN     NaN  NaN  ...        NaN   
153872       NaN         NaN           NaN     NaN  NaN  ...        NaN   
205163       NaN         NaN           NaN     NaN  NaN  ...        NaN   
256454   

Transformed Sales Data:


## Sales Analysis by Customer Segment and Quarter

In [30]:

sales_by_segment_query = """
SELECT 
    c.job_title as Segment,
    CONCAT(YEAR(o.order_date), ' Q', QUARTER(o.order_date)) as Year_Quarter,
    COUNT(DISTINCT o.id) as Total_Orders,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Sales,
    AVG(od.discount) as Avg_Discount,
    SUM(od.quantity) as Units_Sold
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_details od ON o.id = od.order_id
GROUP BY 
    c.job_title, 
    CONCAT(YEAR(o.order_date), ' Q', QUARTER(o.order_date))
"""

try:
    df_segment_analysis = db_connection.get_query_result(sales_by_segment_query)
    print("Sales Analysis by Customer Segment and Quarter:")
    print(df_segment_analysis)
except Exception as e:
    logger.error(f"Error in segment analysis query: {str(e)}")


INFO:__main__:Successfully connected to MySQL database.
INFO:__main__:Query executed successfully: 
SELECT 
    c.job_title as Segment,
    CONCAT(YEAR(o.order_date), ' Q', QUARTER(o.order_date)) as Year_Quarter,
    COUNT(DISTINCT o.id) as Total_Orders,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Sales,
    AVG(od.discount) as Avg_Discount,
    SUM(od.quantity) as Units_Sold
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_details od ON o.id = od.order_id
GROUP BY 
    c.job_title, 
    CONCAT(YEAR(o.order_date), ' Q', QUARTER(o.order_date))



Sales Analysis by Customer Segment and Quarter:
                     Segment Year_Quarter  Total_Orders  Total_Sales  \
0       Accounting Assistant      2006 Q2             2      3625.25   
1                      Owner      2006 Q1             2     15474.75   
2                      Owner      2006 Q2             1       736.00   
3         Purchasing Manager      2006 Q1            10     19731.00   
4         Purchasing Manager      2006 Q2            17     21337.00   
5  Purchasing Representative      2006 Q1             3      3481.00   
6  Purchasing Representative      2006 Q2             5      3752.00   

   Avg_Discount  Units_Sold  
0           0.0       175.0  
1           0.0       375.0  
2           0.0        40.0  
3           0.0       842.0  
4           0.0       997.0  
5           0.0       330.0  
6           0.0       183.0  


## Product Category Performance Analysis

In [25]:

product_performance_query = """
SELECT 
    p.category as Category,
    COUNT(DISTINCT od.order_id) as Number_of_Orders,
    SUM(od.quantity) as Total_Units_Sold,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Revenue,
    AVG(od.discount) as Avg_Discount_Rate,
    SUM(od.quantity * od.unit_price)/SUM(od.quantity) as Avg_Unit_Price
FROM products p
JOIN order_details od ON p.id = od.product_id
GROUP BY p.category
ORDER BY Total_Revenue DESC;
"""

try:
    df_product_performance = db_connection.get_query_result(product_performance_query)
    print("\nProduct Category Performance Analysis:")
    print(df_product_performance)
except Exception as e:
    logger.error(f"Error in product performance query: {str(e)}")



INFO:__main__:Successfully connected to MySQL database.
INFO:__main__:Query executed successfully: 
SELECT 
    p.category as Category,
    COUNT(DISTINCT od.order_id) as Number_of_Orders,
    SUM(od.quantity) as Total_Units_Sold,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Revenue,
    AVG(od.discount) as Avg_Discount_Rate,
    SUM(od.quantity * od.unit_price)/SUM(od.quantity) as Avg_Unit_Price
FROM products p
JOIN order_details od ON p.id = od.product_id
GROUP BY p.category
ORDER BY Total_Revenue DESC;




Product Category Performance Analysis:
                     Category  Number_of_Orders  Total_Units_Sold  \
0                   Beverages                11            1452.0   
1             Jams, Preserves                 3             140.0   
2          Dried Fruit & Nuts                 6             175.0   
3              Dairy products                 2              90.0   
4                       Soups                 4             290.0   
5                      Sauces                 4              65.0   
6                       Candy                 5             200.0   
7                       Pasta                 3             110.0   
8                 Canned Meat                 3             120.0   
9   Canned Fruit & Vegetables                 1              40.0   
10                 Condiments                 3              90.0   
11        Baked Goods & Mixes                 5             105.0   
12                        Oil                 1              25

## Geographic Sales Distribution

In [26]:

geographic_sales_query = """
SELECT 
    c.country_region as Country,
    c.city as City,
    COUNT(DISTINCT o.customer_id) as Unique_Customers,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Sales,
    AVG(od.quantity * od.unit_price * (1 - od.discount)) as Avg_Order_Value,
    SUM(od.quantity) as Total_Units_Sold
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_details od ON o.id = od.order_id
GROUP BY c.country_region, c.city
HAVING Total_Sales > 1000
ORDER BY Total_Sales DESC;
"""

try:
    df_geographic_sales = db_connection.get_query_result(geographic_sales_query)
    print("\nGeographic Sales Distribution:")
    print(df_geographic_sales)
except Exception as e:
    logger.error(f"Error in geographic sales query: {str(e)}")

#

INFO:__main__:Successfully connected to MySQL database.
INFO:__main__:Query executed successfully: 
SELECT 
    c.country_region as Country,
    c.city as City,
    COUNT(DISTINCT o.customer_id) as Unique_Customers,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Sales,
    AVG(od.quantity * od.unit_price * (1 - od.discount)) as Avg_Order_Value,
    SUM(od.quantity) as Total_Units_Sold
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_details od ON o.id = od.order_id
GROUP BY c.country_region, c.city
HAVING Total_Sales > 1000
ORDER BY Total_Sales DESC;




Geographic Sales Distribution:
   Country            City  Unique_Customers  Total_Sales  Avg_Order_Value  \
0      USA         Memphis                 1     15432.50      3858.125000   
1      USA           Boise                 1     13800.00     13800.000000   
2      USA       Milwaukee                 1      8007.50      1334.583333   
3      USA        New York                 1      4949.00       707.000000   
4      USA        Portland                 1      4683.00       780.500000   
5      USA           Miami                 2      4644.75       663.535714   
6      USA  Salt Lake City                 1      3786.50      1262.166667   
7      USA          Denver                 1      2905.50       968.500000   
8      USA       Las Vegas                 2      2695.00       673.750000   
9      USA     Los Angelas                 1      2550.00       510.000000   
10     USA         Seattle                 1      2410.75       602.687500   
11     USA         Chicago      

## Shipping Performance

In [27]:
shipping_performance_query = """
SELECT 
    s.company as Ship_Mode,
    COUNT(DISTINCT o.id) as Total_Shipments,
    AVG(DATEDIFF(o.shipped_date, o.order_date)) as Avg_Days_to_Ship,
    SUM(o.shipping_fee) as Total_Shipping_Cost,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Order_Value,
    (SUM(o.shipping_fee) / SUM(od.quantity * od.unit_price * (1 - od.discount))) * 100 as Shipping_Cost_Percentage
FROM orders o
JOIN shippers s ON o.shipper_id = s.id
JOIN order_details od ON o.id = od.order_id
WHERE o.shipped_date IS NOT NULL
GROUP BY s.company
ORDER BY Total_Shipments DESC;
"""

try:
    df_shipping_performance = db_connection.get_query_result(shipping_performance_query)
    print("\nShipping Performance Analysis:")
    print(df_shipping_performance)
except Exception as e:
    logger.error(f"Error in shipping performance query: {str(e)}")



INFO:__main__:Successfully connected to MySQL database.
INFO:__main__:Query executed successfully: 
SELECT 
    s.company as Ship_Mode,
    COUNT(DISTINCT o.id) as Total_Shipments,
    AVG(DATEDIFF(o.shipped_date, o.order_date)) as Avg_Days_to_Ship,
    SUM(o.shipping_fee) as Total_Shipping_Cost,
    SUM(od.quantity * od.unit_price * (1 - od.discount)) as Total_Order_Value,
    (SUM(o.shipping_fee) / SUM(od.quantity * od.unit_price * (1 - od.discount))) * 100 as Shipping_Cost_Percentage
FROM orders o
JOIN shippers s ON o.shipper_id = s.id
JOIN order_details od ON o.id = od.order_id
WHERE o.shipped_date IS NOT NULL
GROUP BY s.company
ORDER BY Total_Shipments DESC;




Shipping Performance Analysis:
            Ship_Mode  Total_Shipments  Avg_Days_to_Ship  Total_Shipping_Cost  \
0  Shipping Company B               14            1.1667               1618.0   
1  Shipping Company C               11            0.3333                479.0   
2  Shipping Company A                7            3.6923                335.0   

   Total_Order_Value  Shipping_Cost_Percentage  
0           16078.50                 10.063128  
1           24802.25                  1.931276  
2            9593.50                  3.491948  


In [None]:
# Example of executing a SQL query
my_shop_employees = "SELECT * FROM myshop.employees;"
try:
    df_query_result = db_connection.get_query_result(my_shop_employees)
    print("Query Result:")
    print(df_query_result)
except Exception as e:
    logger.error(f"Error in query execution: {str(e)}")

INFO:__main__:Successfully connected to MySQL database.
INFO:__main__:Query executed successfully: SELECT * FROM myshop.employees;


Query Result:
   id         company       last_name first_name              email_address  \
0   1  myshop Traders       Freehafer      Nancy    nancy@myshoptraders.com   
1   2  myshop Traders         Cencini     Andrew   andrew@myshoptraders.com   
2   3  myshop Traders           Kotas        Jan      jan@myshoptraders.com   
3   4  myshop Traders       Sergienko     Mariya   mariya@myshoptraders.com   
4   5  myshop Traders          Thorpe     Steven   steven@myshoptraders.com   
5   6  myshop Traders         Neipper    Michael  michael@myshoptraders.com   
6   7  myshop Traders            Zare     Robert   robert@myshoptraders.com   
7   8  myshop Traders        Giussani      Laura    laura@myshoptraders.com   
8   9  myshop Traders  Hellung-Larsen       Anne     anne@myshoptraders.com   

               job_title business_phone     home_phone mobile_phone  \
0   Sales Representative  (123)555-0100  (123)555-0102         None   
1  Vice President, Sales  (123)555-0100  (123)555-010