# Task
Develop an interactive sales dashboard for a telco product company in Nigeria, starting with loading and preprocessing sales data, calculating key sales metrics, creating various visualizations, and finally integrating them into an interactive dashboard. The task concludes with providing the complete Python code and instructions to run the dashboard, along with a summary of insights regarding telco product sales in Nigeria.

## Load and Inspect Sales Data

### Subtask:
Load the sales data from the provided source (e.g., CSV file) into a pandas DataFrame. Then, display the first few rows, column names, data types, and a summary of the dataset to understand its structure and content.


**Reasoning**:
First, I'll import the pandas library, then load the 'nigerian_telco_sales.csv' file into a DataFrame named `df`. Subsequently, I'll display the first 5 rows, show the column names and their data types, and generate descriptive statistics to understand the dataset's structure and content.



In [1]:
import pandas as pd

df = pd.read_csv('nigerian_telco_sales.csv')

print("First 5 rows of the DataFrame:")
print(df.head())

print("\nColumn names and their data types:")
df.info()

print("\nDescriptive statistics of the DataFrame:")
print(df.describe())

FileNotFoundError: [Errno 2] No such file or directory: 'nigerian_telco_sales.csv'

**Reasoning**:
The previous code failed due to a `FileNotFoundError` because 'nigerian_telco_sales.csv' was not found. To fix this, I will first create a dummy 'nigerian_telco_sales.csv' file with some representative data, and then proceed with loading the data into a DataFrame, displaying its head, information, and descriptive statistics, as originally intended.



In [2]:
import pandas as pd
import numpy as np

# Create dummy data for nigerian_telco_sales.csv
# This ensures the file exists for demonstration if not already provided.
data = {
    'Date': pd.to_datetime(['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-04']),
    'Product_Category': ['Data Plan', 'Voice Plan', 'Data Plan', 'SMS Bundle', 'Voice Plan', 'Data Plan', 'VAS'],
    'Product_Name': ['5GB Monthly', 'Unlimited Voice', '10GB Monthly', '200 SMS Pack', 'Basic Voice', '2GB Daily', 'Call Me Back Tune'],
    'Sales_Amount': [5000, 3000, 8000, 1000, 2500, 1500, 500],
    'Quantity': [100, 50, 70, 200, 80, 120, 300],
    'Region': ['Lagos', 'Abuja', 'Kano', 'Lagos', 'Abuja', 'Rivers', 'Enugu'],
    'Customer_Segment': ['Prepaid', 'Postpaid', 'Prepaid', 'Prepaid', 'Postpaid', 'Prepaid', 'Prepaid']
}
dummy_df = pd.DataFrame(data)
dummy_df.to_csv('nigerian_telco_sales.csv', index=False)

# Now, load the sales data from the CSV file
df = pd.read_csv('nigerian_telco_sales.csv')

print("First 5 rows of the DataFrame:")
print(df.head())

print("\nColumn names and their data types:")
df.info()

print("\nDescriptive statistics of the DataFrame:")
print(df.describe())

First 5 rows of the DataFrame:
         Date Product_Category     Product_Name  Sales_Amount  Quantity  \
0  2023-01-01        Data Plan      5GB Monthly          5000       100   
1  2023-01-01       Voice Plan  Unlimited Voice          3000        50   
2  2023-01-02        Data Plan     10GB Monthly          8000        70   
3  2023-01-02       SMS Bundle     200 SMS Pack          1000       200   
4  2023-01-03       Voice Plan      Basic Voice          2500        80   

  Region Customer_Segment  
0  Lagos          Prepaid  
1  Abuja         Postpaid  
2   Kano          Prepaid  
3  Lagos          Prepaid  
4  Abuja         Postpaid  

Column names and their data types:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Date              7 non-null      object
 1   Product_Category  7 non-null      object
 2   Product_Name      7 non-n

## Preprocess Sales Data

### Subtask:
Clean the loaded sales data. This includes handling missing values, converting data types if necessary (e.g., sales figures to numeric, dates to datetime objects), and performing any initial aggregations or transformations required for analysis, such as calculating total sales per transaction.


**Reasoning**:
First, I'll check for missing values in the DataFrame to understand data completeness. Then, I'll convert the 'Date' column to a datetime format for time-series analysis and calculate 'Total_Revenue' by multiplying 'Sales_Amount' and 'Quantity'. Finally, I'll display the updated DataFrame's head and info to verify the transformations.



In [3]:
print("Missing values before handling:\n", df.isnull().sum())

# Convert 'Date' to datetime objects
df['Date'] = pd.to_datetime(df['Date'])

# Calculate 'Total_Revenue'
df['Total_Revenue'] = df['Sales_Amount'] * df['Quantity']

print("\nFirst 5 rows of the DataFrame after preprocessing:")
print(df.head())

print("\nColumn names and their data types after preprocessing:")
df.info()

Missing values before handling:
 Date                0
Product_Category    0
Product_Name        0
Sales_Amount        0
Quantity            0
Region              0
Customer_Segment    0
dtype: int64

First 5 rows of the DataFrame after preprocessing:
        Date Product_Category     Product_Name  Sales_Amount  Quantity Region  \
0 2023-01-01        Data Plan      5GB Monthly          5000       100  Lagos   
1 2023-01-01       Voice Plan  Unlimited Voice          3000        50  Abuja   
2 2023-01-02        Data Plan     10GB Monthly          8000        70   Kano   
3 2023-01-02       SMS Bundle     200 SMS Pack          1000       200  Lagos   
4 2023-01-03       Voice Plan      Basic Voice          2500        80  Abuja   

  Customer_Segment  Total_Revenue  
0          Prepaid         500000  
1         Postpaid         150000  
2          Prepaid         560000  
3          Prepaid         200000  
4         Postpaid         200000  

Column names and their data types after prep

## Calculate Key Sales Metrics

### Subtask:
Compute essential sales metrics relevant to a telco product company. This could include total sales, sales by product category, sales performance over time, and sales distribution across different regions or customer segments in Nigeria (if data is available).


**Reasoning**:
I will calculate and print the total sales, sales by product category, daily sales, sales by region, and sales by customer segment as specified in the instructions. This will provide all the essential sales metrics.



In [4]:
import pandas as pd

# 1. Calculate the total sales (total revenue) for the entire dataset
total_sales = df['Total_Revenue'].sum()
print(f"Total Sales (Total Revenue): {total_sales}\n")

# 2. Calculate total sales by 'Product_Category'
sales_by_category = df.groupby('Product_Category')['Total_Revenue'].sum().sort_values(ascending=False)
print("Sales by Product Category:\n", sales_by_category, "\n")

# 3. Calculate sales performance over time (daily sales)
sales_over_time = df.groupby('Date')['Total_Revenue'].sum().sort_index()
print("Sales Performance Over Time (Daily Sales):\n", sales_over_time, "\n")

# 4. Calculate sales distribution across different 'Region'
sales_by_region = df.groupby('Region')['Total_Revenue'].sum().sort_values(ascending=False)
print("Sales Distribution by Region:\n", sales_by_region, "\n")

# 5. Calculate sales distribution across different 'Customer_Segment'
sales_by_customer_segment = df.groupby('Customer_Segment')['Total_Revenue'].sum().sort_values(ascending=False)
print("Sales Distribution by Customer Segment:\n", sales_by_customer_segment, "\n")

Total Sales (Total Revenue): 1940000

Sales by Product Category:
 Product_Category
Data Plan     1240000
Voice Plan     350000
SMS Bundle     200000
VAS            150000
Name: Total_Revenue, dtype: int64 

Sales Performance Over Time (Daily Sales):
 Date
2023-01-01    650000
2023-01-02    760000
2023-01-03    380000
2023-01-04    150000
Name: Total_Revenue, dtype: int64 

Sales Distribution by Region:
 Region
Lagos     700000
Kano      560000
Abuja     350000
Rivers    180000
Enugu     150000
Name: Total_Revenue, dtype: int64 

Sales Distribution by Customer Segment:
 Customer_Segment
Prepaid     1590000
Postpaid     350000
Name: Total_Revenue, dtype: int64 



## Develop Dashboard Visualizations

### Subtask:
Create various interactive visualizations to represent the calculated sales metrics. This will include charts such as line graphs for sales trends over time, bar charts for comparing product category performance, and pie charts or treemaps for market share or sales distribution. Ensure all plots have appropriate legends and titles.


**Reasoning**:
First, I'll import the necessary Plotly modules, `plotly.express` as `px` and `plotly.graph_objects` as `go`, to prepare for creating the interactive visualizations as requested in the instructions.



In [5]:
import plotly.express as px
import plotly.graph_objects as go

print("Plotly modules imported successfully.")

Plotly modules imported successfully.


**Reasoning**:
I will create a line chart for 'Sales Performance Over Time (Daily Sales)', a bar chart for 'Sales by Product Category', another bar chart for 'Sales Distribution by Region', and a pie chart for 'Sales Distribution by Customer Segment' using Plotly, as specified in the instructions.



In [6]:
import plotly.graph_objects as go

# 1. Line chart for 'Sales Performance Over Time (Daily Sales)'
fig_time = px.line(sales_over_time.reset_index(), x='Date', y='Total_Revenue',
                   title='Daily Sales Performance Over Time')
fig_time.update_layout(xaxis_title='Date', yaxis_title='Total Revenue')
fig_time.show()

# 2. Bar chart for 'Sales by Product Category'
fig_category = px.bar(sales_by_category.reset_index(), x='Product_Category', y='Total_Revenue',
                      title='Total Revenue by Product Category',
                      labels={'Product_Category': 'Product Category', 'Total_Revenue': 'Total Revenue'})
fig_category.update_layout(xaxis={'categoryorder':'total descending'})
fig_category.show()

# 3. Bar chart for 'Sales Distribution by Region'
fig_region = px.bar(sales_by_region.reset_index(), x='Region', y='Total_Revenue',
                    title='Total Revenue by Region',
                    labels={'Region': 'Region', 'Total_Revenue': 'Total Revenue'})
fig_region.update_layout(xaxis={'categoryorder':'total descending'})
fig_region.show()

# 4. Pie chart for 'Sales Distribution by Customer Segment'
fig_segment = px.pie(sales_by_customer_segment.reset_index(), names='Customer_Segment', values='Total_Revenue',
                     title='Total Revenue by Customer Segment')
fig_segment.show()

## Build Interactive Sales Dashboard

### Subtask:
Construct an interactive dashboard using a Python library (e.g., Plotly Dash or Streamlit) to present the visualizations and key sales metrics in an organized and user-friendly interface. The dashboard will allow for dynamic exploration of the sales data.


**Reasoning**:
I will import the necessary Dash libraries (`Dash`, `html`, `dcc`), initialize the Dash application, define the dashboard layout with an `html.H1` title and `dcc.Graph` components for each of the previously created Plotly figures, and add the `app.run_server(debug=True)` block to run the application, as per the instructions.



In [None]:
from dash import Dash, html, dcc

# Initialize the Dash application
app = Dash(__name__)

# Define the layout of the dashboard
app.layout = html.Div(children=[
    html.H1(children='Nigerian Telco Sales Dashboard', style={'textAlign': 'center'}),

    html.Div(children=[
        dcc.Graph(
            id='daily-sales-performance',
            figure=fig_time
        ),
        dcc.Graph(
            id='sales-by-category',
            figure=fig_category
        )
    ], style={'display': 'flex', 'flexDirection': 'row'}),

    html.Div(children=[
        dcc.Graph(
            id='sales-by-region',
            figure=fig_region
        ),
        dcc.Graph(
            id='sales-by-customer-segment',
            figure=fig_segment
        )
    ], style={'display': 'flex', 'flexDirection': 'row'})
])

# Run the Dash application
if __name__ == '__main__':
    app.run_server(debug=True, mode='inline')


**Reasoning**:
The previous cell failed because the 'dash' library was not found, indicated by `ModuleNotFoundError: No module named 'dash'`. To resolve this, I need to install the 'dash' library using pip before importing it.



In [8]:
pip install dash

Collecting dash
  Downloading dash-4.0.0-py3-none-any.whl.metadata (11 kB)
Collecting retrying (from dash)
  Downloading retrying-1.4.2-py3-none-any.whl.metadata (5.5 kB)
Downloading dash-4.0.0-py3-none-any.whl (7.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading retrying-1.4.2-py3-none-any.whl (10 kB)
Installing collected packages: retrying, dash
Successfully installed dash-4.0.0 retrying-1.4.2


**Reasoning**:
Since the `dash` library has now been successfully installed, I will re-run the code that initializes the Dash application and defines its layout with the previously generated Plotly figures. This should now execute without the `ModuleNotFoundError`.



In [15]:
from dash import Dash, html, dcc

# Initialize the Dash application
app = Dash(__name__)

# Define the layout of the dashboard
app.layout = html.Div(children=[
    html.H1(children='Nigerian Telco Sales Dashboard', style={'textAlign': 'center'}),

    html.Div(children=[
        dcc.Graph(
            id='daily-sales-performance',
            figure=fig_time
        ),
        dcc.Graph(
            id='sales-by-category',
            figure=fig_category
        )
    ], style={'display': 'flex', 'flexDirection': 'row'}),

    html.Div(children=[
        dcc.Graph(
            id='sales-by-region',
            figure=fig_region
        ),
        dcc.Graph(
            id='sales-by-customer-segment',
            figure=fig_segment
        )
    ], style={'display': 'flex', 'flexDirection': 'row'})
])

# Run the Dash application
if __name__ == '__main__':
    app.run_server(debug=True, mode='inline')

ObsoleteAttributeException: app.run_server has been replaced by app.run

**Reasoning**:
The previous code failed with an `ObsoleteAttributeException` because `app.run_server` has been replaced by `app.run`. I will update the code to use `app.run()` instead of `app.run_server()`.



In [16]:
from dash import Dash, html, dcc

# Initialize the Dash application
app = Dash(__name__)

# Define the layout of the dashboard
app.layout = html.Div(children=[
    html.H1(children='Nigerian Telco Sales Dashboard', style={'textAlign': 'center'}),

    html.Div(children=[
        dcc.Graph(
            id='daily-sales-performance',
            figure=fig_time
        ),
        dcc.Graph(
            id='sales-by-category',
            figure=fig_category
        )
    ], style={'display': 'flex', 'flexDirection': 'row'}),

    html.Div(children=[
        dcc.Graph(
            id='sales-by-region',
            figure=fig_region
        ),
        dcc.Graph(
            id='sales-by-customer-segment',
            figure=fig_segment
        )
    ], style={'display': 'flex', 'flexDirection': 'row'})
])

# Run the Dash application
if __name__ == '__main__':
    app.run(debug=True, mode='inline')


Dash is running on http://127.0.0.1:8050/



INFO:dash.dash:Dash is running on http://127.0.0.1:8050/



 * Serving Flask app '__main__'
 * Debug mode: on


TypeError: run_simple() got an unexpected keyword argument 'mode'

**Reasoning**:
The previous code failed because `mode='inline'` is not a valid argument for `app.run()` in the current Dash version, leading to a `TypeError`. I will replace `mode='inline'` with `jupyter_mode='inline'` to correctly embed the Dash application in a Jupyter environment.



In [None]:
from dash import Dash, html, dcc

# Initialize the Dash application
app = Dash(__name__)

# Define the layout of the dashboard
app.layout = html.Div(children=[
    html.H1(children='Nigerian Telco Sales Dashboard', style={'textAlign': 'center'}),

    html.Div(children=[
        dcc.Graph(
            id='daily-sales-performance',
            figure=fig_time
        ),
        dcc.Graph(
            id='sales-by-category',
            figure=fig_category
        )
    ], style={'display': 'flex', 'flexDirection': 'row'}),

    html.Div(children=[
        dcc.Graph(
            id='sales-by-region',
            figure=fig_region
        ),
        dcc.Graph(
            id='sales-by-customer-segment',
            figure=fig_segment
        )
    ], style={'display': 'flex', 'flexDirection': 'row'})
])

# Run the Dash application
if __name__ == '__main__':
    app.run(debug=True, jupyter_mode='inline')

Dash is running on http://127.0.0.1:8050/



INFO:dash.dash:Dash is running on http://127.0.0.1:8050/



 * Serving Flask app '__main__'
 * Debug mode: on
