# Data Visualisation and Communication - CA2

## Online Retail Data Analysis

**Student Name:** Tiago De Oliveira Freitas  
**Student ID:** 2021406  
**Date:** November 2025

---

### Links

**GitHub Repository:** https://github.com/TiagoStudent/Y4-Data-Vis-CA2-60-.git  
**Video Presentation:** 

---

### Assignment Overview

This notebook presents a comprehensive analysis of an Online Retail dataset from a UK-based gift wholesaler. The analysis includes data quality assessment, cleaning, exploratory data analysis (EDA), static visualisations, and an interactive dashboard to help business stakeholders understand sales patterns, product performance, and regional trends.

The dataset contains transactional data including invoice numbers, product codes, descriptions, quantities, prices, timestamps, customer IDs, and countries. Our goal is to transform this raw data into actionable insights through effective visualisation and communication techniques.

1. Data Quality Assessment and Cleaning
1.1 Import Libraries and Load Data
We begin by importing the necessary libraries for data manipulation, analysis, and visualisation. The main libraries used are:

pandas: For data manipulation and analysis
numpy: For numerical operations
matplotlib and seaborn: For static visualisations
plotly: For interactive visualisations and dashboard
ipywidgets: For creating interactive dashboard controls

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display
import warnings
from datetime import datetime

# Configure display settings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Load the dataset
df_raw = pd.read_excel('OnlineRetail.xlsx')

# Display basic information
print("Dataset loaded successfully!")
print(f"\nDataset shape: {df_raw.shape}")
print(f"Number of rows: {df_raw.shape[0]:,}")
print(f"Number of columns: {df_raw.shape[1]}")

1.2 Initial Data Inspection
Before cleaning the data, we need to understand its structure, data types, and identify potential quality issues. This initial inspection helps us make informed decisions about the cleaning process.

In [None]:
# Display first few rows
print("First 10 rows of the dataset:")
df_raw.head(10)

In [None]:
# Display data types and non-null counts
print("Data types and missing values:")
df_raw.info()

In [None]:
# Display descriptive statistics
print("Descriptive statistics for numerical columns:")
df_raw.describe()