# IT Support Dashboard - Supporting Notebook

This notebook contains supporting documentation to the full report. Here, we provide quality checks for the full dataset, confirm whether the English-only sample is representitive of the full population available, and conduct text analytics.

In [None]:
# Importing essential packages
import sys
from pathlib import Path
import pandas as pd

# Add parent directory (project root) to sys.path
project_root = Path(r"C:\Users\David\Desktop\Python_Files\IT-Support-Ticket-Analysis")
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

from config import Config

Config.ensure_directories()

# Reading the initial dataset as a DataFrame: df
df = pd.read_csv(
    Config.RAW_DATA_PATH,
    encoding="utf-8",
    engine="python",
    on_bad_lines="skip",
)

df  # Display the DataFrame to verify successful import

[Config] Verified project directory structure under C:\Users\David\Desktop\Python_Files\IT-Support-Ticket-Analysis


Unnamed: 0,subject,body,answer,type,queue,priority,language,version,tag_1,tag_2,tag_3,tag_4,tag_5,tag_6,tag_7,tag_8
0,Wesentlicher Sicherheitsvorfall,"Sehr geehrtes Support-Team,\n\nich möchte eine...",Vielen Dank für die Meldung des kritischen Sic...,Incident,Technical Support,high,de,51,Security,Outage,Disruption,Data Breach,,,,
1,Account Disruption,"Dear Customer Support Team,\n\nI am writing to...","Thank you for reaching out, <name>. We are awa...",Incident,Technical Support,high,en,51,Account,Disruption,Outage,IT,Tech Support,,,
2,Query About Smart Home System Integration Feat...,"Dear Customer Support Team,\n\nI hope this mes...",Thank you for your inquiry. Our products suppo...,Request,Returns and Exchanges,medium,en,51,Product,Feature,Tech Support,,,,,
3,Inquiry Regarding Invoice Details,"Dear Customer Support Team,\n\nI hope this mes...",We appreciate you reaching out with your billi...,Request,Billing and Payments,low,en,51,Billing,Payment,Account,Documentation,Feedback,,,
4,Question About Marketing Agency Software Compa...,"Dear Support Team,\n\nI hope this message reac...",Thank you for your inquiry. Our product suppor...,Problem,Sales and Pre-Sales,medium,en,51,Product,Feature,Feedback,Tech Support,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28582,Performance Problem with Data Analytics Tool,The data analytics tool experiences sluggish p...,We are addressing the performance issue with t...,Incident,Technical Support,high,en,400,Performance,IT,Tech Support,,,,,
28583,Datensperrung in der Kundschaftsbetreuung,"Es gab einen Datensperrungsunfall, bei dem ung...",Ich kann Ihnen bei dem Datensperrungsunfall he...,Incident,Product Support,high,de,400,Security,IT,Tech Support,Bug,,,,
28584,Problem mit der Videokonferenz-Software heute,Wichtigere Sitzungen wurden unterbrochen durch...,"Sehr geehrte/r [Name], leider wurde das Proble...",Incident,Human Resources,low,de,400,Bug,Performance,Network,IT,Tech Support,,,
28585,Update Request for SaaS Platform Integration F...,Requesting an update on the integration featur...,Received your request for updates on the integ...,Change,IT Support,high,en,400,Feature,IT,Tech Support,,,,,


### Quality checking the dataset

We wanted to see how clean and complete this dataset was. We were looking at the shape of the table, missing values, if dimensions were in appropriate datatypes, how much memory they used, and if any further information could be gathered from descriptive statistics.

In [10]:
# Information about the dataset
print("What is the shape of my table?")
print(df.shape)
print("\nAre there any missing values in each dimension?")
print(df.isna().sum().sort_values())
print("\nWhat is the datatype of each column?")
print(df.dtypes)
print("\nHow many bytes does each column use?")
print(df.memory_usage())
print("\nWhat are the initial stats of the dataframe?")
print(df.describe())

What is the shape of my table?
(28587, 16)

Are there any missing values in each dimension?
body            0
type            0
queue           0
priority        0
language        0
version         0
tag_1           0
answer          7
tag_2          13
tag_3         136
tag_4        3058
subject      3838
tag_5       14042
tag_6       22713
tag_7       26547
tag_8       28022
dtype: int64

What is the datatype of each column?
subject     object
body        object
answer      object
type        object
queue       object
priority    object
language    object
version      int64
tag_1       object
tag_2       object
tag_3       object
tag_4       object
tag_5       object
tag_6       object
tag_7       object
tag_8       object
dtype: object

How many bytes does each column use?
Index          132
subject     228696
body        228696
answer      228696
type        228696
queue       228696
priority    228696
language    228696
version     228696
tag_1       228696
tag_2       228696
tag_

Continuing on with the quality checks, we wanted to confirm how many unique values there were under each dimension. This would save time later on by avoiding manual processing of the dataset to identify these findings.

In [11]:
# Loop to extract all unique values from each column in df
for column in df.columns:
    unique_values = df[column].sort_values(ascending=True).unique()
    length = len(unique_values)
    print(f"There were {length} unique values in {column}: {unique_values}")
    print("")

There were 24750 unique values in subject: [' Assistance Request'
 ' Bitte um Ausführliche Informationen zur Datenaufbereitungsdienstleistung'
 ' Datenschutzverletzung in Krankenhaus-Systemen ' ...
 'Übersicht der digitalen Kampagnen'
 'Überwachung medizinischer Daten in Krankenhaus-Systemen' nan]

There were 28587 unique values in body: [' Assistance Requested' ' Assistance Required' ' Assistance needed' ...
 'wishes to enhance data analysis tools for better optimization of decision-making processes for financial strategies to achieve better results.'
 'Änderungen in den Datenanalyseberichten wurden bemerkt. Obwohl die Berichte neu ausgeführt wurden, bestehen weiterhin Probleme mit den Dateneingaben. Neueste Software-Updates könnten der Grund sein.'
 'Überarbeitung der Datenanalyse, um Investitionen zu verbessern']

There were 28581 unique values in answer: [' Nehmen wir diesen Fall ernst. Bitte bereiten Sie zusätzliche Details des Vorfalls vor und rufen Sie uns unter <tel_num> an. Wi