<a href="https://colab.research.google.com/github/Jowayria-27/Artifical-Intelligence-Project/blob/main/Customer_Ticket.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Artificial Intelligence Project**


## Continous Time Markov Chain

In this project, we are aiming to resolve a known business case issue. In most customer support systems, it is hard to allocate the proper time, resources, and employees to several different problems. Hence, classifying customer support tickets as 'open', 'pending', and 'resolved' based on several of the tickets' attributes would lead to better time management, better resource allocation, and higher customer satisfaction

**First Step**


We will let the code read the CSV and explore the dataset

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
import statistics as s

%matplotlib inline
sns.set()
try:
    df = pd.read_csv('customer_support_tickets.csv')
except:
    df = pd.read_csv('https://raw.githubusercontent.com/Jowayria-27/Artifical-Intelligence-Project/refs/heads/main/customer_support_tickets.csv')

df.head()

Unnamed: 0,Ticket ID,Customer Name,Customer Email,Customer Age,Customer Gender,Product Purchased,Date of Purchase,Ticket Type,Ticket Subject,Ticket Description,Ticket Status,Resolution,Ticket Priority,Ticket Channel,First Response Time,Time to Resolution,Customer Satisfaction Rating
0,1,Marisa Obrien,carrollallison@example.com,32,Other,GoPro Hero,2021-03-22,Technical issue,Product setup,I'm having an issue with the {product_purchase...,Pending Customer Response,,Critical,Social media,2023-06-01 12:15:36,,
1,2,Jessica Rios,clarkeashley@example.com,42,Female,LG Smart TV,2021-05-22,Technical issue,Peripheral compatibility,I'm having an issue with the {product_purchase...,Pending Customer Response,,Critical,Chat,2023-06-01 16:45:38,,
2,3,Christopher Robbins,gonzalestracy@example.com,48,Other,Dell XPS,2020-07-14,Technical issue,Network problem,I'm facing a problem with my {product_purchase...,Closed,Case maybe show recently my computer follow.,Low,Social media,2023-06-01 11:14:38,2023-06-01 18:05:38,3.0
3,4,Christina Dillon,bradleyolson@example.org,27,Female,Microsoft Office,2020-11-13,Billing inquiry,Account access,I'm having an issue with the {product_purchase...,Closed,Try capital clearly never color toward story.,Low,Social media,2023-06-01 07:29:40,2023-06-01 01:57:40,3.0
4,5,Alexander Carroll,bradleymark@example.com,67,Female,Autodesk AutoCAD,2020-02-04,Billing inquiry,Data loss,I'm having an issue with the {product_purchase...,Closed,West decision evidence bit.,Low,Email,2023-06-01 00:12:42,2023-06-01 19:53:42,1.0


Next, we will start with exploring the data for any missing values. Based on the data shown below, there are 8649 rows and 17 columns. Resolution, First response time, Time to Resolution, and customer satisfaction rating are having missing values.

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8469 entries, 0 to 8468
Data columns (total 17 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Ticket ID                     8469 non-null   int64  
 1   Customer Name                 8469 non-null   object 
 2   Customer Email                8469 non-null   object 
 3   Customer Age                  8469 non-null   int64  
 4   Customer Gender               8469 non-null   object 
 5   Product Purchased             8469 non-null   object 
 6   Date of Purchase              8469 non-null   object 
 7   Ticket Type                   8469 non-null   object 
 8   Ticket Subject                8469 non-null   object 
 9   Ticket Description            8469 non-null   object 
 10  Ticket Status                 8469 non-null   object 
 11  Resolution                    2769 non-null   object 
 12  Ticket Priority               8469 non-null   object 
 13  Tic

Next, we are going to find the frequency, mode, mean, median, and other statistics of the data using describe

In [3]:
df.describe(include="all")

Unnamed: 0,Ticket ID,Customer Name,Customer Email,Customer Age,Customer Gender,Product Purchased,Date of Purchase,Ticket Type,Ticket Subject,Ticket Description,Ticket Status,Resolution,Ticket Priority,Ticket Channel,First Response Time,Time to Resolution,Customer Satisfaction Rating
count,8469.0,8469,8469,8469.0,8469,8469,8469,8469,8469,8469,8469,2769,8469,8469,5650,2769,2769.0
unique,,8028,8320,,3,42,730,5,16,8077,3,2769,4,4,5470,2728,
top,,Michael Garcia,bsmith@example.com,,Male,Canon EOS,2020-10-21,Refund request,Refund request,I'm having an issue with the {product_purchase...,Pending Customer Response,Case maybe show recently my computer follow.,Medium,Email,2023-06-01 15:21:42,2023-06-01 17:14:42,
freq,,5,4,,2896,240,24,1752,576,25,2881,1,2192,2143,3,3,
mean,4235.0,,,44.026804,,,,,,,,,,,,,2.991333
std,2444.934048,,,15.296112,,,,,,,,,,,,,1.407016
min,1.0,,,18.0,,,,,,,,,,,,,1.0
25%,2118.0,,,31.0,,,,,,,,,,,,,2.0
50%,4235.0,,,44.0,,,,,,,,,,,,,3.0
75%,6352.0,,,57.0,,,,,,,,,,,,,4.0


Based on the data above, we can conclude the following:
1. The data is normaly distributed as the mean and median are the same
2. The average age is 44 years old
3. Most requests are pending customer response
4. Most requests are refund requests
5. Most priorities are medium
6. Most customers satisfaction rate is medium
7. The most used channel is emails
8. There is no variable to calculate the time it took to respond until the time it got resolved.

Based on those results, we need to the count of the priorities, channels, and ticket status

In [4]:
df['Ticket Channel'].value_counts()

Unnamed: 0_level_0,count
Ticket Channel,Unnamed: 1_level_1
Email,2143
Phone,2132
Social media,2121
Chat,2073


Based on the count of Ticket Channels, they are almost equally the same count.

In [5]:
df['Ticket Priority'].value_counts()

Unnamed: 0_level_0,count
Ticket Priority,Unnamed: 1_level_1
Medium,2192
Critical,2129
High,2085
Low,2063


roewijfwe

In [6]:
df['Ticket Status'].value_counts()

Unnamed: 0_level_0,count
Ticket Status,Unnamed: 1_level_1
Pending Customer Response,2881
Open,2819
Closed,2769


In [7]:
df['Ticket Type'].value_counts()

Unnamed: 0_level_0,count
Ticket Type,Unnamed: 1_level_1
Refund request,1752
Technical issue,1747
Cancellation request,1695
Product inquiry,1641
Billing inquiry,1634


In [8]:
df['Ticket Type'].value_counts()

Unnamed: 0_level_0,count
Ticket Type,Unnamed: 1_level_1
Refund request,1752
Technical issue,1747
Cancellation request,1695
Product inquiry,1641
Billing inquiry,1634
