# Ticket classification

## Starting

Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np


Call your data

In [2]:
#data = pd.read_csv('generated_tickets_csv.csv')
data = pd.read_csv('tickets.csv')

Check your data

What do you think we really need to use in our model?

In [3]:
data

Unnamed: 0,Timestamp,Title of your ticket,Write your ticket,Your ticket is about:
0,10/10/2023 11:36,Employee Data Access Request,An employee has requested access to their pers...,Privacy issue
1,10/10/2023 11:37,Request for New Monitors,We need to purchase 10 new high-resolution mon...,Purchase requisition
2,10/10/2023 12:51,Email Client Crashing,Our email client is frequently crashing on sta...,Software issue
3,10/10/2023 12:51,Slow File-Sharing Server,The company's file-sharing server is experienc...,Software issue
4,10/10/2023 12:52,Software License Procurement,We need to purchase a new software license for...,Purchase requisition
...,...,...,...,...
76,,Customer Data Deletion Request,,Privacy issue
77,,Privacy Policy Update,Our privacy policy needs to be updated to refl...,Privacy issue
78,,Vendor Data Handling Assessment,Assess how our vendors handle our data and ens...,Privacy issue
79,,Privacy Impact Assessment,Conduct a Privacy Impact Assessment (PIA) for ...,Privacy issue


In [4]:
# Change the name of the columns to: Timestam, Title, Description and Class
data

Unnamed: 0,Timestamp,Title,Description,Class
0,10/10/2023 11:36,Employee Data Access Request,An employee has requested access to their pers...,Privacy issue
1,10/10/2023 11:37,Request for New Monitors,We need to purchase 10 new high-resolution mon...,Purchase requisition
2,10/10/2023 12:51,Email Client Crashing,Our email client is frequently crashing on sta...,Software issue
3,10/10/2023 12:51,Slow File-Sharing Server,The company's file-sharing server is experienc...,Software issue
4,10/10/2023 12:52,Software License Procurement,We need to purchase a new software license for...,Purchase requisition
...,...,...,...,...
76,,Customer Data Deletion Request,,Privacy issue
77,,Privacy Policy Update,Our privacy policy needs to be updated to refl...,Privacy issue
78,,Vendor Data Handling Assessment,Assess how our vendors handle our data and ens...,Privacy issue
79,,Privacy Impact Assessment,Conduct a Privacy Impact Assessment (PIA) for ...,Privacy issue


**What we need to do?**

- Check that we have one missing title and one missing ticket content. What we should do?
- What we can do for the missing Timestamps? 
- Which information we need?
- Always keep in mind our goal.

## Data Processing

In [5]:
data.loc[[72]]

Unnamed: 0,Timestamp,Title,Description,Class
72,,,An employee has raised concerns about their pe...,Privacy issue


In [6]:
#Give a title for the ticket

In [7]:
data.loc[[72]]

Unnamed: 0,Timestamp,Title,Description,Class
72,,Concern about privacy,An employee has raised concerns about their pe...,Privacy issue


In [8]:
#What to do with the missing ticket?

In [9]:
#join the title with the ticket description

In [10]:
data = #concat the data
data

Unnamed: 0,Timestamp,Title,Description,Class,0
0,10/10/2023 11:36,Employee Data Access Request,An employee has requested access to their pers...,Privacy issue,Employee Data Access Request. An employee has ...
1,10/10/2023 11:37,Request for New Monitors,We need to purchase 10 new high-resolution mon...,Purchase requisition,Request for New Monitors. We need to purchase ...
2,10/10/2023 12:51,Email Client Crashing,Our email client is frequently crashing on sta...,Software issue,Email Client Crashing. Our email client is fre...
3,10/10/2023 12:51,Slow File-Sharing Server,The company's file-sharing server is experienc...,Software issue,Slow File-Sharing Server. The company's file-s...
4,10/10/2023 12:52,Software License Procurement,We need to purchase a new software license for...,Purchase requisition,Software License Procurement. We need to purch...
...,...,...,...,...,...
75,,GDPR Compliance Audit,We require a comprehensive audit to ensure com...,Privacy issue,GDPR Compliance Audit. We require a comprehens...
77,,Privacy Policy Update,Our privacy policy needs to be updated to refl...,Privacy issue,Privacy Policy Update. Our privacy policy need...
78,,Vendor Data Handling Assessment,Assess how our vendors handle our data and ens...,Privacy issue,Vendor Data Handling Assessment. Assess how ou...
79,,Privacy Impact Assessment,Conduct a Privacy Impact Assessment (PIA) for ...,Privacy issue,Privacy Impact Assessment. Conduct a Privacy I...


In [12]:
data.loc[[72]]

Unnamed: 0,Timestamp,Title,Description,Class,0
72,,Concern about privacy,An employee has raised concerns about their pe...,Privacy issue,Concern about privacy. An employee has raised ...


In [13]:
#Drop possible columns 
data

Unnamed: 0,Timestamp,Class,0
0,10/10/2023 11:36,Privacy issue,Employee Data Access Request. An employee has ...
1,10/10/2023 11:37,Purchase requisition,Request for New Monitors. We need to purchase ...
2,10/10/2023 12:51,Software issue,Email Client Crashing. Our email client is fre...
3,10/10/2023 12:51,Software issue,Slow File-Sharing Server. The company's file-s...
4,10/10/2023 12:52,Purchase requisition,Software License Procurement. We need to purch...
...,...,...,...
75,,Privacy issue,GDPR Compliance Audit. We require a comprehens...
77,,Privacy issue,Privacy Policy Update. Our privacy policy need...
78,,Privacy issue,Vendor Data Handling Assessment. Assess how ou...
79,,Privacy issue,Privacy Impact Assessment. Conduct a Privacy I...


In [15]:
#rename again the column to: Description

In [16]:
X = data['Description'] 
y = data['Class']

Split your data already to be sure that no data likeage will happens

In [17]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state = 4)

In [18]:
train = pd.concat([X_train, y_train], axis =1)
train.to_csv('train_tickets.csv')

In [19]:
test = pd.concat([X_test, y_test], axis =1)
test.to_csv('test_tickets.csv')

In [20]:
train

Unnamed: 0,Description,Class
56,Laptop Won't Boot. Data is missing from the da...,Software issue
24,New software for front-end developers. The fro...,Software issue
43,Office Software Licenses. Request to purchase ...,Purchase requisition
11,Update issues. In our most recent update many ...,Software issue
41,Access Permission Audit. Conduct a review of u...,Privacy issue
22,Policies for new department . Make policies fo...,Privacy issue
64,Request for Monitor Upgrade. The design team n...,Purchase requisition
18,Java update required. A Java update is require...,Software issue
48,Slow database query. Some standard database qu...,Software issue
12,Deprecated hardware. The entire logistical dep...,Purchase requisition


In [21]:
X_train[0]

'Employee Data Access Request. An employee has requested access to their personal data stored in our systems. Please provide them with the requested information in compliance with data '

Well... 

Now start the data processing using NLP techniques!

We need to get some tools to help us.

We will continue in the tickets_classification2 notebook.