<a href="https://colab.research.google.com/github/Jowayria-27/Artifical-Intelligence-Project/blob/main/Microsoft%20Stocks%20Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Artificial Intelligence Project**


## Continous Time Markov Chain

In this project, we are aiming to resolve a known business case issue. In most customer support systems, it is hard to allocate the proper time, resources, and employees to several different problems. Hence, classifying customer support tickets as 'open', 'pending', and 'resolved' based on several of the tickets' attributes would lead to better time management, better resource allocation, and higher customer satisfaction

**First Step**


We will let the code read the CSV and explore the dataset

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
import statistics as s

%matplotlib inline
sns.set()
try:
    df = pd.read_csv('Microsoft_Stock.csv')
except:
    df = pd.read_csv('https://raw.githubusercontent.com/Jowayria-27/Artifical-Intelligence-Project/refs/heads/main/Microsoft_Stock.csv')

df.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,4/1/2015 16:00:00,40.6,40.76,40.31,40.72,36865322
1,4/2/2015 16:00:00,40.66,40.74,40.12,40.29,37487476
2,4/6/2015 16:00:00,40.34,41.78,40.18,41.55,39223692
3,4/7/2015 16:00:00,41.61,41.91,41.31,41.53,28809375
4,4/8/2015 16:00:00,41.48,41.69,41.04,41.42,24753438
5,4/9/2015 16:00:00,41.25,41.62,41.25,41.48,25723861
6,4/10/2015 16:00:00,41.63,41.95,41.41,41.72,28022002
7,4/13/2015 16:00:00,41.4,42.06,41.39,41.76,30276692
8,4/14/2015 16:00:00,41.8,42.03,41.39,41.65,24244382
9,4/15/2015 16:00:00,41.76,42.46,41.68,42.26,27343581


Next, we will start with exploring the data for any missing values. Based on the data shown below, there are 8649 rows and 17 columns. Resolution, First response time, Time to Resolution, and customer satisfaction rating are having missing values.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1511 entries, 0 to 1510
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1511 non-null   object 
 1   Open    1511 non-null   float64
 2   High    1511 non-null   float64
 3   Low     1511 non-null   float64
 4   Close   1511 non-null   float64
 5   Volume  1511 non-null   int64  
dtypes: float64(4), int64(1), object(1)
memory usage: 71.0+ KB


Next, we are going to find the frequency, mode, mean, median, and other statistics of the data using describe

In [4]:
df.describe(include="all")

Unnamed: 0,Date,Open,High,Low,Close,Volume
count,1511,1511.0,1511.0,1511.0,1511.0,1511.0
unique,1511,,,,,
top,4/1/2015 16:00:00,,,,,
freq,1,,,,,
mean,,107.385976,108.437472,106.294533,107.422091,30198630.0
std,,56.691333,57.382276,55.977155,56.702299,14252660.0
min,,40.34,40.74,39.72,40.29,101612.0
25%,,57.86,58.06,57.42,57.855,21362130.0
50%,,93.99,95.1,92.92,93.86,26629620.0
75%,,139.44,140.325,137.825,138.965,34319620.0


Based on the data above, we can conclude the following:
1. The data is normaly distributed as the mean and median are the same
2. The average age is 44 years old
3. Most requests are pending customer response
4. Most requests are refund requests
5. Most priorities are medium
6. Most customers satisfaction rate is medium
7. The most used channel is emails
8. There is no variable to calculate the time it took to respond until the time it got resolved.

Based on those results, we need to the count of the priorities, channels, and ticket status

In [None]:
df['Ticket Channel'].value_counts()

Unnamed: 0_level_0,count
Ticket Channel,Unnamed: 1_level_1
Email,2143
Phone,2132
Social media,2121
Chat,2073


Based on the count of Ticket Channels, they are almost equally the same count.

In [None]:
df['Ticket Priority'].value_counts()

Unnamed: 0_level_0,count
Ticket Priority,Unnamed: 1_level_1
Medium,2192
Critical,2129
High,2085
Low,2063


roewijfwe

In [None]:
df['Ticket Status'].value_counts()

Unnamed: 0_level_0,count
Ticket Status,Unnamed: 1_level_1
Pending Customer Response,2881
Open,2819
Closed,2769


In [None]:
df['Ticket Type'].value_counts()

Unnamed: 0_level_0,count
Ticket Type,Unnamed: 1_level_1
Refund request,1752
Technical issue,1747
Cancellation request,1695
Product inquiry,1641
Billing inquiry,1634


Now, we are going to apply the Markov model based on the following variable:
- Ticket Status

The dependent variables are:
- Ticket Priority
- Ticket Channel
- Ticket Type

In [None]:
df_encode=df[['Ticket ID','Ticket Status','Ticket Priority','Ticket Channel','Ticket Type']]
df_encode.head(5)


Unnamed: 0,Ticket ID,Ticket Status,Ticket Priority,Ticket Channel,Ticket Type
0,1,Pending Customer Response,Critical,Social media,Technical issue
1,2,Pending Customer Response,Critical,Chat,Technical issue
2,3,Closed,Low,Social media,Technical issue
3,4,Closed,Low,Social media,Billing inquiry
4,5,Closed,Low,Email,Billing inquiry


In [13]:
import pandas as pd


# Define the states based on closing price changes
df['State'] = np.where(df['Close'] > df['Close'].shift(1), 'Up',
                 np.where(df['Close'] < df['Close'].shift(1), 'Down', 'Stable'))

# Calculate the transition probabilities
states = df['State'].unique()
transition_matrix = pd.DataFrame(index=states, columns=states).fillna(0)

for (i, row) in df.iterrows():
    if i == 0:
        continue
    prev_state = df['State'].iloc[i - 1]
    curr_state = row['State']
    transition_matrix.loc[prev_state, curr_state] += 1

# Convert counts to probabilities
transition_matrix = transition_matrix.div(transition_matrix.sum(axis=1), axis=0)

print("Transition Matrix:")
print(transition_matrix)


# print("number of states = ", len(markov_model.keys()))
# print("Refund request:", markov_model.get('Refund request', 'Not Found')) # Use .get() to avoid KeyError if key doesn't exist
# print("Technical issue:", markov_model.get('Technical issue', 'Not Found')) # Use .get() to avoid KeyError if key doesn't exist
# print("Cancellation request:", markov_model.get('Cancellation request', 'Not Found'))
# print("Product inquiry:", markov_model.get('Product inquiry', 'Not Found'))
# print("Billing inquiry:", markov_model.get('Billing inquiry', 'Not Found'))

  transition_matrix = pd.DataFrame(index=states, columns=states).fillna(0)


Transition Matrix:
          Stable      Down        Up
Stable  0.000000  0.437500  0.562500
Down    0.010401  0.396731  0.592868
Up      0.009744  0.485993  0.504263


In [14]:
df['StateOC'] = np.where(df['Close'].shift(1) < df['Open'], 'Up',
                 np.where(df['Close'].shift(1) > df['Open'], 'Down', 'Stable'))

# Calculate the transition probabilities
states = df['StateOC'].unique()
transition_matrix = pd.DataFrame(index=states, columns=states).fillna(0)

for (i, row) in df.iterrows():
    if i == 0:
        continue
    prev_state = df['StateOC'].iloc[i - 1]
    curr_state = row['StateOC']
    transition_matrix.loc[prev_state, curr_state] += 1

# Convert counts to probabilities
transition_matrix = transition_matrix.div(transition_matrix.sum(axis=1), axis=0)

print("Transition Matrix:")
print(transition_matrix)

  transition_matrix = pd.DataFrame(index=states, columns=states).fillna(0)


Transition Matrix:
          Stable      Down       Up
Stable  0.000000  0.500000  0.50000
Down    0.013514  0.456456  0.53003
Up      0.012136  0.427184  0.56068


In [15]:
df.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Volume,StateOC,State
0,4/1/2015 16:00:00,40.6,40.76,40.31,40.72,36865322,Stable,Stable
1,4/2/2015 16:00:00,40.66,40.74,40.12,40.29,37487476,Down,Down
2,4/6/2015 16:00:00,40.34,41.78,40.18,41.55,39223692,Up,Up
3,4/7/2015 16:00:00,41.61,41.91,41.31,41.53,28809375,Up,Down
4,4/8/2015 16:00:00,41.48,41.69,41.04,41.42,24753438,Down,Down
5,4/9/2015 16:00:00,41.25,41.62,41.25,41.48,25723861,Down,Up
6,4/10/2015 16:00:00,41.63,41.95,41.41,41.72,28022002,Up,Up
7,4/13/2015 16:00:00,41.4,42.06,41.39,41.76,30276692,Down,Up
8,4/14/2015 16:00:00,41.8,42.03,41.39,41.65,24244382,Up,Down
9,4/15/2015 16:00:00,41.76,42.46,41.68,42.26,27343581,Up,Up
