# AI-Based Incident Categorization and Resolution Recommendation System

## Problem Definition
Retail IT support teams receive thousands of incident tickets daily.
Manual categorization and resolution selection increases response time and errors.

## Objective
- Predict incident category using NLP and Machine Learning
- Recommend possible resolution based on similar past incidents

## Real-World Relevance
Faster ticket routing improves store uptime and reduces business losses in retail operations.


In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

from sklearn.metrics.pairwise import cosine_similarity


In [2]:
df = pd.read_excel("tickets.xlsx")
df.head()


Unnamed: 0,INCIDENT_ID,STATUS,AGE,LOB,OPEN_TIME_IM,OPEN_TIME_SD,TITLE,DESCRIPTION,ASSIGNMENT,UPDATED_BY_UID,...,INTERACTION_ID,SLA_STATUS,VENDOR,REFERENCE_NO,SLA_TARGET,VENDOR_SLA_STATUS,EXPIRATION_TIME,APPROVAL_STATUS,APPROVED_TIME,RESOLUTION_TYPE
0,IM15893129,CLOSED,500,IT INFRA,2024-03-02 11:59:45,2024-03-02 10:50:11,S571 MSL Material Movement,Dear Sir MSL Material Movement process for A...,RETAILIT-L2_MSL_BHO,RRL1E0112,...,SD44343356,Not Breached,.,,NaT,,2025-07-16 09:53:13,Field Resolution,,NO
1,IM15930355,RESOLVED,500,IT INFRA,2024-03-16 11:25:23,2024-03-13 17:51:06,QAE7 MATERIAL MOVEMENT,Moment Date 13 03 2024 Call ID IM15922811 ...,RETAILIT-ST_GJ,50055370,...,SD44774289,Not Breached,,,NaT,,2025-07-29 16:39:27,na,,NO
2,IM16189541,CLOSED,376,IT INFRA,2024-06-28 09:22:18,2024-06-27 19:01:06,Non Working Old asset return to RCBWH,Desktop HP 280G6 i3 8GB 500GB W O MONITER 5...,RETAILIT-RBL_DOC_CREATION,50140797,...,SD48793195,Not Breached,,,NaT,,2025-07-09 13:07:42,,,NO
3,IM16197238,CLOSED,512,IT INFRA,2024-07-01 16:18:12,2024-07-01 16:07:24,11 Retail Physical Inventory Activity 2549,Dear Team Please align FE to do FOM fo...,RETAILIT-ST_KL,37101608,...,SD48931848,Not Breached,,,NaT,,2025-11-26 12:34:36,Remote Resolution,KM1902,YES
4,IM16197309,CLOSED,512,IT INFRA,2024-07-01 16:35:47,2024-07-01 16:09:31,11 Retail Physical Inventory Activity 2551,Dear Team Please align FE to do FOM for s...,RETAILIT-ST_KL,37101608,...,SD48931937,Not Breached,,,NaT,,2025-11-26 12:35:49,Remote Resolution,KM1902,YES


In [3]:
print(df.shape)
print(df.columns)
df.isnull().sum()


(270910, 44)
Index(['INCIDENT_ID', 'STATUS', 'AGE', 'LOB', 'OPEN_TIME_IM', 'OPEN_TIME_SD',
       'TITLE', 'DESCRIPTION', 'ASSIGNMENT', 'UPDATED_BY_UID', 'ASSIGNEE',
       'CONTACT_NAME', 'PHONE_NUMBER', 'LOCATION', 'STORENAME', 'STOREFORMAT',
       'MAIN_FORMAT', 'STORECITY', 'STORESTATE', 'AREACODE', 'FORMAT',
       'CATEGORY', 'SUBCATEGORY', 'ISSUE_TYPE', 'SEVERITY', 'UPDATE_TIME',
       'RESOLVE_TIME', 'CLOSE_TIME', 'RESOLUTION_CODE', 'RESOLUTION',
       'TICKET_REOPEN_COUNT', 'OPENED_BY', 'SOURCE', 'WDMANAGERNAME',
       'INTERACTION_ID', 'SLA_STATUS', 'VENDOR', 'REFERENCE_NO', 'SLA_TARGET',
       'VENDOR_SLA_STATUS', 'EXPIRATION_TIME', 'APPROVAL_STATUS',
       'APPROVED_TIME', 'RESOLUTION_TYPE'],
      dtype='object')


INCIDENT_ID                 0
STATUS                      0
AGE                         0
LOB                         0
OPEN_TIME_IM            84529
OPEN_TIME_SD                0
TITLE                       2
DESCRIPTION                 3
ASSIGNMENT                  0
UPDATED_BY_UID              0
ASSIGNEE                  737
CONTACT_NAME                0
PHONE_NUMBER              132
LOCATION                    0
STORENAME               45434
STOREFORMAT                 0
MAIN_FORMAT                76
STORECITY                   0
STORESTATE                  0
AREACODE                   73
FORMAT                      0
CATEGORY                    0
SUBCATEGORY                 0
ISSUE_TYPE               1409
SEVERITY                    0
UPDATE_TIME                 0
RESOLVE_TIME                0
CLOSE_TIME              23890
RESOLUTION_CODE          1202
RESOLUTION               1202
TICKET_REOPEN_COUNT      1195
OPENED_BY                1195
SOURCE                   1200
WDMANAGERN

In [4]:
df["TITLE"] = df["TITLE"].astype(str)
df["DESCRIPTION"] = df["DESCRIPTION"].astype(str)
df["TEXT"] = df["TITLE"].fillna("") + " " + df["DESCRIPTION"].fillna("")
df["TEXT"] = df["TEXT"].replace("nan", "", regex=True)



In [5]:
df = df.dropna(subset=["CATEGORY"])
print(df.shape)



(270910, 45)


In [6]:
X = df["TEXT"]
y = df["CATEGORY"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [7]:
vectorizer = TfidfVectorizer(
    max_features=15000,
    stop_words="english",
    ngram_range=(1,2)
)

X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)


In [8]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train_vec, y_train)


In [9]:
pred = model.predict(X_test_vec)

print("Accuracy:", accuracy_score(y_test, pred))
print(classification_report(y_test, pred))


Accuracy: 0.7369052452844118


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


                                   precision    recall  f1-score   support

                     ACCESS POINT       0.00      0.00      0.00        84
                 ANYTRAC-HARDWARE       0.38      0.22      0.28       232
                   ASSET REVERSAL       0.87      0.71      0.79        77
                  BARCODE SCANNER       0.63      0.62      0.63       223
                BLUETOOTH PRINTER       0.00      0.00      0.00         2
                     CONNECTIVITY       0.67      0.74      0.70      3001
                   DC/CPC SUPPORT       0.86      0.49      0.62        51
             ELO TV / JIO SIGNAGE       0.00      0.00      0.00         1
              ESL HARDWARE ISSUES       0.00      0.00      0.00         5
                  EXTERNAL PORTAL       0.61      0.51      0.55      1004
                 EXTERNAL PORTALS       0.00      0.00      0.00        36
         FLOOR SALES AREA JIO BOX       0.00      0.00      0.00         1
     FLOOR SALES AREA OM

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [10]:
# Vectorize full dataset for similarity search
all_vec = vectorizer.transform(df["TEXT"])

def recommend_resolution(issue_text):
    vec = vectorizer.transform([issue_text])
    sim = cosine_similarity(vec, all_vec)
    idx = sim.argmax()
    return df.iloc[idx]["RESOLUTION"]

test_issue = "POS billing not working in store"
print("Test Issue:", test_issue)
print("Suggested Resolution:")
print(recommend_resolution(test_issue))


Test Issue: POS billing not working in store
Suggested Resolution:
KM2843 Issue POS BILLING NO WORKING Error pos slowness issue Resolution   As check C disk is full so cleared temp  prefetch and did disk cleanup and restarted pos and issue got resolved hence closing this ticket with user confirmation   User Confirmation  User Name   Raja A   Phone Number   8667792258 Transfer Call from Voice  NA Feedback        YES


In [13]:
all_vec = vectorizer.transform(df["TEXT"])

def recommend_resolution(issue_text):
    vec = vectorizer.transform([issue_text])
    sim = cosine_similarity(vec, all_vec)
    idx = sim.argmax()
    return df.iloc[idx]["RESOLUTION"]

test_issue = "Billing not working in POS system"
print("Suggested Resolution:")
print(recommend_resolution(test_issue))


Suggested Resolution:
RC  CDIT BILLING NOT WORKING AT  UROVO DEVICE RE ENROLLED AND CONNECTED TO STORE NETWORK  NOW ITS WORKING FINE UC  VIDHU VINOD  7306432875


## Ethical Considerations and Responsible AI

- Model depends on historical data quality.
- Biased routing may occur if training data is imbalanced.
- Recommendations should assist humans, not fully automate decisions.
- Sensitive personal data must be protected.


## Conclusion and Future Scope

The project demonstrates practical use of NLP in IT Service Management.
Future improvements can include:
- Deep learning embeddings
- LLM-based explanation of resolutions
- Auto-routing and automation integration


In [14]:
df[["TITLE","CATEGORY","SUBCATEGORY","ISSUE_TYPE"]].isnull().sum()


TITLE             0
CATEGORY          0
SUBCATEGORY       0
ISSUE_TYPE     1409
dtype: int64

In [15]:
df["TITLE_CLEAN"] = df["TITLE"].astype(str).str.lower()


In [16]:
df_cls = df.dropna(subset=["CATEGORY","SUBCATEGORY","ISSUE_TYPE"])
print(df_cls.shape)


(269501, 46)


In [18]:
X = df_cls["TITLE_CLEAN"]

vectorizer_title = TfidfVectorizer(
    max_features=8000,
    stop_words="english",
    ngram_range=(1,2)
)

X_vec = vectorizer_title.fit_transform(X)


In [19]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_cat_train, y_cat_test = train_test_split(
    X_vec, df_cls["CATEGORY"], test_size=0.2, random_state=42
)

_, _, y_sub_train, y_sub_test = train_test_split(
    X_vec, df_cls["SUBCATEGORY"], test_size=0.2, random_state=42
)

_, _, y_issue_train, y_issue_test = train_test_split(
    X_vec, df_cls["ISSUE_TYPE"], test_size=0.2, random_state=42
)


In [20]:
from sklearn.linear_model import LogisticRegression

model_cat = LogisticRegression(max_iter=1000)
model_cat.fit(X_train, y_cat_train)

model_sub = LogisticRegression(max_iter=1000)
model_sub.fit(X_train, y_sub_train)

model_issue = LogisticRegression(max_iter=1000)
model_issue.fit(X_train, y_issue_train)


In [21]:
pred_cat = model_cat.predict(X_test)
print("CATEGORY Accuracy:", accuracy_score(y_cat_test, pred_cat))

CATEGORY Accuracy: 0.6795606760542476


In [22]:
pred_sub = model_sub.predict(X_test)
print("SUBCATEGORY Accuracy:", accuracy_score(y_sub_test, pred_sub))


SUBCATEGORY Accuracy: 0.5814734420511679


In [23]:
pred_issue = model_issue.predict(X_test)
print("ISSUE TYPE Accuracy:", accuracy_score(y_issue_test, pred_issue))


ISSUE TYPE Accuracy: 0.43208845847015825


In [26]:
def predict_ticket_fields(title_text):
    t = vectorizer_title.transform([title_text.lower()])
    
    return {
        "Predicted Category": model_cat.predict(t)[0],
        "Predicted Subcategory": model_sub.predict(t)[0],
        "Predicted Issue Type": model_issue.predict(t)[0]
    }

test_title = "PC not booking"
print(predict_ticket_fields(test_title))


{'Predicted Category': 'PC / DESKTOP', 'Predicted Subcategory': 'PC CPU', 'Predicted Issue Type': 'CPU FAULTY / CPU ISSUE'}


In [25]:
demo = df_cls.sample(5)
vec_demo = vectorizer_title.transform(demo["TITLE_CLEAN"])

demo["PRED_CATEGORY"] = model_cat.predict(vec_demo)
demo["PRED_SUBCATEGORY"] = model_sub.predict(vec_demo)
demo["PRED_ISSUE"] = model_issue.predict(vec_demo)

demo[["TITLE","CATEGORY","PRED_CATEGORY","SUBCATEGORY","PRED_SUBCATEGORY","ISSUE_TYPE","PRED_ISSUE"]]


Unnamed: 0,TITLE,CATEGORY,PRED_CATEGORY,SUBCATEGORY,PRED_SUBCATEGORY,ISSUE_TYPE,PRED_ISSUE
150646,SAP not working P 19 installation,PC / DESKTOP,PC / DESKTOP,SAP INSTALLATION,SAP INSTALLATION,P19,P19
263861,SYSTEM NOT WORKING,PC / DESKTOP,PC / DESKTOP,MONITOR / DISPLAY,PC CPU,BLUR DISPLAY / BLANK DISPLAY,POWER CABLE ISSUE / LOOSE
45043,PC Showing recovery error,PC / DESKTOP,PC / DESKTOP,CPU,PC CPU,OPERATING SYSTEM/APPLICATION,OPERATING SYSTEM/APPLICATION
132492,Barcode not scaninig properly,WEIGHING SCALE-BIZEBRA,WEIGHING SCALE-ESSAE,HARDWARE ISSUE,LABEL NOT SCANNING ON POS,HEAD ISSUE,HEAD CLEAN/RE-INSERT MEDIA
174052,SF MIGRATION,STATE IT TASKS,STATE IT TASKS,STATE IT ASSIGNMENTS,STATE IT ASSIGNMENTS,STATE IT ASSIGNMENTS,STATE IT ASSIGNMENTS


In [27]:
df["OPEN_TIME_IM"] = pd.to_datetime(df["OPEN_TIME_IM"], errors="coerce")
df["RESOLVE_TIME"] = pd.to_datetime(df["RESOLVE_TIME"], errors="coerce")

df["RESOLUTION_HOURS"] = (df["RESOLVE_TIME"] - df["OPEN_TIME_IM"]).dt.total_seconds()/3600


In [28]:
df["SLA_BREACH"] = (df["RESOLUTION_HOURS"] > 24).astype(int)
df["SLA_BREACH"].value_counts()


SLA_BREACH
0    230923
1     39987
Name: count, dtype: int64

In [29]:
X = vectorizer_title.transform(df["TITLE_CLEAN"])
y = df["SLA_BREACH"].fillna(0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

sla_model = LogisticRegression(max_iter=1000)
sla_model.fit(X_train, y_train)

pred = sla_model.predict(X_test)
print("SLA Prediction Accuracy:", accuracy_score(y_test, pred))


SLA Prediction Accuracy: 0.8699568122254623


In [30]:
df["REOPEN_FLAG"] = (df["TICKET_REOPEN_COUNT"] > 0).astype(int)
df["REOPEN_FLAG"].value_counts()


REOPEN_FLAG
0    263778
1      7132
Name: count, dtype: int64

In [31]:
vec_res = TfidfVectorizer(max_features=8000, stop_words="english")
X = vec_res.fit_transform(df["RESOLUTION"].astype(str))
y = df["REOPEN_FLAG"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

reopen_model = LogisticRegression(max_iter=1000)
reopen_model.fit(X_train, y_train)

pred = reopen_model.predict(X_test)
print("Reopen Prediction Accuracy:", accuracy_score(y_test, pred))


Reopen Prediction Accuracy: 0.9736074711158688


In [32]:
store_stats = df.groupby("STORENAME").agg({
    "INCIDENT_ID":"count",
    "SLA_BREACH":"mean",
    "REOPEN_FLAG":"mean"
})

store_stats["RISK_SCORE"] = (
    store_stats["INCIDENT_ID"]*0.5 +
    store_stats["SLA_BREACH"]*100 +
    store_stats["REOPEN_FLAG"]*50
)

store_stats.sort_values("RISK_SCORE", ascending=False).head(10)


Unnamed: 0_level_0,INCIDENT_ID,SLA_BREACH,REOPEN_FLAG,RISK_SCORE
STORENAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HO,4300,0.405116,0.027442,2191.883721
FC F L,1028,0.256809,0.0,539.680934
SUMEET LOGISTICS,974,0.141684,0.011294,501.73306
SULTANPUR 200K,723,0.138313,0.0,375.331259
PUDUR,653,0.029096,0.004594,329.639357
RRL TRENDS HOSAKOTE 2 DC,567,0.098765,0.010582,293.905644
DC TRENDS,551,0.116152,0.005445,287.387477
KANDLAKOYA CS,555,0.014414,0.005405,279.211712
RRL HYD DEVARAYAMJAL TRENDS DC,465,0.088172,0.021505,242.392473
RRL TRENDS CHENNAI DC,445,0.074157,0.0,229.91573


In [33]:
daily = df.groupby(df["OPEN_TIME_IM"].dt.date).size()

from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(daily, order=(5,1,0))
model_fit = model.fit()

forecast = model_fit.forecast(7)
print("Next 7 days ticket forecast:")
print(forecast)


  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


Next 7 days ticket forecast:
351    926.332673
352    902.297010
353    951.543301
354    915.295488
355    925.041982
356    918.517185
357    918.668101
Name: predicted_mean, dtype: float64


  return get_prediction_index(
  return get_prediction_index(


In [34]:
from sklearn.cluster import KMeans

X = vectorizer.transform(df["TEXT"])

kmeans = KMeans(n_clusters=10, random_state=42)
clusters = kmeans.fit_predict(X)

df["CLUSTER"] = clusters
df.groupby("CLUSTER").size()


CLUSTER
0      8716
1    133181
2      7809
3     14059
4     26622
5     12855
6     10560
7     11626
8     33596
9     11886
dtype: int64

In [35]:
terms = vectorizer.get_feature_names_out()

for i in range(3):
    center = kmeans.cluster_centers_[i]
    top = center.argsort()[-10:]
    print("Cluster",i,[terms[j] for j in top])


Cluster 0 ['issue urovo', 'issue fe', 'urovo wifi', 'urovo', 'issue wifi', 'issue', 'wifi', 'wifi connectivity', 'connectivity issue', 'connectivity']
Cluster 1 ['unable', 'working', 'urovo', 'ip', 'device', 'error', 'print', 'printer', 'pos', 'issue']
Cluster 2 ['issue', 'error printer', 'illegal issue', 'pos printer', 'illegal pos', 'illegal printer', 'pos', 'printer', 'printer illegal', 'illegal']


In [36]:
def explain_ticket(title):
    pred = predict_ticket_fields(title)
    return f"""
Issue detected: {pred['Predicted Category']}
Likely cause: {pred['Predicted Issue Type']}
Recommended action: Follow standard resolution steps used earlier.
"""

print(explain_ticket("POS printer paper jam at store"))



Issue detected: POS PRINTER
Likely cause: PHYSICAL DAMAGE
Recommended action: Follow standard resolution steps used earlier.



In [37]:
df["OPEN_TIME_IM"] = pd.to_datetime(df["OPEN_TIME_IM"], errors="coerce")
df["RESOLVE_TIME"] = pd.to_datetime(df["RESOLVE_TIME"], errors="coerce")


In [38]:
df["RESOLUTION_HOURS"] = (
    df["RESOLVE_TIME"] - df["OPEN_TIME_IM"]
).dt.total_seconds() / 3600


In [39]:
df["SLA_BREACH"] = (df["RESOLUTION_HOURS"] > 24).astype(int)
df["SLA_BREACH"].value_counts()


SLA_BREACH
0    230923
1     39987
Name: count, dtype: int64

In [40]:
X = vectorizer_title.transform(df["TITLE_CLEAN"])
y = df["SLA_BREACH"].fillna(0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

sla_model = LogisticRegression(max_iter=1000)
sla_model.fit(X_train, y_train)

pred = sla_model.predict(X_test)
print("SLA Breach Prediction Accuracy:", accuracy_score(y_test, pred))


SLA Breach Prediction Accuracy: 0.8705105016426119


In [41]:
store_stats = df.groupby("STORENAME").agg({
    "INCIDENT_ID": "count",
    "SLA_BREACH": "mean",
    "TICKET_REOPEN_COUNT": "mean"
}).reset_index()


In [42]:
store_stats["RISK_SCORE"] = (
    store_stats["INCIDENT_ID"] * 0.5 +
    store_stats["SLA_BREACH"] * 100 +
    store_stats["TICKET_REOPEN_COUNT"] * 10
)


In [43]:
store_stats.sort_values("RISK_SCORE", ascending=False).head(10)


Unnamed: 0,STORENAME,INCIDENT_ID,SLA_BREACH,TICKET_REOPEN_COUNT,RISK_SCORE
2662,HO,4300,0.405116,0.037011,2190.88174
2115,FC F L,1028,0.256809,0.0,539.680934
11587,SUMEET LOGISTICS,974,0.141684,0.021561,501.383984
11581,SULTANPUR 200K,723,0.138313,0.0,375.331259
9085,PUDUR,653,0.029096,0.006154,329.471186
10129,RRL TRENDS HOSAKOTE 2 DC,567,0.098765,0.021164,293.588183
1741,DC TRENDS,551,0.116152,0.009091,287.206154
6820,KANDLAKOYA CS,555,0.014414,0.007246,279.013905
10102,RRL HYD DEVARAYAMJAL TRENDS DC,465,0.088172,0.041304,241.730248
10127,RRL TRENDS CHENNAI DC,445,0.074157,0.0,229.91573


In [44]:
X_cluster = vectorizer.transform(df["TEXT"])


In [45]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=8, random_state=42)
df["CLUSTER"] = kmeans.fit_predict(X_cluster)

df["CLUSTER"].value_counts()


CLUSTER
1    163865
4     26213
7     24585
5     14329
3     14190
6     10981
0      8907
2      7840
Name: count, dtype: int64

In [46]:
terms = vectorizer.get_feature_names_out()

for i in range(5):
    center = kmeans.cluster_centers_[i]
    top = center.argsort()[-8:]
    print("Cluster", i, "Top terms:", [terms[j] for j in top])


Cluster 0 Top terms: ['urovo wifi', 'urovo', 'issue wifi', 'issue', 'wifi', 'wifi connectivity', 'connectivity issue', 'connectivity']
Cluster 1 Top terms: ['error', 'device', 'print', 'fe', 'printer', 'pos', 'issue', 'working']
Cluster 2 Top terms: ['illegal issue', 'pos printer', 'illegal pos', 'illegal printer', 'pos', 'printer', 'printer illegal', 'illegal']
Cluster 3 Top terms: ['issue', 'store', 'offline key', 'billing', 'pos', 'offline', 'yes', 'network']
Cluster 4 Top terms: ['pm', 'pm activity', 'activity fe', 'scale', 'fe', 'scale fusion', 'fusion', 'activity']


In [47]:
daily = df.groupby(df["OPEN_TIME_IM"].dt.date).size()
daily.head()


OPEN_TIME_IM
2024-03-02    1
2024-03-16    1
2024-06-28    1
2024-07-01    3
2024-07-02    5
dtype: int64

In [48]:
from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(daily, order=(5,1,0))
model_fit = model.fit()

forecast = model_fit.forecast(7)
print("Next 7 Days Ticket Forecast:")
print(forecast)


Next 7 Days Ticket Forecast:
351    926.332673
352    902.297010
353    951.543301
354    915.295488
355    925.041982
356    918.517185
357    918.668101
Name: predicted_mean, dtype: float64


  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(


In [49]:
def it_support_chatbot():
    print("🤖 IT Support Bot: Hello! Describe your issue. Type 'exit' to stop.\n")

    while True:
        user_input = input("You: ")

        if user_input.lower() == "exit":
            print("🤖 IT Support Bot: Thank you. Have a good day!")
            break

        vec = vectorizer_title.transform([user_input.lower()])

        cat = model_cat.predict(vec)[0]
        sub = model_sub.predict(vec)[0]
        issue = model_issue.predict(vec)[0]

        resolution = recommend_resolution(user_input)

        print("\n🤖 IT Support Bot:")
        print("Predicted Category:", cat)
        print("Predicted Subcategory:", sub)
        print("Issue Type:", issue)
        print("Suggested Resolution:", resolution)
        print("-"*60)


In [51]:
it_support_chatbot()


🤖 IT Support Bot: Hello! Describe your issue. Type 'exit' to stop.



You:  help



🤖 IT Support Bot:
Predicted Category: POS HARDWARE
Predicted Subcategory: LASERJET PRINTER
Issue Type: PHYSICAL DAMAGE
Suggested Resolution: RC  SYSTEM Laptop AT  this is a duplicate ticket another call logged already IM17256885 UC  Anuj   9354229908
------------------------------------------------------------


You:  slowness



🤖 IT Support Bot:
Predicted Category: PC / DESKTOP
Predicted Subcategory: PC SLOWNESS
Issue Type: DISK CLEANUP
Suggested Resolution: RC  SYSTEM CONFIGURATION  AT  STATE IT ASSIGNMENTS COMPLETED  UC  Muthumaran K   8072552049
------------------------------------------------------------


KeyboardInterrupt: Interrupted by user

In [52]:
def predict_sla_risk(title):
    vec = vectorizer_title.transform([title.lower()])
    risk = sla_model.predict(vec)[0]
    return "High" if risk == 1 else "Low"


In [53]:
def get_store_risk(store):
    row = store_stats[store_stats["STORENAME"] == store]
    if len(row) == 0:
        return "Store not found"
    score = row["RISK_SCORE"].values[0]
    return round(score,2)


In [54]:
def it_ops_chatbot():
    print("🤖 IT Ops Engineer Bot Ready.")
    print("Describe issue. Optionally mention store name.\n")

    while True:
        issue = input("Issue: ")

        if issue.lower() == "exit":
            print("Session closed.")
            break

        store = input("Store (optional, press Enter to skip): ")

        vec = vectorizer_title.transform([issue.lower()])

        cat = model_cat.predict(vec)[0]
        sub = model_sub.predict(vec)[0]
        issue_type = model_issue.predict(vec)[0]

        resolution = recommend_resolution(issue)
        sla_risk = predict_sla_risk(issue)

        print("\n🤖 IT Ops Analysis")
        print("Category:", cat)
        print("Subcategory:", sub)
        print("Issue Type:", issue_type)
        print("SLA Breach Risk:", sla_risk)
        print("Suggested Fix:", resolution)

        if store != "":
            print("Store IT Risk Score:", get_store_risk(store))

        print("-"*60)


In [56]:
it_ops_chatbot()


🤖 IT Ops Engineer Bot Ready.
Describe issue. Optionally mention store name.



Issue:  slowness
Store (optional, press Enter to skip):  tnc7



🤖 IT Ops Analysis
Category: PC / DESKTOP
Subcategory: PC SLOWNESS
Issue Type: DISK CLEANUP
SLA Breach Risk: Low
Suggested Fix: RC  SYSTEM CONFIGURATION  AT  STATE IT ASSIGNMENTS COMPLETED  UC  Muthumaran K   8072552049
Store IT Risk Score: Store not found
------------------------------------------------------------


Issue:  pc
Store (optional, press Enter to skip):  tnc7



🤖 IT Ops Analysis
Category: PC / DESKTOP
Subcategory: PC CPU
Issue Type: CPU FAULTY / CPU ISSUE
SLA Breach Risk: Low
Suggested Fix: RC   PC CPU OPERATING SYSTEM APPLICATION ISSUE AT   checked with user complain logged for PC CPU OPERATING SYSTEM APPLICATION ISSUE so Bios setting done issue resolved hence closing this case with user confirmation  UC   Ashutosh Dwivedi  7080867332 
Store IT Risk Score: Store not found
------------------------------------------------------------


Issue:  enrollment
Store (optional, press Enter to skip):  tnc7



🤖 IT Ops Analysis
Category: STATE IT TASKS
Subcategory: CONFIGURATION
Issue Type: STATE IT ASSIGNMENTS
SLA Breach Risk: Low
Suggested Fix: RC  UROVO device configuration AT  UROVO device configuration and enrollment is done  Device working fine  Hence call is closed  UC  Diyanshu  9466456476
Store IT Risk Score: Store not found
------------------------------------------------------------


KeyboardInterrupt: Interrupted by user

In [57]:
pip install streamlit


Note: you may need to restart the kernel to use updated packages.


In [58]:
import streamlit as st
import pickle

st.title("🤖 Virtual IT Support Engineer")

issue = st.text_input("Describe your issue:")
store = st.text_input("Store Name (optional):")

if st.button("Analyze Issue"):

    vec = vectorizer_title.transform([issue.lower()])

    cat = model_cat.predict(vec)[0]
    sub = model_sub.predict(vec)[0]
    issue_type = model_issue.predict(vec)[0]
    resolution = recommend_resolution(issue)

    st.success("Analysis Complete")

    st.write("Category:", cat)
    st.write("Subcategory:", sub)
    st.write("Issue Type:", issue_type)
    st.write("Suggested Resolution:", resolution)


2026-01-09 15:40:32.045 
  command:

    streamlit run C:\Users\manu.chopra\AppData\Local\anaconda\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2026-01-09 15:40:32.064 Session state does not function when running a script without `streamlit run`
