# Data Validation Report for `validation_test_case`


The validation_test_case project is a test assignment focused on verifying the accuracy of an AI-based ticket classification system.
The dataset contains tickets, which are customer reviews or opinions about a company (in this case, Booking).
The system automatically classifies each ticket into five predefined topics and determines the sentiment for each topic (positive, negative, or neutral).

The goal of this task is to validate the correctness of the system’s outputs in two key dimensions:

Topic Volume Validation — Verify whether the number of tickets assigned to each topic matches the expected or actual values.

Sentiment Validation — Assess the accuracy of sentiment classification (positive, negative, neutral) within each topic.

In the first part of report, I import data from two JSON files, validate their data, and eliminate the possibility of empty lines. In the second, I process and compare the obtained data. In the third, I display graphs demonstrating the data and draw a conclusion on the work done.

In [22]:
import pandas as pd

clean_tickets = pd.json_normalize(pd.read_json('json_file_for_validation/booking_reviews_clean.json')['tickets'])

results_json = pd.read_json('json_file_for_validation/results_booking_reviews_clean.json')
results_raw = results_json["gemini-2.5-flash"]
all_blocks = [item for sublist in results_raw for item in sublist]
total_results_tickets_gemini = sum(len(sublist) for sublist in results_tickets["gemini-2.5-flash"])
results_rows = []
for block in all_blocks:
    for ex in block["examples"]:
        results_rows.append({
            "subtopic": block["subtopic"],
            "sentiment": block["sentiment"],
            "example_text": ex.strip().lower(),})

results_tickets_flat = pd.DataFrame(results_rows)

print(f'Size clean_tickets is {clean_tickets.size}')
print(clean_tickets.head(), end='\n\n')

print(f'Size results_tickets is {results_tickets_flat.size}')
print(results_tickets_flat.head(), end='\n\n')


Size clean_tickets is 2034
                                    original_message  \
0  Not to gush, but support has genuinely helped ...   
1  Over the weekend, the rewards points actually ...   
2  Not to gush, but support has genuinely helped ...   
3  While traveling abroad, the app froze twice on...   
4  For a family visit, the property was overbooke...   

                                        message_text sentiment__filter  
0  not to gush, but support has genuinely helped ...          Positive  
1  over the weekend, the rewards points actually ...          Positive  
2  not to gush, but support has genuinely helped ...          Positive  
3  while traveling abroad, the app froze twice on...          Negative  
4  for a family visit, the property was overbooke...          Negative  

Size results_tickets is 2013
                       subtopic sentiment  \
0  Unexpected Charges & Pricing  Negative   
1  Unexpected Charges & Pricing  Negative   
2  Unexpected Charges & Pricing  

Здесь я описываю дальнейшую логику, говорящую о том, что в результирующих данных большое количество дубликатов, которые полностью эквивалентны между собой.

Объяснение нижележащей функции.

In [15]:
from collections import Counter

df = clean_tickets.dropna(subset=['message_text'])

sentiment_map = {}

for text, group in df.groupby('message_text'):
    sentiments = group['sentiment__filter'].tolist()
    most_common = Counter(sentiments).most_common(1)[0][0]
    sentiment_map[text] = most_common

print(f"Total number of unique texts: {len(sentiment_map)}")

print("Example 10 keys:")
for k in list(sentiment_map.keys())[:10]:
    print(k, "=>", sentiment_map[k])

Total number of unique texts: 358
Example 10 keys:
ads in notifications, annoying => Negative
after comparing a few options, communication with the host was straightforward, and i appreciated the clarity. => Positive
after comparing a few options, communication with the host was straightforward, so i'd use it again. => Positive
after comparing a few options, customer support answered within minutes and fixed a date mistake, but there's still room for improvement. => Positive
after comparing a few options, customer support answered within minutes and fixed a date mistake, so i'd use it again. => Positive
after comparing a few options, customer support answered within minutes and fixed a date mistake, though i wish it were faster. => Positive
after comparing a few options, fees appeared at checkout that weren't shown before, and i appreciated the clarity. => Negative
after comparing a few options, fees appeared at checkout that weren't shown before, which made the whole process stress-fr

Здесь я опишу, что, развернув хэш-мапу, я вывожу несоответствие, и я хочу перебрать каждый элемент из примеров и сравнить его с сентиметс в оригинале. Хожу вывести, где не совпадает, в каком настроении и по каким фразом, и я считаю это валидацией настроения. Также я выведу графики, какие у модели проебы, процент проебов общий, и в каких местах эти проебы были.

In [None]:
def check_sentiment(row):
    raw_sent = sentiment_map.get(row['norm_text'])
    if raw_sent is None:
        return "not_found"
    return "match" if raw_sent == row['sentiment'] else "mismatch"

results_df['sentiment_check'] = results_df.apply(check_sentiment, axis=1)
