# Pet Clinic Health Record Data Analysis System

This system implements the full-process processing of pet clinic health records based on Python, including data reading and cleaning, statistical analysis, risk identification, and report generation. It can handle dirty data such as invalid ages, incorrect dates, and missing fields, while providing an innovative function of health risk prompts and service recommendations.

## 1.Import Dependent Libraries
First, import the required Python libraries for date processing, data statistics, and file operations:

In [36]:
import re
from datetime import datetime
from collections import defaultdict
import os

## 2.Implementation of Core Functions



### 2.1 Data Reading, Cleaning and Storage (read_and_clean_pet_data)
Read and clean pet health record data.

Parameters:

file_path: Path of the input file (e.g., "pet_health_records.txt")

Returns:

valid_data: List of valid 

dataerror_records: Dictionary of error records

In [37]:
def read_and_clean_pet_data(file_path):
    valid_data = []  # Store cleaned valid data
    error_records = []  # Store error records, categorized by error type
    valid_visit_types = {"Routine Check-up", "Vaccination", "Disease Treatment"}  # Preset valid visit types
    
    # Open the file and read line by line
    with open(file_path, 'r') as file:
        line_num = 0  # Record line number  
        for line in file:
            line_num += 1
            if line_num == 1:
                continue   ## (The first line is the header, data starts from the second line)
                
            line = line.strip()  # Remove newline characters
            fields = line.split('|')  # Split fields by "|"
            
            # 1. Check the number of fields (normally 8 fields; mark as error if missing)
            if len(fields) != 8:
                error_records.append(f"Line {line_num} | Mismatched Field Count | Expected 8 fields, actually {len(fields)} fields")
                continue
            
            # Validate pet age: Must be a positive integer (handle negative numbers, Chinese characters, etc.)
            try:    
                if int(fields[3]) <= 0:
                    raise ValueError
            except ValueError:
                    error_records.append(f"Line {line_num} | Invalid Age | Age must be a positive integer, original value: {fields[3]}")
                    continue

            # Validate visit type: Must be in the preset list
            if fields[4] not in valid_visit_types:
                error_records.append(f"Line {line_num} | Invalid Visit Type | Visit type must be among {valid_visit_types}, original value: {fields[4]}")
                continue
            
            # All validations passed, construct a valid record dictionary
            valid_record = {
                "pet_id": fields[0],
                "pet_name": fields[1],
                "pet_type": fields[2],
                "pet_age": fields[3],
                "visit_type": fields[4],
                "check_items": fields[5],
                "visit_fee": fields[6],
                "visit_date": fields[7]
            }
            
            valid_data.append(valid_record)
            
    return valid_data, error_records

### 2.2 Data Statistical Calculation (calculate_pet_stats)

Calculate core indicators such as total number of visits, total revenue, and pet type proportion based on the cleaned valid data

Parameters:

valid_data: List of cleaned valid data (from the read_and_clean_pet_data function)

Returns:

stats_dict: Dictionary of statistical results, including total number of visits, total revenue, various proportions, etc.

In [38]:
def calculate_pet_stats(valid_data):
    # Variable initialization
    pet_type_count = defaultdict(int)
    visit_type_count = defaultdict(int)
    daily_revenue = defaultdict(float)
    high_fee_records = []
    total_revenue = 0

    # Statistics: Total number of visits
    total_visits = len(valid_data)
    
    for record in valid_data:
         # Statistics: Total revenue
        total_revenue += int(record["visit_fee"])

        # Statistics: Count of each pet type (quantity + frequency)
        pet_type_count[record["pet_type"]] += 1
        
        # Statistics: Count of each visit type (quantity + frequency)
        visit_type_count[record["visit_type"]] += 1

        # Statistics: Daily revenue
        daily_revenue[record["visit_date"]] += int(record["visit_fee"])

        # Statistics: High-cost records (fee ≥ 280 yuan)
        if int(record["visit_fee"]) >= 280:
            high_fee_records.append(record)
    
    # Sort visit types by count in descending order
    visit_type_count = sorted(visit_type_count.items(), key=lambda x: x[1])
    
    # Proportion of high-cost records
    high_fee_percent = len(high_fee_records) / len(valid_data)


    # Integrate all statistical results
    stats_dict = {
        "total_visits": total_visits,
        "total_revenue": total_revenue,
        "pet_type_count": dict(pet_type_count),
        "visit_type_count": dict(visit_type_count),
        "high_fee_records": high_fee_records,
        "high_fee_percent": high_fee_percent
    }
    return stats_dict

### 2.3 Innovative Function: Health Risk Identification (identify_health_risks)

Identify pet health risks based on "visit type + check items" and recommend customized services according to age

Parameters:

valid_data: List of cleaned valid data

Returns:

risk_result: Dictionary of risk and recommendation results, including health risk list and service recommendation statistics

In [39]:
def identify_health_risks(valid_data):
    health_risks = [] 
    service_rec = {
        "young_pet_need_package": 0,  # Young pets (≤ 1 year old): Need young pet health package
        "elderly_pet_need_screen": 0, # Elderly pets (cats ≥ 8 years old/dogs ≥ 7 years old): Need chronic disease screening
    }
    
    for record in valid_data:
        # 1. Health risk identification (based on check items and visit type)
        risk_level = "None"  # Risk level: None/Low/Medium/High
        risk_tip = ""      # Risk prompt and suggestion
        vt = record["visit_type"]
        ci = record["check_items"]
        
        # Rule 1: Disease Treatment + Digestive System Check → Medium risk (gastrointestinal sensitivity)
        if vt == "Disease Treatment" and "Digestive System" in ci:
            risk_level = "Medium"
            risk_tip = "Need to control regular diet, avoid feeding high-fiber/irritating food, and it is recommended to recheck once a month"
        # Rule 2: Routine Check-up + Skin Check + Skin Medicine Fee → Low risk (skin problems)
        elif vt == "Routine Check-up" and "Skin Check" in ci and "Skin" in ci:
            risk_level = "Low"
            risk_tip = "Comb hair regularly (2-3 times a week), keep skin dry, and avoid humid environment"
        # Rule 3: Disease Treatment + Urinary System Check → Medium-high risk (recurrent urinary problems)
        elif vt == "Disease Treatment" and "Urinary System" in ci:
            risk_level = "Medium-High"
            risk_tip = "Ensure adequate daily water intake (automatic water dispensers are recommended), and test urine pH value once every 2 weeks"
        
        # If there is a risk, add to the risk list
        if risk_level != "None":
            health_risks.append({
                "pet_id": record["pet_id"],
                "pet_name": record["pet_name"],
                "pet_type": record["pet_type"],
                "risk_level": risk_level,
                "risk_tip": risk_tip
            })
        
        # 2. Service recommendation statistics (based on age and visit records)
        # Young pets (≤ 1 year old): Recommend young pet health package (including physical examination + deworming)
        if int(record["pet_age"]) <= 1:
            service_rec["young_pet_need_package"] += 1
        # Elderly pets (cats ≥ 8 years old/dogs ≥ 7 years old): Recommend chronic disease screening package (including liver and kidney function tests)
        if (record["pet_type"] == "Cat" and int(record["pet_age"]) >= 8) or \
           (record["pet_type"] == "Dog" and int(record["pet_age"]) >= 7):
            service_rec["elderly_pet_need_screen"] += 1
    
    risk_result = {
        "health_risk_list": health_risks,
        "service_recommendation": service_rec
    }
    
    return risk_result

### 2.3 Report and Log Generation (write_pet_report)

Write statistical results, error records, and risk analysis into files to generate an "analysis report" and "error log"

Generate 2 output files: 
1. Pet Diagnosis and Treatment Analysis Report
2. Data Error Log

Parameters:

stats_dict: Dictionary of statistical results (from the calculate_pet_stats function)
error_records: Dictionary of error records (from the read_and_clean_pet_data function)
risk_result: Risk and recommendation results (from the identify_health_risks function)
output_dir: Output directory (current directory by default)

In [43]:
def write_pet_report(stats_dict, error_records, risk_result, output_dir="."):    
    # -------------------------- Generate Diagnosis and Treatment Analysis Report --------------------------
    report_path = os.path.join(output_dir, "pet_clinic_report.txt")
    report_content = []
    
    # Report title and basic information
    report_content.append("# Pet Clinic Health Record Analysis Report")
    report_content.append("="*50 + "\n")
    
    # I. Core Statistical Overview
    report_content.append("## I. Core Statistical Overview")
    if stats_dict:
        report_content.append(f"- Total Number of Visits: {stats_dict['total_visits']}")
        report_content.append(f"- Total Revenue: {stats_dict['total_revenue']} Yuan")
        
        # Pet Type Distribution
        report_content.append("\n### 1.1 Pet Type Distribution")
        for pet_type in stats_dict["pet_type_count"]:
            count = stats_dict["pet_type_count"][pet_type]
            report_content.append(f"  - {pet_type}: {count} Times")
        
        # Visit Type Analysis
        report_content.append("\n### 1.2 Visit Type Analysis")
        for vt in stats_dict["visit_type_count"]:
            count = stats_dict["visit_type_count"][vt]
            report_content.append(f"  - {vt}: {count} Times")
    else:
        report_content.append("No valid data for statistics")
    
    # II. High-Cost Records (Fee ≥ 280 Yuan)
    report_content.append("\n" + "## II. High-Cost Records (Fee ≥ 280 Yuan)")
    if stats_dict and stats_dict["high_fee_records"]:
        for pet in stats_dict["high_fee_records"]:
            report_content.append(f"Pet ID: {pet['pet_id']} | Name: {pet['pet_name']} | Visit Type: {pet['visit_type']} | Fee: {pet['visit_fee']} Yuan")
    else:
        report_content.append("  No high-cost records (fee ≥ 280 Yuan)")
    
    # III. Health Risk Prompt (Innovative Function Results)
    report_content.append("\n" + "## III. Health Risk Prompt (Innovative Function)")
    risk_list = risk_result["health_risk_list"]
    if risk_list:
        report_content.append(f"A total of {len(risk_list)} high-risk records identified:")
        for risk in risk_list[:10]:  # Display the first 10 records (to avoid overly long report)
            report_content.append(f"  - {risk['pet_id']} ({risk['pet_name']}/{risk['pet_type']}): Risk Level {risk['risk_level']}, Suggestion: {risk['risk_tip']}")
        if len(risk_list) > 10:
            report_content.append(f"  ... (Another {len(risk_list)-10} risk records not displayed)")
    else:
        report_content.append("  No high-risk pet records identified")
    
    # IV. Customized Service Recommendations (Innovative Function Results)
    report_content.append("\n" + "## IV. Customized Service Recommendations (Innovative Function)")
    rec = risk_result["service_recommendation"]
    report_content.append(f"1. Young Pet Package (≤ 1 Year Old): {rec['young_pet_need_package']} pets need recommendation, including basic physical examination + deworming service (discounted price: 180 Yuan/package)")
    report_content.append(f"2. Elderly Pet Chronic Disease Screening (Cats ≥ 8 Years Old/Dogs ≥ 7 Years Old): {rec['elderly_pet_need_screen']} pets need reminder, phone appointment is recommended")
    
    # V. Error Record Summary
    report_content.append("\n" + "## V. Error Record Summary")
    if error_records:
        total_errors = len(error_records)
        report_content.append(f"A total of {total_errors} error records found. Error examples (first 3):")
        
        # Display the first 3 error examples (for easy data correction)
        err_example_count = 0
        for err_item in error_records:
            if err_example_count < 3:
                report_content.append(err_item)
                err_example_count += 1
    else:
        report_content.append("  No error records, original data format is standard")
    
    # Write the report content to the file
    try:
        with open(report_path, "w") as f:
            f.write("\n".join(report_content))
        print(f"✅ Diagnosis and Treatment Analysis Report Generated: {report_path}")
    except Exception as e:
        print(f"❌ Failed to Generate Report: {str(e)}")

## 3.Main Program Execution (Full-Process Operation)

Call the above functions to complete the full process of "data reading → cleaning → analysis → report generation", and print key results for verification:

In [45]:
# -------------------------- Main Program Entry --------------------------
# Define file paths (ensure pet_health_records.txt is in the current directory)
input_file_path = "pet_health_records.txt"  # Input data file
output_directory = "."                     # Output directory (current directory)

# Step 1: Read and clean data
print("="*60)
print("1. Starting to read and clean data...")
valid_data, error_records = read_and_clean_pet_data(input_file_path)

# Print data cleaning results (for verification)
total_valid = len(valid_data)
total_error = len(error_records)
print(f"   Data Cleaning Completed: {total_valid} Valid Records | {total_error} Error Records")
print("="*60 + "\n")

# Step 2: Data analysis (execute only if there is valid data)
if valid_data:
    print("2. Starting data analysis...")
    # Calculate statistical indicators
    stats_dict = calculate_pet_stats(valid_data)

    # Identify health risks
    risk_result = identify_health_risks(valid_data)
    print("   Data Analysis Completed")
    print("="*60 + "\n")
    
# Step 3: Generate report and log
    print("3. Starting to generate report...")
    write_pet_report(stats_dict, error_records, risk_result, output_directory)
    print("="*60 + "\n")
else:
    print("❌ No valid data, unable to perform analysis and report generation")
    print("="*60)

1. Starting to read and clean data...
   Data Cleaning Completed: 95 Valid Records | 5 Error Records

2. Starting data analysis...
   Data Analysis Completed

3. Starting to generate report...
✅ Diagnosis and Treatment Analysis Report Generated: ./pet_clinic_report.txt

