This project demonstrates skills applied in a technical assessment for a data analytics role.
It focuses on analyzing and tagging a dataset of customer complaints and repair information to derive actionable insights.
Key objectives:
- Clean and preprocess raw data.
- Tag fields such as
Root Cause
,Symptom Condition
,Symptom Component
,Fix Condition
, andFix Component
. - Identify critical columns and patterns for stakeholders.
- Generate visualizations to communicate insights effectively.
The dataset contains detailed information on customer complaints and repairs, including:
- Vehicle & Transaction Info:
VIN
,TRANSACTION_ID
,REPAIR_DATE
,VEH_TEST_GRP
,COUNTRY_SALE_ISO
,ORD_SELLING_SRC_CD
- Customer Complaints:
CUSTOMER_VERBATIM
,CORRECTION_VERBATIM
,COMPLAINT_CD_CSI
,COMPLAINT_CD
- Parts & Labor:
CAUSAL_PART_NM
,GLOBAL_LABOR_CODE_DESCRIPTION
,TRANSACTION_CATEGORY
,REPORTING_COST
,TOTALCOST
,LBRCOST
- Vehicle Specs:
PLATFORM
,BODY_STYLE
,ENGINE
,TRANSMISSION
,ENGINE_DESC
,TRANSMISSION_DESC
,ENGINE_SOURCE_PLANT
,TRANSMISSION_SOURCE_PLANT
- Dealership Info:
LAST_KNOWN_DLR_NAME
,LAST_KNOWN_DLR_CITY
,REPAIRING_DEALER_CODE
,DEALER_NAME
,REPAIR_DLR_CITY
,STATE
,DEALER_REGION
,REPAIR_DLR_POSTAL_CD
- Repair Metadata:
REPAIR_AGE
,KM
,MEDIA_FLAG
,VIN_MODL_DESGTR
,LINE_SERIES
,LAST_KNOWN_DELVRY_TYPE_CD
,NON_CAUSAL_PART_QTY
,SALES_REGION_CODE
Note: Sensitive information has been anonymized for privacy.
-
Data Cleaning
- Removed duplicates and null values.
- Standardized inconsistent entries for parts, labor codes, and repair notes.
-
Data Tagging
- Applied structured logic to tag root causes, symptoms, and fixes.
- Ensured consistency across multiple components and conditions.
-
Analysis & Visualization
- Selected top 5 critical columns for stakeholder insights.
- Created visualizations (bar plots, distribution charts) to highlight key patterns.
- Derived actionable insights to support data-driven decision making.
- Recurring complaints help prioritize preventive actions.
- Symptom and fix patterns indicate areas for process improvement.
- Data-driven analysis can guide repair teams to reduce repeat issues.
- Certain parts and modules frequently cause customer-reported problems.
- Python 3.x
- Pandas, NumPy
- Matplotlib, Seaborn
- Jupyter Notebook
This repository serves as a hands-on example for aspiring Python data analysts to understand how to:
- Load and clean real-world datasets.
- Perform exploratory data analysis (EDA) to extract meaningful insights.
- Identify key features that matter to stakeholders.
- Create effective visualizations to communicate findings.
By studying this repo, learners can gain practical experience and ideas for structuring their own data analysis projects in Python.
Happy learning ❤️