# RoleColorAI – Resume RoleColor NLP Prototype

This notebook identifies behavioral RoleColor traits from resume text and rewrites the resume summary based on the dominant trait using a simple NLP keyword scoring approach.


## Problem Understanding

RoleColorAI focuses on identifying how a person contributes to a team rather than just their job title.

The four RoleColors are:

- Builder → Strategy, architecture, innovation
- Enabler → Collaboration, coordination, teamwork
- Thriver → Speed, ownership, performance under pressure
- Supportee → Reliability, stability, documentation

This notebook builds a lightweight NLP prototype that:
1. Reads resume text
2. Scores it across RoleColors using keywords
3. Normalizes the scores
4. Rewrites the resume summary based on the dominant RoleColor


In [1]:
import re
from collections import Counter

In [2]:
ROLECOLOR_KEYWORDS = {
    "Builder": [
        "architecture", "design", "strategy", "scalable", "innovation",
        "vision", "roadmap", "system", "optimization", "automation",
        "pipeline", "rag", "agent", "llm", "engineering"
    ],
    "Enabler": [
        "collaboration", "cross-functional", "stakeholders", "coordination",
        "communication", "teamwork", "alignment", "support", "dashboard",
        "compliance"
    ],
    "Thriver": [
        "fast", "deadline", "ownership", "deliver", "pressure",
        "agile", "quickly", "adapt", "dynamic", "real-time",
        "latency", "performance"
    ],
    "Supportee": [
        "maintenance", "documentation", "testing", "reliability",
        "monitoring", "debugging", "stability", "consistency",
        "governance"
    ]
}


## Why These Keywords?

The keywords were selected based on common language used in engineering, teamwork, and leadership contexts.

- Builder words reflect system thinking, architecture, and innovation.
- Enabler words reflect collaboration and coordination across teams.
- Thriver words reflect execution in fast-paced and performance-driven environments.
- Supportee words reflect stability, monitoring, and reliability practices.

These keywords act as behavioral signals in resume language.


In [3]:
resume_text = """
Srujan Reddy Pundru
AI/ML Engineer
 Professional Summary
AI/Machine Learning Engineer with 3+ years of experience specializing in Generative AI and MLOps. Experience architecting production grade Multi-Agent RAG systems and optimizing LLM inference (vLLM, Quantization) to minimize latency and GPU costs. Expert in building end-to-end Machine Learning pipelines on Kubernetes (EKS), ensuring 99.9% availability via robust CI/CD (ArgoCD, GitHub Actions) and observability. Skilled in standardizing agentic workflows using LangGraph and LangChain for enterprise-grade AI systems and reducing infrastructure overhead for high-availability enterprise solutions.
Technical Skills
Programming Languages: Python, SQL, PySpark, Java, R
ML/DL: PyTorch, TensorFlow, XGBoost, LightGBM, CNN, RNN, LSTM, Transformers, ETL / ELT
LLMs & GenAI: LangChain, LangGraph, LangSmith, vLLM, LoRA, QLoRA, Quantization, Hugging Face, OpenAI API, Agentic Workflows, RAG, Vector Databases (FAISS, Pinecone), Ollama, Distributed Training (Data, Tensor, & Pipeline Parallelism), RLHF
MLOps / LLMOps: Kubeflow, MLflow, SageMaker Pipelines, Docker, Kubernetes (EKS), KServe, TorchServe, ArgoCD, FastAPI, Feast Feature Store, Vertex AI, CI/CD, REST APIs, GitHub Actions, Terraform, Helm, Jenkins
Monitoring & Governance: SHAP, LIME, Evidently AI, Prometheus, Grafana, Great Expectations, Drift Detection, Model SLOs Cloud & Data Platforms: AWS (SageMaker, S3, Lambda, EKS, Redshift, EMR, EC2), GCP, Azure ML, Apache Spark, Kafka, Flink, Snowflake, BigQuery, Hive, distributed systems, dbt Cloud
Finance & Analytics Tools: QuantLib, Bloomberg Terminal, SAS, SAP HANA, Oracle Financials, Alteryx, Credit Risk Modeling, SWIFT
Databases & Visualization Tools: PostgreSQL, Redis, MongoDB, Elasticsearch, Tableau, Power BI, Streamlit
Compliance: Data Governance, GDPR, SOX, Responsible AI Practices
Work Experience
AI/ML Engineer, Liberty Mutual Insurance, USA	Feb 2025 – Present
•	Designed and deployed production ML inference pipelines on AWS SageMaker and EKS, improving fraud and risk detection accuracy by 15% and reducing false positives by 20%.
•	Built real-time transaction processing pipelines using Kafka and Spark Structured Streaming, supporting 500K+ daily transactions with sub-second latency and automated alerting via SNS/SQS.
•	Implemented end-to-end fraud detection workflows using Docker, Kubeflow, MLflow, and CI/CD, reducing manual deployment effort by 90%.
•	Developed automated model retraining and rollout pipelines with Kubeflow and GitHub Actions, ensuring reproducible and compliant production releases.
•	Constructed a scalable feature store using AWS Glue, EMR (PySpark), and Feast, enabling consistent features across batch and streaming models.
•	Integrated model explainability (SHAP) and drift detection (KS test, Evidently AI) into Prometheus/Grafana for proactive monitoring and regulatory compliance.
•	Engineered secure ETL pipelines using Airflow and AWS Glue to process 1TB+ monthly financial data into Redshift, enforcing encryption and audit logging.
•	Built LLM-powered RAG and agent workflows using LangChain and LangGraph to automate financial research and document analysis, reducing analyst effort by 40%.
•	Collaborated with risk, compliance, and business stakeholders to calibrate fraud thresholds, reducing false alerts while meeting regulatory requirements.
Machine Learning Engineer, Persistent Systems, India	Jun 2020 – Nov 2022
•	Developed fraud detection and risk-scoring models (XGBoost, Isolation Forest, LSTM), improving recall by 15% and reducing false positives for BFSI clients.
•	Engineered low-latency Spark Streaming pipelines for fraud inference with strict data versioning (DVC) and reproducible experiments.
•	Built stacked ensemble models (XGBoost + Logistic Regression) for loan default prediction, achieving 0.92 AUC and 12% F1-score improvement.
•	Designed ARIMA and LSTM forecasting models to predict demand and operational load, reducing forecasting error by 18%.
•	Automated ETL and feature engineering workflows using Python, Pandas, NumPy, cutting data prep time by 40%.
•	Developed executive dashboards in Power BI and Tableau to visualize KPIs, model performance, and risk metrics.
•	Partnered with data engineering and DevOps teams to deploy models reliably in Agile, production environments.

EDUCATION
Master of Science in Computer and Information Systems Security / Information Assurance
Wilmington University, New Castle, DE, USA	Dec 2024
Bachelor of Technology in Computer Science and Engineering
Vignana Bharathi Institute of Technology, India	Jul 2022
CERTIFICATIONS
AWS Academy Cloud Foundations | AWS Academy Machine Learning
PROJECTS
Autonomous Agentic RAG Research Assistant | LlamaIndex, OpenAI, Arize Phoenix
•	Architected an autonomous Agentic RAG system using LlamaIndex with router-based query orchestration across vector and summary indexes, improving retrieval precision and response grounding.
•	Engineered a multi-tool reasoning agent capable of synthesizing information from 3 disparate data sources (Internal Technical Docs, Live Web Search, ArXiv Research Papers), enabling real-time cross-referencing of proprietary data with external academic validation.
•	Implemented full-stack observability with Arize AI Phoenix, establishing real-time tracing for agent decision trails to debug routing logic, validate tool selection, and ensure grounded, hallucination-free responses.

"""


In [4]:
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    return text


In [5]:
def score_rolecolors(text):
    scores = Counter()
    for role, words in ROLECOLOR_KEYWORDS.items():
        for word in words:
            if word in text:
                scores[role] += 1
    return scores


In [6]:
def normalize_scores(scores):
    total = sum(scores.values())
    return {role: round(score/total, 2) for role, score in scores.items()}


In [7]:
def rewrite_summary(dominant_role):
    templates = {
        "Builder": "Strategic AI/ML engineer focused on architecting scalable systems, designing robust pipelines, and building innovative solutions that drive long-term technical vision.",
        "Enabler": "Collaborative AI/ML engineer skilled at aligning stakeholders, enabling cross-functional teams, and ensuring smooth execution of complex projects.",
        "Thriver": "Result-driven AI/ML engineer who thrives in fast-paced environments, takes ownership, and delivers high-performance solutions under tight timelines.",
        "Supportee": "Reliable AI/ML engineer dedicated to system stability, monitoring, documentation, and maintaining consistent and dependable performance."
    }
    return templates[dominant_role]


In [8]:
clean_text = preprocess(resume_text)
scores = score_rolecolors(clean_text)
normalized = normalize_scores(scores)

dominant_role = max(normalized, key=normalized.get)

print("RoleColor Scores:\n")
for role, score in normalized.items():
    print(f"{role}: {score}")

print("\nDominant RoleColor:", dominant_role)
print("\nRewritten Summary:\n")
print(rewrite_summary(dominant_role))


RoleColor Scores:

Builder: 0.44
Enabler: 0.22
Thriver: 0.22
Supportee: 0.11

Dominant RoleColor: Builder

Rewritten Summary:

Strategic AI/ML engineer focused on architecting scalable systems, designing robust pipelines, and building innovative solutions that drive long-term technical vision.


## Observations

- The resume shows strong Builder traits due to frequent use of architecture, system design, pipelines, and innovation-related language.
- Enabler and Thriver traits are also present through collaboration, real-time processing, and performance-focused work.
- The keyword-based NLP scoring effectively captures how engineering language reflects behavioral team roles.
- The rewritten summary aligns with the detected RoleColor and highlights the candidate’s natural team contribution style.
