I am a seasoned Data Scientist/ AI Engineer with over 5 years of experience in delivering data science and machine learning solutions across diverse sectors including climate, fintech, healthcare, agri-tech, and capital markets. I am skilled in designing and automating scalable ML systems, specializing in model versioning, CI/CD, and cloud-native tools like Docker, Kubernetes, and MLflow. I focus on turning complex models into production-ready solutions that deliver insights and drive business impact. Currently based in Nairobi, Kenya, open to opportunities worldwide.
- MPhil in Environmental Science: The Cyprus Institute, Nicosia Cyprus; through the Cyprus Institute Merit Scholarship.
- MSc in Mathematical Sciences(Data Science) : University of Western Cape/AIMS, Cape Town, South Africa; The program made possible through MasterCard Foundation scholarship.
- BSc. Mathematics: University of Nairobi, Nairobi, Kenya; through sponsorship by Finlays Undergraduate Scholarship.
- 5+ years experience delivering Data Science and ML solutions across sectors (Fintech, Climate, Healthcare).
- Skilled in data analytics, predictive analytics,credit risk analysis, anomaly & fraud detection, credit scoring, time series forecasting, and scalable ML model deployment.
- Proficient in Python, R, SQL, Scikit-learn, CNN, OpenCV, TensorFlow, RF, LSTM,Catboost, XGBoost, LightGBM,PyOD models and AWS.
- Experienced with MLOps workflows: streamlit, Docker, Kubernetes, MLflow, FastAPI, and CI/CD pipelines.
- LLMs & GenAI Applications: Experienced in deploying transformer-based LLMs (GPT, Mistral, LLaMA) via Hugging Face, OpenAI, and OpenRouter APIs. Built RAG pipelines using LangChain, Chroma, and FAISS. Proficient in prompt engineering, embeddings, and semantic search.
- Committed to leveraging advanced analytics for risk minimization, business growth, and social impact.
Linkedln: Vincent-langat-19307a94/
🌐 Website: vinylango25.github.io
github.com/Vinylango25
📫 Email: langatvincent.ds@gmail.com
| Project | Description | Tools Used |
|---|---|---|
| 🌍 Air Quality Monitoring in Nicosia, Cyprus | To improve urban air quality monitoring, this project calibrates low-cost electrochemical sensors using ML algorithms like XGBoost, Random Forest, and ANN. Raw sensor data, collected over six months, is aligned with reference-grade measurements. The study analyzes calibration frequency, data sampling strategies, and environmental factors like humidity and cross-gas interference. Results show that with proper calibration, LCSs can meet EU and EPA accuracy standards. This opens doors for cost-effective, citywide monitoring networks. 👉 Read the full project on Github or Medium | Python, Scikit-learn, LR, SVR, ANN, FLAML, XGBoost, Random Forest,Jupyter Notebook |
| 🤖 Lending Automation - ML for Credit Scoring | This project builds an end-to-end loan approval system using machine learning algorithms like Random Forest, XGBoost, and LightGBM. It replaces manual decision-making with faster, scalable, and data-driven processes for improved credit scoring. Key tasks include data cleaning, feature engineering, and model optimization using real-world loan data. Evaluation metrics ensure accuracy and fairness, reducing false approvals and rejections. The system supports personalized lending and dynamic pricing for better customer experience. 👉 Read the full project on Github or Medium | Python, Scikit-learn, RF, XGBoost, LightGBM, Catboost |
| 🛡️ Anomaly and Fraud Detection in Finance | This project applies PyOD and Microsoft AutoML (FLAML) to detect anomalies in credit card transactions using a highly imbalanced dataset. A variety of algorithms—including Isolation Forest and Autoencoders—were tested for their ability to flag suspicious activity. To address imbalance, techniques like undersampling, oversampling, and SMOTE were applied. Evaluation focused on metrics like Precision, Recall, and ROC-AUC for a robust assessment. The outcome is a high-precision fraud detection pipeline that enhances financial risk management. 👉 Read the full project on Github or Medium | Python, Scikit-learn, XGBoost, RF, Optuna, LightGBM, FLAML, PyOD, LIME, SHAP |
| 🏥 Enhancing Healthcare Accessibility in Nairobi | This project presents a comprehensive, data-driven narrative that delves into the current state of Nairobi’s healthcare infrastructure. By leveraging multiple datasets, including population demographics and health facility distributions, it evaluates how well the city is positioned to meet the healthcare needs of its diverse population. The analysis is anchored within the framework of the United Nations Sustainable Development Goal 3, which aims to ensure healthy lives and promote well-being for all at all ages. Through detailed examination of healthcare accessibility, service availability, and facility operational hours, this study identifies critical gaps and opportunities for improvement. Ultimately, the findings serve as a foundation for informed policy-making and strategic interventions designed to advance Nairobi’s journey toward achieving SDG 3. 👉 Read the full project on GitHub or Medium | Python, Matplotlib, Seaborn, Plotly, GeoPandas, OSRM, Folium, Pandas, QGIS, Spatial Analysis |
| 🌐 My Personal Portfolio Website | In this project, I created my personal website that houses a portfolio of my work and highlights key projects in data science and AI. The website demonstrates my proficiency in front-end development using HTML, CSS, and JavaScript, with a focus on building responsive and user-friendly layouts. Through this project, I developed skills in web design, layout structuring, and interactive user interface development, all aimed at presenting information clearly and effectively. 👉 Visit the Website or View on Github | HTML5, CSS3, JavaScript, Flexbox, DOM Manipulation, UI/UX Design, Web Deployment |
| 🔮Vincent Chatbot – Personalized AI Assistant Powered by LLMs | This project builds an AI chatbot powered by large language models to answer questions about Vincent’s professional profile. Using vector embeddings and FastAPI, it enables contextual responses based on uploaded documents. Deployed on Render.com, it demonstrates skills in NLP, API development, and cloud deployment for personalized AI applications. 👉 Read the full project on Github | LLMs, GPT, Mistral, LLaMa, Hugging Face, OpenAI, and OpenRouter APIs, RAG pipelines, LangChain, Chroma,FAISS |
| 🩺 COVID-19 Detection Using CT Scans | This project applies deep learning to detect COVID-19 from chest CT scans using convolutional neural networks. Models like ResNet50, DenseNet169, and MobileNetV2 are trained and fine-tuned for accurate image classification, achieving high detection accuracy through transfer learning and ensemble methods. The pipeline includes image preprocessing, augmentation, and evaluation with real-world datasets. The system supports rapid and reliable diagnosis, aiding medical decision-making. 👉 Read the full project on Github or Medium | Python, TensorFlow, Keras, CNN, ResNet50, DenseNet169, MobileNetV2 |
| 📉 Customer Churn Analysis and Prediction | This project predicts customer churn using survival analysis and machine learning to identify clients likely to leave. It focuses on telecom-style use cases where retaining customers is more cost-effective than acquiring new ones. By analyzing historical behavior and risk factors, it enables targeted retention campaigns. An interactive tool is also developed to assess individual churn risk and lifetime value. This helps businesses make data-driven decisions to reduce attrition and improve customer loyalty. 👉 Read the full project on Github. | Python, Flask, Scikit-learn, SHAP, Cox Proportional Hazards Model, Survival Analysis |
| 📊 Sales Time Series Analysis | This project analyzes historical sales data using time series forecasting techniques to uncover trends, seasonality, and patterns that drive business performance. By leveraging models like ARIMA, SARIMA, and Prophet, it enables accurate sales forecasting to support inventory planning, resource allocation, and strategic decision-making. The pipeline includes data preprocessing, stationarity testing, model tuning, and performance evaluation. The insights generated help optimize operations and reduce forecasting errors. 👉 Read the full project on Github | Python, Pandas, Statsmodels, FBProphet, ARIMA, SARIMA |
| 💻 Machine Learning Loan Application Web App | This project delivers a full-stack web application for automated loan approval using machine learning models. Built with Streamlit and powered by Random Forest and Logistic Regression, the system predicts loan eligibility based on user input. The backend includes data preprocessing, model training, and evaluation, while the frontend ensures a seamless user experience. It streamlines loan applications by replacing manual reviews with instant, data-driven decisions. 👉 Read the full project on Github | Python, Scikit-learn, Streamlit, Random Forest, Logistic Regression, Flask |
| 🚀 Active Learning API (Django + CatBoost) | This project implements an active learning backend system to optimize data labeling for machine learning tasks. It intelligently selects the most informative samples for annotation, reducing labeling effort while improving model performance. Built with Django and integrated with machine learning models (e.g., CatBoost), it supports iterative learning cycles, model versioning, and anomaly detection. Key features include dataset management, active query strategies, and seamless MLflow tracking. 👉 Read the full project on Github | Python, Django, CatBoost, MLflow, Active Learning, REST API, SQLite |
| 🧠 Brain Tumor Detection Using Deep Learning | This project leverages deep learning to detect brain tumors from MRI scans using advanced convolutional neural networks. It employs architectures like MobileNetV2, DenseNet169, and ResNet50, enhanced through transfer learning and ensemble techniques to achieve up to 99.8% accuracy. The workflow includes image preprocessing, model training, and performance evaluation, enabling fast, accurate, and scalable tumor classification to support early diagnosis and clinical workflows. 👉 Read the full project Github | Python, TensorFlow, Keras, CNN, MobileNetV2, ResNet50, DenseNet169, FLAML |
|
|
|
I am open to working on AI, ML, data science, and fintech projects.
💬 Reach out to me for exciting collaborations!

