I am a Data Scientist and Machine Learning Engineer with an M.S. in Computer Science, specializing in Data Science. My expertise spans designing and implementing robust data solutions and end-to-end machine learning pipelines, covering data ingestion, exploratory data analysis (EDA), data preprocessing, model training, and deployment. I have hands-on experience streamlining data onboarding with automated ETL pipelines, multi-source data integration, and cloud-based solutions using AWS services like Sagemaker, Glue, Lambda, and Redshift.
My work also includes developing real-time data pipelines for applications such as fraud detection and contract data processing, leveraging Python, Apache Spark, and cloud solutions to optimize performance and cut costs. I am skilled at creating centralized data lakes, constructing scalable ETL pipelines, and integrating CI/CD practices with Docker to ensure reliable, efficient data flow.
In addition, I am passionate about solving complex problems using advanced machine learning techniques and transfer learning models (e.g., GPT-3, BERT), building interactive data visualizations with tools like Tableau and Plotly, and optimizing SQL queries for faster data retrieval. Whether it's enhancing operational efficiency, reducing manual intervention, or cutting down processing time, I am committed to delivering data-driven solutions that drive business value.
- Goal: Predict income levels using the Adult Income Census dataset.
- Highlights: End-to-end ML pipeline with data ingestion, preprocessing, model training (Random Forest, Decision Tree, Logistic Regression), and deployment on Flask.
- Technologies: Anaconda, Python, Scikit-learn, Jupyter, Pandas, Numpy, Flask.
- Results: Achieved 85% accuracy using hyperparameter-tuned models. Deployed the model on Flask with real-time predictions under 200ms.