Welcome to Data Science with Python — a curated portfolio of practical data science and machine learning projects developed by Tega Jarikre.
Each project demonstrates real-world applications of Python in solving problems across insurance, logistics, content moderation, and agricultural analytics.
This repository represents a continuous learning journey in data-driven problem-solving, from raw data wrangling to model deployment.
- Build and evaluate machine learning models across diverse domains
- Apply data preprocessing, feature engineering, and model optimization techniques
- Experiment with classification, regression, and clustering algorithms
- Explore bias detection, economic efficiency, and yield analysis using real datasets
- Strengthen portfolio readiness for data science career roles
Data-Science-with-Python/ │ ├── insurance_premium_prediction/ # Regression models for premium forecasting ├── fake_content_detection/ # NLP + metadata-based bias and fake news detection ├── delivery_delay_classification/ # Predicting early, on-time, or late deliveries ├── earthquake damage classification/ # Predicting low, medium, or high grade earthquake damage ├── china real estate demand prediction/ # Regression models for real estate demand forecasting ├── borehole functionality classification/ # Predicting function, non-functional, or functional needs repair boreholes ├── notebooks/ # Shared EDA, feature engineering, and model experiments ├── scripts/ # Reusable Python utilities ├── data/ # Sample datasets (clean or synthetic) ├── results/ # Visualizations and performance reports ├── requirements.txt # Dependencies for reproducibility └── README.md # You’re here!
Category | Tools & Libraries |
---|---|
Core Language | Python 3.10+ |
Data Manipulation | Pandas, NumPy |
Visualization | Matplotlib, Seaborn, Plotly |
Modeling & ML | Scikit-learn, XGBoost, Random Forest, Logistic Regression |
NLP & Text Analytics | NLTK, spaCy, TF-IDF, Word2Vec |
Evaluation & Metrics | Precision, Recall, F1-score, RMSE, ROC-AUC |
Version Control | Git & GitHub |
Notebooks & IDEs | Jupyter Notebook, VS Code |
- Goal: Predict customer insurance premiums using demographic and risk variables.
- Approach: Regression models (Linear Regression, XGBoost, Random Forest).
- Focus: Feature selection, multicollinearity detection, and interpretability.
- Goal: Classify online content as fake, biased, or neutral.
- Approach: Natural Language Processing (NLP) with metadata features.
- Focus: Text cleaning, vectorization (TF-IDF), and ensemble learning.
- Goal: Predict delivery status — early, on time, or late.
- Approach: Multi-class classification with Logistic Regression, XGBoost, and Random Forest.
- Focus: Handling class imbalance, feature importance, and business impact analysis.
- Goal: Analyze farm typology, post-harvest loss, and yield determinants.
- Approach: Clustering, feature selection, and supervised learning for productivity prediction.
- Focus: Data science-driven agricultural analytics.
- Data wrangling and cleaning workflows
- Feature engineering and transformation
- Model training, validation, and hyperparameter tuning
- Model interpretability (SHAP, feature importance)
- End-to-end data science pipeline documentation
- Clone the repository
git clone https://github.com/Tegazini/Data-Science-with-Python.git
- Navigate into the folder
cd Data-Science-with-Python
- Install dependencies
pip install -r requirements.txt
- Run notebooks
jupyter notebook
Explore each project folder for its own datasets and notebook scripts.
- Add deep learning experiments with TensorFlow/PyTorch
- Incorporate MLOps tools (e.g., MLflow, DVC) for versioned model tracking
- Deploy selected models using Streamlit or FastAPI
📧 Email: jarikretega@gmail.com
💻 GitHub: https://github.com/Tegazini
"Data science isn’t just about algorithms — it’s about understanding data deeply enough to tell meaningful stories."