Skip to content

Tegazini/Data-Science-with-Python

Repository files navigation

🧠 Data Science with Python

Welcome to Data Science with Python — a curated portfolio of practical data science and machine learning projects developed by Tega Jarikre.
Each project demonstrates real-world applications of Python in solving problems across insurance, logistics, content moderation, and agricultural analytics.

This repository represents a continuous learning journey in data-driven problem-solving, from raw data wrangling to model deployment.


🚀 Project Objectives

  • Build and evaluate machine learning models across diverse domains
  • Apply data preprocessing, feature engineering, and model optimization techniques
  • Experiment with classification, regression, and clustering algorithms
  • Explore bias detection, economic efficiency, and yield analysis using real datasets
  • Strengthen portfolio readiness for data science career roles

🧩 Repository Structure

Data-Science-with-Python/ │ ├── insurance_premium_prediction/ # Regression models for premium forecasting ├── fake_content_detection/ # NLP + metadata-based bias and fake news detection ├── delivery_delay_classification/ # Predicting early, on-time, or late deliveries ├── earthquake damage classification/ # Predicting low, medium, or high grade earthquake damage ├── china real estate demand prediction/ # Regression models for real estate demand forecasting ├── borehole functionality classification/ # Predicting function, non-functional, or functional needs repair boreholes ├── notebooks/ # Shared EDA, feature engineering, and model experiments ├── scripts/ # Reusable Python utilities ├── data/ # Sample datasets (clean or synthetic) ├── results/ # Visualizations and performance reports ├── requirements.txt # Dependencies for reproducibility └── README.md # You’re here!


🧰 Tech Stack

Category Tools & Libraries
Core Language Python 3.10+
Data Manipulation Pandas, NumPy
Visualization Matplotlib, Seaborn, Plotly
Modeling & ML Scikit-learn, XGBoost, Random Forest, Logistic Regression
NLP & Text Analytics NLTK, spaCy, TF-IDF, Word2Vec
Evaluation & Metrics Precision, Recall, F1-score, RMSE, ROC-AUC
Version Control Git & GitHub
Notebooks & IDEs Jupyter Notebook, VS Code

📊 Highlighted Projects

1️⃣ Insurance Premium Prediction

  • Goal: Predict customer insurance premiums using demographic and risk variables.
  • Approach: Regression models (Linear Regression, XGBoost, Random Forest).
  • Focus: Feature selection, multicollinearity detection, and interpretability.

2️⃣ Fake or Biased Content Detection

  • Goal: Classify online content as fake, biased, or neutral.
  • Approach: Natural Language Processing (NLP) with metadata features.
  • Focus: Text cleaning, vectorization (TF-IDF), and ensemble learning.

3️⃣ Delivery Delay Classification

  • Goal: Predict delivery status — early, on time, or late.
  • Approach: Multi-class classification with Logistic Regression, XGBoost, and Random Forest.
  • Focus: Handling class imbalance, feature importance, and business impact analysis.

4️⃣ Agricultural Efficiency & Productivity Studies

  • Goal: Analyze farm typology, post-harvest loss, and yield determinants.
  • Approach: Clustering, feature selection, and supervised learning for productivity prediction.
  • Focus: Data science-driven agricultural analytics.

📚 Learning Focus Areas

  • Data wrangling and cleaning workflows
  • Feature engineering and transformation
  • Model training, validation, and hyperparameter tuning
  • Model interpretability (SHAP, feature importance)
  • End-to-end data science pipeline documentation

🧑‍💻 How to Use

  1. Clone the repository
    git clone https://github.com/Tegazini/Data-Science-with-Python.git
  2. Navigate into the folder
    cd Data-Science-with-Python
  3. Install dependencies
    pip install -r requirements.txt
  4. Run notebooks
    jupyter notebook
    

Explore each project folder for its own datasets and notebook scripts.

🌟 Future Work

  • Add deep learning experiments with TensorFlow/PyTorch
  • Incorporate MLOps tools (e.g., MLflow, DVC) for versioned model tracking
  • Deploy selected models using Streamlit or FastAPI

🧾 Author

👤 Tega Jarikre

"Data science isn’t just about algorithms — it’s about understanding data deeply enough to tell meaningful stories."

About

Data science projects with python libraries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published