Skip to content

Implemented and used image processing, feature extraction, clustering, decision trees, random forest, SVR, SVM, model evaluation (RMSE, precision, recall), Scikit-learn, Pandas, NumPy, OpenCV, PIL.

Notifications You must be signed in to change notification settings

SeivenBell/Data_Science_and_Statistical_Learning_Projects

Repository files navigation

Data science tools and stat learning

This repository contains projects showcasing my skills in machine learning, data science, and statistical learning techniques. The projects include feature extraction from pathology images and regression tree optimization, among others. Below is a summary of each project and the tools used.

Project 1: Feature Extraction and Clustering

Description: This project involves using deep learning techniques to extract features from pathology images. The task is part of an assignment where the objective is to generate features from a set of training images using a deep network. The features are then allocated with 10% of the data for validation.

Tools and Technologies:

  • Deep Learning Framework (PyTorch)
  • Image Processing Libraries (OpenCV, PIL)
  • NumPy, Pandas for data handling
  • Clustering Techniques (K-Means, Hierarchical Clustering)

Key Files:

  • Assignment1.ipynb: Jupyter Notebook containing the code for feature extraction from pathology images.
  • code.ipynb: Supplementary code for handling specific tasks related to feature generation.

Project 2: Regression Tree Optimization and Classification

Description: This project is focused on performing a regression task, with a significant emphasis on decision tree optimization. The tasks include data preprocessing, tuning regression trees to prevent overfitting, comparing regression trees with Random Forest and Support Vector Regression (SVR), and performing a classification task by adding a threshold to the label column.

Tools and Technologies:

  • Scikit-learn for machine learning models (Decision Trees, Random Forest, SVR)
  • Data Preprocessing Libraries (Pandas, NumPy)
  • Metrics for Model Evaluation (RMSE, Accuracy, Precision, Recall)

Key Files:

  • A2.ipynb: Jupyter Notebook containing code for data preprocessing, regression tree optimization, comparison with Random Forest and SVR, and classification tasks.
  • Problem_Statement.txt: Detailed description of the assignment requirements and tasks.

How to Use

  1. Clone the repository to your local machine:

    git clone https://github.com/yourusername/your-repo-name.git
  2. Navigate to the project directory:

    cd your-repo-name
  3. Open the Jupyter Notebooks in your preferred environment (e.g., JupyterLab, Google Colab).

  4. Follow the instructions within each notebook to reproduce the results.

Additional Notes

  • Make sure to have the required Python libraries installed. You can install the necessary dependencies by running:
    pip install -r requirements.txt

About

Implemented and used image processing, feature extraction, clustering, decision trees, random forest, SVR, SVM, model evaluation (RMSE, precision, recall), Scikit-learn, Pandas, NumPy, OpenCV, PIL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •