Data Mining Projects

This repository contains seven distinct projects in the realm of data analysis, machine learning, and data mining. Each project is designed to explore different aspects of these fields, showcasing the application of various techniques, algorithms, and methodologies.

Data Mining Course - Fall 2023
Amirkabir University of Technology

1) Introduction to Python Libraries:

In this project, we provide an exploration of Python language libraries commonly used in data mining projects. The report covers installation, general aspects, and functions of each library. The focus is on enhancing project steps such as pre-processing with accuracy and speed.

2) EDA and Visualization:

The goal of this project is to analyze a dataset of people with biological characteristics to classify the occurrence of heart attacks. Emphasis is on statistical analysis, visualization, and in-depth exploration of the dataset.

3) Data Cleaning and Feature Engineering:

This project delves into feature engineering methods, including reduction, selection, and extraction, after data cleaning. The impact of these methods on linear regression, decision tree, and random forest algorithms is explored, providing insights into the effectiveness of feature engineering.

4) Frequency Pattern Detection:

By comparing Apriori and FP-Growth algorithms on different datasets, this project conducts sensitivity analysis. FP-Growth, known for efficient large dataset handling, outperforms Apriori in terms of speed and efficiency. The project reveals useful patterns and relationships in the data.

5) Advanced Methods in Classification:

This project involves preparing a dataset of gas sensor information, classifying it using algorithms like Random Forest, SVM, and Naive Bayes, and then evaluating each classifier in detail. Multi-model classification, utilizing stacking, enhances efficiency, and sensitivity analysis of hyperparameters is performed.

6) Advanced Methods in Clustering:

Cluster analysis is performed on an insurance dataset using methods like KMeans, Agglomerative Clustering, and DBSCAN. The clustering results are compared using metrics like Silhouette Score, and dimensionality reduction techniques such as PCA and t-SNE are applied for visualization.

7) Identifying Data Outliers and Anomalies, Comparing Data Balancing Methods, and Providing Evaluation:

This comprehensive project covers various sections in data analysis and machine learning. It includes anomaly detection, data augmentation, and de-emphasis using techniques like One-Class SVM and Local Outlier Factor. Balancing methods, including OverSampling (Random OverSampler, SMOTE) and UnderSampling (Random UnderSampler), are applied. The project concludes with the use of an LSTM network for identifying temporal patterns in the data.

These projects collectively showcase a diverse range of techniques in data analysis, addressing challenges such as class imbalance, anomaly detection, and temporal pattern recognition. They provide valuable insights into improving performance and reliability in complex data analysis, especially in climate-related datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
Advanced methods in Clustring		Advanced methods in Clustring
Advanced methods in classification		Advanced methods in classification
Data Cleaning and Feature Engineering		Data Cleaning and Feature Engineering
EDA and Visualization		EDA and Visualization
Frequency Pattern Detection		Frequency Pattern Detection
Identifying data outliers and anomalies		Identifying data outliers and anomalies
Introduction to Python Libraries		Introduction to Python Libraries
README.md		README.md
Samin-Mahdipour.pdf		Samin-Mahdipour.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining Projects

1) Introduction to Python Libraries:

2) EDA and Visualization:

3) Data Cleaning and Feature Engineering:

4) Frequency Pattern Detection:

5) Advanced Methods in Classification:

6) Advanced Methods in Clustering:

7) Identifying Data Outliers and Anomalies, Comparing Data Balancing Methods, and Providing Evaluation:

About

Releases

Packages

Languages

Precioux/Data-Mining

Folders and files

Latest commit

History

Repository files navigation

Data Mining Projects

1) Introduction to Python Libraries:

2) EDA and Visualization:

3) Data Cleaning and Feature Engineering:

4) Frequency Pattern Detection:

5) Advanced Methods in Classification:

6) Advanced Methods in Clustering:

7) Identifying Data Outliers and Anomalies, Comparing Data Balancing Methods, and Providing Evaluation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages