This repo holds exercise notebooks from the Data Analysis with Python specialization from Coursera.
- Introduction to Classification: provides an overview of classification as a supervised learning method. You will also learn the K-Nearest Neighbors (KNN) algorithm, understanding its principles and applications in classification tasks.
- Decision Tree Classification: explore the Decision Tree algorithm, learning its structure, construction, and applications in classification problems.
- Support Vector Machine Classification: focuses on the Support Vector Machine (SVM) algorithm, where you will grasp its principles and how it is used for classification.
- Naive Bayes and Logistic Regression: delves into two essential classifiers: Naive Bayes and Logistic Regression. You will gain insights into their assumptions, strengths, and applications.
- Classification Evaluation: provides techniques on how to evaluate the performance of classifiers using various metrics and visualization techniques.
- Introduction to Regression and Linear Regression: delves into the concepts of linear regression, understanding its principles, assumptions, and practical applications.
- Polynomial Regression: explores polynomial regression, an advanced technique used to capture nonlinear relationships between variables.
- Regularization: focuses on regularization techniques, including Ridge, Lasso, and Elastic Net, which help prevent overfitting and improve the generalization of regression models.
- Evaluation and Cross Validation: explores evaluation metrics and cross-validation techniques to assess and optimize regression model performance.
- Ensemble Methods: explores ensemble methods in regression analysis, including bagging and boosting, to combine multiple models for improved prediction accuracy.
- Introduction to Partitioning and Clustering: provides an introduction to unsupervised learning and clustering analysis. You will delve into partitioning clustering methods, such as K-Means and K-Medoids, understanding their principles and applications.
- Hierarchical Clustering: explores hierarchical clustering, a method that creates a tree-like structure to represent data similarities.
- Density Based Clusering: focuses on density-based clustering, which groups data points based on their density within the dataset.
- Grid Based Clustering: explores grid-based clustering, an approach that partitions the data space into grids for efficient clustering.
- Dimension Reduction Methods: introduces dimension reduction techniques as a critical preprocessing step for handling high-dimensional data.
- Frequent Itemset: provides an introduction to unsupervised learning and association rules analysis. You will explore frequent itemsets, understanding their significance in discovering patterns in transactional data. You will also explore association rules, such as support, confidence, and lift metrics as key indicators of association rule quality.
- Association Rules Mining: discusses association rule mining, such as closed and maxed patterns.
- Apriori and FP Growth algorithm: focuses on the Apriori and FP Growth algorithm, a key method for efficient frequent itemset mining.
- Outliers: explores the significance of outlier detection and its role in identifying unusual data points.
- Data Analysis Overview: overview of data analysis, understanding supervised and unsupervised learning directions. You will learn how to define the scope and direction of their data analysis project effectively.
- Classification Analysis: focuses on classification techniques, where you will explore Nearest Neighbors, Decision Trees, SVM, Naive Bayes, Logistic Regression, cross-validation, ensemble methods, and evaluation metrics.
- Regression Analysis: delves into regression techniques, including Simple Linear, Polynomial Linear, Linear with regularization, multivariate regression, cross-validation, ensemble methods, and evaluation metrics.
- Clustering Analysis: introduces clustering techniques, including partitioning, hierarchical, density-based, and grid-based methods, for unsupervised pattern discovery.
- Dimension Reduction: focuses on dimension reduction techniques, with a particular emphasis on Principal Component Analysis (PCA).
- Association Rules: focuses on a comprehensive case study where you will apply association rule mining and outlier detection techniques to solve a real-world problem.
- Outlier Detection: focuses on outlier detection methods, including Zscore, IQR, OneClassSVM, Isolation Forest, DBSCAN, LOF, and contextual outliers.