In [None]:
# What is ensemble learning :

# Ensemble learning is a machine learning technique where multiple models are 
# combined to improve overall predictive performance and generalization

# Bagging (Bootstrap Aggregating): It involves training multiple instances of the same model on different subsets
# of the training data and combining their predictions. Random Forest is a common example of bagging.

# Boosting: Boosting focuses on sequentially training models, where each subsequent model
# corrects the errors of the previous ones. Examples include AdaBoost, Gradient Boosting, and XGBoost.

In [None]:
# What is OOB Score ?  

# The OOB score is the performance metric (e.g., accuracy, MSE) computed on these out-of-bag samples. 
# It serves as a validation measure for the model without requiring a separate validation set. 
# OOB scores help assess the model's performance and can be used for early stopping or hyperparameter tuning.

In [1]:
# What is R2 Square ?
# R-squared, also known as the coefficient of determination, measures the proportion of the variance in the 
# dependent variable that is explained by the independent variables in a regression model. It ranges from 0 to 1, 
# with higher values indicating a better fit of the model to the data.


# R2 = 1âˆ’ Sum of Squared Residuals/Total Sum of Squares

 
from sklearn.metrics import r2_score

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

r2 = r2_score(y_true, y_pred)
print("R-squared:", r2)


R-squared: 0.9486081370449679


In [None]:
# What is Adjusted R2 square ?
# Adjusted R-squared is a modified version of R-squared that takes into account the number of 
# independent variables in the model. 
# It penalizes the addition of unnecessary variables that may not contribute significantly 
# to the model's predictive power.

In [None]:
# Supervised and unupervised ?

# Supervised learning is a type of machine learning where the algorithm is trained on labeled data, 
# meaning the input data is paired with the correct output or target. 
# The algorithm learns to map inputs to outputs by finding patterns in the training data. 
# The goal is to make accurate predictions or classifications on new, unseen data.


# Unsupervised learning is a type of machine learning where the algorithm is trained on unlabeled data, 
# meaning there are no predefined outputs or targets. The algorithm's objective is to discover inherent patterns, 
# structures, or relationships within the data. It's often used for clustering or dimensionality reduction tasks.


In [None]:
# All Unsupervised technique (there are many but here is few for revision)

Clustering:
# Clustering methods group similar data points together into clusters based on certain similarity measures. 
# The goal is to discover natural groupings within the data.

# K-Means Clustering
# Hierarchical Clustering
# DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
# Gaussian Mixture Models (GMM)

# Dimensionality Reduction:
# Dimensionality reduction techniques aim to reduce the number of features while retaining as much 
# relevant information as possible. 
# They are useful for visualizing high-dimensional data or improving model efficiency.

# Principal Component Analysis (PCA)


In [None]:
# K-Means clusters data into a predetermined number of clusters by minimizing the sum of squared distances 
#between data points and their cluster centers.
# Real Case Use: Customer segmentation in marketing for identifying distinct customer groups based on purchase behaviors.

# Hierarchical Clustering:
# Hierarchical Clustering builds a tree-like structure of nested clusters, allowing for various levels of granularity in cluster assignments.
# Real Case Use: Biological taxonomy to classify species into hierarchical categories like kingdom, phylum, etc.

# DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
# DBSCAN groups dense regions of data points and identifies noise points as outliers, without requiring the number of clusters as input.
# Real Case Use: Identifying hotspots of criminal activities in crime analysis.

# Gaussian Mixture Models (GMM):
# GMM models data as a mixture of multiple Gaussian distributions, enabling probabilistic assignment of data points to clusters.
# Real Case Use: Image segmentation for separating foreground and background objects.

# Principal Component Analysis (PCA):
# PCA reduces the dimensionality of data by projecting it onto a new orthogonal basis that captures the most important variability.
# Real Case Use: Reducing the number of features in facial recognition to improve computational efficiency.


In [None]:
# In short Time Series Will Do One to Understand Better 

# So what is time series : 
# 1. Time Series Data: Time series data is a sequence of data points collected over a specific time interval. 
# It's characterized by its order, which is usually chronological. 
# Components of time series include trend (long-term movement), seasonality (repeating patterns), and noise (random fluctuations).


# 2. Data Preprocessing:
# Handling Missing Data: You might interpolate missing values or fill them using methods like forward fill or backward fill.
# Dealing with Outliers: Outliers can distort analysis and predictions. You may remove or transform them.
# Resampling: You can change the frequency of data by upsampling (increasing frequency) or downsampling (decreasing frequency).
# Handling Time Zones and Irregular Intervals: Ensure data is consistently timestamped and aligned.


# 3. Time Series Visualization:
# Line Plots and Area Plots: Plotting the raw time series data helps visualize trends and patterns.
# Seasonal Decomposition: Using decomposition, you can separate a time series into trend, seasonality, and residual components.
# Autocorrelation and Partial Autocorrelation Plots (ACF and PACF): These help identify patterns in the data that can guide model selection.


# 4. Time Series Decomposition:
# Trend Extraction: Removing the trend helps in studying seasonality and residual patterns.
# Seasonal Component Extraction: Isolating seasonality reveals recurring patterns within a given period.
# Residual Analysis: Analyzing residuals helps identify any leftover patterns not captured by trend and seasonality.


# 5. Time Series Forecasting:
# Stationarity and Differencing: Making a time series stationary by removing trend and seasonality can improve forecast accuracy.
# ARIMA Models: AutoRegressive Integrated Moving Average models capture the relationship between current and past observations, 
# and their differences.
# SARIMA Models: Seasonal ARIMA models handle seasonality along with ARIMA components.
# Exponential Smoothing Methods (Holt-Winters): These models assign exponentially decreasing weights to past observations.


# 6. Evaluation Metrics:
# Mean Absolute Error (MAE): Average of absolute errors.
# Mean Squared Error (MSE): Average of squared errors.
# Root Mean Squared Error (RMSE): Square root of MSE.
# Mean Absolute Percentage Error (MAPE): Percentage of average absolute errors.

# 7. Feature Engineering:
# Lag Features: Using past observations as features for prediction.
# Rolling Window Statistics: Using rolling averages or other statistics over a window of time.
# Seasonal Features: Incorporating cyclical patterns as features.

# 8. Model Selection and Tuning:
# Cross-Validation: Splitting data into training and validation sets to evaluate models.
# Hyperparameter Tuning: Adjusting model parameters to optimize performance.
# Grid Search and Random Search: Methods to systematically find optimal hyperparameters.

# 9. Best Practices:
# Handling Overfitting: Regularization techniques to prevent models from fitting noise.
# Understanding Model Limitations: Time series models have assumptions and limitations; consider these in interpretation.
# Monitoring and Updating Forecasts: Continuous monitoring and model updates based on new data



In [4]:
# Now our Main Algorithms : 

# Linear Reg 
# Linear regression is a simple machine learning algorithm used for predicting a 
# continuous outcome based on one or more input features. It assumes a linear relationship between 
# the input variables and the target variable and calculates the best-fitting 
# line to minimize the difference between predicted and actual values.

# MAE (Mean Absolute Error):
# MAE measures the average absolute difference between predicted and actual values. 
# It's less sensitive to outliers compared to other metrics, making it suitable for cases where large errors shouldn't be heavily penalized.

# MSE (Mean Squared Error):
# MSE calculates the average of squared differences between predicted and actual values.
# It amplifies larger errors and is commonly used in regression tasks.

# RMSE (Root Mean Squared Error):
# RMSE is the square root of MSE. It has the same unit as the target variable, making it more interpretable.
# RMSE is a widely used evaluation metric, especially in regression problems.

#These are regularization techniques used to prevent overfitting in regression models.
# Ridge adds a penalty term to the model's coefficients, 
# Lasso encourages sparsity by forcing some coefficients to become exactly zero, and 
# Elastic Net combines both Lasso and Ridge penalties. 
# These techniques help improve model generalization and stability.

# ----
# Regularization techniques like Ridge, Lasso, and Elastic Net help 
# prevent overfitting by adding constraints to model parameters. They penalize 
# large parameter values, making the model less sensitive to noise in the data. 
# This prevents the model from fitting noise and improves its generalization to new data.

# For feature selection, Lasso and Elastic Net are particularly useful. 
# Lasso sets some coefficients to zero, effectively removing corresponding features from the model.
# This helps in identifying the most important features. 
# Ridge and Elastic Net also downscale less relevant features, reducing their impact on the model.

In [8]:
arr1 = [1,2,32,13]
arr2 = [2,3,22,32,35]

union = sorted(set(arr1 + arr2))
print(union)

intersection = [x for x in arr1 if x in arr2]
print(intersection)

[1, 2, 3, 13, 22, 32, 35]
[2, 32]


In [None]:
# Logistic Regression:
# Logistic Regression is a classification algorithm used to predict the probability of an
# instance belonging to a particular class. It models the relationship between input 
# features and the likelihood of a binary outcome using a logistic function.

In [None]:
# Naive Bayes:
# Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem. 
# It assumes that features are conditionally independent given the class label, 
# which simplifies calculations. It's particularly effective for text classification 
# and works well with high-dimensional data.

In [None]:
# KNN (K-Nearest Neighbors):
# KNN is a simple classification and regression algorithm. It assigns a label to a new 
# instance based on the majority class of its k nearest neighbors in the feature space. 
# It's non-parametric and easy to understand, making it suitable for small datasets.

In [None]:
# Overfitting/Underfitting/Bias/Variance:
# Overfitting occurs when a model learns the training data too well, including noise, leading to poor generalization.
# Underfitting happens when a model is too simple to capture the underlying patterns. 
# Bias refers to the error due to the model's assumptions.
# Variance is the model's sensitivity to variations in the training data.



In [None]:
# SVM (Support Vector Machine):
# SVM is a powerful classification algorithm that finds a hyperplane in a high-dimensional space 
# to best separate different classes.
# It aims to maximize the margin between classes. 
# In regression tasks, SVM becomes SVR (Support Vector Regression) by fitting a 
# hyperplane to minimize the error within a certain margin.

In [None]:
# SVM Kernel:
# SVM kernels transform data into higher-dimensional space, 
# allowing complex nonlinear boundaries to be drawn. For example, 
# the Gaussian (RBF) kernel calculates similarity based on distance between points in higher dimensions.

# Example: In text classification, an RBF kernel can project text data 
# into a higher-dimensional space to classify documents based on their semantic meaning.

In [None]:
# Decision Tree:
# Decision Trees recursively split data based on features to create a tree-like structure. 
# It's used for both classification and regression tasks. 
# It makes decisions at each node by maximizing information gain or minimizing impurity.
# Formula: Information Gain = Entropy(parent) - Weighted Avg. Entropy(children)
# Example: In a credit risk assessment, a decision tree may split customers based on credit score, 
# income, and other features to determine loan approval.

# Post-Pruning and Pre-Pruning:
# Post-pruning involves growing a full decision tree and then removing nodes that
# provide little predictive power. 
# Pre-pruning, on the other hand, involves setting constraints 
# while growing the tree to limit depth, splits, or minimum samples per leaf.
# Example of Post-Pruning: After building a deep tree for email classification, 
# prune nodes with insignificant feature splits.
# Example of Pre-Pruning: Set a limit on tree depth to prevent overfitting.


# Variance Reduction in Decision Trees:
# Decision Trees can lead to overfitting due to high variance. 
# Techniques like pruning, limiting depth, or setting minimum samples 
# per leaf reduce variance by simplifying the tree's complexity.
# Example: In predicting housing prices, reducing the maximum depth of the 
# tree prevents overfitting by avoiding complex splits for small subsets of data.


# Decision trees use various methods to find purity in splits and determine the best features for splitting:
# Gini Impurity: It measures the frequency at which a randomly selected element would be incorrectly classified.
# A lower Gini score indicates a purer split.

# Entropy: Entropy measures the impurity or randomness in a set. It is used to quantify the uncertainty of a split. 
# A lower entropy indicates a more certain split.

# Information Gain: It measures the reduction in entropy or impurity achieved by a particular split. 
# Features that lead to the most information gain are chosen as the best splitting features.

In [None]:
# Training Data, Test Data, Validation Data:

# Training Data: The dataset used to train a machine learning model.
# It's used to learn the patterns and relationships between input features and target labels.

# Test Data: A separate dataset used to evaluate the model's performance after training. 
# It helps to estimate how well the model will generalize to new, unseen data.

# Validation Data: An optional dataset used during training to tune hyperparameters and prevent overfitting.
# It's not used for model evaluation like test data.
#-----------------------------------------------------------------
# Cross Validation:
# A technique to assess the model's performance by splitting the dataset into multiple subsets, 
# training and evaluating the model on different combinations of these subsets.

# Types of Cross Validation:

# K-Fold Cross Validation: Data is divided into k subsets. The model is trained k times,
# each time using k-1 subsets for training and 1 subset for validation.

# Stratified K-Fold Cross Validation: Similar to k-fold, but ensures each 
# fold has a similar distribution of target labels.

# Leave-One-Out Cross Validation (LOOCV): Each observation is used as a validation set,
# and the rest are used for training.

# Time Series Cross Validation: Maintains temporal order, where earlier data is used for 
# training and later data for validation.

# Repeated Cross Validation: K-Fold process is repeated multiple times and results are averaged.
#-------------------------------------------------------
# Grid Search and Random Search:
# Both are hyperparameter tuning techniques.

# Grid Search: Exhaustively searches through a predefined hyperparameter grid to find 
# the best combination of hyperparameters.
# Random Search: Randomly samples from the hyperparameter space, allowing more efficient exploration.

# Other Hyperparameter Tuning Techniques:
# Bayesian Optimization: Models the function relating hyperparameters to performance and chooses next values to evaluate.
# Genetic Algorithms: Evolves a population of hyperparameter combinations over several generations.
# Optuna, Hyperopt: Libraries that automate hyperparameter optimization using various strategies.
# These techniques ensure models are well-tuned and evaluated to achieve optimal performance.