## Dimensionality Reduction Techniques 

- Dimensionality reduction involves decreasing the number of features (or dimensions) in a dataset while preserving as much information as possible. This technique is used for various purposes, such as simplifying a model, enhancing the performance of a learning algorithm, or making the data easier to visualize.

## Importance of Dimensionality Reduction

1. **Improves computational efficiency:** Reduces the computational cost for data processing and model training.
2. **Mitigates the curse of dimensionality:** Simplifies data to prevent overfitting and sparsity issues.
3. **Reduces noise in data:** Eliminates irrelevant or noisy features to enhance model performance.
4. **Enhances data visualization:** Makes high-dimensional data easier to visualize in 2D or 3D.
5. **Boosts model performance:** Focuses on the most relevant features for better accuracy.
6. **Saves storage and memory:** Decreases the amount of storage and memory needed for large datasets.
7. **Increases model interpretability:** Simplifies models, making them easier to understand and explain.
8. **Avoids multicollinearity:** Addresses high correlation between features to improve regression models.

## Approaches to Dimensionality Reduction

- There are two ways to apply the dimension reduction technique, Feature Selection, Feature Extraction. 

- `Feature Selection`:

    - Feature selection is the process of choosing a subset of relevant features and discarding irrelevant ones from a dataset to build a more accurate model. Only want to keep optimal features for the input data.

- Three methods are used for the feature selection:

- **Filter Methods:** this method involves filtering the dataset to retain only the relevant features.

  Common techniques include: Correlation, Chi-Square Test, and ANOVA (these techniques are already covered in ADSP course).

- **Wrapper Methods:** this method evaluates subsets of features using a machine learning model. Features are added or removed based on their impact on model performance. It is more accurate but also more complex than filter methods.

  Common techniques include: Forward Selection and Backward Selection.

- **Embedded Methods:** these methods evaluate the importance of features during the training process of the machine learning model.

  Common techniques include: LASSO, Elastic Net, and Ridge Regression (these techniques are covered in detail in regression lesson).


`Feature Extraction`:

- Feature extraction is the process of transforming high-dimensional data into a lower-dimensional space. This approach is useful for retaining essential information while using fewer resources for processing. 

Some common feature extraction techniques are:

- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- Independent Component Analysis (ICA)


## Principal Component Analysis (PCA)

- Principal Component Analysis (PCA) is a statistical technique for dimensionality reduction in data analysis. It simplifies the complexity of high-dimensional data while preserving trends and patterns.

- Real-world problems typically deal with datasets that have a huge number of features. 

- Example: High-resolution images that need classification or power allocation exercises across multiple communication channels that have high dimensionality. Dealing with such datasets demands increased computational power and more complex algorithms.

- Principal Component Analysis (PCA) is an unsupervised learning technique used to preprocess datasets and reduce their dimensionality while preserving the original dataset. 

## Common Terms in PCA 

- Dimensionality: It is the number of features present in the data.

- Correlation: It indicates the strength of the relationship between features. The correlation value ranges between -1 and +1. It is -1 when variables are inversely proportional and +1 when they are directly proportional.

- Orthogonality: Dimensionality reduction techniques often utilize orthogonality to maintain the independence of features when reducing the number of dimensions in a dataset.

- Covariance Matrix: It is a matrix containing the covariance between variables.

- Variance is a measure of the variability or spread of a single variable, indicating how much the values differ from the mean.

- Eigenvector: Given a square matrix  𝐴  and a nonzero vector  𝑣 , and  𝑣  is the eigenvector if  𝐴𝑣  (the result of applying matrix  𝐴  to  𝑣 ) is a scalar multiple of  𝑣 , i.e.  𝐴𝑣  =  𝜆𝑣 where  𝜆  is the eigenvalue

- Eigenvalues: The scalar  𝜆  associated with the eigenvector  𝑣  in the transformation  𝐴𝑣  =  𝜆𝑣 , indicating how much the eigenvector is scaled during the transformation. It represents the variance captured by each principal component, indicating their importance in explaining the data's variability.

- Principal component: Principal components are new variables created as linear combinations of the original variables, arranged to be uncorrelated and to compress most of the information into the initial components. In this way, from 10-dimensional data, PCA aims to maximize the information in the first component, then the next most in the second, and so on.

## Steps involved in PCA 

- Standardization
- Covariance matrix computation
- Compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components
- Create a Feature vector
- Recasting data along the principal component axes

## Standardization 
- Standardization adjusts the range of variables so that each one contributes equally, ensuring uniformity in their influence. 
    - Standardization transforms data by rescaling it to have mean of 0 and a standard deviation of 1, ensuring consistent ranges and making it more suitable for comparison and analysis across different variables. 
    - This normalization process mitigates the dominance of variables with larger ranges over those with smaller values.

## Covariance Matrix Computation
- It helps to check the correlation between features in a dataset.

    ## types of Covariance:
    - Positive covariance indicates a direct correlation.
    - Negative covariance indicates an inverse correlation. 

- The covariance matrix provides a summary of the relationships (correlations) between variables in a tabular representation. 

## Identifying Principal Components

- Eigenvectors and eigenvalues, computed from the covariance matrix, determine the principal components of data. Each eigenvector, paired with a corresponding eigenvalue, represents an axis direction where the data variance is maximized-these are the principal components. The eigenvalues indicate the amount of variance each component carries. By ordering the eigenvectors from highest to lowest eigenvalues, you rank the principal components by their significance.

## Create a feature vector

 - Decide whether to retain all components or to discard the less significatn ones (those with lower eigenvalues). Then a matrix is formed from the remaining, more significant eigenvectors, knows as the feature vector. 

- the feature vector is simply a matrix that has as columns the eigenvectors of the components that we decide to keep. This makes it the first step towards dimensionality reduction, because if we choose to keep only p eigenvectors (components) out of n, the final data set will have only p dimensions.

## Recasting Data Along Principal Component Axes

- In this step aim to use the feature vector formed using the eigenvectors of the covariance matrix, to reorient the data from the original axes to the ones represented by the principal components. 

## Application of PCA

- PCA compresses information into a smaller set with new dimensions. 

    - In neuroscience, it identifies the action potential of neurons by their shape.
    - In quantitative finance, it redueces the complexity of stocks analysis. 
     



In [None]:
# Implements PCA

#This dataset consists of 2000 samples with 8 features: preg, plas, pres, skin, insu, mass, pedi, and age. 
# Each sample includes a target variable class, which indicates whether the sample tested positive or negative for a condition.

import pandas as pd

# Load the dataset
data = pd.read_csv('diabetes.csv')
data.head()

# Split the dataset into features and target
X = data.drop(columns=['class'])
y = data['class'].apply(lambda x: 1 if x == 'tested_positive' else 0)

# the dataset is split into training (60%) and testing (40%) sets. 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# standardize the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# apply the PCA

# Import PCA from sklearn.decomposition.
# Find the optimal number of principal components
# Instantiate a PCA object with the optimal number of components.
# Fit PCA to the scaled data.
# Transform the scaled data using PCA.

# Apply PCA without reducing dimensionality to find the optimal number of components
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import numpy as np
pca = PCA()
pca.fit(X_train_scaled)

# Explained variance ratios
explained_variance = pca.explained_variance_ratio_


plt.figure(figsize=(8, 4))
plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.5, align='center',
        label='Individual explained variance')
plt.step(range(1, len(explained_variance) + 1), np.cumsum(explained_variance), where='mid',
         label='Cumulative explained variance')
plt.ylabel('Explained variance ratio')
plt.xlabel('Principal components')
plt.title('Explained Variance by Different Principal Components')
plt.legend(loc='best')
plt.tight_layout()
plt.show()


#Observations
# The plot shows that each of the 8 principal components explains a similar amount of variance individually, 
# while the cumulative explained variance steadily increases, approaching 100% by the 8th component.

# Eigenvalues (which are proportional to the explained variance)
eigenvalues = pca.explained_variance_

# Scree plot
plt.figure(figsize=(8, 4))
plt.plot(range(1, len(eigenvalues) + 1), eigenvalues, marker='o', linestyle='-', label='Eigenvalues')
plt.xlabel('Principal components')
plt.ylabel('Eigenvalues')
plt.title('Scree Plot')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

# Observation 
# The scree plot shows the eigenvalues, representing the explained variance ratio, for each of the eight principal components. 
# The eigenvalues decrease sharply from the 1st to the 2nd principal component, continue to decline at a slower rate up to the 3rd component, and then begin to level off from the 4th component onwards. 
# This pattern indicates that the first few components capture most of the variance in the data, 
# with diminishing returns for each additional component beyond the 3rd or 4th. Thus, the plot suggests that the optimal number of principal components is around 3 or 4.

import pandas as pd
pca = PCA(n_components=3)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

#create dataframe
df_train_pca = pd.DataFrame(data=X_train_pca, columns=['Principal Component 1', 'Principal Component 2', 'Principal Component 3'])
df_train_pca['Target'] = y_train

df_test_pca = pd.DataFrame(data=X_test_pca, columns=['Principal Component 1', 'Principal Component 2', 'Principal Component 3'])
df_test_pca['Target'] = y_test

#visualize the results for training and testing set. 
# Plotting the 3D scatter plot
fig = plt.figure(figsize=(12, 8))

# Training set
ax1 = fig.add_subplot(121, projection='3d')
colors = ['r', 'b']
for target, color in zip([0, 1], colors):
    indices = df_train_pca['Target'] == target
    ax1.scatter(df_train_pca.loc[indices, 'Principal Component 1'],
                df_train_pca.loc[indices, 'Principal Component 2'],
                df_train_pca.loc[indices, 'Principal Component 3'],
                c=color, s=50)
ax1.set_xlabel('Principal Component 1')
ax1.set_ylabel('Principal Component 2')
ax1.set_zlabel('Principal Component 3')
ax1.legend(['No Diabetes', 'Diabetes'])
ax1.set_title('PCA of Diabetes Dataset (Training set)')
ax1.grid()

# Testing set
ax2 = fig.add_subplot(122, projection='3d')
for target, color in zip([0, 1], colors):
    indices = df_test_pca['Target'] == target
    ax2.scatter(df_test_pca.loc[indices, 'Principal Component 1'],
                df_test_pca.loc[indices, 'Principal Component 2'],
                df_test_pca.loc[indices, 'Principal Component 3'],
                c=color, s=50)
ax2.set_xlabel('Principal Component 1')
ax2.set_ylabel('Principal Component 2')
ax2.set_zlabel('Principal Component 3')
ax2.legend(['No Diabetes', 'Diabetes'])
ax2.set_title('PCA of Diabetes Dataset (Testing set)')
ax2.grid()

plt.show()

## Linear Discriminant Analysis (LDA)

- Linear discriminant analysis (LDA) is a technique used for dimensionality reduction and classification. It aims to project the data onto a lower-dimensional space in such a way that the separation between different classes is maximized. LDA focuses on finding a linear combination of features that best separate two or more classes of objects or events.

- LDA assumes the data follows a Gaussian distribution.
- It assumes that the covariance matrices of different classes are equal.
- It assumes the data is linearly separable, allowing for an accurate linear decision boundary to classify different classes.
- It can reduce the dimensionality of the data to a maximum of  𝑘−1  components, where  𝑘  is the number of classes in the target variable.

In [None]:
# Implement LDA

# Instantiate a LDA object with one components.
# Fit LDA to the scaled data.
# Transform the scaled data using LDA.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components=1)
X_train_lda = lda.fit_transform(X_train_scaled, y_train)
X_test_lda = lda.transform(X_test_scaled)

#Create DataFrames
df_train_lda = pd.DataFrame(data=X_train_lda, columns=['LDA Component 1'])
df_train_lda['Target'] = y_train

df_test_lda = pd.DataFrame(data=X_test_lda, columns=['LDA Component 1'])
df_test_lda['Target'] = y_test

# Visualize the results for traning and testing set

import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 5))

# Training set
plt.subplot(1, 2, 1)
colors = ['r', 'b']
for target, color in zip([0, 1], colors):
    indices = df_train_lda['Target'] == target
    plt.scatter(df_train_lda.loc[indices, 'LDA Component 1'], np.zeros_like(df_train_lda.loc[indices, 'LDA Component 1']),
                c=color, s=50)
plt.xlabel('LDA Component 1')
plt.ylabel('Constant zero line')
plt.legend(['No Diabetes', 'Diabetes'])
plt.title('LDA of Diabetes Dataset (Training set)')
plt.grid()

# Testing set
plt.subplot(1, 2, 2)
for target, color in zip([0, 1], colors):
    indices = df_test_lda['Target'] == target
    plt.scatter(df_test_lda.loc[indices, 'LDA Component 1'], np.zeros_like(df_test_lda.loc[indices, 'LDA Component 1']),
                c=color, s=50)
plt.xlabel('LDA Component 1')
plt.ylabel('Constant zero line')
plt.legend(['No Diabetes', 'Diabetes'])
plt.title('LDA of Diabetes Dataset (Testing set)')
plt.grid()

plt.show()

# Observation
# In both plots, the LDA component is plotted along the x-axis. 
# The points are colored according to their target labels: red for non diabetes and blue for diabetes. 
# The zero y-values are used to clearly separate the data points for visualization purposes.

## t-Distributed Stochastic Neighbor Embedding (t-SNE)
 - t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction algorithm that uses a randomized approach to non-linearly reduce the dimensionality of a dataset. It focuses on retaining the local structure of the data in the lower-dimensional space.

 - This algorithm helps explore high-dimensional data by mapping it into lower dimensions while preserving local relationships. As a result, we can visualize and understand the structure of the dataset by plotting it in 2D or 3D.

- The MNIST dataset is loaded. It contains 60,000 training images and 10,000 test images of handwritten digits (0-9).
Here we are taking y as label to plot the visualization, it is not used to train the model.

In [None]:
df = pd.read_csv('mnist.csv')

# Pixel values
X = df.iloc[:, 1:].values  

#Labels
y = df.iloc[:, 0].values  

df.head()

# Applying t-SNE
# TSNE from sklearn.manifold is used to reduce the data to 2 dimensions for visualization.
# Only a subset of 1000 samples is used for performance reasons (t-SNE can be computationally intensive).

import seaborn as sns
from sklearn.manifold import TSNE

model = TSNE(n_components=2, random_state=42, n_iter=1000)

tsne_data = model.fit_transform(X[:1000])

# Creating a new DataFrame to help us in plotting the result data
tsne_data = np.vstack((tsne_data.T, y[:1000])).T
tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "label"))

# Plotting the result of t-SNE
sns.FacetGrid(tsne_df, hue="label", height=6).map(plt.scatter, "Dim_1", "Dim_2").add_legend()
plt.show()


## Association Rule Learning

- Association Rule Learning is a popular unsupervised learning technique used to uncover relationships, patterns, or associations among a set of items in large datasets. This technique is commonly used in market basket analysis, where the goal is to identify sets of products that frequently co-occur in transactions.

- The two key concepts in association rule learning are frequent itemsets and association rules.

- Frequent Itemsets:

    - These are groups of items that appear frequently together in transactions.
    - The frequency is measured by the support count, which is the number of transactions containing the itemset.

- Association Rules:

    - These are implications of the form  𝐴,𝐵→𝐶 , meaning that if items A and B are bought, then item C is likely to be bought.

    - Support: The proportion of transactions that contain the itemset or how frequently an item appears in the dataset.

    - Confidence: The probability that a transaction containing the antecedent also contains the consequent or how often the rule has been found to be true.

    - Lift: The ratio of the observed support to that expected if the items were independent. A lift greater than 1 indicates a positive association.

## Practical Application: 

1. **Market Basket Analysis:** Identifying products that are frequently bought together to optimize product placement and promotions.
2. **Web Usage Mining:** Analyzing user navigation patterns to improve website design and content recommendation.
3. **Bioinformatics:** Discovering relationships between genes and proteins.
4. **Fraud Detection:** Identifying patterns in fraudulent transactions.

## Popular Algorithms:

**Apriori Algorithm:**
- It uses breadth-first search and Hash Tree to calculate the itemset efficiently.
- Generates frequent itemsets by iteratively expanding smaller itemsets.
- Uses the _Apriori Property_ which states that all non-empty subsets of a frequent itemset must also be frequent.

**Eclat Algorithm:**

- Uses a depth-first search strategy to find frequent itemsets.
- It is more efficient for dense datasets.

## Apriori Algorithm 

- The Apriori algorithm is a classic algorithm used for mining frequent itemsets and learning association rules over transactional databases. It is an unsupervised learning technique, typically used in market basket analysis to find interesting relationships between items in large datasets.

- The algorithm operates by identifying the frequent individual items in the database and extending them to larger itemsets as long as those itemsets appear sufficiently often in the database.




In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Load the dataset
data = pd.read_csv('Market_Basket_Optimisation.csv', header=None)

# Convert the DataFrame to a list of lists
transactions = []
for i in range(data.shape[0]):
    transactions.append([str(data.values[i, j]) for j in range(data.shape[1]) if str(data.values[i, j]) != 'nan'])

# Transaction Encoding: 
# use the TransactionEncoder from the mlxtend.preprocessing module to convert the list of lists into a one-hot encoded DataFrame. 
# In this format, each column represents an item, and each row represents a transaction, with binary values indicating whether an item was purchased in that transaction.

# Initialize the TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Display the first few rows of the one-hot encoded DataFrame
print(df.head())

# applying the apriori algorithm: 

# The apriori function from the mlxtend.frequent_patterns module was used to find frequent itemsets. We specified a minimum support threshold of 0.01 (1%), meaning that an itemset must appear in at least 1% of transactions to be considered frequent.
# The result is a DataFrame where each row represents a frequent itemset, and the columns provide the support (proportion of transactions containing the itemset) and the itemsets themselves.

from mlxtend.frequent_patterns import apriori, association_rules

# Apply the Apriori algorithm with a minimum support of 0.01 (1%)
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

frequent_itemsets.head(5)

#Rules
# The association_rules function was used to generate association rules from the frequent itemsets. We specified a minimum confidence threshold of 0.2 (20%), 
# meaning that the rules must have a confidence of at least 20% to be considered.
# The result is a DataFrame where each row represents an association rule, and the columns provide various metrics related to the rule.

# Generate the association rules with a minimum confidence of 0.2 (20%)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)

# Display the results
print("Frequent Itemsets:")
print(frequent_itemsets.head())
print("\nAssociation Rules:")
print(rules.head())



## Eclat Algorithm

- The ECLAT algorithm, which stands for Equivalence Class Clustering and bottom-up Lattice Traversal, is a widely-used method for Association Rule mining. It is considered more efficient and scalable than the Apriori algorithm.
- While Apriori operates in a horizontal fashion similar to Breadth-First Search in a graph, ECLAT functions vertically, akin to Depth-First Search. This vertical approach makes ECLAT faster than Apriori.

In [None]:
import pandas as pd
from pyECLAT import ECLAT
from sklearn.model_selection import train_test_split

# Load the dataset
data = pd.read_csv('Market_Basket_Optimisation.csv', header=None)

# Convert the DataFrame to a list of lists
transactions = []
for i in range(data.shape[0]):
    transactions.append([str(data.values[i, j]) for j in range(data.shape[1]) if str(data.values[i, j]) != 'nan'])
    
    
# Split the transactions into training and testing sets
train_transactions, test_transactions = train_test_split(transactions, test_size=0.2, random_state=42)

# Create DataFrames from the list of lists for training and testing sets
train_df = pd.DataFrame(train_transactions)
test_df = pd.DataFrame(test_transactions)



# Perform ECLAT algorithm using pyECLAT on the training set
eclat_instance = ECLAT(data=train_df, verbose=True)

# Get the frequent itemsets with a minimum support of 0.01 (1%) on the training set
support_dict, frequent_itemsets = eclat_instance.fit(min_support=0.01, min_combination=1, max_combination=2)

# Convert frequent itemsets to a DataFrame for better readability
total_transactions = len(transactions)

frequent_itemsets_df = pd.DataFrame({
    'Itemset': list(frequent_itemsets.keys()),
    'Support': [len(support_dict[item]) / total_transactions for item in frequent_itemsets.keys()]
})

# Sort the DataFrame by 'Support' in descending order
frequent_itemsets_df_sorted = frequent_itemsets_df.sort_values(by='Support', ascending=False)

frequent_itemsets_df_sorted.head(100)


**Observation:**

**Mineral Water (0.193574):** The highest support value in this snippet, indicating that mineral water appears in about 19.36% of all transactions. This suggests it's a very popular item among customers.

**Eggs (0.142781):** Also showing high popularity, eggs are included in approximately 14.28% of transactions.

**Spaghetti (0.139315), French Fries (0.135715), and Chocolate (0.130516):** These items are also commonly purchased, each appearing in about 13-14% of transactions, reflecting their strong customer demand.

**Cooking Oil & Mineral Water (0.016798), Meatballs (0.016798), Almonds (0.016798):** These itemsets show a much lower support, appearing in about 1.68% of transactions. The combination of cooking oil and mineral water might indicate a specific usage pattern or a niche but relevant market segment.



## Anomaly Detection Techniques 

- Anomaly detection is a technique used to identify rare items, events, or outliers that differ significantly from the majority of the data. In unsupervised learning, anomaly detection is particularly challenging because there are no labeled examples of anomalies to guide the learning process.

## Isolation forest

- Isolation Forest is an unsupervised learning algorithm for anomaly detection that works by isolating observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The key idea is that anomalies are few and different, so they are more susceptible to isolation.

- **Unique Approach:** Isolation Forest does not rely on proximity measures like traditional methods.
- **Random Feature Selection:** It randomly selects features and splits them at random values.
- **Isolation Process:** This process creates partitions or "trees" to isolate individual data points.
- **Anomaly Detection:** Anomalies, being fewer and further from the norm, typically require fewer splits to isolate.
- **Efficiency:** This makes anomalies easier and faster to detect compared to normal observations.


In [None]:
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Load the dataset
file_path = 'credit_card_fraud.csv'  # Update the path to your file
df = pd.read_csv(file_path)

# Display the first few rows of the dataset
print(df.head())

# Features for training
features = ['V1', 'V2', 'V3', 'V4', 'V5', 'Amount']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[features], df['Class'], test_size=0.2, random_state=42)

print("Training features:", X_train.columns)
print("Testing features:", X_test.columns)

# Fit the Isolation Forest model
iso_forest = IsolationForest(contamination=0.01, random_state=42)
iso_forest.fit(X_train)


# Predict anomalies (-1 for anomalies, 1 for normal points) on the training set
train_anomaly_predictions = iso_forest.predict(X_train)
train_anomaly_scores = iso_forest.decision_function(X_train)

# Predict anomalies on the test set
test_anomaly_predictions = iso_forest.predict(X_test)
test_anomaly_scores = iso_forest.decision_function(X_test)

# Add predictions and scores to the test set
X_test['Anomaly'] = test_anomaly_predictions
X_test['Anomaly Score'] = test_anomaly_scores

# Evaluate the results on the test set
print(X_test['Anomaly'].value_counts())

# Plot the results for the test set
plt.figure(figsize=(6, 4))
plt.scatter(X_test['Amount'], X_test['Anomaly Score'], c=y_test, cmap='coolwarm')
plt.xlabel('Amount')
plt.ylabel('Anomaly Score')
plt.title('Isolation Forest Anomaly Detection (Credit Card Fraud) on Test Data')
plt.show()

## Conclusion


In conclusion, unsupervised learning techniques are powerful tools for exploring and extracting meaningful insights from unlabeled data. By allowing algorithms to autonomously identify patterns and structures within datasets, you can uncover hidden relationships, detect anomalies, and gain a deeper understanding of the underlying data distribution. From clustering and dimensionality reduction to association rule learning, the applications of unsupervised learning are vast and diverse, spanning fields such as data analysis, pattern recognition, and anomaly detection.