In [None]:
#Assignment: PCA Implementation

"""Objective:
The objective of this assignment is to implement PCA on a given dataset and analyse the results.

Instructions:
Download the wine dataset from the UCI Machine Learning Repository
(https://archive.ics.uci.edu/ml/datasets/Wine).
Load the dataset into a Pandas dataframe.
Split the dataset into features and target variables.
Perform data preprocessing (e.g., scaling, normalisation, missing value imputation) as necessary.
Implement PCA on the preprocessed dataset using the scikit-learn library.
Determine the optimal number of principal components to retain based on the explained variance ratio.
Visualise the results of PCA using a scatter plot.
Perform clustering on the PCA-transformed data using K-Means clustering algorithm.
Interpret the results of PCA and clustering analysis.


Deliverables:
Jupyter notebook containing the code for the PCA implementation.
A report summarising the results of PCA and clustering analysis.
Scatter plot showing the results of PCA.
A table showing the performance metrics for the clustering algorithm.

Additional Information:
You can use the python programming language.
You can use any other machine learning libraries or tools as necessary.
You can use any visualisation libraries or tools as necessary."""

Ans: 

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA

# Load the wine dataset
wine_df = pd.read_csv("wine.csv")

# Split the dataset into features and target variables
features = wine_df.iloc[:, :-1]
target = wine_df.iloc[:, -1]

# Perform data preprocessing
features = features.dropna()
features = (features - features.mean()) / features.std()

# Implement PCA
pca = PCA()
pca.fit(features)

# Determine the optimal number of principal components to retain
explained_variance_ratio = pca.explained_variance_ratio_
cumsum_explained_variance_ratio = np.cumsum(explained_variance_ratio)
optimal_number_of_components = np.argmax(cumsum_explained_variance_ratio >= 0.95) + 1

# Visualize the results of PCA
pca_components = pca.transform(features)
pca_df = pd.DataFrame(pca_components, columns=[f"PC{i}" for i in range(optimal_number_of_components)])
pca_df["target"] = target

# Perform clustering on the PCA-transformed data
kmeans = KMeans(n_clusters=3)
kmeans.fit(pca_df)

# Interpret the results of PCA and clustering analysis
print(kmeans.labels_)
