Before building any models or making predictions, we must ensure our dataset is ready. Here's the problem: our data comes with over 50 variables (columns), and having too many can impact our model's efficiency. we can reduce too-many varibales with Principal Component Analysis (PCA). However, before we reduce variables, we need to complete the Exploratory Data Analysis (EDA) process.
This readme serves as your guide, explaining how we navigate the data, solve the variable problem using PCA, all while preserving the essence of the original dataset.
PCA on Female-Headed Households' Census Data (2011)
The Indian Census, one of the most reputed globally, has been consistently conducted every ten years since 1872. The 2011 Census marks the fifteenth in this series and provides detailed insights into India's demographics.
Our focus is on the "Primary Census Abstract for Female-Headed Households Excluding Institutional Household". This data captures significant metrics such as area, households, total population, literacy rates, work classifications, and more, across 35 States/Union Territories, encompassing 640 districts.
Given the extensive nature of this dataset with a multitude of variables, the challenge is to conduct a comprehensive Exploratory Data Analysis (EDA). The goal is to identify the optimal Principal Components that capture the most variance in the data, making use of the Sklearn library, and try to reduce the variables without reducing the essence of insights from the data
Data Source: PCA India Data Census.xlsx
Data Preprocessing & Dimensionality Reduction: Census Data Analysis Welcome to my project repository. This project entails a comprehensive journey from raw data cleaning to advanced feature transformation using PCA.
Contents:
Project explanation Delve into the step-by-step breakdown of the preprocessing techniques and dimensionality reduction methods employed in this project. The presentation provides a visual guide to the entire data analysis journey.
Download the dataset Gain access to the primary dataset used in this project. It comprises 640 rows and 61 columns, capturing various demographic metrics from the 2011 Indian Census.