Skip to content

This project entails a comprehensive journey from raw data cleaning to advanced feature transformation using PCA.

Notifications You must be signed in to change notification settings

Rishi-Solanki07/Python_Feature-engineering_PCA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

problem statement

Before building any models or making predictions, we must ensure our dataset is ready. Here's the problem: our data comes with over 50 variables (columns), and having too many can impact our model's efficiency. we can reduce too-many varibales with Principal Component Analysis (PCA). However, before we reduce variables, we need to complete the Exploratory Data Analysis (EDA) process.

This readme serves as your guide, explaining how we navigate the data, solve the variable problem using PCA, all while preserving the essence of the original dataset.

Important Info

PCA on Female-Headed Households' Census Data (2011)

The Indian Census, one of the most reputed globally, has been consistently conducted every ten years since 1872. The 2011 Census marks the fifteenth in this series and provides detailed insights into India's demographics.

Our focus is on the "Primary Census Abstract for Female-Headed Households Excluding Institutional Household". This data captures significant metrics such as area, households, total population, literacy rates, work classifications, and more, across 35 States/Union Territories, encompassing 640 districts.

Our first priority is to reduce too-many variables(columns)

Given the extensive nature of this dataset with a multitude of variables, the challenge is to conduct a comprehensive Exploratory Data Analysis (EDA). The goal is to identify the optimal Principal Components that capture the most variance in the data, making use of the Sklearn library, and try to reduce the variables without reducing the essence of insights from the data

Data Source: PCA India Data Census.xlsx

Data Preprocessing & Dimensionality Reduction: Census Data Analysis Welcome to my project repository. This project entails a comprehensive journey from raw data cleaning to advanced feature transformation using PCA.

Contents:

Presentation on Data Preprocessing and PCA:

Project explanation Delve into the step-by-step breakdown of the preprocessing techniques and dimensionality reduction methods employed in this project. The presentation provides a visual guide to the entire data analysis journey.

Dataset (Excel File):

Download the dataset Gain access to the primary dataset used in this project. It comprises 640 rows and 61 columns, capturing various demographic metrics from the 2011 Indian Census.

About

This project entails a comprehensive journey from raw data cleaning to advanced feature transformation using PCA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published