problem statement

Before building any models or making predictions, we must ensure our dataset is ready. Here's the problem: our data comes with over 50 variables (columns), and having too many can impact our model's efficiency. we can reduce too-many varibales with Principal Component Analysis (PCA). However, before we reduce variables, we need to complete the Exploratory Data Analysis (EDA) process.

This readme serves as your guide, explaining how we navigate the data, solve the variable problem using PCA, all while preserving the essence of the original dataset.

Important Info

PCA on Female-Headed Households' Census Data (2011)

The Indian Census, one of the most reputed globally, has been consistently conducted every ten years since 1872. The 2011 Census marks the fifteenth in this series and provides detailed insights into India's demographics.

Our focus is on the "Primary Census Abstract for Female-Headed Households Excluding Institutional Household". This data captures significant metrics such as area, households, total population, literacy rates, work classifications, and more, across 35 States/Union Territories, encompassing 640 districts.

Our first priority is to reduce too-many variables(columns)

Given the extensive nature of this dataset with a multitude of variables, the challenge is to conduct a comprehensive Exploratory Data Analysis (EDA). The goal is to identify the optimal Principal Components that capture the most variance in the data, making use of the Sklearn library, and try to reduce the variables without reducing the essence of insights from the data

Data Source: PCA India Data Census.xlsx

Data Preprocessing & Dimensionality Reduction: Census Data Analysis Welcome to my project repository. This project entails a comprehensive journey from raw data cleaning to advanced feature transformation using PCA.

Contents:

Presentation on Data Preprocessing and PCA:

Project explanation Delve into the step-by-step breakdown of the preprocessing techniques and dimensionality reduction methods employed in this project. The presentation provides a visual guide to the entire data analysis journey.

Dataset (Excel File):

Download the dataset Gain access to the primary dataset used in this project. It comprises 640 rows and 61 columns, capturing various demographic metrics from the 2011 Indian Census.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Codes_PCA_Feature-Engineering.ipynb		Codes_PCA_Feature-Engineering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

problem statement

Important Info

Our first priority is to reduce too-many variables(columns)

Presentation on Data Preprocessing and PCA:

Dataset (Excel File):

About

Uh oh!

Releases

Packages

Languages

Rishi-Solanki07/Python_Feature-engineering_PCA

Folders and files

Latest commit

History

Repository files navigation

problem statement

Important Info

Our first priority is to reduce too-many variables(columns)

Presentation on Data Preprocessing and PCA:

Dataset (Excel File):

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages