Skip to content

Explore the H1B Visa data from 2014 through a comprehensive multivariate analysis.

Notifications You must be signed in to change notification settings

ashita03/H1BVisa2014_Analysis

Repository files navigation

H1BVisa2014 Analysis

This project conducted a comprehensive multivariate analysis of H1B visa application data to uncover significant patterns and predictive insights.

About the Data

The data has columns that include details related to the applicant's application date, whether the H1B was approved, the date when it was approved, the location of the employer (state, zip code), and other specific details related to the job role such as title of the role, category it falls into. The dataset has been taken from Kaggle

Methodology

The approach employed for this analysis involves the following steps:

  • Exploratory Data Analysis - Created box plots, scatter plots, and histograms to visualize the distribution and relationships between key variables.
  • Principal Component Analysis -Reduced the dimensionality of the dataset while retaining significant variance. This was achieved by determining the optimal number of PCs.
  • Cluster Analysis - Grouped job titles into distinct clusters based on their characteristics and roles.
  • Factor Analysis - Grouped variables into underlying factors to simplify complex relationships.
  • Predictive Modeling -
    • Multiple Regression - Analyzed the influence of continuous variables on application status.
    • Logistic Regression - Modeled the probability of application approval based on categorical and continuous predictors.
    • Linear Discriminant Analysis (LDA) Provided the best predictive performance with an accuracy of 83.33%.

Future Scope

Assess the status of applications for each visa class, incorporating additional factors and larger datasets.

Repository

The cleaned data, Rmd (R), and HTML files are available to play with!