### S14 Group 6 Members <br>
Chan, Kendrick Martin <br>
Dy, Fatima Kriselle <br>
Vitan, Layne Ashley <br>
Wu, Waynes Weyner <br>

# Introduction

This notebook explores the Philippine Statistics Authority's (PSA) Labor Force Survey 2016, a comprehensive dataset that offers insights into the Philippine labor market. The project involves data preprocessing, exploratory data analysis (EDA), and model development to predict key labor outcomes. Three distinct machine learning models are trained and evaluated, followed by error analysis and hyperparameter tuning to improve performance. Finally, the project concludes with a summary of findings and insights drawn from the analysis.   

# Dataset Description

The Labor Force Survey (LFS) is a nationwide quarterly survey conducted by the Philippine Statistics Authority (PSA) to gather labor market statistics. It provides insights into the levels and trends of employment, unemployment, and underemployment in the Philippines. The data serves as a quantitative framework for developing labor-related policies and programs.

For the April 2016 LFS, data collection was carried out between April 8, 2016, and April 30, 2016. The survey followed a structured methodology to ensure data accuracy and reliability. The sampling design was based on the 2013 Master Sample Design, ensuring comprehensive representation at both the national and regional levels. Data collection was conducted through face-to-face interviews using standardized questionnaires. The process was closely supervised by Regional Directors (RDs) and Provincial Statistics Officers (PSOs) to ensure consistency and adherence to survey protocols. The survey targeted household members aged 15 and older, focusing on individuals within the labor force.
This rigorous data collection process enhances the reliability of the survey's findings, making it a valuable resource for policymakers, economists, and researchers examining the Philippine labor market.

### Dataset Structure
The dataset consists of **180,862** instances and **50** features with rows representing an individual's responses to each question and columns pertaining to the data for a specific attribute across multiple respondents. The features and their respective descriptions are as follows.

1.   `PUFREG` - Region the participant belongs to
2.   `PUFPRV` - Province code of where the participant resides in
3.   `PUFPRRCD` - Recoded province code
4.   `PUFHHNUM` - Unique sequential number assigned to each household
5.   `PUFURB2K10` - Urban or rural classification based on the 2010 FIES survey
6.   `PUFPWGTFIN` - Final weight based on provincial projections
7.   `PUFSVYMO` - Month when the survey was conducted
8.   `PUFSVYYR` - Year when the survey was conducted
9.   `PUFPSU` - Primary Sampling Unit (PSU) number
10.   `PUFRPL` - Replication number used for variance estimation
11.   `PUFHHSIZE` - Number of household members
12.   `PUFC01_LNO` - Line number used to identify each member of the household in the survey
13.   `PUFC03_REL` - Relationship of the participant to the head of the household
14.   `PUFC04_SEX` - Gender of the participant
15.   `PUFC05_AGE` - Age of the participant
16.   `PUFC06_MSTAT` - Marital status of the participant
17.   `PUFC07_GRADE` - Educational attainment of the participant
18.   `PUFC08_CURSCH` - Indicates if the participant is currently attending school
19.   `PUFC09_GRADTECH` - Indicates whether the participant is a graduate of a technical/vocational course
20.   `PUFC10_CONWR` - Overseas Filipino Worker (OFW) Classification
21. `PUFC11_WORK` - Indicates if the participant worked at least one hour in the past week
22. `PUFC12_JOB` - Indicates if the participant had a job or business during the past week despite not working
23. `PUFC14_PROCC` - Primary occupation of the participant during the past week
24. `PUFC16_PKB` - Type of business or industry the participant is involved in
25. `PUFC17_NATEM` - Nature of employment of the participant (permanent, short-term).
26. `PUFC18_PNWHRS` - Normal working hours per day in the participant’s primary job
27. `PUFC19_PHOURS` - Total hours worked by the participant during the past week
28. `PUFC20_PWMORE` - Indicates if the participant wanted to work more hours in the past week
29. `PUFC21_PLADDW` - Indicates whether the participant looked for additional work during the past week
30. `PUFC22_PFWRK` - Indicates whether this was the participant’s first time doing any work
31. `PUFC23_PCLASS` - Class of worker for the participant’s primary occupation
32. `PUFC24_PBASIS` - Payment basis for the participant’s primary occupation
33. `PUFC25_PBASIC` - Basic pay per day for the participant’s primary occupation
34. `PUFC26_OJOB` - Indicates if the participant had another job or business during the past week
35. `PUFC27_NJOBS` - Number of jobs the participant had during the past week
36. `PUFC28_THOURS` - Total hours worked by the participant across all jobs during the past week
37. `PUFC29_WWM48H` - Participant's main reason for not working more than 48 hours in the past week
38. `PUFC30_LOOKW` - Indicates if the participant looked for work or tried to start a business in the past week
39. `PUFC31_FLWRK` - Indicates if it was the participant’s first time looking for work or starting a business
40. `PUFC32_JOBSM` - Method the participant used to search for work
41. `PUFC33_WEEKS` - Number of weeks the participant spent looking for work
42. `PUFC34_WYNOT` - Reason the participant did not look for work
43. `PUFC35_LTLOOKW` - Period of the participant’s last job search attempt
44. `PUFC36_AVAIL` - Indicates if the participant would have been available for work had an opportunity existed
45. `PUFC37_WILLING` - Indicates if the participant was willing to take up work in the past week or within two weeks
46. `PUFC38_PREVJOB` - Indicates if the participant had worked at any time before
47. `PUFC40_POCC` - Last occupation held by the participant
48. `PUFC41_WQTR` - Indicates if the participant worked at all or had a job/business during the past quarter
49. `PUFC43_QKB` - Type of business the participant engaged in during the past quarter
50. `PUFNEWEMPSTAT` - Employment status based on new employment criteria (employed, unemployed, not in labor force)

# List of Requirements

List all the Python libraries and modules that you used.

# Data Preprocessing and Cleaning

Perform necessary steps before using the data. In this section of the notebook, please take note of the following:

*   If needed, perform preprocessing techniques to transform the data to the appropriate representation. This may include binning, log transformations, conversion to one-hot encoding, normalization, standardization, interpolation, truncation, and feature engineering, among others. There should be a correct and proper justification for the use of each preprocessing technique used in the project.
*   Make sure that the data is clean, especially features that are used in the project. This may include checking for misrepresentations, checking the data type, dealing with missing data, dealing with duplicate data, and dealing with outliers, among others. There should be a correct and proper justification for the application (or non-application) of each data cleaning method used in the project. Clean only the variables utilized in the study.