<a href="https://colab.research.google.com/github/SSInimgba/Notebooks/blob/master/Samantha_Sam_Inimgba_PREPR_Challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Task 1:
Visualize labour / employment data for Canada by province and industry for the last 5 years.
The visualization should be pivotable by demographic:
- Age
- Sex
- Ethnicity
- Industry / Sector

## Task 2:
Use open skills data and job titles and visualize a mashup correlating skills and jobs to industry and employment and provide a sample for evaluation.

## Task 3:
Suggest in detail, how you would train these two data sets over time and what machine learning algorithm you would use to make the data "smarter" over time.

## **Task 1**

In [0]:
#Exploratory analysis on dataset
#import necessary libraries for this

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt



In [38]:
#read in labour dataset
df = pd.read_excel('/labour data 2014 - 2019.xlsx')
df.head()

Unnamed: 0,REF_DATE,GEO,DGUID,Labour force characteristics,North American Industry Classification System (NAICS),Sex,Age group,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,2014,Newfoundland and Labrador,2016A000210,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,Persons,249,thousands,3,v19669884,2.2.3.2.2,,x,,,1
1,2015,Newfoundland and Labrador,2016A000210,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,Persons,249,thousands,3,v19669884,2.2.3.2.2,,x,,,1
2,2016,Newfoundland and Labrador,2016A000210,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,Persons,249,thousands,3,v19669884,2.2.3.2.2,,x,,,1
3,2017,Newfoundland and Labrador,2016A000210,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,Persons,249,thousands,3,v19669884,2.2.3.2.2,,x,,,1
4,2018,Newfoundland and Labrador,2016A000210,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,Persons,249,thousands,3,v19669884,2.2.3.2.2,,x,,,1


Brief glance of the dataset shows that there needs to be some necessary data cleaning to be done. We also need to figure out what columns to drop

We can see what's contained in the columns and if the data is balanced(i.e 10 provinces are represented here)

In [41]:
# Next we check for null columns and drop them alongside columns we wont be using
print(f'There are {len(df)} records in the dataset')
df.isnull().sum()

There are 16200 records in the dataset


REF_DATE                                                     0
GEO                                                          0
DGUID                                                        0
Labour force characteristics                                 0
North American Industry Classification System (NAICS)        0
Sex                                                          0
Age group                                                    0
UOM                                                          0
UOM_ID                                                       0
SCALAR_FACTOR                                                0
SCALAR_ID                                                    0
VECTOR                                                       0
COORDINATE                                                   0
VALUE                                                     8023
STATUS                                                    8177
SYMBOL                                                 

In [42]:
#so we will drop Terminated and Symbol as they are empty columns. We will ignore the null in Value and Status for now
df.drop(columns= ['TERMINATED','SYMBOL','UOM', 'UOM_ID', 'SCALAR_FACTOR', 'SCALAR_ID', 'VECTOR',
       'COORDINATE',  'DGUID', 'DECIMALS','STATUS'], inplace = True)
df.columns

Index(['REF_DATE', 'GEO', 'Labour force characteristics',
       'North American Industry Classification System (NAICS)', 'Sex',
       'Age group', 'VALUE'],
      dtype='object')

In [43]:
#Let's rename the columns for ease
col_list = list(df.columns.values)
col_list

['REF_DATE',
 'GEO',
 'Labour force characteristics',
 'North American Industry Classification System (NAICS)',
 'Sex',
 'Age group',
 'VALUE']

In [0]:
new_names = ['Date','Province', 'LFC','Industry',
             'Sex','AgeGroup','Value']

In [45]:
col_dict = dict(zip(col_list, new_names)) 
col_dict

{'Age group': 'AgeGroup',
 'GEO': 'Province',
 'Labour force characteristics': 'LFC',
 'North American Industry Classification System (NAICS)': 'Industry',
 'REF_DATE': 'Date',
 'Sex': 'Sex',
 'VALUE': 'Value'}

In [46]:
df.rename(columns=col_dict, inplace= True)
df

Unnamed: 0,Date,Province,LFC,Industry,Sex,AgeGroup,Value
0,2014,Newfoundland and Labrador,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,
1,2015,Newfoundland and Labrador,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,
2,2016,Newfoundland and Labrador,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,
3,2017,Newfoundland and Labrador,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,
4,2018,Newfoundland and Labrador,Employment,"Agriculture [111-112, 1100, 1151-1152]",Males,15 to 24 years,
...,...,...,...,...,...,...,...
16195,2015,British Columbia,Unemployment,Unclassified industries,Females,55 years and over,4.0
16196,2016,British Columbia,Unemployment,Unclassified industries,Females,55 years and over,5.6
16197,2017,British Columbia,Unemployment,Unclassified industries,Females,55 years and over,5.5
16198,2018,British Columbia,Unemployment,Unclassified industries,Females,55 years and over,3.5


In [40]:
!pip install pivottablejs



In [47]:
from pivottablejs import pivot_ui
from IPython.display import HTML

df_vis = df[new_names]
pivot_ui(df_vis, outfile_path='pivottablejs.html')
HTML('pivottablejs.html')



## **Task 2**

In [56]:
pd.set_option('display.max_colwidth', None)
df2 = pd.read_csv('/job_skills.csv')
df2.head()

Unnamed: 0,Company,Title,Category,Location,Responsibilities,Minimum Qualifications,Preferred Qualifications
0,Google,Google Cloud Program Manager,Program Management,Singapore,"Shape, shepherd, ship, and show technical programs designed to support the work of Cloud Customer Engineers and Solutions Architects.\nMeasure and report on key metrics tied to those programs to identify any need to change course, cancel, or scale the programs from a regional to global platform.\nCommunicate status and identify any obstacles and paths for resolution to stakeholders, including those in senior roles, in a transparent, regular, professional and timely manner.\nEstablish expectations and rationale on deliverables for stakeholders and program contributors.\nProvide program performance feedback to teams in Product, Engineering, Sales, and Marketing (among others) to enable efficient cross-team operations.","BA/BS degree or equivalent practical experience.\n3 years of experience in program and/or project management in cloud computing, enterprise software and/or marketing technologies.","Experience in the business technology market as a program manager in SaaS, cloud computing, and/or emerging technologies.\nSignificant cross-functional experience across engineering, sales, and marketing teams in cloud computing or related technical fields.\nProven successful program outcomes from idea to launch in multiple contexts throughout your career.\nAbility to manage the expectations, demands and priorities of multiple internal stakeholders based on overarching vision and success for global team health.\nAbility to work under pressure and possess flexibility with changing needs and direction in a rapidly-growing organization.\nStrong organization and communication skills."
1,Google,"Supplier Development Engineer (SDE), Cable/Connector",Manufacturing & Supply Chain,"Shanghai, China","Drive cross-functional activities in the supply chain for overall Technical Operational readiness in all NPI phases leading into mass production.\nCollaborate with suppliers and Engineering teams in assessing process technologies based on project requirements, and propose and develop the manufacturing blueprint including process flow, equipment/fixture designs, implementation schedules and validation plans for engineering builds and mass production.\nDrive project technical and operational issues with material, process, fixtures, equipment, etc. during the NPI phase to enable delivery of a mature product and manufacturing process into mass production. Support/drive continuous improvement efforts in the supply chain.\nLead suppliers by providing technical direction to establish and validate (utilizing statistical tools) process capability during the NPI phase for consistently delivering a quality product in mass production.\nUtilize DOE’s, FMEA and other Industry standard tools to proactively identify and address risks and optimize process parameters.","BS degree in an Engineering discipline or equivalent practical experience.\n7 years of experience in Cable/Connector Design or Manufacturing in an NPI role.\nExperience working with Interconnect Engineering and Product Design (PD)/Mechanical Engineer in developing, manufacturing and testing.\nAbility to speak and write in English and Mandarin fluently and idiomatically.","BSEE, BSME or BSIE degree.\nExperience of using Statistics tools for Data analysis, e.g. distribution histogram/pareto chart, process control chart, Design of Experiment (DOE), Correlation Analysis, etc.\nDemonstrated knowledge in PCBA manufacturing process and quality control.\nFamilar with cable/connector related components' manufacturing: moldling, stamping, die-casting, LIM, MIM process and materials.\nSelf starter with innovation, integrity and attention to detail.\nAbility to travel up to 50% of the time"
2,Google,"Data Analyst, Product and Tools Operations, Google Technical Services",Technical Solutions,"New York, NY, United States","Collect and analyze data to draw insight and identify strategic solutions.\nBuild consensus by facilitating broad collaboration with clear communication and documentation.\nWork with cross-functional stakeholders to gather requirements, manage implementation, and drive delivery of projects.","Bachelor’s degree in Business, Economics, Statistics, Operations Research or related analytical field, or equivalent practical experience.\n2 years of work experience in business analysis.\n1 year of experience with statistical modeling, forecasting or machine learning. Experience with R, SAS or Python.\n1 year of experience developing and manipulating large datasets.",Experience partnering or consulting cross-functionally with senior stakeholders.\nProficiency in a database query language (e.g. SQL).\nAbility to manage multiple projects in an ambiguous environment.\nStrong presentation and communication skills with the ability to communicate statistical concepts and explain recommendations to non-experts.
3,Google,"Developer Advocate, Partner Engineering",Developer Relations,"Mountain View, CA, United States","Work one-on-one with the top Android, iOS, and web engineers to build exciting new product/API integrations that drive adoption of Google’s developer platforms.\nConceive new features and ideas that can change how users interact with apps and Google, and help developers build them.\nConduct regular, engineering-focused meetings with developers to help them design new systems, fix bugs, improve UX, and solve complex code issues.\nWork on the source code of Google's products with other engineers to identify, reproduce, and/or fix bugs that are affecting developers.","BA/BS degree in Computer Science or equivalent practical experience.\nExperience working directly with partners.\nProgramming experience in one or more of the following languages/platforms: Android, Java, Kotlin, iOS, Javascript.","Experience as a software developer, architect, technology advocate, CTO, or consultant working with web or mobile technologies.\nExperience working with third parties.\nExperience interacting with clients or internal stakeholders.\nKnowledge of web application or mobile application development landscapes."
4,Google,"Program Manager, Audio Visual (AV) Deployments",Program Management,"Sunnyvale, CA, United States","Plan requirements with internal customers.\nProvide portfolio reports and forecasts to Regional Service Delivery Manager. Manage vendor PM services to usher projects through the entire lifecycle.\nPlan finances and tracking across a portfolio of projects.\nEnsure that any changes in scope, schedule or cost are managed in accordance with the agreed change order procedures. Interface with cross functional stakeholders to understand and communicate program strategy and priorities.\nEnsure project closure processes are completed - including handover to support. Help to validate new and improve existing products by partnering with AV Eng teams. Set operational goals and lead ongoing process and program improvement initiatives.","BA/BS degree or equivalent practical experience.\n4 years of experience managing large scale Global AV Deployments.\nExperience managing vendors to deliver defined projects.\nExperience in Audio/Visual and Video Conferencing design, system integration and resolution.",CTS Certification.\nExperience in the construction sector.\nExperience with project management software and reporting tools.\nAbility to understand technical subjects and emerging technologies and their relevance to the marketplace.\nBusiness management and consulting skills.\nExcellent interpersonal and relationship building skills across multiple stakeholders at varying levels.


In [57]:
df2['Location'].value_counts()

Mountain View, CA, United States    190
Sunnyvale, CA, United States        155
Dublin, Ireland                      87
New York, NY, United States          70
London, United Kingdom               62
                                   ... 
Budapest, Hungary                     1
Zagreb, Croatia                       1
Nairobi, Kenya                        1
Moscow, ID, United States             1
Columbus, OH, United States           1
Name: Location, Length: 92, dtype: int64

In [54]:
print(df2['Category'].value_counts())
print('              ')
print(df['Industry'].value_counts())

Sales & Account Management          168
Marketing & Communications          165
Finance                             115
Technical Solutions                 101
Business Strategy                    98
People Operations                    86
User Experience & Design             84
Program Management                   74
Partnerships                         60
Product & Customer Support           50
Legal & Government Relations         46
Administrative                       40
Software Engineering                 31
Sales Operations                     31
Hardware Engineering                 26
Real Estate & Workplace Services     25
Manufacturing & Supply Chain         16
Technical Infrastructure             11
Network Engineering                   6
Developer Relations                   5
IT & Data Management                  5
Technical Writing                     5
Data Center & Network                 2
Name: Category, dtype: int64
              
Fishing, hunting and trapping [114] 

In [50]:
for col in list(df2.columns.values):
  print(f'\n{df2[col].value_counts()}')


Google     1227
YouTube      23
Name: Company, dtype: int64

Business Intern 2018                                               35
MBA Intern, Summer 2018                                            34
MBA Intern 2018                                                    28
BOLD Intern, Summer 2018                                           21
Field Sales Representative, Google Cloud                           17
                                                                   ..
Store Campaigns Manager, Merchandising, Google Play                 1
Retail Strategy Lead, Retail Marketing                              1
UX Content Strategist Lead                                          1
Strategic Agency Consultant, Google Marketing Solutions (Dutch)     1
Partner Marketing Manager, Google Cloud                             1
Name: Title, Length: 794, dtype: int64

Sales & Account Management          168
Marketing & Communications          165
Finance                             115
Technica