# Analysing H1B Acceptance Trends 

H1B visa is a nonimmigrant visa issued to gradute level workers which allows them to work in the United States. The employer sponsors the H1B visa for workers with theoretical or technical expertise in specialized fields such as in IT, finance, accounting etc. An interesting fact about immigrant workers is that about 52 percent of new Silicon valley companies were founded by such workers during 1995 and 2005. Some famous CEOs like Indira Nooyi (Pepsico), Elon Musk (Tesla), Sundar Pichai (Google),Satya Nadella (Microsoft) once arrived to the US on a H1B visa.

**Motivation**: Our team consists of five international gradute students, in the future we will be applying for H1B visa. The visa application process seems very long, complicated and uncertain. So we decided to understand this process and use Machine learning algorithms to predict the acceptance rate and trends of H1B visa. 

## Data 
The data used in the project has been collected from <a href="https://www.foreignlaborcert.doleta.gov/performancedata.cfm">the Office of Foreign Labor Certification (OFLC).</a> 

In [120]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [9]:
!pip install autocorrect
import pandas as pd
import numpy as np
import warnings
import nltk
from textblob import TextBlob
from autocorrect import Speller 
nltk.download('wordnet')



[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Charic\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

## Exploratory Data Analysis

Before we begin working on our data we need to understand the traits of our data which we accomplish using EDA. We see that we have about 260 columns , not all 260 coulms have essential information that contributes to our analysis. Hence we pick out the columns such as case status( Accepted/ Denied) ,Employer, Job title etc. 

In [8]:
#Read the csv file and stored in file
#file=pd.read_csv('/content/gdrive/My Drive/H-1B_Disclosure_Data_FY2019.csv')
file = pd.read_excel('../data/H-1B_Disclosure_Data_FY2019.xlsx', skip_blank_lines=False)

In [10]:
file.shape

(664616, 260)

In [31]:
cleaned=file[['CASE_NUMBER','CASE_STATUS','CASE_SUBMITTED','DECISION_DATE','VISA_CLASS','JOB_TITLE','SOC_CODE','SOC_TITLE','EMPLOYER_NAME','WAGE_RATE_OF_PAY_FROM_1','WAGE_UNIT_OF_PAY_1']]
cleaned.head()

Unnamed: 0,CASE_NUMBER,CASE_STATUS,CASE_SUBMITTED,DECISION_DATE,VISA_CLASS,JOB_TITLE,SOC_CODE,SOC_TITLE,EMPLOYER_NAME,WAGE_RATE_OF_PAY_FROM_1,WAGE_UNIT_OF_PAY_1
0,I-200-16092-327771,WITHDRAWN,2016-04-08,2019-04-30,H-1B,ASSOCIATE CREATIVE DIRECTOR,11-2011,ADVERTISING AND PROMOTIONS MANAGERS,"R/GA MEDIA GROUP, INC.",179000.0,Year
1,I-203-17188-450729,WITHDRAWN,2017-07-14,2019-05-13,E-3 Australian,ACCOUNT SUPERVISOR (MOTHER),11-2011,ADVERTISING AND PROMOTIONS MANAGERS,MOTHER INDUSTRIES LLC,110000.0,Year
2,I-203-17229-572307,WITHDRAWN,2017-08-23,2019-04-30,E-3 Australian,EXECUTIVE CREATIVE DIRECTOR,11-2011,ADVERTISING AND PROMOTIONS MANAGERS,"WE ARE UNLIMITED, INC.",275000.0,Year
3,I-203-17356-299648,WITHDRAWN,2017-12-22,2019-08-20,E-3 Australian,PROJECT MANAGEMENT LEAD,11-2011,ADVERTISING AND PROMOTIONS MANAGERS,"HELLO ELEPHANT, LLC",140000.0,Year
4,I-203-18008-577576,WITHDRAWN,2018-01-10,2019-04-15,E-3 Australian,"CREATIVE DIRECTOR, UX",11-2011,ADVERTISING AND PROMOTIONS MANAGERS,"HELLO ELEPHANT, LLC",180000.0,Year


In [32]:
cleaned['VISA_CLASS'].value_counts()

H-1B               649083
E-3 Australian      13087
H-1B1 Singapore      1291
H-1B1 Chile          1155
Name: VISA_CLASS, dtype: int64

In [33]:
# Visa class has many categories which are not of use , we require only H1B visa type , hence we drop all records with other visa types
cleaned.drop(labels=cleaned[cleaned['VISA_CLASS']!='H-1B'].index , inplace=True)

In [34]:
cleaned['CASE_STATUS'].value_counts()

CERTIFIED              578640
CERTIFIED-WITHDRAWN     46050
WITHDRAWN               19227
DENIED                   5166
Name: CASE_STATUS, dtype: int64

In [35]:
#As we want to only need accepted and denied cases we are dropping withdrawn from the data frame. 
#Case status of class certified-withdraw were certified earlier and later withdraw which can be considered a
cleaned.replace({"CASE_STATUS":"CERTIFIED-WITHDRAWN"},"CERTIFIED",inplace=True)
cleaned.drop(labels=cleaned[cleaned['CASE_STATUS']=='WITHDRAWN'].index , inplace=True)
cleaned.head()

Unnamed: 0,CASE_NUMBER,CASE_STATUS,CASE_SUBMITTED,DECISION_DATE,VISA_CLASS,JOB_TITLE,SOC_CODE,SOC_TITLE,EMPLOYER_NAME,WAGE_RATE_OF_PAY_FROM_1,WAGE_UNIT_OF_PAY_1
18,I-200-17250-072640,CERTIFIED,2017-09-07,2019-01-07,H-1B,"EXECUTIVE DIRECTOR, STRATEGY",11-2011,ADVERTISING AND PROMOTIONS MANAGERS,FIGLIULO & PARTNERS LLC,230000.0,Year
19,I-200-18026-717110,CERTIFIED,2018-01-26,2019-07-05,H-1B,PROJECT OPERATIONS MANAGER,11-2011,ADVERTISING AND PROMOTIONS MANAGERS,INVISIONAPP INC.,107000.0,Year
21,I-200-18039-081565,CERTIFIED,2018-03-05,2019-01-08,H-1B,MANAGER OF LEAGUE AND TOURNAMENT SERVICES,11-2011,ADVERTISING AND PROMOTIONS MANAGERS,OREGON YOUTH SOCCER ASSOCIATION,49087.0,Year
22,I-200-18082-340860,CERTIFIED,2018-03-23,2019-04-22,H-1B,"DIRECTOR, DEMAND",11-2011,ADVERTISING AND PROMOTIONS MANAGERS,"FACTUAL, INC.",172930.0,Year
24,I-200-18162-689783,CERTIFIED,2018-09-26,2018-10-02,H-1B,ADVERSTING AND PROMOTIONS MANAGER,11-2011,ADVERTISING AND PROMOTIONS MANAGERS,FANTUAN GROUP INC,33.0,Hour


In [36]:
cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 629856 entries, 18 to 664615
Data columns (total 11 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   CASE_NUMBER              629856 non-null  object        
 1   CASE_STATUS              629856 non-null  object        
 2   CASE_SUBMITTED           629856 non-null  datetime64[ns]
 3   DECISION_DATE            629856 non-null  datetime64[ns]
 4   VISA_CLASS               629856 non-null  object        
 5   JOB_TITLE                629856 non-null  object        
 6   SOC_CODE                 629852 non-null  object        
 7   SOC_TITLE                629852 non-null  object        
 8   EMPLOYER_NAME            629848 non-null  object        
 9   WAGE_RATE_OF_PAY_FROM_1  629852 non-null  float64       
 10  WAGE_UNIT_OF_PAY_1       629852 non-null  object        
dtypes: datetime64[ns](2), float64(1), object(8)
memory usage: 57.7+ MB


In [37]:
#the column wages has a mix of both string and float value types and some record have the symbol '$' which we want to remove
cleaned['WAGE_RATE_OF_PAY_FROM_1'].apply(type).value_counts()

<class 'float'>    629856
Name: WAGE_RATE_OF_PAY_FROM_1, dtype: int64

In [38]:
def clean_wages(w):
    """ Function to remove '$' symbol and other delimiters from wages column which consistes of str and float type values
    if the column entry is string type then remove the symbols else return the column value as it is 
    """
    if isinstance(w, str):
        return(w.replace('$', '').replace(',', ''))
    return(w)

In [39]:
cleaned['WAGES']=cleaned['WAGE_RATE_OF_PAY_FROM_1'].apply(clean_wages).astype('float')
cleaned.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


<class 'pandas.core.frame.DataFrame'>
Int64Index: 629856 entries, 18 to 664615
Data columns (total 12 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   CASE_NUMBER              629856 non-null  object        
 1   CASE_STATUS              629856 non-null  object        
 2   CASE_SUBMITTED           629856 non-null  datetime64[ns]
 3   DECISION_DATE            629856 non-null  datetime64[ns]
 4   VISA_CLASS               629856 non-null  object        
 5   JOB_TITLE                629856 non-null  object        
 6   SOC_CODE                 629852 non-null  object        
 7   SOC_TITLE                629852 non-null  object        
 8   EMPLOYER_NAME            629848 non-null  object        
 9   WAGE_RATE_OF_PAY_FROM_1  629852 non-null  float64       
 10  WAGE_UNIT_OF_PAY_1       629852 non-null  object        
 11  WAGES                    629852 non-null  float64       
dtypes: datetime64[n

In [41]:
# the wage information that we have available has different unit of pay
cleaned['WAGE_UNIT_OF_PAY_1'].value_counts()

Year         587386
Hour          41927
Month           342
Bi-Weekly       105
Week             92
Name: WAGE_UNIT_OF_PAY_1, dtype: int64

In [42]:
x=cleaned.loc[cleaned['WAGE_UNIT_OF_PAY_1']=="Month"]
x.head(2)

Unnamed: 0,CASE_NUMBER,CASE_STATUS,CASE_SUBMITTED,DECISION_DATE,VISA_CLASS,JOB_TITLE,SOC_CODE,SOC_TITLE,EMPLOYER_NAME,WAGE_RATE_OF_PAY_FROM_1,WAGE_UNIT_OF_PAY_1,WAGES
818,I-200-18306-399497,DENIED,2018-11-02 11:37:37,2018-11-05 12:07:42,H-1B,ACCOUNTING & MARKETING MANAGER FOR AFRICA,11-2021,MARKETING MANAGERS,SHOP2SHIP LLC,2000.0,Month,2000.0
826,I-200-18309-843479,CERTIFIED,2018-11-05 12:34:19,2018-11-09 22:00:34,H-1B,ACCOUNTING & MARKETING MANAGER FOR AFRICA,11-2021,MARKETING MANAGERS,SHOP2SHIP LLC,2000.0,Month,2000.0


In [43]:
# we convert the different units of pay to the type 'Year'
cleaned['WAGES'] = np.where(cleaned['WAGE_UNIT_OF_PAY_1'] == 'Month',cleaned['WAGES'] * 12,cleaned['WAGES'])
cleaned['WAGES'] = np.where(cleaned['WAGE_UNIT_OF_PAY_1'] == 'Hour',cleaned['WAGES'] * 2080,cleaned['WAGES']) # 2080=8 hours*5 days* 52 weeks
cleaned['WAGES'] = np.where(cleaned['WAGE_UNIT_OF_PAY_1'] == 'Bi-Weekly',cleaned['WAGES'] *26,cleaned['WAGES'])
cleaned['WAGES'] = np.where(cleaned['WAGE_UNIT_OF_PAY_1'] == 'Week',cleaned['WAGES'] * 52,cleaned['WAGES'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_

In [44]:
#As we have got the information of Wages and made transformation we can drop the initial two records
cleaned.drop(columns=['WAGE_RATE_OF_PAY_FROM_1','WAGE_UNIT_OF_PAY_1'],axis=1,inplace=True)

In [45]:
cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 629856 entries, 18 to 664615
Data columns (total 10 columns):
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   CASE_NUMBER     629856 non-null  object        
 1   CASE_STATUS     629856 non-null  object        
 2   CASE_SUBMITTED  629856 non-null  datetime64[ns]
 3   DECISION_DATE   629856 non-null  datetime64[ns]
 4   VISA_CLASS      629856 non-null  object        
 5   JOB_TITLE       629856 non-null  object        
 6   SOC_CODE        629852 non-null  object        
 7   SOC_TITLE       629852 non-null  object        
 8   EMPLOYER_NAME   629848 non-null  object        
 9   WAGES           629852 non-null  float64       
dtypes: datetime64[ns](2), float64(1), object(7)
memory usage: 52.9+ MB


In [46]:
"""
We should remove record that have null objects, from the above cell we see
that all columns don't have same number of non-null records
which means we have to remove the records that have the null values.
we see that there are about 17 records that have null values
""" 
null_rows = cleaned.isnull().any(axis=1)
print(cleaned[null_rows].shape)
print(cleaned.shape)

(16, 10)
(629856, 10)


In [47]:
cleaned.dropna(inplace=True)
print(cleaned.shape)

(629840, 10)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [50]:
cleaned['JOB_TITLE'].value_counts()

SOFTWARE ENGINEER                                  33199
SOFTWARE DEVELOPER                                 33057
SENIOR SYSTEMS ANALYST JC60                        12759
SENIOR SOFTWARE ENGINEER                            8135
MANAGER JC50                                        8118
                                                   ...  
DEG PE PLANNER                                         1
DIRECTOR PLATFORM ENGINEERING                          1
DATA SOLUTION MANAGER - CLOUD                          1
ASSISTANT PROFESSOR OF FINANCING AND ACCOUNTING        1
PHP SUPPORT ENGINEER                                   1
Name: JOB_TITLE, Length: 108140, dtype: int64

In [51]:
#we see that the job title has integers(words with integers also) 
#removing comma also
def remove_num(text):
  if not any(c.isdigit() for c in text):
    return text
  return ''
cleaned['JOB_TITLE']=cleaned.JOB_TITLE.apply(lambda txt: " ".join([remove_num(i) for i in txt.lower().split()]))
cleaned['JOB_TITLE']=cleaned['JOB_TITLE'].str.replace(',', '')
cleaned['SOC_TITLE']=cleaned.SOC_TITLE.apply(lambda txt: " ".join([remove_num(i) for i in txt.lower().split()]))
cleaned['JOB_TITLE']=cleaned['SOC_TITLE'].str.replace(',', '')

cleaned.head()
cleaned['JOB_TITLE'].value_counts()

AttributeError: 'int' object has no attribute 'lower'

In [0]:
#code to clean and group the JOB_TITLE COLUMN
# lemmatization and spell check function
nltk.download('words')
lemmatizer = nltk.stem.WordNetLemmatizer()
words = set(nltk.corpus.words.words())
spell = Speller()


def lemmatize_text(text):
  return lemmatizer.lemmatize(text)

def spelling_checker(text):
  return spell(text)
 
print(spelling_checker("computr sciece progam check"))

[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!
computer science program check


In [118]:
cleaned['JOB_TITLE']=cleaned.JOB_TITLE.apply(lambda txt: " ".join([lemmatize_text(i) for i in txt.lower().split()]))
cleaned['SOC_TITLE']=cleaned.SOC_TITLE.apply(lambda txt: " ".join([lemmatize_text(i) for i in txt.lower().split()]))

cleaned.JOB_TITLE = cleaned.JOB_TITLE.apply(lambda txt: " ".join([spelling_checker(i) for i in txt.lower().split()]))
cleaned.SOC_TITLE = cleaned.SOC_TITLE.apply(lambda txt: " ".join([spelling_checker(i) for i in txt.lower().split()]))

#cleaned['JOB_TITLE']=cleaned.JOB_TITLE.apply(lambda txt: " ".join([remove_text(i) for i in txt.lower().split()]))
cleaned['JOB_TITLE'].value_counts() 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


software developer application                   208691
computer system analyst                           71188
computer occupation all other                     54486
software developer system software                30600
computer programmer                               16623
                                                  ...  
plumber pipefitter and steamfitter                    1
security commodity and financial service sale         1
broadcast technician                                  1
musical instrument repairer and tuner                 1
software develops application                         1
Name: JOB_TITLE, Length: 716, dtype: int64

In [0]:
grouped_wages=cleaned.groupby('JOB_TITLE', as_index=False).agg({'WAGES':'mean'})
op=grouped_wages.sort_values(by=['WAGES'],ascending=False)
X=op.loc[op['JOB_TITLE']=='SOFTWARE ENGINEER']
display(op)
display(X)

Unnamed: 0,JOB_TITLE,WAGES
64790,senior applications engineer - power management,98847500.00
47728,nurse practitioners (licensed),97780945.60
80095,specialist web developer,74060709.00
14470,chemists,41649665.05
38055,jr. data scientist (-.),40303356.00
...,...,...
46956,nail technical,18720.00
18524,customer service all tasks & duties of a nail ...,18200.00
91615,track and field coach,17040.00
40913,live streaming service,17000.00


Unnamed: 0,JOB_TITLE,WAGES


### Baseline classifier

The baseline classifier is done with a basic model. In this case we are taking the mode of the labels ('certified' and 'denied' for H1B visa approvals). It will give us the base accuracy to which we will compare our classifier's accuracy. Our classifier should have a better accuracy than the baseline classifier accuracy.


In [163]:
# This step assigns a binary class label (0 or 1) to each label for H1B visa approval. 
#'CERTIFIED' is mapped to 1 and 'DENIED' to 0

def create_class_labels(processed_data):
    
    y = np.where((processed_data['CASE_STATUS']=='CERTIFIED'),1, 0)
    
    return y

X = cleaned['CASE_STATUS'].to_numpy()

# Groundtruth labels for the dataset
y = create_class_labels(cleaned)
counts = cleaned['CASE_STATUS'].value_counts()
print(counts)
print('proportion: ', counts[0]/counts[1], ': 1')
#print(np.count_nonzero(y == True), len(y))

CERTIFIED    624682
DENIED         5158
Name: CASE_STATUS, dtype: int64
proportion:  121.10934470725087 : 1


In [164]:
import sklearn
from statistics import mean

# Baseline classifier that predicts the class base on the mode of the labels.

class BaselineClasifier():
    
    def __init__(self):
        self.central_tendency = None
        
    def fit(self, data, y, central_t='mode'): 
        
        # Count labels and find the most frequent one
        label, counts = np.unique(y, return_counts=True) 
        
        if central_t == 'mode':
            self.central_tendency = counts.argmax()
        elif central_t == 'mean':
            self.central_tendency = round(np.sum(y)/len(y))
        
        return self
    
    # Return an array with size equal to the data size  and each element setted to the mode.
    def predict(self, data):
        
        result = np.full(data.shape[0], self.central_tendency)
        
        return result

In [165]:
def compute_accuracy(validation, predicted):
    
    comp = prediction == validation 
    match_counts = np.count_nonzero(comp == True) 
    clasifier_accuracy = match_counts/len(validation)
    
    return clasifier_accuracy
    

In [189]:
from sklearn import metrics
from sklearn.metrics import roc_auc_score

def compute_AUC(y, prediction):
    
    #fpr, tpr, thresholds = metrics.roc_curve(y, prediction, pos_label=2)
    auc = roc_auc_score(y, prediction)
    #print(auc)
    #return metrics.auc(fpr, tpr)
    return auc

In [190]:
y1 = np.array([1, 1, 1, 0])
pred = np.array([1,1,1,1])
print(compute_AUC(y1, pred))

0.5


In [167]:
# Prediction with the cleaned data
baseline_clasifier = BaselineClasifier() 
baseline_clasifier.fit(X, y, 'mode') 
prediction = baseline_clasifier.predict(cleaned)

# Computation of accuracy by comparing the groundtruth labels and the predicted labels for the cleaned dataset
baseline_clasifier_accuracy = compute_accuracy(y, prediction)

print('Baseline accuracy: ', baseline_clasifier_accuracy)

Baseline accuracy:  0.9918106185697955


In [95]:
cleaned.columns

Index(['CASE_NUMBER', 'CASE_STATUS', 'CASE_SUBMITTED', 'DECISION_DATE',
       'VISA_CLASS', 'JOB_TITLE', 'SOC_CODE', 'SOC_TITLE', 'EMPLOYER_NAME',
       'WAGES'],
      dtype='object')

In [191]:
from sklearn import model_selection

# Testing with K-folds

accuracies = []

kf = sklearn.model_selection.KFold(n_splits=4, random_state=1, shuffle=True) 

for train_idx, test_idx in kf.split(X):
    
    X_train, X_test, y_train, y_test = X[train_idx], X[test_idx], y[train_idx], y[test_idx] 
    baseline_clasifier = BaselineClasifier()
    classifier = baseline_clasifier.fit(X_train, y_train, 'mean')
    prediction = baseline_clasifier.predict(X_test)
    
    #fold_accuracy = compute_accuracy(y_test, prediction)
    fold_accuracy = compute_AUC(y_test, prediction)
    accuracies.append(fold_accuracy)
    
print(accuracies)
baseline_clasifier_accuracy = mean(accuracies)

print('Baseline accuracy: ', baseline_clasifier_accuracy)   
    

[0.5, 0.5, 0.5, 0.5]
Baseline accuracy:  0.5


In [192]:
from sklearn.model_selection import train_test_split

# Testing with regular split

# create training and testing vars
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
baseline_clasifier = BaselineClasifier()
classifier = baseline_clasifier.fit(X_train, y_train, 'mean')
prediction = baseline_clasifier.predict(X_test)

#split_accuracy = compute_accuracy(y_test, prediction)
split_accuracy = compute_AUC(y_test, prediction)

print('Baseline accuracy: ', split_accuracy)  

Baseline accuracy:  0.5
