<a href="https://colab.research.google.com/github/aliehhz/NLP-Learnings/blob/main/Project01_NLTK_SpaCy_RezaShokrzad.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📜 Project: Job Description Analyzer – Extracting Required Skills from Job Postings


## 📌 Objective
Use spaCy’s Named Entity Recognition (NER) and NLTK preprocessing to extract and categorize required skills from job descriptions. The goal is to identify trends in job requirements and analyze the most in-demand skills across industries.

## 🛠️ Project Steps & Instructions


In [None]:
#📥 Download the Dataset
!wget https://raw.githubusercontent.com/binoydutt/Resume-Job-Description-Matching/refs/heads/master/data.csv

--2025-03-28 02:35:13--  https://raw.githubusercontent.com/binoydutt/Resume-Job-Description-Matching/refs/heads/master/data.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 646072 (631K) [text/plain]
Saving to: ‘data.csv.2’


2025-03-28 02:35:13 (16.2 MB/s) - ‘data.csv.2’ saved [646072/646072]



### Step 1: Load the Dataset
#### 📌 Dataset: A provided CSV file containing job descriptions from different industries (IT, Healthcare, Finance, Marketing, etc.).

1. Download the dataset (link below).
2. Load it into Python using Pandas.
3. View the first few rows to understand its structure.

In [None]:
# your code here
import pandas as pd
df = pd.read_csv('data.csv')
print("Shape of dataset:", df.shape)
df.head()



Shape of dataset: (157, 10)


Unnamed: 0.1,Unnamed: 0,company,position,url,location,headquaters,employees,founded,industry,Job Description
0,1,Visual BI Solutions Inc,Graduate Intern (Summer 2017) - SAP BI / Big D...,https://www.glassdoor.com/partner/jobListing.h...,"Plano, TX","Plano, TX",51 to 200 employees,2010,Information Technology,"Location: Plano, TX or Oklahoma City, OK Dura..."
1,2,Jobvertise,Digital Marketing Manager,https://www.glassdoor.com/partner/jobListing.h...,"Dallas, TX","Berlin, Germany",1 to 50 employees,2011,Unknown,The Digital Marketing Manager is the front li...
2,3,Santander Consumer USA,"Manager, Pricing Management Information Systems",https://www.glassdoor.com/partner/jobListing.h...,"Dallas, TX","Dallas, TX",5001 to 10000 employees,1995,Finance,Summary of Responsibilities:The Manager Prici...
3,4,Federal Reserve Bank of Dallas,Treasury Services Analyst Internship,https://www.glassdoor.com/partner/jobListing.h...,"Dallas, TX","Dallas, TX",1001 to 5000 employees,1914,Finance,ORGANIZATIONAL SUMMARY: As part of the nati...
4,5,Aviall,"Intern, Sales Analyst",https://www.glassdoor.com/partner/jobListing.h...,"Dallas, TX","Dallas, TX",1001 to 5000 employees,Boeing,Subsidiary or Business Segment,Aviall is the world's largest provider of n...


### Step 2: Preprocessing the Job Descriptions
#### 📌 Goal: Clean the text by removing stopwords, punctuation, and unnecessary characters.

1. Use NLTK to tokenize the descriptions.
2. Remove stopwords and special characters.
3. Convert text to lowercase for consistency.

In [None]:
# your code here
import nltk

nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize


job_des = {}
for i in range(df.shape[0]):
  job_des[i] = word_tokenize(df['Job Description'][i])
  job_des[i] = [word.lower() for word in job_des[i] if word.lower() not in stopwords.words('english')  and word.isalnum()]


for key, value in job_des.items():
    print(key, ":", value)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


0 : ['location', 'plano', 'tx', 'oklahoma', 'city', 'ok', 'duration', 'internship', 'summer', '2017', 'term', 'job', 'summary', 'visual', 'bi', 'solutions', 'inc', 'seeking', 'graduate', 'interns', 'strong', 'bi', 'big', 'data', 'analytics', 'solutions', 'sap', 'bw', 'sap', 'hana', 'oracle', 'ms', 'sql', 'edw', 'bods', 'sas', 'big', 'data', 'visualization', 'tools', 'join', 'college', 'recruiting', 'hiring', 'program', 'role', 'would', 'building', 'bi', 'analytics', 'big', 'data', 'solutions', 'would', 'consumed', 'leaders', 'executives', 'fortune', '500', 'organizations', 'strong', 'sense', 'business', 'analysis', 'etl', 'data', 'modeling', 'data', 'warehousing', 'visualization', 'reporting', 'advanced', 'analytics', 'data', 'interpretation', 'key', 'attributes', 'look', 'market', 'leader', 'sap', 'bi', 'analytics', 'visual', 'bi', 'selective', 'student', 'hiring', 'candidates', 'portfolio', 'project', 'blogs', 'preferred', 'work', 'experience', 'academic', 'gpa', 'scores', 'asked', '

### Step 3: Extract Skills Using Named Entity Recognition (NER)
#### 📌 Goal: Use spaCy’s built-in NER to detect and extract skills from job descriptions.

1. Load spaCy’s English model.
2. Use NER to identify important keywords.
3. Extract words related to technical skills, tools, and expertise.

In [None]:
import spacy

# Load spaCy's pre-trained NER model
nlp = spacy.load("en_core_web_sm")

job_des = {}  # Store processed job descriptions
job_desc_entities = {}  # Store extracted entities
size = df.shape[0]  # Get the number of rows in the dataframe

# Loop through job descriptions and extract named entities
for i in range(size):
    doc = nlp(df['Job Description'][i])  # Process the text with spaCy
    job_desc_entities[i] = [
        [ent.text, ent.label_]  # Store entity text and label
        for ent in doc.ents  # Extract named entities
        if ent.label_ in ['ORG', 'PERSON', 'PRODUCT']  # Filter relevant entities
    ]

# Print the extracted entities
for key, value in job_desc_entities.items():
    print(key, ":", value)


0 : [['TX', 'ORG'], ['Graduate Interns', 'PERSON'], ['Big Data & Analytics Solutions', 'ORG'], ['SAP', 'ORG'], ['Oracle / MS SQL EDW / PL/SQL / BODS / SAS / Big Data / Visualization Tools', 'ORG'], ['College Recruiting Hiring Program', 'ORG'], ['Analytics & Big Data Solutions', 'ORG'], ['ETL', 'ORG'], ['Data Modeling', 'ORG'], ['Data Warehousing, Visualization, Reporting, Advanced Analytics', 'ORG'], ['SAP BI & Analytics - Visual BI', 'ORG'], ['GPA Scores', 'ORG'], ['BI & Analytics', 'ORG'], ['BI', 'ORG'], ['EDW', 'ORG'], ['Big Data', 'ORG'], ['Relevant Work BI', 'ORG'], ['Preferred -Visual BI', 'ORG'], ['SAP BW', 'ORG'], ['SAP BODS', 'ORG'], ['Big Data Frameworks (', 'ORG'], ['CSS/JavaScript/UI5/Fiori', 'ORG'], ['Visual BI Labs', 'ORG'], ['Internship', 'ORG'], ['Paid Internship  Opportunities', 'ORG'], ['BI & Analytics', 'ORG']]
1 : [['Digital Marketing', 'ORG'], ['EyeCare Services Partners', 'ORG'], ['Digital Marketing', 'ORG'], ['Functions Own', 'PERSON'], ['Leverage', 'ORG'], ['Ove

In [None]:



ner_dict = {
    "Skills": [
        "Data Modeling", "Data Warehousing", "Data Management", "Data Visualization",
        "Data Analytics",  "ETL", "Extraction","Transformation", "Loading", "Reporting",
        "Data Science", "Business Intelligence", "BI", "Advanced Analytics",
        "Information Technology", "IT", "Cloud", "CyberSecurity", "Business Objects",
        "Quality Assurance","Testing", "Digital Marketing", "Marketing Analytics",
        "Social Media Analytics", "Marketing Communications", "Data Integration",
        "Cloud Computing", "Machine Learning", "Business Analytics", "Data Mining",
        "Predictive Analytics", "Problem Solving", "Project Management", "Leadership",
        "Task Master", "Analytical Thinking", "Strategic Thinking",
        "Quality & Reliability Engineering", "Data Mining Methods",
        "Predictive Maintenance of Equipment", "Data Warehouse",
        "Leadership and Management Skills", "Financial Planning & Analysis",
        "Risk Management", "Business Risk", "Interpersonal Skills", "Communication Skills",
        "Business Analysis", "Operations Management",
        "Analytics", "Visualization", "Statistical Analysis" ,
        "Risk Management" , "Workflow Ability", "Epidemiology", "Process Improvement",
        "Relationship Management", "Extreme Programming", "Feature Driven Development",
        "Dynamic Systems Development Method",
        "Data Conversion", "Project Management Skills", "Analytical Skills",
    ],
    "Tools": [
        "PL", "SQL", "BODS (BusinessObjects Data Services)", "SAS", "Hadoop", "Spark", "CSS",
        "JavaScript", "UI5", "Fiori", "SAP", "Tableau", "MS", "Oracle", "EDW"
        "Microsoft Office Suite", "Excel", "Word", "PowerPoint", "Access",
        "Microsoft SharePoint", "Microsoft Access", "IBM Cognos", "SQL Server", "Oracle",
        "CMS","Content Management Systems", "CRM", "Customer Relationship Management",
        "Google AdWords", "Adobe Suite", "Microsoft Power BI", "QlikView", "Jaspersoft",
        "Text Mining Tools", "ElasticSearch", "Logstash", "Visual Analytics Tools", "InfoPath",
        "Microsoft Skype for Business", "Microsoft Visio", "Microsoft Outlook", "Yardi",
        "ERP systems", "Design Thinking", "Google Analytics", "SurveyMonkey", "ExactTarget",
        "Twitter", "Facebook", "Balsamiq", "Notepad++", "Macromedia Flash", "Java/J2EE",
        "Drools", "eViews", "MS Project", "ServiceNow","Windows","Linux", "Windows/Linux Environments", "SAN",
        "Avaya", "Polycom VC", "Acme Packet SBC’s & ECB", "QuickBooks", "TurboTax", "Mint.com",
        "SQL", "Map Reduce", "Autocad", "Facebook Updates", "Jira", "MVC", "Excel", "PowerPoint", "Word",
        "JavaScript", "PHP", "Microsoft Visio", "Java", "CSS", "HTML", "Visual BI", "MATLAB", "SPSS", "Scrum",
         "Software-as-a-Service", "SaaS"
    ],
    "Expertise": [
        "Big Data & Analytics Solutions", "Business Intelligence Solutions", "Visualization Tools",
        "Cloud & Big Data (specifically in relation to consulting and managed services)",
        "Cybersecurity Expertise", "Data Warehousing Expertise", "Software Engineering",
        "Cloud Computing Expertise", "Digital Product Marketing Expertise",
        "Marketing Strategy Expertise", "IT/Software Solutions & Consulting Expertise",
        "Project Tracking & Management Expertise", "Cloud, Big Data & Cyber", "Information Technology",
        "Business Intelligence", "Finance", "Health & Manufacturing", "Data Warehouse", "ERP",
        "Business Risk", "Business Strategy", "Financial Analysis", "Consulting & Systems Integration",
        "IT Support", "Tax Intern", "Financial Planning & Analysis", "Retail & Telecommunications",
        "Marketing & Consumer Branding", "Managed Services & BPO", "Marketing and Media Relations",
        "Operations and Strategy", "Health, Manufacturing, Retail, Telecommunications, Transportation",
        "Business Analytics", "Project Management and Finance", "Information Security and Cyber Forensics",
        "Environmental Engineering", "Audit and Tax", "Insurance and Spatial Product Development",
        "Treasury Services", "Data Integration Technologies", "Business & Marketing Strategy",
        "Marketing Research", "Financial Services", "Human Resources & Recruiting",
        "Risk Management & Treasury", "Consumer Brand Analytics", "Corporate Communications",
        "Operations & Logistics Management", "Project Management & Business Analysis",
        "Business Analysis", "Data Engineering", "Data Science Research", "Web Development",
        "Market Research", "Business Development", "Financial Auditing", "Risk Management",
        "Process Safety Management", "Voice Engineering", "E-commerce and IT Solutions",
        "Strategic Communications and PR", "Managed Services Providers", "Data Analytics", "Health Care",
        "General Administration", "Digital Marketing", "Marketing", "General Accounting", "Management",
        "Engineering Technology", "Computer Engineering", "Economics", "Mathematics", "Computer Science",
        "General Agriculture", "Marketing Communications", "Sales Operations"
    ]
}


'''
skills = {
    "skills": [
    "AB Testing", "ABAP", "Advanced Analytics", "Agile", "Adobe Suite",
    "Analytics", "Big Data", "Big Data Frameworks", "BI", "BODS",
    "Bloomberg", "Business Objects", "Capital IQ", "CIA", "CFE",
    "Cloud", "CMS", "Cognos", "Copywriting", "CRM",
    "CyberSecurity", "Data Analytics", "Data Management", "Data Modeling", "Data Science",
    "Data Visualization", "Data Warehousing", "DBA", "Decision Sciences", "Digital Marketing",
    "EDW", "ETL", "ElasticSearch", "EMR", "Excel",
    "Fiori", "Facebook", "Google Adwords", "Graphic Design", "Hadoop",
    "HCM", "HireVue", "HRIS", "HTML", "H-Lookup",
    "IBM", "iTracker", "Informatica", "Information Systems", "Information Technology",
    "Indexing", "Java", "JavaScript", "Jaspersoft", "Logstash",
    "Mainframe", "Mainframe Disk", "MedeAnalytics", "Metadata Management", "Microsoft Access",
    "Microsoft Office", "Microsoft Outlook", "MS SQL", "NewsletterManage", "Patient Communication Software",
    "Pivot Tables", "PL/SQL", "PMP", "PPC", "PowerPoint",
    "Proof Reading", "Product Marketing", "Project Tracking Software", "QA/Testing", "QlikView",
    "Query", "Reporting", "SAS", "SAP", "SAP BI & Analytics",
    "SAP BW", "SEO", "SharePoint", "Social Media", "Spark",
    "SQL", "SQL Server", "SSRS", "Tableau", "Twitter",
    "UI5", "V-Lookup", "Visual Analytics Tools", "Visual BI", "Visualization",
    "Visualization Tools", "Video Production", "Web Development", "Word", "Android",
    "Big Data Management", "Capital Management", "Cloud Computing", "Cognitive Solutions", "Cyber",
    "Cyber Forensics", "Data Exploration", "Data Mining Methods", "Database Technology", "Design Thinking",
    "Digital Technologies", "ERP", "Environmental Engineering", "ExactTarget", "Google Analytics",
    "Identity Matching", "InfoPath", "IoT", "Jaspersoft", "Logistics & Supply Chain",
    "Machine Learning", "Marketing Analytics", "MDM", "Microsoft Power BI", "Microsoft SharePoint Online",
    "Microsoft Skype for Business", "Microsoft Visio", "MS SharePoint", "OneNote", "Outlook",
    "Pattern Recognition", "Predictive Analytics", "Predictive Maintenance of Equipment", "Project Management", "Quality & Reliability Engineering",
    "QC", "Risk Management", "SAP", "Scala", "SOX",
    "Statistics", "SurveyMonkey", "Text Mining", "Visual Analytics Tools", "Yardi",
    "Aptitude", "Business Analytics", "Business Intelligence", "Digital Platform", "Digital Technology",
    "ETL", "Extraction, Transformation and Loading", "EIT Certification", "Google Analytics", "Languages",
    "Linear", "Microsoft Excel", "Microsoft Office Suite", "Microsoft Projects", "MS Project",
    "Notepad++", "Photoshop", "Process Hazard Analysis", "Process Safety Management", "Risk Management & Workflow",
    "ServiceNow", "StaffTrak", "StartWire", "Task Master", "Access",
    "Balsamiq", "CSS", "CSS3", "Drools", "eBusiness",
    "eCommerce", "eViews", "Hive", "IBM Security", "Installation",
    "Java/J2EE", "Linux", "Macromedia Flash", "Map Reduce", "Matlab",
    "MedeAnalytics", "MSCI IPD", "PowerShell", "Product Development", "QA",
    "QBO", "QuickBooks", "Random Forests", "Scrum", "SPSS",
    "Supply Chain Management", "Systems Hardware", "TurboTax", "User Experience Design", "User Research and Interaction Design",
    "VB.NET", "VB6", "Voice", "Voice Engineering", "VoIP",
    "Windows", "Windows & Network", "XML", "XPath", "XQuery",
    "XSLT"]
}
'''


'\nskills = {\n    "skills": [\n    "AB Testing", "ABAP", "Advanced Analytics", "Agile", "Adobe Suite",\n    "Analytics", "Big Data", "Big Data Frameworks", "BI", "BODS",\n    "Bloomberg", "Business Objects", "Capital IQ", "CIA", "CFE",\n    "Cloud", "CMS", "Cognos", "Copywriting", "CRM",\n    "CyberSecurity", "Data Analytics", "Data Management", "Data Modeling", "Data Science",\n    "Data Visualization", "Data Warehousing", "DBA", "Decision Sciences", "Digital Marketing",\n    "EDW", "ETL", "ElasticSearch", "EMR", "Excel",\n    "Fiori", "Facebook", "Google Adwords", "Graphic Design", "Hadoop",\n    "HCM", "HireVue", "HRIS", "HTML", "H-Lookup",\n    "IBM", "iTracker", "Informatica", "Information Systems", "Information Technology",\n    "Indexing", "Java", "JavaScript", "Jaspersoft", "Logstash",\n    "Mainframe", "Mainframe Disk", "MedeAnalytics", "Metadata Management", "Microsoft Access",\n    "Microsoft Office", "Microsoft Outlook", "MS SQL", "NewsletterManage", "Patient Communication

In [None]:
import spacy
import pandas as pd
from spacy.matcher import Matcher

# Load SpaCy model
nlp = spacy.load("en_core_web_sm")

# Initialize Matcher with NLP vocab
matcher = Matcher(nlp.vocab)



# Convert words in ner_dict into SpaCy token patterns
for label, words in ner_dict.items():
    for word in words:
        pattern = [{"LOWER": word.lower()}]  # Match case-insensitive
        matcher.add(label, [pattern])



# Dictionary to store extracted entities
job_desc_entities= {}

# Apply Matcher on each job description
for i in range(size):
    doc = nlp(df["Job Description"][i].lower())
    matches = matcher(doc)

    # Extract matched entities
    extracted_entities = [[doc[start:end].text, nlp.vocab.strings[match_id]] for match_id, start, end in matches]

    # Store in dictionary
    job_desc_entities[i] = extracted_entities

# Print the results
for key, value in job_desc_entities.items():
    print(key, ":", value)



0 : [['bi', 'Skills'], ['bi', 'Skills'], ['analytics', 'Skills'], ['sap', 'Tools'], ['sap', 'Tools'], ['oracle', 'Tools'], ['ms', 'Tools'], ['sql', 'Tools'], ['pl', 'Tools'], ['sql', 'Tools'], ['sas', 'Tools'], ['visualization', 'Skills'], ['bi', 'Skills'], ['analytics', 'Skills'], ['etl', 'Skills'], ['visualization', 'Skills'], ['reporting', 'Skills'], ['analytics', 'Skills'], ['sap', 'Tools'], ['bi', 'Skills'], ['analytics', 'Skills'], ['bi', 'Skills'], ['bi', 'Skills'], ['analytics', 'Skills'], ['bi', 'Skills'], ['it', 'Skills'], ['bi', 'Skills'], ['etl', 'Skills'], ['bi', 'Skills'], ['bi', 'Skills'], ['sap', 'Tools'], ['sap', 'Tools'], ['sap', 'Tools'], ['hadoop', 'Tools'], ['spark', 'Tools'], ['reporting', 'Skills'], ['css', 'Tools'], ['javascript', 'Tools'], ['ui5', 'Tools'], ['fiori', 'Tools'], ['access', 'Tools'], ['bi', 'Skills'], ['bi', 'Skills'], ['bi', 'Skills'], ['analytics', 'Skills'], ['bi', 'Skills']]
1 : [['marketing', 'Expertise'], ['marketing', 'Expertise'], ['market

### Step 4: Identify the Most In-Demand Skills
#### 📌 Goal: Count the most frequently mentioned skills in job descriptions.

1. Create a word frequency distribution of extracted skills.
2. Identify the top 10 most required skills.

In [None]:
from collections import Counter

skills_list = []

# Iterate throughjob_desc_entities and extract only "SKILLS"
for key, value in  job_desc_entities.items():
    for item in value:
        if item[1] == "Skills":
            skills_list.append(item[0])

# Count the frequency of each skill
skill_freq = Counter(skills_list)

# Get the top 10 most frequent skills
top_10_skills = skill_freq.most_common(10)

# Print the results
print("\nTop 10 Most Required Skills:")
for skill, count in top_10_skills:
    print(f"{skill}: {count}")


Top 10 Most Required Skills:
analytics: 152
it: 130
leadership: 101
reporting: 74
testing: 36
cloud: 34
bi: 28
visualization: 18
transformation: 10
etl: 7


### Step 5: Categorize Skills by Industry
#### 📌 Goal: Compare the most in-demand skills across different industries.

1. Group job descriptions by industry.
2. Extract and analyze skills for each industry.
3. Compare IT vs. Marketing vs. Healthcare, etc..

In [None]:
# Group by industry and get the list of indices for each industry
industry_indices = df.groupby('industry').apply(lambda x: x.index.tolist()).to_dict()

# Print the dictionary to check
print(industry_indices)


{'$500 million to $1 billion (USD) per year': [41, 72], 'Accounting & Legal': [10, 15, 16, 19, 33, 35, 36, 37, 56, 57, 61, 64, 66, 67, 68, 87, 88, 95, 154, 156], 'Aerospace & Defense': [102, 123], 'Arts, Entertainment & Recreation': [39, 59, 70, 90], 'Business Services': [8, 22, 26, 27, 29, 46, 48, 53, 58, 77, 79, 84, 89, 98, 99, 114, 115, 119, 120, 122, 128, 143, 144], 'Company - Public': [148], 'Construction, Repair & Maintenance': [150], 'Finance': [2, 3, 5, 9, 32, 34, 43, 49, 54, 63, 65, 74, 80, 85, 108, 109, 136], 'Health Care': [11, 45, 76, 97, 113, 118, 127, 132, 147, 152], 'Information Technology': [0, 7, 17, 20, 21, 25, 30, 38, 40, 47, 51, 69, 71, 78, 82, 93, 100, 117, 125, 126, 142], 'Insurance': [101, 131, 134, 135, 138], 'Manufacturing': [60, 91, 96, 103, 110, 111, 141, 146, 151, 153, 155], 'Media': [12, 42, 44, 50, 73, 75, 81, 106, 112, 116, 121, 130], 'Mining & Metals': [149], 'Non-Profit': [18], 'Real Estate': [129], 'Retail': [124, 140, 145], 'Subsidiary or Business Seg

  industry_indices = df.groupby('industry').apply(lambda x: x.index.tolist()).to_dict()


In [None]:

# Dictionary to store skills by industry
industry_skills = {}

# Iterate through each industry and its corresponding indices
for industry, indices in industry_indices.items():
    industry_skills[industry] = []

    # Get skills from job_desc_entities using the row indices
    for idx in indices:
        if idx in job_desc_entities:  # Ensure index exists in extracted skills
          skills = [skill[0] for skill in job_desc_entities[idx] if skill[1] == "Skills"]
          industry_skills[industry].extend(skills)


# Remove duplicate skills per industry
for industry in industry_skills:
    industry_skills[industry] = list(set(industry_skills[industry]))

# Print industry skills
for industry, skills in industry_skills.items():
    print(f"{industry}: {skills}", "\n")


$500 million to $1 billion (USD) per year: ['visualization', 'loading', 'extraction', 'reporting', 'leadership', 'transformation'] 

Accounting & Legal: ['cybersecurity', 'etl', 'it', 'visualization', 'analytics', 'loading', 'reporting', 'testing', 'leadership', 'transformation'] 

Aerospace & Defense: ['analytics'] 

Arts, Entertainment & Recreation: ['testing', 'leadership', 'reporting', 'analytics'] 

Business Services: ['it', 'cloud', 'analytics', 'reporting', 'testing', 'leadership', 'transformation'] 

Company - Public: ['reporting', 'analytics'] 

Construction, Repair & Maintenance: ['leadership'] 

Finance: ['it', 'cloud', 'visualization', 'analytics', 'loading', 'extraction', 'reporting', 'bi', 'testing', 'leadership', 'transformation'] 

Health Care: ['etl', 'it', 'visualization', 'analytics', 'reporting', 'testing', 'leadership'] 

Information Technology: ['etl', 'it', 'cloud', 'visualization', 'analytics', 'bi', 'reporting', 'testing', 'leadership'] 

Insurance: ['it', 'epi