# NLP in Pyspark's MLlib Project

## Fake Job Posting Predictions

Indeed.com has just hired you to create a system that automatically flags suspicious job postings on it's website. It has recently seen an influx of fake job postings that is negativley impacting it's customer experience. Becuase of the high volume of job postings it receives everyday, their employees do have the capacity to check every posting so they would like prioritize which postings to review before deleting it. 

#### Your task
Use the attached dataset with NLP to create an alogorthim which automatically flags suspicious posts for review. 

#### The data
This dataset contains 18K job descriptions out of which about 800 are fake. The data consists of both textual information and meta-information about the jobs.

**Data Source:** https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction

#### Have fun!

In [1]:
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
# May take awhile locally
spark = SparkSession.builder.appName("NLP").getOrCreate()

cores = spark._jsc.sc().getExecutorMemoryStatus().keySet().size()
print("You are working with", cores, "core(s)")
spark

You are working with 1 core(s)


In [2]:
from pyspark.ml.feature import * #CountVectorizer,StringIndexer, RegexTokenizer,StopWordsRemover
from pyspark.sql.functions import * #col, udf,regexp_replace,isnull
from pyspark.sql.types import * #StringType,IntegerType
from pyspark.ml.classification import *
from pyspark.ml.evaluation import *
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

# For pipeline development
from pyspark.ml import Pipeline 

In [3]:
df = spark.read.csv("Datasets/fake_job_postings.csv",inferSchema=True,header=True)

In [4]:
df.limit(5).toPandas()

Unnamed: 0,job_id,title,location,department,salary_range,company_profile,description,requirements,benefits,telecommuting,has_company_logo,has_questions,employment_type,required_experience,required_education,industry,function,fraudulent
0,1,Marketing Intern,"US, NY, New York",Marketing,,"We're Food52, and we've created a groundbreaki...","Food52, a fast-growing, James Beard Award-winn...",Experience with content management systems a m...,,0,1,0,Other,Internship,,,Marketing,0
1,2,Customer Service - Cloud Video Production,"NZ, , Auckland",Success,,"90 Seconds, the worlds Cloud Video Production ...",Organised - Focused - Vibrant - Awesome!Do you...,What we expect from you:Your key responsibilit...,What you will get from usThrough being part of...,0,1,0,Full-time,Not Applicable,,Marketing and Advertising,Customer Service,0
2,3,Commissioning Machinery Assistant (CMA),"US, IA, Wever",,,Valor Services provides Workforce Solutions th...,"Our client, located in Houston, is actively se...",Implement pre-commissioning and commissioning ...,,0,1,0,,,,,,0
3,4,Account Executive - Washington DC,"US, DC, Washington",Sales,,Our passion for improving quality of life thro...,THE COMPANY: ESRI – Environmental Systems Rese...,"EDUCATION: Bachelor’s or Master’s in GIS, busi...",Our culture is anything but corporate—we have ...,0,1,0,Full-time,Mid-Senior level,Bachelor's Degree,Computer Software,Sales,0
4,5,Bill Review Manager,"US, FL, Fort Worth",,,SpotSource Solutions LLC is a Global Human Cap...,JOB TITLE: Itemization Review ManagerLOCATION:...,QUALIFICATIONS:RN license in the State of Texa...,Full Benefits Offered,0,1,1,Full-time,Mid-Senior level,Bachelor's Degree,Hospital & Health Care,Health Care Provider,0


In [6]:
df.columns

['job_id',
 'title',
 'location',
 'department',
 'salary_range',
 'company_profile',
 'description',
 'requirements',
 'benefits',
 'telecommuting',
 'has_company_logo',
 'has_questions',
 'employment_type',
 'required_experience',
 'required_education',
 'industry',
 'function',
 'fraudulent']

In [5]:
df.count()

17880

drop null values from fraudulent 

In [6]:
df= df.na.drop(subset=['fraudulent'])

In [7]:
from pyspark.sql.functions import *

def null_value_calc(df):
    null_columns_counts = []
    numRows = df.count()
    for k in df.columns:
        nullRows = df.where(col(k).isNull()).count()
        
        if(nullRows > 0):
            temp = k,nullRows,(nullRows/numRows)*100
            null_columns_counts.append(temp)
        else:
            temp = k,nullRows,(nullRows/numRows)*100
            null_columns_counts.append(temp)
    return(null_columns_counts)

null_columns_calc_list = null_value_calc(df)
spark.createDataFrame(null_columns_calc_list, ['Column_Name', 'Null_Values_Count','Null_Value_Percent']).show()


+-------------------+-----------------+-------------------+
|        Column_Name|Null_Values_Count| Null_Value_Percent|
+-------------------+-----------------+-------------------+
|             job_id|                0|                0.0|
|              title|                0|                0.0|
|           location|              342| 1.9317668323542703|
|         department|            11451|   64.6802982376864|
|       salary_range|            14841|  83.82851333032083|
|    company_profile|             3281| 18.532535020334386|
|        description|                0|                0.0|
|       requirements|             2572| 14.527790329868957|
|           benefits|             6962| 39.324446452779036|
|      telecommuting|               64|0.36150022593764125|
|   has_company_logo|               24|0.13556258472661548|
|      has_questions|               13|0.07342973339358337|
|    employment_type|             3286| 18.560777225485765|
|required_experience|             6696| 

In [8]:
df=df.drop('job_id')

In [9]:
df.columns

['title',
 'location',
 'department',
 'salary_range',
 'company_profile',
 'description',
 'requirements',
 'benefits',
 'telecommuting',
 'has_company_logo',
 'has_questions',
 'employment_type',
 'required_experience',
 'required_education',
 'industry',
 'function',
 'fraudulent']

In [None]:
#class

In [10]:
df.groupBy("fraudulent").count().orderBy(col("count").desc()).toPandas()

Unnamed: 0,fraudulent,count
0,0,16080
1,1,886
2,Full-time,73
3,Hospital & Health Care,55
4,Bachelor's Degree,53
...,...,...
253,with a keen interest in technology and highly...,1
254,"No franchise fee, we do not charge you a franc...",1
255,You are required to :Hold an SVQ in Health and...,1
256,Apple,1


In [11]:
df = df.filter("fraudulent IN('0','1')")
# Make sure it worked
df.groupBy("fraudulent").count().orderBy(col("count").desc()).show(truncate=False)

+----------+-----+
|fraudulent|count|
+----------+-----+
|0         |16080|
|1         |886  |
+----------+-----+



as there is class imbalance thus we could change the metric of evaluation as accuracy is not the best metric to use when evaluating imbalanced datasets 
we could use percision instead 

In [12]:
df.select("description").show(10,False)


+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [13]:
df= df[['description','fraudulent']]

In [14]:
pattern = '(.|'')(#)(\w+)'
split_pattern = r'.*?({pattern})'.format(pattern=pattern)
end_pattern = r'(.*{pattern}).*?$'.format(pattern=pattern)


In [15]:
df= df.withColumn("description_clean",regexp_replace('description', '[^A-Za-z ]+', ''))
        #   .withColumn("description_clean",regexp_replace('description', ' +', ' '))\
        # .withColumn("description_clean",regexp_replace('description', r'''(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))''', ''))\
        # .withhColumn("description_clean",regexp_replace('description',

In [83]:
df= df.withColumn("description_clean",regexp_replace('description', '/#\w+/gm', ''))

In [16]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
df.limit(10).toPandas()

Unnamed: 0,description,fraudulent,description_clean
0,"Food52, a fast-growing, James Beard Award-winning online food community and crowd-sourced and curated recipe hub, is currently interviewing full- and part-time unpaid interns to work in a small team of editors, executives, and developers in its New York City headquarters.Reproducing and/or repackaging existing Food52 content for a number of partner sites, such as Huffington Post, Yahoo, Buzzfeed, and more in their various content management systemsResearching blogs and websites for the Provisions by Food52 Affiliate ProgramAssisting in day-to-day affiliate program support, such as screening affiliates and assisting in any affiliate inquiriesSupporting with PR &amp; Events when neededHelping with office administrative work, such as filing, mailing, and preparing for meetingsWorking with developers to document bugs and suggest improvements to the siteSupporting the marketing and executive staff",0,Food a fastgrowing James Beard Awardwinning online food community and crowdsourced and curated recipe hub is currently interviewing full and parttime unpaid interns to work in a small team of editors executives and developers in its New York City headquartersReproducing andor repackaging existing Food content for a number of partner sites such as Huffington Post Yahoo Buzzfeed and more in their various content management systemsResearching blogs and websites for the Provisions by Food Affiliate ProgramAssisting in daytoday affiliate program support such as screening affiliates and assisting in any affiliate inquiriesSupporting with PR amp Events when neededHelping with office administrative work such as filing mailing and preparing for meetingsWorking with developers to document bugs and suggest improvements to the siteSupporting the marketing and executive staff
1,"Organised - Focused - Vibrant - Awesome!Do you have a passion for customer service? Slick typing skills? Maybe Account Management? ...And think administration is cooler than a polar bear on a jetski? Then we need to hear you! We are the Cloud Video Production Service and opperating on a glodal level. Yeah, it's pretty cool. Serious about delivering a world class product and excellent customer service.Our rapidly expanding business is looking for a talented Project Manager to manage the successful delivery of video projects, manage client communications and drive the production process. Work with some of the coolest brands on the planet and learn from a global team that are representing NZ is a huge way!We are entering the next growth stage of our business and growing quickly internationally. Therefore, the position is bursting with opportunity for the right person entering the business at the right time. 90 Seconds, the worlds Cloud Video Production Service - http://90#URL_fbe6559afac620a3cd2c22281f7b8d0eef56a73e3d9a311e2f1ca13d081dd630#90 Seconds is the worlds Cloud Video Production Service enabling brands and agencies to get high quality online video content shot and produced anywhere in the world. Fast, affordable, and all managed seamlessly in the cloud from purchase to publish. 90 Seconds removes the hassle, cost, risk and speed issues of working with regular video production companies by managing every aspect of video projects in a beautiful online experience. With a growing network of over 2,000 rated video professionals in over 50 countries and dedicated production success teams in 5 countries guaranteeing video project success 100%. It's as easy as commissioning a quick google adwords campaign.90 Seconds has produced almost 4,000 videos in over 30 Countries for over 500 Global brands including some of the worlds largest including Paypal, L'oreal, Sony and Barclays and has offices in Auckland, London, Sydney, Tokyo &amp; Singapore.Our Auckland office is based right in the heart of the Wynyard Quarter Innovation Precinct - GridAKL!",0,Organised Focused Vibrant AwesomeDo you have a passion for customer service Slick typing skills Maybe Account Management And think administration is cooler than a polar bear on a jetski Then we need to hear youWe are the Cloud Video Production Service and opperating on a glodal level Yeah its pretty cool Serious aboutdelivering a world class product and excellent customer serviceOur rapidly expanding business is looking for a talented Project Manager to manage the successful delivery of video projects manage client communications and drive the production process Work with some of the coolest brands on the planet and learn from a global team that are representing NZ is a huge wayWe are entering the next growth stage of our business and growing quickly internationally Therefore the position is bursting with opportunity for the right person entering the business at the right time Seconds the worlds Cloud Video Production Service httpURLfbeafacacdcfbdeefaedaefcaddd Seconds is the worlds Cloud Video Production Service enabling brands and agencies to get high quality online video content shot and produced anywhere in the world Fast affordable and all managed seamlessly in the cloud from purchase to publish Seconds removes the hassle cost risk and speed issues of working with regular video production companies by managing every aspect of video projects in a beautiful online experience With a growing network of over rated video professionals in over countries and dedicated production success teams in countries guaranteeing video project success Its as easy as commissioning a quick google adwords campaign Seconds has produced almost videos in over Countries for over Global brands including some of the worlds largest including Paypal Loreal Sony and Barclays and has offices in Auckland London Sydney Tokyo amp SingaporeOur Auckland office is basedright in the heart of the Wynyard Quarter Innovation Precinct GridAKL
2,"Our client, located in Houston, is actively seeking an experienced Commissioning Machinery Assistant that possesses strong supervisory skills and has an attention to detail. A strong dedication to safety is a must. The ideal candidate will execute all activities while complying with quality requirements and health, environmental, and safety regulations.",0,Our client located in Houston is actively seeking an experienced Commissioning Machinery Assistant that possesses strong supervisory skills and has an attention to detail A strong dedication to safety is a must The ideal candidate will execute all activities while complying with quality requirements and health environmental and safety regulations
3,"THE COMPANY: ESRI – Environmental Systems Research InstituteOur passion for improving quality of life through geography is at the heart of everything we do. Esri’s geographic information system (GIS) technology inspires and enables governments, universities and businesses worldwide to save money, lives and our environment through a deeper understanding of the changing world around them.Carefully managed growth and zero debt give Esri stability that is uncommon in today's volatile business world. Privately held, we offer exceptional benefits, competitive salaries, 401(k) and profit-sharing programs, opportunities for personal and professional growth, and much more.THE OPPORTUNITY: Account ExecutiveAs a member of the Sales Division, you will work collaboratively with an account team in order to sell and promote adoption of Esri’s ArcGIS platform within an organization. As part of an account team, you will be responsible for facilitating the development and execution of a set of strategies for a defined portfolio of accounts. When executing these strategies you will utilize your experience in enterprise sales to help customers leverage geospatial information and technology to achieve their business goals. Specifically…Prospect and develop opportunities to partner with key stakeholders to envision, develop, and implement a location strategy for their organizationClearly articulate the strength and value proposition of the ArcGIS platformDevelop and maintain a healthy pipeline of opportunities for business growthDemonstrate a thoughtful understanding of insightful industry knowledge and how GIS applies to initiatives, trends, and triggersUnderstand the key business drivers within an organization and identify key business stakeholdersUnderstand your customers’ budgeting and acquisition processesSuccessfully execute the account management process including account prioritization, account resourcing, and account planningSuccessfully execute the sales process for all opportunitiesLeverage and lead an account team consisting of sales and other cross-divisional resources to define and execute an account strategyEffectively utilize and leverage the CRM to manage opportunities and drive the buying processPursue professional and personal development to ensure competitive knowledge of the real estate industryLeverage social media to successfully prospect and build a professional networkParticipate in trade shows, workshops, and seminars (as required)Support visual story telling through effective whiteboard sessionsBe resourceful and takes initiative to resolve issues",0,THE COMPANY ESRI Environmental Systems Research InstituteOur passion for improving quality of life through geography is at the heart of everything we do Esris geographic information system GIS technology inspires and enables governments universities and businesses worldwide to save money lives and our environment through a deeper understanding of the changing world around themCarefully managed growth and zero debt give Esri stability that is uncommon in todays volatile business world Privately held we offer exceptional benefits competitive salaries k and profitsharing programs opportunities for personal and professional growth and much moreTHE OPPORTUNITY Account ExecutiveAs a member of the Sales Division you will work collaboratively with an account team in order to sell and promote adoption of Esris ArcGIS platform within an organization As part of an account team you will be responsible for facilitating the development and execution of a set of strategies for a defined portfolio of accounts When executing these strategies you will utilize your experience in enterprise sales to help customers leverage geospatial information and technology to achieve their business goalsSpecificallyProspect and develop opportunities to partner with key stakeholders to envision develop and implement a location strategy for their organizationClearly articulate the strength and value proposition of the ArcGIS platformDevelop and maintain a healthy pipeline of opportunities for business growthDemonstrate a thoughtful understanding of insightful industry knowledge and how GIS applies to initiatives trends and triggersUnderstand the key business drivers within an organization and identify key business stakeholdersUnderstand your customers budgeting and acquisition processesSuccessfully execute the account management process including account prioritization account resourcing and account planningSuccessfully execute the sales process for all opportunitiesLeverage and lead an account team consisting of sales and other crossdivisional resources to define and execute an account strategyEffectively utilize and leverage the CRM to manage opportunities and drive the buying processPursue professional and personal development to ensure competitive knowledge of the real estate industryLeverage social media to successfully prospect and build a professional networkParticipate in trade shows workshops and seminars as requiredSupport visual story telling through effective whiteboard sessionsBe resourceful and takes initiative to resolve issues
4,"JOB TITLE: Itemization Review ManagerLOCATION: Fort Worth, TX DEPARTMENT: Itemization ReviewREPORTS TO: VP Operations GENERAL DESCRIPTION:Responsible for the overall aspects of Itemization Review operations: Personnel Hiring, Quality Control of Process, Workflow, monitoring the tracking of and accountability of staff regarding production standards and department expectations.DUTIES AND RESPONSIBILITIES:Oversee company’s Itemization Review department in its operationsResponsible for encouraging and reinforcing company cultureDevelops processes to better department and implements new procedures/protocols Works with Customer Service on elevated issues and provider callsImplements and Audits policy in conjunction with Policy and Payment Integrity department Monitoring quality/and quality control of results for department Responsible for ensuring overall metrics are in compliance with management and client expectationsResponsible for human resources matters directly related to department supervised (i.e. Interviewing, Hiring, Training, annual evaluations, electronic time cards, and addressing personnel issues)May create/review daily, weekly, monthly reports, invoices, logs and expensesAdditional duties/responsibilities as assigned Comply with all safety rules/regulations, in conjunction with the Injury and Illness Prevention Program (“IIPP”), as well as, maintain HIPAA complianceOccasional interaction with customers",0,JOB TITLE Itemization Review ManagerLOCATION Fort Worth TX DEPARTMENT Itemization ReviewREPORTS TO VP Operations GENERAL DESCRIPTIONResponsible for the overall aspects of Itemization Review operations Personnel Hiring Quality Control of Process Workflow monitoring the tracking of and accountability of staff regarding production standards and department expectationsDUTIES AND RESPONSIBILITIESOversee companys Itemization Review department in its operationsResponsible for encouraging and reinforcing company cultureDevelops processes to better department and implements new proceduresprotocols Works with Customer Service on elevated issues and provider callsImplements and Audits policy in conjunction with Policy and Payment Integrity department Monitoring qualityand quality control of results for department Responsible for ensuring overall metrics are in compliance with management and client expectationsResponsible for human resources matters directly related to department supervised ie Interviewing Hiring Training annual evaluations electronic time cards and addressing personnel issuesMay createreview daily weekly monthly reports invoices logs and expensesAdditional dutiesresponsibilities as assigned Comply with all safety rulesregulations in conjunction with the Injury and Illness Prevention Program IIPP as well as maintain HIPAA complianceOccasional interaction with customers
5,"Job OverviewApex is an environmental consulting firm that offers stable leadership and growth and views employees as valuable resources. We are seeking a self-motivated, multi-faceted Accounts Payable Clerk to join our team in Rockville, MD and become an integral part of our continued success story. This position entails processing high volume of invoices and working in a fast pace environment; keying and verifying various types of invoices to General Ledger accounts and job numbers submitted by vendors and company personnel; and calculating balance due to vendor by reviewing history of prior payments made to an account. Candidate must be able to answer vendor and personnel inquiries via phone or email. QualificationsThis position requires a high school diploma and 2-5 years of relevant work experience; keen attention to detail; knowledge of commonly-used concepts, practices, and procedures within the accounting field; experience with accounting software; proficiency in MS Office Suite including advanced Excel experience; and a high degree of professionalism.Want to join a team of talented accounting professionals, engineers, and managers? Submit your resume for consideration today!#URL_f030e16ff4531e87a62857357985e3e8f1fdedb40dbfebfeb0e7e3a5ead65097#About ApexApex is a customer-focused company that delivers environmental, health, safety and engineering services to over 700 clients across the United States and abroad. Driven by an entrepreneurial spirit and a dedication to providing responsive, cost-effective solutions, Apex has grown rapidly since our founding in 1988.Working in partnership with our public and private sector clients, our team of experts provides services tailored to support each customer’s unique goals and objectives. By blending strong technical skills, business acumen, and superior customer service, we are able to deliver creative solutions that deliver high quality results at low cost.From commercial and industrial firms to construction, petroleum, and utility companies to financial institutions and government clients, Apex has extensive experience in a wide variety of industries. Our corporate professional resume includes proven capabilities in the areas of water resources, remediation and restoration, assessment and compliance, and industrial hygiene, among others.Ranked in the Top 200 Environmental Firms by ENR Magazine, ranked among the Top 500 Design Firms by ENR Magazine, awarded the 2011 National Environmental Excellence Award for Environmental Stewardship by the National Association of Environmental Professionals, and selected as a 2010 Hot Firm by the Zweig Letter, come join our award winning team.Apex is an entrepreneurial firm, and ensuring that our senior managers are able to move unencumbered is our priority. We are a successful and growing mid-sized firm. We’re small enough that our employees still have access to our leadership, and it’s easy for high-performers to be recognized for their contributions and advance without bureaucracy. With over 30 office locations, we’re big enough to provide comprehensive environmental consulting and engineering services to our diverse client base and to provide resources to our employees to help in their professional development. We offer incentive bonus plans and ownership opportunities for our successful managers.Apex Companies, LLC is an Affirmative Action/Equal Opportunity Employer",0,Job OverviewApex is an environmental consulting firm that offers stable leadership and growth and views employees as valuable resources We are seeking a selfmotivated multifaceted Accounts Payable Clerk to join our team in Rockville MD and become an integral part of our continued success story This position entails processing high volume of invoices and working in a fast pace environment keying and verifying various types of invoices to General Ledger accounts and job numbers submitted by vendors and company personnel and calculating balance due to vendor by reviewing history of prior payments made to an account Candidate must be able to answer vendor and personnel inquiries via phone or email QualificationsThis position requires a high school diploma and years of relevant work experience keen attention to detail knowledge of commonlyused concepts practices and procedures within the accounting field experience with accounting software proficiency in MS Office Suite including advanced Excel experience and a high degree of professionalismWant to join a team of talented accounting professionals engineers and managers Submit your resume for consideration todayURLfeffeaeeffdedbdbfebfebeeaeadAbout ApexApex is a customerfocused company that delivers environmental health safety and engineering services to over clients across the United States and abroad Driven by an entrepreneurial spirit and a dedication to providing responsive costeffective solutions Apex has grown rapidly since our founding in Working in partnership with our public and private sector clients our team of experts provides services tailored to support each customers unique goals and objectives By blending strong technical skills business acumen and superior customer service we are able to deliver creative solutions that deliver high quality results at low costFrom commercial and industrial firms to construction petroleum and utility companies to financial institutions and government clients Apex has extensive experience in a wide variety of industries Our corporate professional resume includes proven capabilities in the areas of water resources remediation and restoration assessment and compliance and industrial hygiene among othersRanked in the Top Environmental Firms by ENR Magazine ranked among the Top Design Firms by ENR Magazine awarded the National Environmental Excellence Award for Environmental Stewardship by the National Association of Environmental Professionals and selected as a Hot Firm by the Zweig Letter come join our award winning teamApex is an entrepreneurial firm and ensuring that our senior managers are able to move unencumbered is our priority We are a successful and growing midsized firm Were small enough that our employees still have access to our leadership and its easy for highperformers to be recognized for their contributions and advance without bureaucracy With over office locations were big enough to provide comprehensive environmental consulting and engineering services to our diverse client base and to provide resources to our employees to help in their professional development We offer incentive bonus plans and ownership opportunities for our successful managersApex Companies LLC is an Affirmative ActionEqual Opportunity Employer
6,Your Responsibilities: Manage the English-speaking editorial team and build a team of best-in-class editorsSet up content creation schedules and ensure deadlines are adhered toResearch and write about the latest tech topics and news in relation to the Android ecosystemEnsure that the content on the site is of a consistently high qualityBe the face and voice of #URL_874846adb69d98865d05ec57ce2425d9e363ef71e0c8436e59e86a136a508716#,0,Your ResponsibilitiesManage the Englishspeaking editorial team and build a team of bestinclass editorsSet up content creation schedules and ensure deadlines are adhered toResearch and write about the latest tech topics and news in relation to the Android ecosystemEnsure that the content on the site is of a consistently high qualityBe the face and voice of URLadbddeccedeefeceeaa
7,"Who is Airenvy?Hey there! We are seasoned entrepreneurs in the heart of San Francisco’s SOMA neighborhood. We are looking for someone who embodies an entrepreneurial spirit, pays strong attention to detail and wants to be a part of the next big thing. This business can feel like a circus at times, but we have an all-star team with a one of a kind culture. Get a little taste of it here.Airenvy is the #1 technology driven property management company in a multi-billion dollar industry and is revolutionizing the vacation rental space! We are growing at record speed and expanding to new markets! Our platform allows owners to put their vacation rental on autopilot. We are a proven team of startup veterans and would love for you to join the family! In 2014 we were named the #1 Airbnb property management company in San Francisco according to the SF Chronicle. We have 18 supportive and resourceful investors, many of whom are leaders in the technology and real estate industries.The PositionWANTED: Ultimate Peace Keeper &amp; Problem SolverAirenvy is growing faster than we can handle, which is why we’re looking for someone to help us scale! We are seeking best-in-class Lead Guest Service Specialist who are passionate about delighting Guests and Owners. You’ll play a direct role in improving the customer experience, scaling the business, and creating powerful brand advocates.ResponsibilitiesService First - Interact with Guests and Owners daily; listen and address inquiries via phone, email, and chat.Leadership - Set the precedent for writing beautiful, helpful emails and getting to inbox-zero. Be the first to answer the phone and the last to give-up on an interesting escalation.Cross Collaboration - Act as the eyes and ears of the Airenvy business. Speak-to bug requests, new features, and influence the product positively.Ultimate Multitasker - You’re able to manage multiple day-to-day gifts at once. You’re able to ensure that each person in contact with Airenvy has a positive experience, even when facing hundreds of emails a day.You?Proven ability to take customers from irate to delightedAble to make decisions quickly; high sense of urgency that spills out to other team membersPassion for delighting people!Thrive under pressure; you’re proactive in recognizing and solving issues before they ariseExcellent written and verbal communication skills -- you spot an error without spell checkFocused on defining and scaling the business thru playbook definition",0,Who is AirenvyHey there We are seasoned entrepreneurs in the heart of San Franciscos SOMA neighborhood We are looking for someone who embodies an entrepreneurial spirit pays strong attention to detail and wants to be a part of the next big thing This business can feel like a circus at times but we have an allstar team with a one of a kind culture Get a little taste of it hereAirenvy is the technology driven property management company in amultibillion dollar industryand is revolutionizing the vacation rental space We are growing at record speed and expanding to new markets Our platform allows owners to put their vacation rental on autopilot We are a proven team of startup veterans and would love for you to join thefamily In we were named the Airbnb property management company in San Francisco according to theSF Chronicle We have supportive and resourceful investors many of whom are leaders in the technology and real estate industriesThe PositionWANTED Ultimate Peace Keeper amp Problem SolverAirenvy is growing faster than we can handle which is why were looking for someone to help us scale We are seeking bestinclass Lead Guest Service Specialist who are passionate about delighting Guests and Owners Youll play a direct role in improving the customer experience scaling the business and creating powerful brand advocatesResponsibilitiesService First Interact with Guests and Owners daily listen and address inquiries via phone email and chatLeadership Set the precedent for writing beautiful helpful emails and getting to inboxzero Be the first to answer the phone and the last to giveup on an interesting escalationCross Collaboration Act as the eyes and ears of the Airenvy business Speakto bug requests new features and influence the product positivelyUltimate Multitasker Youre able to manage multiple daytoday gifts at once Youre able to ensure that each person in contact with Airenvy has a positive experience even when facing hundreds of emails a dayYouProven ability to take customers from irate to delightedAble to make decisions quickly high sense of urgency that spills out to other team membersPassion for delighting peopleThrive under pressure youre proactive in recognizing and solving issues before they ariseExcellent written and verbal communication skills you spot an error without spell checkFocused on defining and scaling the business thru playbook definition
8,Implementation/Configuration/Testing/Training on:HP Service Health Reporter,0,ImplementationConfigurationTestingTraining onHP Service Health Reporter
9,"The Customer Service Associate will be based in Phoenix, AZ. The right candidate will be an integral part of our talented team, supporting our continued growth.Responsibilities:Perform various Mail Center activities (sorting, metering, folding, inserting, delivery, pickup, etc.)Lift heavy boxes, files or paper when neededMaintain the highest levels of customer care while demonstrating a friendly and cooperative attitudeDemonstrate flexibility in satisfying customer demands in a high volume, production environmentConsistently adhere to business procedure guidelinesAdhere to all safety proceduresTake direction from supervisor or site managerMaintain all logs and reporting documentation; attention to detailParticipate in cross-training and perform other duties as assigned (Filing, outgoing shipments, etc)Operating mailing, copy or scanning equipmentShipping &amp; ReceivingHandle time-sensitive material like confidential, urgent packagesPerform other tasks as assignedScanning incoming mail to recipientsPerform file purges and pullsCreate files and ship filesProvide backfill when neededEnter information daily into spreadsheetsIdentify charges and match them to billingSort and deliver mail, small packages",0,The Customer Service Associate will be based in Phoenix AZ The right candidate will be an integral part of our talented team supporting our continued growthResponsibilitiesPerform various Mail Center activities sorting metering folding inserting delivery pickup etcLift heavy boxes files or paper when neededMaintain the highest levels of customer care while demonstrating a friendly and cooperative attitudeDemonstrate flexibility in satisfying customer demands in a high volume production environmentConsistently adhere to business procedure guidelinesAdhere to all safety proceduresTake direction from supervisor or site managerMaintain all logs and reporting documentation attention to detailParticipate in crosstraining and perform other duties as assigned Filing outgoing shipments etcOperating mailing copy or scanning equipmentShipping amp ReceivingHandle timesensitive material like confidential urgent packagesPerform other tasks as assignedScanning incoming mail to recipientsPerform file purges and pullsCreate files and ship filesProvide backfill when neededEnter information daily into spreadsheetsIdentify charges and match them to billingSort and deliver mail small packages


In [86]:
df = df.withColumn("description_clean",lower(col('description_clean')))
df.limit(10).toPandas()

Unnamed: 0,description,fraudulent,description_clean
0,"Food52, a fast-growing, James Beard Award-winning online food community and crowd-sourced and curated recipe hub, is currently interviewing full- and part-time unpaid interns to work in a small team of editors, executives, and developers in its New York City headquarters.Reproducing and/or repackaging existing Food52 content for a number of partner sites, such as Huffington Post, Yahoo, Buzzfeed, and more in their various content management systemsResearching blogs and websites for the Provisions by Food52 Affiliate ProgramAssisting in day-to-day affiliate program support, such as screening affiliates and assisting in any affiliate inquiriesSupporting with PR &amp; Events when neededHelping with office administrative work, such as filing, mailing, and preparing for meetingsWorking with developers to document bugs and suggest improvements to the siteSupporting the marketing and executive staff",0,food a fastgrowing james beard awardwinning online food community and crowdsourced and curated recipe hub is currently interviewing full and parttime unpaid interns to work in a small team of editors executives and developers in its new york city headquartersreproducing andor repackaging existing food content for a number of partner sites such as huffington post yahoo buzzfeed and more in their various content management systemsresearching blogs and websites for the provisions by food affiliate programassisting in daytoday affiliate program support such as screening affiliates and assisting in any affiliate inquiriessupporting with pr amp events when neededhelping with office administrative work such as filing mailing and preparing for meetingsworking with developers to document bugs and suggest improvements to the sitesupporting the marketing and executive staff
1,"Organised - Focused - Vibrant - Awesome!Do you have a passion for customer service? Slick typing skills? Maybe Account Management? ...And think administration is cooler than a polar bear on a jetski? Then we need to hear you! We are the Cloud Video Production Service and opperating on a glodal level. Yeah, it's pretty cool. Serious about delivering a world class product and excellent customer service.Our rapidly expanding business is looking for a talented Project Manager to manage the successful delivery of video projects, manage client communications and drive the production process. Work with some of the coolest brands on the planet and learn from a global team that are representing NZ is a huge way!We are entering the next growth stage of our business and growing quickly internationally. Therefore, the position is bursting with opportunity for the right person entering the business at the right time. 90 Seconds, the worlds Cloud Video Production Service - http://90#URL_fbe6559afac620a3cd2c22281f7b8d0eef56a73e3d9a311e2f1ca13d081dd630#90 Seconds is the worlds Cloud Video Production Service enabling brands and agencies to get high quality online video content shot and produced anywhere in the world. Fast, affordable, and all managed seamlessly in the cloud from purchase to publish. 90 Seconds removes the hassle, cost, risk and speed issues of working with regular video production companies by managing every aspect of video projects in a beautiful online experience. With a growing network of over 2,000 rated video professionals in over 50 countries and dedicated production success teams in 5 countries guaranteeing video project success 100%. It's as easy as commissioning a quick google adwords campaign.90 Seconds has produced almost 4,000 videos in over 30 Countries for over 500 Global brands including some of the worlds largest including Paypal, L'oreal, Sony and Barclays and has offices in Auckland, London, Sydney, Tokyo &amp; Singapore.Our Auckland office is based right in the heart of the Wynyard Quarter Innovation Precinct - GridAKL!",0,organised focused vibrant awesomedo you have a passion for customer service slick typing skills maybe account management and think administration is cooler than a polar bear on a jetski then we need to hear youwe are the cloud video production service and opperating on a glodal level yeah its pretty cool serious aboutdelivering a world class product and excellent customer serviceour rapidly expanding business is looking for a talented project manager to manage the successful delivery of video projects manage client communications and drive the production process work with some of the coolest brands on the planet and learn from a global team that are representing nz is a huge waywe are entering the next growth stage of our business and growing quickly internationally therefore the position is bursting with opportunity for the right person entering the business at the right time seconds the worlds cloud video production service httpurlfbeafacacdcfbdeefaedaefcaddd seconds is the worlds cloud video production service enabling brands and agencies to get high quality online video content shot and produced anywhere in the world fast affordable and all managed seamlessly in the cloud from purchase to publish seconds removes the hassle cost risk and speed issues of working with regular video production companies by managing every aspect of video projects in a beautiful online experience with a growing network of over rated video professionals in over countries and dedicated production success teams in countries guaranteeing video project success its as easy as commissioning a quick google adwords campaign seconds has produced almost videos in over countries for over global brands including some of the worlds largest including paypal loreal sony and barclays and has offices in auckland london sydney tokyo amp singaporeour auckland office is basedright in the heart of the wynyard quarter innovation precinct gridakl
2,"Our client, located in Houston, is actively seeking an experienced Commissioning Machinery Assistant that possesses strong supervisory skills and has an attention to detail. A strong dedication to safety is a must. The ideal candidate will execute all activities while complying with quality requirements and health, environmental, and safety regulations.",0,our client located in houston is actively seeking an experienced commissioning machinery assistant that possesses strong supervisory skills and has an attention to detail a strong dedication to safety is a must the ideal candidate will execute all activities while complying with quality requirements and health environmental and safety regulations
3,"THE COMPANY: ESRI – Environmental Systems Research InstituteOur passion for improving quality of life through geography is at the heart of everything we do. Esri’s geographic information system (GIS) technology inspires and enables governments, universities and businesses worldwide to save money, lives and our environment through a deeper understanding of the changing world around them.Carefully managed growth and zero debt give Esri stability that is uncommon in today's volatile business world. Privately held, we offer exceptional benefits, competitive salaries, 401(k) and profit-sharing programs, opportunities for personal and professional growth, and much more.THE OPPORTUNITY: Account ExecutiveAs a member of the Sales Division, you will work collaboratively with an account team in order to sell and promote adoption of Esri’s ArcGIS platform within an organization. As part of an account team, you will be responsible for facilitating the development and execution of a set of strategies for a defined portfolio of accounts. When executing these strategies you will utilize your experience in enterprise sales to help customers leverage geospatial information and technology to achieve their business goals. Specifically…Prospect and develop opportunities to partner with key stakeholders to envision, develop, and implement a location strategy for their organizationClearly articulate the strength and value proposition of the ArcGIS platformDevelop and maintain a healthy pipeline of opportunities for business growthDemonstrate a thoughtful understanding of insightful industry knowledge and how GIS applies to initiatives, trends, and triggersUnderstand the key business drivers within an organization and identify key business stakeholdersUnderstand your customers’ budgeting and acquisition processesSuccessfully execute the account management process including account prioritization, account resourcing, and account planningSuccessfully execute the sales process for all opportunitiesLeverage and lead an account team consisting of sales and other cross-divisional resources to define and execute an account strategyEffectively utilize and leverage the CRM to manage opportunities and drive the buying processPursue professional and personal development to ensure competitive knowledge of the real estate industryLeverage social media to successfully prospect and build a professional networkParticipate in trade shows, workshops, and seminars (as required)Support visual story telling through effective whiteboard sessionsBe resourceful and takes initiative to resolve issues",0,the company esri environmental systems research instituteour passion for improving quality of life through geography is at the heart of everything we do esris geographic information system gis technology inspires and enables governments universities and businesses worldwide to save money lives and our environment through a deeper understanding of the changing world around themcarefully managed growth and zero debt give esri stability that is uncommon in todays volatile business world privately held we offer exceptional benefits competitive salaries k and profitsharing programs opportunities for personal and professional growth and much morethe opportunity account executiveas a member of the sales division you will work collaboratively with an account team in order to sell and promote adoption of esris arcgis platform within an organization as part of an account team you will be responsible for facilitating the development and execution of a set of strategies for a defined portfolio of accounts when executing these strategies you will utilize your experience in enterprise sales to help customers leverage geospatial information and technology to achieve their business goalsspecificallyprospect and develop opportunities to partner with key stakeholders to envision develop and implement a location strategy for their organizationclearly articulate the strength and value proposition of the arcgis platformdevelop and maintain a healthy pipeline of opportunities for business growthdemonstrate a thoughtful understanding of insightful industry knowledge and how gis applies to initiatives trends and triggersunderstand the key business drivers within an organization and identify key business stakeholdersunderstand your customers budgeting and acquisition processessuccessfully execute the account management process including account prioritization account resourcing and account planningsuccessfully execute the sales process for all opportunitiesleverage and lead an account team consisting of sales and other crossdivisional resources to define and execute an account strategyeffectively utilize and leverage the crm to manage opportunities and drive the buying processpursue professional and personal development to ensure competitive knowledge of the real estate industryleverage social media to successfully prospect and build a professional networkparticipate in trade shows workshops and seminars as requiredsupport visual story telling through effective whiteboard sessionsbe resourceful and takes initiative to resolve issues
4,"JOB TITLE: Itemization Review ManagerLOCATION: Fort Worth, TX DEPARTMENT: Itemization ReviewREPORTS TO: VP Operations GENERAL DESCRIPTION:Responsible for the overall aspects of Itemization Review operations: Personnel Hiring, Quality Control of Process, Workflow, monitoring the tracking of and accountability of staff regarding production standards and department expectations.DUTIES AND RESPONSIBILITIES:Oversee company’s Itemization Review department in its operationsResponsible for encouraging and reinforcing company cultureDevelops processes to better department and implements new procedures/protocols Works with Customer Service on elevated issues and provider callsImplements and Audits policy in conjunction with Policy and Payment Integrity department Monitoring quality/and quality control of results for department Responsible for ensuring overall metrics are in compliance with management and client expectationsResponsible for human resources matters directly related to department supervised (i.e. Interviewing, Hiring, Training, annual evaluations, electronic time cards, and addressing personnel issues)May create/review daily, weekly, monthly reports, invoices, logs and expensesAdditional duties/responsibilities as assigned Comply with all safety rules/regulations, in conjunction with the Injury and Illness Prevention Program (“IIPP”), as well as, maintain HIPAA complianceOccasional interaction with customers",0,job title itemization review managerlocation fort worth tx department itemization reviewreports to vp operations general descriptionresponsible for the overall aspects of itemization review operations personnel hiring quality control of process workflow monitoring the tracking of and accountability of staff regarding production standards and department expectationsduties and responsibilitiesoversee companys itemization review department in its operationsresponsible for encouraging and reinforcing company culturedevelops processes to better department and implements new proceduresprotocols works with customer service on elevated issues and provider callsimplements and audits policy in conjunction with policy and payment integrity department monitoring qualityand quality control of results for department responsible for ensuring overall metrics are in compliance with management and client expectationsresponsible for human resources matters directly related to department supervised ie interviewing hiring training annual evaluations electronic time cards and addressing personnel issuesmay createreview daily weekly monthly reports invoices logs and expensesadditional dutiesresponsibilities as assigned comply with all safety rulesregulations in conjunction with the injury and illness prevention program iipp as well as maintain hipaa complianceoccasional interaction with customers
5,"Job OverviewApex is an environmental consulting firm that offers stable leadership and growth and views employees as valuable resources. We are seeking a self-motivated, multi-faceted Accounts Payable Clerk to join our team in Rockville, MD and become an integral part of our continued success story. This position entails processing high volume of invoices and working in a fast pace environment; keying and verifying various types of invoices to General Ledger accounts and job numbers submitted by vendors and company personnel; and calculating balance due to vendor by reviewing history of prior payments made to an account. Candidate must be able to answer vendor and personnel inquiries via phone or email. QualificationsThis position requires a high school diploma and 2-5 years of relevant work experience; keen attention to detail; knowledge of commonly-used concepts, practices, and procedures within the accounting field; experience with accounting software; proficiency in MS Office Suite including advanced Excel experience; and a high degree of professionalism.Want to join a team of talented accounting professionals, engineers, and managers? Submit your resume for consideration today!#URL_f030e16ff4531e87a62857357985e3e8f1fdedb40dbfebfeb0e7e3a5ead65097#About ApexApex is a customer-focused company that delivers environmental, health, safety and engineering services to over 700 clients across the United States and abroad. Driven by an entrepreneurial spirit and a dedication to providing responsive, cost-effective solutions, Apex has grown rapidly since our founding in 1988.Working in partnership with our public and private sector clients, our team of experts provides services tailored to support each customer’s unique goals and objectives. By blending strong technical skills, business acumen, and superior customer service, we are able to deliver creative solutions that deliver high quality results at low cost.From commercial and industrial firms to construction, petroleum, and utility companies to financial institutions and government clients, Apex has extensive experience in a wide variety of industries. Our corporate professional resume includes proven capabilities in the areas of water resources, remediation and restoration, assessment and compliance, and industrial hygiene, among others.Ranked in the Top 200 Environmental Firms by ENR Magazine, ranked among the Top 500 Design Firms by ENR Magazine, awarded the 2011 National Environmental Excellence Award for Environmental Stewardship by the National Association of Environmental Professionals, and selected as a 2010 Hot Firm by the Zweig Letter, come join our award winning team.Apex is an entrepreneurial firm, and ensuring that our senior managers are able to move unencumbered is our priority. We are a successful and growing mid-sized firm. We’re small enough that our employees still have access to our leadership, and it’s easy for high-performers to be recognized for their contributions and advance without bureaucracy. With over 30 office locations, we’re big enough to provide comprehensive environmental consulting and engineering services to our diverse client base and to provide resources to our employees to help in their professional development. We offer incentive bonus plans and ownership opportunities for our successful managers.Apex Companies, LLC is an Affirmative Action/Equal Opportunity Employer",0,job overviewapex is an environmental consulting firm that offers stable leadership and growth and views employees as valuable resources we are seeking a selfmotivated multifaceted accounts payable clerk to join our team in rockville md and become an integral part of our continued success story this position entails processing high volume of invoices and working in a fast pace environment keying and verifying various types of invoices to general ledger accounts and job numbers submitted by vendors and company personnel and calculating balance due to vendor by reviewing history of prior payments made to an account candidate must be able to answer vendor and personnel inquiries via phone or email qualificationsthis position requires a high school diploma and years of relevant work experience keen attention to detail knowledge of commonlyused concepts practices and procedures within the accounting field experience with accounting software proficiency in ms office suite including advanced excel experience and a high degree of professionalismwant to join a team of talented accounting professionals engineers and managers submit your resume for consideration todayurlfeffeaeeffdedbdbfebfebeeaeadabout apexapex is a customerfocused company that delivers environmental health safety and engineering services to over clients across the united states and abroad driven by an entrepreneurial spirit and a dedication to providing responsive costeffective solutions apex has grown rapidly since our founding in working in partnership with our public and private sector clients our team of experts provides services tailored to support each customers unique goals and objectives by blending strong technical skills business acumen and superior customer service we are able to deliver creative solutions that deliver high quality results at low costfrom commercial and industrial firms to construction petroleum and utility companies to financial institutions and government clients apex has extensive experience in a wide variety of industries our corporate professional resume includes proven capabilities in the areas of water resources remediation and restoration assessment and compliance and industrial hygiene among othersranked in the top environmental firms by enr magazine ranked among the top design firms by enr magazine awarded the national environmental excellence award for environmental stewardship by the national association of environmental professionals and selected as a hot firm by the zweig letter come join our award winning teamapex is an entrepreneurial firm and ensuring that our senior managers are able to move unencumbered is our priority we are a successful and growing midsized firm were small enough that our employees still have access to our leadership and its easy for highperformers to be recognized for their contributions and advance without bureaucracy with over office locations were big enough to provide comprehensive environmental consulting and engineering services to our diverse client base and to provide resources to our employees to help in their professional development we offer incentive bonus plans and ownership opportunities for our successful managersapex companies llc is an affirmative actionequal opportunity employer
6,Your Responsibilities: Manage the English-speaking editorial team and build a team of best-in-class editorsSet up content creation schedules and ensure deadlines are adhered toResearch and write about the latest tech topics and news in relation to the Android ecosystemEnsure that the content on the site is of a consistently high qualityBe the face and voice of #URL_874846adb69d98865d05ec57ce2425d9e363ef71e0c8436e59e86a136a508716#,0,your responsibilitiesmanage the englishspeaking editorial team and build a team of bestinclass editorsset up content creation schedules and ensure deadlines are adhered toresearch and write about the latest tech topics and news in relation to the android ecosystemensure that the content on the site is of a consistently high qualitybe the face and voice of urladbddeccedeefeceeaa
7,"Who is Airenvy?Hey there! We are seasoned entrepreneurs in the heart of San Francisco’s SOMA neighborhood. We are looking for someone who embodies an entrepreneurial spirit, pays strong attention to detail and wants to be a part of the next big thing. This business can feel like a circus at times, but we have an all-star team with a one of a kind culture. Get a little taste of it here.Airenvy is the #1 technology driven property management company in a multi-billion dollar industry and is revolutionizing the vacation rental space! We are growing at record speed and expanding to new markets! Our platform allows owners to put their vacation rental on autopilot. We are a proven team of startup veterans and would love for you to join the family! In 2014 we were named the #1 Airbnb property management company in San Francisco according to the SF Chronicle. We have 18 supportive and resourceful investors, many of whom are leaders in the technology and real estate industries.The PositionWANTED: Ultimate Peace Keeper &amp; Problem SolverAirenvy is growing faster than we can handle, which is why we’re looking for someone to help us scale! We are seeking best-in-class Lead Guest Service Specialist who are passionate about delighting Guests and Owners. You’ll play a direct role in improving the customer experience, scaling the business, and creating powerful brand advocates.ResponsibilitiesService First - Interact with Guests and Owners daily; listen and address inquiries via phone, email, and chat.Leadership - Set the precedent for writing beautiful, helpful emails and getting to inbox-zero. Be the first to answer the phone and the last to give-up on an interesting escalation.Cross Collaboration - Act as the eyes and ears of the Airenvy business. Speak-to bug requests, new features, and influence the product positively.Ultimate Multitasker - You’re able to manage multiple day-to-day gifts at once. You’re able to ensure that each person in contact with Airenvy has a positive experience, even when facing hundreds of emails a day.You?Proven ability to take customers from irate to delightedAble to make decisions quickly; high sense of urgency that spills out to other team membersPassion for delighting people!Thrive under pressure; you’re proactive in recognizing and solving issues before they ariseExcellent written and verbal communication skills -- you spot an error without spell checkFocused on defining and scaling the business thru playbook definition",0,who is airenvyhey there we are seasoned entrepreneurs in the heart of san franciscos soma neighborhood we are looking for someone who embodies an entrepreneurial spirit pays strong attention to detail and wants to be a part of the next big thing this business can feel like a circus at times but we have an allstar team with a one of a kind culture get a little taste of it hereairenvy is the technology driven property management company in amultibillion dollar industryand is revolutionizing the vacation rental space we are growing at record speed and expanding to new markets our platform allows owners to put their vacation rental on autopilot we are a proven team of startup veterans and would love for you to join thefamily in we were named the airbnb property management company in san francisco according to thesf chronicle we have supportive and resourceful investors many of whom are leaders in the technology and real estate industriesthe positionwanted ultimate peace keeper amp problem solverairenvy is growing faster than we can handle which is why were looking for someone to help us scale we are seeking bestinclass lead guest service specialist who are passionate about delighting guests and owners youll play a direct role in improving the customer experience scaling the business and creating powerful brand advocatesresponsibilitiesservice first interact with guests and owners daily listen and address inquiries via phone email and chatleadership set the precedent for writing beautiful helpful emails and getting to inboxzero be the first to answer the phone and the last to giveup on an interesting escalationcross collaboration act as the eyes and ears of the airenvy business speakto bug requests new features and influence the product positivelyultimate multitasker youre able to manage multiple daytoday gifts at once youre able to ensure that each person in contact with airenvy has a positive experience even when facing hundreds of emails a dayyouproven ability to take customers from irate to delightedable to make decisions quickly high sense of urgency that spills out to other team memberspassion for delighting peoplethrive under pressure youre proactive in recognizing and solving issues before they ariseexcellent written and verbal communication skills you spot an error without spell checkfocused on defining and scaling the business thru playbook definition
8,Implementation/Configuration/Testing/Training on:HP Service Health Reporter,0,implementationconfigurationtestingtraining onhp service health reporter
9,"The Customer Service Associate will be based in Phoenix, AZ. The right candidate will be an integral part of our talented team, supporting our continued growth.Responsibilities:Perform various Mail Center activities (sorting, metering, folding, inserting, delivery, pickup, etc.)Lift heavy boxes, files or paper when neededMaintain the highest levels of customer care while demonstrating a friendly and cooperative attitudeDemonstrate flexibility in satisfying customer demands in a high volume, production environmentConsistently adhere to business procedure guidelinesAdhere to all safety proceduresTake direction from supervisor or site managerMaintain all logs and reporting documentation; attention to detailParticipate in cross-training and perform other duties as assigned (Filing, outgoing shipments, etc)Operating mailing, copy or scanning equipmentShipping &amp; ReceivingHandle time-sensitive material like confidential, urgent packagesPerform other tasks as assignedScanning incoming mail to recipientsPerform file purges and pullsCreate files and ship filesProvide backfill when neededEnter information daily into spreadsheetsIdentify charges and match them to billingSort and deliver mail, small packages",0,the customer service associate will be based in phoenix az the right candidate will be an integral part of our talented team supporting our continued growthresponsibilitiesperform various mail center activities sorting metering folding inserting delivery pickup etclift heavy boxes files or paper when neededmaintain the highest levels of customer care while demonstrating a friendly and cooperative attitudedemonstrate flexibility in satisfying customer demands in a high volume production environmentconsistently adhere to business procedure guidelinesadhere to all safety procedurestake direction from supervisor or site managermaintain all logs and reporting documentation attention to detailparticipate in crosstraining and perform other duties as assigned filing outgoing shipments etcoperating mailing copy or scanning equipmentshipping amp receivinghandle timesensitive material like confidential urgent packagesperform other tasks as assignedscanning incoming mail to recipientsperform file purges and pullscreate files and ship filesprovide backfill when neededenter information daily into spreadsheetsidentify charges and match them to billingsort and deliver mail small packages


In [17]:
regex_tokenizer = RegexTokenizer(inputCol="description_clean", outputCol="words", pattern="\W")
raw_words = regex_tokenizer.transform(df)
raw_words.show(2,False)
raw_words.printSchema() 

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [18]:
remover = StopWordsRemover(inputCol="words", outputCol="filtered")
stopwords = remover.getStopWords() 


In [19]:
words_df = remover.transform(raw_words)
words_df.limit(1).toPandas()

Unnamed: 0,description,fraudulent,description_clean,words,filtered
0,"Food52, a fast-growing, James Beard Award-winning online food community and crowd-sourced and curated recipe hub, is currently interviewing full- and part-time unpaid interns to work in a small team of editors, executives, and developers in its New York City headquarters.Reproducing and/or repackaging existing Food52 content for a number of partner sites, such as Huffington Post, Yahoo, Buzzfeed, and more in their various content management systemsResearching blogs and websites for the Provisions by Food52 Affiliate ProgramAssisting in day-to-day affiliate program support, such as screening affiliates and assisting in any affiliate inquiriesSupporting with PR &amp; Events when neededHelping with office administrative work, such as filing, mailing, and preparing for meetingsWorking with developers to document bugs and suggest improvements to the siteSupporting the marketing and executive staff",0,Food a fastgrowing James Beard Awardwinning online food community and crowdsourced and curated recipe hub is currently interviewing full and parttime unpaid interns to work in a small team of editors executives and developers in its New York City headquartersReproducing andor repackaging existing Food content for a number of partner sites such as Huffington Post Yahoo Buzzfeed and more in their various content management systemsResearching blogs and websites for the Provisions by Food Affiliate ProgramAssisting in daytoday affiliate program support such as screening affiliates and assisting in any affiliate inquiriesSupporting with PR amp Events when neededHelping with office administrative work such as filing mailing and preparing for meetingsWorking with developers to document bugs and suggest improvements to the siteSupporting the marketing and executive staff,"[food, a, fastgrowing, james, beard, awardwinning, online, food, community, and, crowdsourced, and, curated, recipe, hub, is, currently, interviewing, full, and, parttime, unpaid, interns, to, work, in, a, small, team, of, editors, executives, and, developers, in, its, new, york, city, headquartersreproducing, andor, repackaging, existing, food, content, for, a, number, of, partner, sites, such, as, huffington, post, yahoo, buzzfeed, and, more, in, their, various, content, management, systemsresearching, blogs, and, websites, for, the, provisions, by, food, affiliate, programassisting, in, daytoday, affiliate, program, support, such, as, screening, affiliates, and, assisting, in, any, affiliate, inquiriessupporting, with, pr, amp, events, when, neededhelping, with, office, administrative, work, ...]","[food, fastgrowing, james, beard, awardwinning, online, food, community, crowdsourced, curated, recipe, hub, currently, interviewing, full, parttime, unpaid, interns, work, small, team, editors, executives, developers, new, york, city, headquartersreproducing, andor, repackaging, existing, food, content, number, partner, sites, huffington, post, yahoo, buzzfeed, various, content, management, systemsresearching, blogs, websites, provisions, food, affiliate, programassisting, daytoday, affiliate, program, support, screening, affiliates, assisting, affiliate, inquiriessupporting, pr, amp, events, neededhelping, office, administrative, work, filing, mailing, preparing, meetingsworking, developers, document, bugs, suggest, improvements, sitesupporting, marketing, executive, staff]"


In [20]:
indexer = StringIndexer(inputCol="fraudulent", outputCol="label")
feature_data = indexer.fit(words_df).transform(words_df)
feature_data.show(5)
feature_data.printSchema()

+--------------------+----------+--------------------+--------------------+--------------------+-----+
|         description|fraudulent|   description_clean|               words|            filtered|label|
+--------------------+----------+--------------------+--------------------+--------------------+-----+
|Food52, a fast-gr...|         0|Food a fastgrowin...|[food, a, fastgro...|[food, fastgrowin...|  0.0|
|Organised - Focus...|         0|Organised  Focuse...|[organised, focus...|[organised, focus...|  0.0|
|Our client, locat...|         0|Our client locate...|[our, client, loc...|[client, located,...|  0.0|
|THE COMPANY: ESRI...|         0|THE COMPANY ESRI ...|[the, company, es...|[company, esri, e...|  0.0|
|JOB TITLE: Itemiz...|         0|JOB TITLE Itemiza...|[job, title, item...|[job, title, item...|  0.0|
+--------------------+----------+--------------------+--------------------+--------------------+-----+
only showing top 5 rows

root
 |-- description: string (nullable = true)


In [21]:
regex_tokenizer = RegexTokenizer(inputCol="description_clean", outputCol="words", pattern="\\W")
# raw_words = regex_tokenizer.transform(df)

# Remove Stop words
# getoutput col if i don't know the name of the output 
remover = StopWordsRemover(inputCol=regex_tokenizer.getOutputCol(), outputCol="filtered")
# words_df = remover.transform(raw_words)

# Zero Index Label Column
indexer = StringIndexer(inputCol="fraudulent", outputCol="label")
# feature_data = indexer.fit(words_df).transform(words_df)

# Create the Pipeline
pipeline = Pipeline(stages=[regex_tokenizer,remover,indexer])
data_prep_pl = pipeline.fit(df) # sees if regex tokenizer it has a fit method if not check next and so on until it finds a fit  
# print(type(data_prep_pl))
# print(" ")
# Now call on the Pipeline to get our final df
feature_data = data_prep_pl.transform(df)
feature_data.show(1,False)

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+----------------------------------------------------------------------------------

In [22]:
# Hashing TF
hashingTF = HashingTF(inputCol="filtered", outputCol="rawfeatures", numFeatures=20)
HTFfeaturizedData = hashingTF.transform(feature_data)


In [23]:
# TF-IDF
idf = IDF(inputCol="rawfeatures", outputCol="features")
idfModel = idf.fit(HTFfeaturizedData)
TFIDFfeaturizedData = idfModel.transform(HTFfeaturizedData)
TFIDFfeaturizedData.name = 'TFIDFfeaturizedData'

In [24]:
#rename the HTF features to features to be consistent
HTFfeaturizedData = HTFfeaturizedData.withColumnRenamed("rawfeatures","features")
HTFfeaturizedData.name = 'HTFfeaturizedData' #We will use later for printing

In [25]:
# Word2Vec
word2Vec = Word2Vec(vectorSize=3, minCount=0, inputCol="filtered", outputCol="features")
model = word2Vec.fit(feature_data)

W2VfeaturizedData = model.transform(feature_data)
# W2VfeaturizedData.show(1,False)

In [26]:
# W2Vec Dataframes typically has negative values so we will correct for that here so that we can use the Naive Bayes classifier
scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")

# Compute summary statistics and generate MinMaxScalerModel
scalerModel = scaler.fit(W2VfeaturizedData)

# rescale each feature to range [min, max].
scaled_data = scalerModel.transform(W2VfeaturizedData)
W2VfeaturizedData = scaled_data.select('fraudulent','description_clean','label','scaledFeatures')
W2VfeaturizedData = W2VfeaturizedData.withColumnRenamed('scaledFeatures','features')

W2VfeaturizedData.name = 'W2VfeaturizedData' # We will need this to print later

In [36]:
def ClassTrainEval(classifier,features,classes,train,test):

    def FindMtype(classifier):
        # Intstantiate Model
        M = classifier
        # Learn what it is
        Mtype = type(M).__name__
        
        return Mtype
    
    Mtype = FindMtype(classifier)
    

    def IntanceFitModel(Mtype,classifier,classes,features,train):
        
        if Mtype == "OneVsRest":
            # instantiate the base classifier.
            lr = LogisticRegression()
            # instantiate the One Vs Rest Classifier.
            OVRclassifier = OneVsRest(classifier=lr)
#             fitModel = OVRclassifier.fit(train)
            # Add parameters of your choice here:
            paramGrid = ParamGridBuilder() \
                .addGrid(lr.regParam, [0.1, 0.01]) \
                .build()
            #Cross Validator requires the following parameters:
            crossval = CrossValidator(estimator=OVRclassifier,
                                      estimatorParamMaps=paramGrid,
                                      evaluator=MulticlassClassificationEvaluator(metricName="weightedPrecision"),
                                      numFolds=2) # 3 is best practice
            # Run cross-validation, and choose the best set of parameters.
            fitModel = crossval.fit(train)
            return fitModel
        if Mtype == "MultilayerPerceptronClassifier":
            # specify layers for the neural network:
            # input layer of size features, two intermediate of features+1 and same size as features
            # and output of size number of classes
            # Note: crossvalidator cannot be used here
            features_count = len(features[0][0])
            layers = [features_count, features_count+1, features_count, classes]
            MPC_classifier = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
            fitModel = MPC_classifier.fit(train)
            return fitModel
        if Mtype in("LinearSVC","GBTClassifier") and classes != 2: # These classifiers currently only accept binary classification
            print(Mtype," could not be used because PySpark currently only accepts binary classification data for this algorithm")
            return
        if Mtype in("LogisticRegression","NaiveBayes","RandomForestClassifier","GBTClassifier","LinearSVC","DecisionTreeClassifier"):
  
            # Add parameters of your choice here:
            if Mtype in("LogisticRegression"):
                paramGrid = (ParamGridBuilder() \
#                              .addGrid(classifier.regParam, [0.1, 0.01]) \
                             .addGrid(classifier.maxIter, [10, 15,20])
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("NaiveBayes"):
                paramGrid = (ParamGridBuilder() \
                             .addGrid(classifier.smoothing, [0.0, 0.2, 0.4, 0.6]) \
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("RandomForestClassifier"):
                paramGrid = (ParamGridBuilder() \
                               .addGrid(classifier.maxDepth, [2, 5, 10])
#                                .addGrid(classifier.maxBins, [5, 10, 20])
#                                .addGrid(classifier.numTrees, [5, 20, 50])
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("GBTClassifier"):
                paramGrid = (ParamGridBuilder() \
#                              .addGrid(classifier.maxDepth, [2, 5, 10, 20, 30]) \
#                              .addGrid(classifier.maxBins, [10, 20, 40, 80, 100]) \
                             .addGrid(classifier.maxIter, [10, 15,50,100])
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("LinearSVC"):
                paramGrid = (ParamGridBuilder() \
                             .addGrid(classifier.maxIter, [10, 15]) \
                             .addGrid(classifier.regParam, [0.1, 0.01]) \
                             .build())
            
            # Add parameters of your choice here:
            if Mtype in("DecisionTreeClassifier"):
                paramGrid = (ParamGridBuilder() \
#                              .addGrid(classifier.maxDepth, [2, 5, 10, 20, 30]) \
                             .addGrid(classifier.maxBins, [10, 20, 40, 80, 100]) \
                             .build())
            
            #Cross Validator requires all of the following parameters:
            crossval = CrossValidator(estimator=classifier,
                                      estimatorParamMaps=paramGrid,
                                      evaluator=MulticlassClassificationEvaluator(metricName="weightedPrecision"),
                                      numFolds=2) # 3 + is best practice
            # Fit Model: Run cross-validation, and choose the best set of parameters.
            fitModel = crossval.fit(train)
            return fitModel
    
    fitModel = IntanceFitModel(Mtype,classifier,classes,features,train)
    
    # Print feature selection metrics
    if fitModel is not None:
        
        if Mtype in("OneVsRest"):
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype + '\033[0m')
            # Extract list of binary models
            models = BestModel.models
            for model in models:
                print('\033[1m' + 'Intercept: '+ '\033[0m',model.intercept,'\033[1m' + '\nCoefficients:'+ '\033[0m',model.coefficients)

        if Mtype == "MultilayerPerceptronClassifier":
            print("")
            print('\033[1m' + Mtype," Weights"+ '\033[0m')
            print('\033[1m' + "Model Weights: "+ '\033[0m',fitModel.weights.size)
            print("")

        if Mtype in("DecisionTreeClassifier", "GBTClassifier","RandomForestClassifier"):
            # FEATURE IMPORTANCES
            # Estimate of the importance of each feature.
            # Each feature’s importance is the average of its importance across all trees 
            # in the ensemble The importance vector is normalized to sum to 1. 
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype," Feature Importances"+ '\033[0m')
            print("(Scores add up to 1)")
            print("Lowest score is the least important")
            print(" ")
            print(BestModel.featureImportances)
            
            if Mtype in("DecisionTreeClassifier"):
                global DT_featureimportances
                DT_featureimportances = BestModel.featureImportances.toArray()
                global DT_BestModel
                DT_BestModel = BestModel
            if Mtype in("GBTClassifier"):
                global GBT_featureimportances
                GBT_featureimportances = BestModel.featureImportances.toArray()
                global GBT_BestModel
                GBT_BestModel = BestModel
            if Mtype in("RandomForestClassifier"):
                global RF_featureimportances
                RF_featureimportances = BestModel.featureImportances.toArray()
                global RF_BestModel
                RF_BestModel = BestModel

        if Mtype in("LogisticRegression"):
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype," Coefficient Matrix"+ '\033[0m')
            print("You should compares these relative to eachother")
            print("Coefficients: \n" + str(BestModel.coefficientMatrix))
            print("Intercept: " + str(BestModel.interceptVector))
            global LR_coefficients
            LR_coefficients = BestModel.coefficientMatrix.toArray()
            global LR_BestModel
            LR_BestModel = BestModel

        if Mtype in("LinearSVC"):
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype," Coefficients"+ '\033[0m')
            print("You should compares these relative to eachother")
            print("Coefficients: \n" + str(BestModel.coefficients))
            global LSVC_coefficients
            LSVC_coefficients = BestModel.coefficients.toArray()
            global LSVC_BestModel
            LSVC_BestModel = BestModel
        
   
    # Set the column names to match the external results dataframe that we will join with later:
    columns = ['Classifier', 'Result']
    
    if Mtype in("LinearSVC","GBTClassifier") and classes != 2:
        Mtype = [Mtype] # make this a list
        score = ["N/A"]
        result = spark.createDataFrame(zip(Mtype,score), schema=columns)
    else:
        predictions = fitModel.transform(test)
        MC_evaluator = MulticlassClassificationEvaluator(metricName="weightedPrecision") # redictionCol="prediction",
        weightedPrecision = (MC_evaluator.evaluate(predictions))*100
        Mtype = [Mtype] # make this a string
        score = [str(weightedPrecision)] #make this a string and convert to a list
        result = spark.createDataFrame(zip(Mtype,score), schema=columns)
        result = result.withColumn('Result',result.Result.substr(0, 5))
        
    return result
    #Also returns the fit model important scores or p values

In [37]:
classifiers = [
                LogisticRegression()
                ,OneVsRest()
               ,LinearSVC()
               ,NaiveBayes()
               ,RandomForestClassifier()
               ,GBTClassifier()
               ,DecisionTreeClassifier()
               ,MultilayerPerceptronClassifier()
              ] 

featureDF_list = [HTFfeaturizedData,TFIDFfeaturizedData,W2VfeaturizedData]

In [38]:
for featureDF in featureDF_list:
    print('\033[1m' + featureDF.name," Results:"+ '\033[0m')
    train, test = featureDF.randomSplit([0.7, 0.3],seed = 11)
    features = featureDF.select(['features']).collect()
    # Learn how many classes there are in order to specify evaluation type based on binary or multi and turn the df into an object
    class_count = featureDF.select(countDistinct("label")).collect()
    classes = class_count[0][0]

    #set up your results table
    columns = ['Classifier', 'Result']
    vals = [("Place Holder","N/A")]
    results = spark.createDataFrame(vals, columns)

    for classifier in classifiers:
        new_result = ClassTrainEval(classifier,features,classes,train,test)
        results = results.union(new_result)
    results = results.where("Classifier!='Place Holder'")
    print(results.show(truncate=False))

[1mHTFfeaturizedData  Results:[0m
 
[1mLogisticRegression  Coefficient Matrix[0m
You should compares these relative to eachother
Coefficients: 
DenseMatrix([[-0.01170364, -0.04768261,  0.01013503,  0.02740925, -0.01913824,
               0.01536457,  0.07952389, -0.02083674, -0.02914787,  0.02260531,
               0.05001001, -0.0969196 , -0.04019701, -0.01803998, -0.05144809,
               0.06990738,  0.01323246, -0.00941786,  0.04943548, -0.01689926]])
Intercept: [-2.8152765200594425]
 
[1mOneVsRest[0m
[1mIntercept: [0m 2.8342656883072688 [1m
Coefficients:[0m [0.00135610212504612,0.008287707669846605,-0.000858273800606015,-0.003138700352281226,0.0025046901079538025,-0.00021981836610409795,-0.01368057676268631,0.004071365074123304,0.004319930603511382,-0.0021904126830626603,-0.004849356170794193,0.014028191152090513,0.005605574125358937,0.0021195677296970907,0.00873956151731301,-0.010155724499234155,-0.00010828822542453547,0.0013789157285042082,-0.00532259574094582,0.0024