<a href="https://colab.research.google.com/github/derek-shing/DS-Unit-1-Sprint-2-Data-Wrangling/blob/master/LS_DS_124_Make_features_LIVE_LESSON.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

_Lambda School Data Science_

# Make features

Objectives
-  understand the purpose of feature engineering
-  work with strings in pandas
- work with dates and times in pandas

Links
- [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)
- Python Data Science Handbook
  - [Chapter 3.10](https://jakevdp.github.io/PythonDataScienceHandbook/03.10-working-with-strings.html), Vectorized String Operations
  - [Chapter 3.11](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html), Working with Time Series

## Get LendingClub data

[Source](https://www.lendingclub.com/info/download-data.action)

In [2]:
!wget https://resources.lendingclub.com/LoanStats_2018Q3.csv.zip

--2019-01-17 23:03:48--  https://resources.lendingclub.com/LoanStats_2018Q3.csv.zip
Resolving resources.lendingclub.com (resources.lendingclub.com)... 64.48.1.20
Connecting to resources.lendingclub.com (resources.lendingclub.com)|64.48.1.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘LoanStats_2018Q3.csv.zip’

LoanStats_2018Q3.cs     [         <=>        ]  21.42M  1.66MB/s    in 13s     

2019-01-17 23:04:02 (1.65 MB/s) - ‘LoanStats_2018Q3.csv.zip’ saved [22461905]



In [3]:
!unzip LoanStats_2018Q3.csv.zip

Archive:  LoanStats_2018Q3.csv.zip
  inflating: LoanStats_2018Q3.csv    


In [4]:
!head LoanStats_2018Q3.csv

Notes offered by Prospectus (https://www.lendingclub.com/info/prospectus.action)
"id","member_id","loan_amnt","funded_amnt","funded_amnt_inv","term","int_rate","installment","grade","sub_grade","emp_title","emp_length","home_ownership","annual_inc","verification_status","issue_d","loan_status","pymnt_plan","url","desc","purpose","title","zip_code","addr_state","dti","delinq_2yrs","earliest_cr_line","inq_last_6mths","mths_since_last_delinq","mths_since_last_record","open_acc","pub_rec","revol_bal","revol_util","total_acc","initial_list_status","out_prncp","out_prncp_inv","total_pymnt","total_pymnt_inv","total_rec_prncp","total_rec_int","total_rec_late_fee","recoveries","collection_recovery_fee","last_pymnt_d","last_pymnt_amnt","next_pymnt_d","last_credit_pull_d","collections_12_mths_ex_med","mths_since_last_major_derog","policy_code","application_type","annual_inc_joint","dti_joint","verification_status_joint","acc_now_delinq","tot_coll_amt","tot_cur_bal","open_acc_6m","open_act_il","op

In [0]:
!tail LoanStats_2018Q3.csv

## Load LendingClub data

pandas documentation
- [`read_csv`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)
- [`options.display`](https://pandas.pydata.org/pandas-docs/stable/options.html#available-options)

In [5]:
import pandas as pd

#want to skip some row in the csv file

df = pd.read_csv('LoanStats_2018Q3.csv', skipfooter=2, skiprows=1)

df.shape

df.head()

  """


Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,hardship_payoff_balance_amount,hardship_last_payment_amount,disbursement_method,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,,,20000,20000,20000,60 months,17.97%,507.55,D,D1,...,,,DirectPay,N,,,,,,
1,,,25000,25000,25000,60 months,13.56%,576.02,C,C1,...,,,Cash,N,,,,,,
2,,,30000,30000,30000,36 months,18.94%,1098.78,D,D2,...,,,Cash,N,,,,,,
3,,,6000,6000,6000,36 months,7.84%,187.58,A,A4,...,,,DirectPay,N,,,,,,
4,,,10650,10650,10650,36 months,7.84%,332.95,A,A4,...,,,Cash,N,,,,,,


In [0]:
pd.options.display.max_columns = 500

In [0]:
df.head().T

## Work with strings

In [0]:
pd.options.display.max_rows = 500

In [0]:
df.head().T

For machine learning, we usually want to replace strings with numbers

In [0]:
import numpy as np

def all_numeric(df):
    return all((df.dtypes==np.number) | 
               (df.dtypes==bool))

def no_nulls(df):
    return not any(df.isnull().sum())

def ready_for_sklearn(df):
    return all_numeric(df) and no_nulls(df)

We can get info about which columns have a datatype of "object" (strings)

In [10]:
df.select_dtypes('object').info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128194 entries, 0 to 128193
Data columns (total 37 columns):
term                         128194 non-null object
int_rate                     128194 non-null object
grade                        128194 non-null object
sub_grade                    128194 non-null object
emp_title                    114757 non-null object
emp_length                   117807 non-null object
home_ownership               128194 non-null object
verification_status          128194 non-null object
issue_d                      128194 non-null object
loan_status                  128194 non-null object
pymnt_plan                   128194 non-null object
purpose                      128194 non-null object
title                        128194 non-null object
zip_code                     128194 non-null object
addr_state                   128194 non-null object
earliest_cr_line             128194 non-null object
revol_util                   128065 non-null object
initi

### Convert `int_rate`

Define a function to remove percent signs from strings and convert to floats

In [11]:
string = '17.97%'

float(string.replace('%',''))

def remove_percent(string):
  return float(string.strip('%'))

remove_percent(string)

17.97

Apply the function to the `int_rate` column

In [0]:
df['int_rate'] = df['int_rate'].apply(remove_percent)

In [13]:
df.int_rate.head()

0    17.97
1    13.56
2    18.94
3     7.84
4     7.84
Name: int_rate, dtype: float64

### Clean `emp_title`

Look at top 20 titles

In [14]:
df.select_dtypes('object').info()

df['emp_title'].value_counts().head(20)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128194 entries, 0 to 128193
Data columns (total 36 columns):
term                         128194 non-null object
grade                        128194 non-null object
sub_grade                    128194 non-null object
emp_title                    114757 non-null object
emp_length                   117807 non-null object
home_ownership               128194 non-null object
verification_status          128194 non-null object
issue_d                      128194 non-null object
loan_status                  128194 non-null object
pymnt_plan                   128194 non-null object
purpose                      128194 non-null object
title                        128194 non-null object
zip_code                     128194 non-null object
addr_state                   128194 non-null object
earliest_cr_line             128194 non-null object
revol_util                   128065 non-null object
initial_list_status          128194 non-null object
last_

Teacher               2294
Manager               2075
Owner                 1231
Driver                1089
Registered Nurse       944
Supervisor             810
RN                     757
Sales                  726
Project Manager        637
General Manager        548
Office Manager         542
Director               482
owner                  398
Engineer               383
Truck Driver           367
Operations Manager     366
President              350
Sales Manager          323
Supervisor             321
Server                 319
Name: emp_title, dtype: int64

How often is `emp_title` null?

In [15]:
df['emp_title'].isnull().sum()

13437

Clean the title and handle missing values

In [0]:
examples = ['owner','Supervisor ', ' Project Manager',42,np.nan]

def clean_title(x):
  if isinstance(x, str):
    return x.strip().title()
  else:
    return 'Unkown'

for example in examples:
  print(clean_title(example))

In [0]:
df['emp_title'] = df['emp_title'].apply(clean_title)

In [18]:
df['emp_title'].value_counts().head(20)

Unkown                      13437
Teacher                      2843
Manager                      2749
Owner                        1856
Driver                       1498
Registered Nurse             1386
Supervisor                   1345
Sales                         980
Truck Driver                  921
Rn                            905
Office Manager                846
Project Manager               835
General Manager               809
Director                      585
Operations Manager            516
Sales Manager                 510
Engineer                      474
Store Manager                 466
Administrative Assistant      466
President                     464
Name: emp_title, dtype: int64

### Create `emp_title_manager`

pandas documentation: [`str.contains`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html)

In [0]:
df['emp_title_manager'] = df['emp_title'].str.contains('Manager')


In [20]:
df['emp_title_manager'].value_counts()

False    109498
True      18696
Name: emp_title_manager, dtype: int64

## Work with dates

pandas documentation
- [to_datetime](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html)
- [Time/Date Components](https://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-date-components) "You can access these properties via the `.dt` accessor"

In [21]:
df['issue_d'].head().values

array(['Sep-2018', 'Sep-2018', 'Sep-2018', 'Sep-2018', 'Sep-2018'],
      dtype=object)

In [0]:
df['issue_d']= pd.to_datetime(df['issue_d'],infer_datetime_format=True)

In [0]:
df['issue_year'] = df['issue_d'].dt.year
df['issue_month'] = df['issue_d'].dt.month

In [24]:
df['issue_month'].sample(10)

14365     9
38015     9
62214     8
108416    7
89450     7
102158    7
107328    7
42303     8
106517    7
37129     9
Name: issue_month, dtype: int64

In [0]:
df.head(1)

df['earliest_cr_line'] = pd.to_datetime(df['earliest_cr_line'], infer_datetime_format=True)

In [0]:
date_col = [col for col in df if col.endswith('_d')]



In [0]:
for col in date_col:
  df[col] = pd.to_datetime(df[col],infer_datetime_format=True)

In [29]:
df.sample(5)

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_act_il,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,revol_bal_joint,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,sec_app_mths_since_last_major_derog,hardship_flag,hardship_type,hardship_reason,hardship_status,deferral_term,hardship_amount,hardship_start_date,hardship_end_date,payment_plan_start_date,hardship_length,hardship_dpd,hardship_loan_status,orig_projected_additional_accrued_interest,hardship_payoff_balance_amount,hardship_last_payment_amount,disbursement_method,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term,emp_title_manager,issue_year,issue_month
114448,,,6000,6000,6000,36 months,17.97,216.83,D,D1,Clerk,10+ years,OWN,45000.0,Source Verified,2018-07-01,Current,n,,,other,Other,760xx,TX,21.31,0,2001-05-01,1,26.0,,14,0,12951,50%,24,w,5345.8,5345.8,1072.17,1072.17,654.2,417.97,0.0,0.0,0.0,2018-12-01,216.83,2019-01-01,2018-12-01,0,,1,Individual,,,,0,175,24744,3,1,0,0,28.0,11793,66.0,3,3,5468,57.0,25900,0,1,1,3,1767.0,3249.0,73.6,0,0,206.0,108,4,4,0,5.0,26.0,5.0,26.0,0,4,8,5,7,7,13,17,8,14,0.0,0,0,3,91.7,60.0,0,0,43778,24744,12300,17878,,,,,,,,,,,,N,,,,,,,,,,,,,,,Cash,N,,,,,,,False,2018,7
84910,,,10000,10000,10000,36 months,13.56,339.65,C,C1,Rn,10+ years,RENT,105000.0,Source Verified,2018-07-01,Current,n,,,debt_consolidation,Debt consolidation,021xx,MA,15.54,0,1983-06-01,0,72.0,,16,0,12072,39.6%,28,f,9077.91,9077.91,1388.74,1388.74,922.09,466.65,0.0,0.0,0.0,2018-12-01,339.65,2019-01-01,2018-12-01,0,72.0,1,Individual,,,,0,237,37838,2,4,3,3,1.0,25766,81.0,1,2,6363,61.0,30500,0,0,0,5,2365.0,6443.0,64.4,0,0,153.0,421,8,1,0,43.0,,,,1,5,6,6,7,12,12,16,6,16,0.0,0,0,4,82.1,16.7,0,0,62262,37838,18100,31762,,,,,,,,,,,,N,,,,,,,,,,,,,,,Cash,N,,,,,,,False,2018,7
33201,,,5100,5100,5100,36 months,23.4,198.49,E,E1,Driver,4 years,RENT,87500.0,Not Verified,2018-09-01,Current,n,,,home_improvement,Home improvement,112xx,NY,18.3,0,2012-07-01,1,37.0,,14,0,6669,36.6%,17,w,4797.05,4797.05,582.21,582.21,302.95,279.26,0.0,0.0,0.0,2018-12-01,198.493706,2019-01-01,2018-12-01,0,37.0,1,Individual,,,,0,0,19900,2,1,0,0,39.0,13231,52.0,2,10,2069,46.0,18200,0,0,1,10,1421.0,2886.0,57.6,0,0,46.0,73,5,5,0,17.0,64.0,6.0,64.0,1,4,7,4,5,2,13,15,7,14,0.0,0,0,2,88.2,50.0,0,0,43751,19900,6800,25551,,,,,,,,,,,,N,,,,,,,,,,,,,,,Cash,N,,,,,,,False,2018,9
28631,,,8000,8000,8000,36 months,11.06,262.14,B,B3,Public Relations Account Coordinator,3 years,RENT,58000.0,Not Verified,2018-09-01,Current,n,,,debt_consolidation,Debt consolidation,336xx,FL,16.99,0,1996-10-01,0,67.0,55.0,9,1,7516,32.4%,28,w,7429.55,7429.55,781.5,781.5,570.45,211.05,0.0,0.0,0.0,2018-12-01,262.14,2019-01-01,2018-12-01,0,67.0,1,Individual,,,,0,0,43269,2,1,1,1,4.0,35753,,2,5,3367,32.0,23200,0,8,1,6,4808.0,2840.0,71.0,0,0,153.0,263,3,3,2,15.0,,12.0,,1,3,5,3,6,11,8,15,5,9,0.0,0,0,3,95.5,33.3,1,0,60315,43269,9800,37115,,,,,,,,,,,,N,,,,,,,,,,,,,,,Cash,N,,,,,,,False,2018,9
115226,,,5000,5000,5000,36 months,7.84,156.32,A,A4,Business Process Analyst,5 years,RENT,87000.0,Not Verified,2018-07-01,Current,n,,,other,Other,951xx,CA,2.48,0,2002-06-01,0,,,11,0,7234,11.4%,29,f,4373.6,4373.6,779.42,779.42,626.4,153.02,0.0,0.0,0.0,2018-12-01,156.32,2019-01-01,2018-12-01,0,,1,Individual,,,,0,0,13234,1,1,2,3,1.0,6000,100.0,1,3,4528,19.0,63400,3,3,1,6,1654.0,52666.0,12.1,0,0,177.0,170,9,1,2,9.0,,7.0,,0,2,2,7,8,14,10,12,2,11,0.0,0,0,3,100.0,0.0,0,0,69400,13234,59900,6000,,,,,,,,,,,,N,,,,,,,,,,,,,,,Cash,N,,,,,,,False,2018,7


# ASSIGNMENT

- Replicate the lesson code.

- Convert the `term` column from string to integer.

- Make a column named `loan_status_is_great`. It should contain the integer 1 if `loan_status` is "Current" or "Fully Paid." Else it should contain the integer 0.

- Make `last_pymnt_d_month` and `last_pymnt_d_year` columns.



In [30]:
def conv_term(term):
  return int(term.strip('months'))

example = df['term'][0]

example
conv_term(example)

df['term'] = df['term'].apply(conv_term)

df['term'].sample(5)


29008    36
16393    36
87855    36
25422    36
54454    60
Name: term, dtype: int64

In [31]:
df['loan_status'].value_counts()

Current               121082
Fully Paid              4786
In Grace Period          948
Late (31-120 days)       920
Late (16-30 days)        348
Charged Off              110
Name: loan_status, dtype: int64

In [0]:
def loan_status_is_great(status):
  
  great_status=['Current','Fully Paid']
  if status in (great_status):
    return 1
  else:
    return 0

df['loan_status_is_great'] = df['loan_status'].apply(loan_status_is_great)

  

In [33]:
df.loc[df['loan_status_is_great']==0,['loan_status_is_great','loan_status']].sample(20)

Unnamed: 0,loan_status_is_great,loan_status
53824,0,In Grace Period
53763,0,In Grace Period
62230,0,Charged Off
124701,0,Late (31-120 days)
23407,0,Late (16-30 days)
58736,0,Late (31-120 days)
67815,0,In Grace Period
50558,0,In Grace Period
92914,0,Late (31-120 days)
112555,0,Late (16-30 days)


In [34]:
df['last_pymnt_d'].sample(20)

21082    2018-12-01
39706    2018-12-01
49185    2018-12-01
96305    2018-12-01
71191    2018-12-01
89685    2018-11-01
61861    2018-12-01
26293    2018-12-01
116655   2018-12-01
48783    2018-12-01
75320    2018-12-01
111925   2018-12-01
90270    2018-11-01
29489    2018-12-01
70226    2018-12-01
45222    2018-12-01
13260    2018-12-01
118997   2018-12-01
35465    2018-12-01
54318    2018-12-01
Name: last_pymnt_d, dtype: datetime64[ns]

In [0]:
df['last_pymnt_d']= pd.to_datetime(df['last_pymnt_d'],infer_datetime_format=True)

In [0]:
df['last_pymnt_d_year'] = df['last_pymnt_d'].dt.year
df['last_pymnt_d_month'] = df['last_pymnt_d'].dt.month


In [37]:
df['last_pymnt_d_year'].value_counts()

2018.0    128048
Name: last_pymnt_d_year, dtype: int64

In [38]:
df['last_pymnt_d_month'].value_counts()

12.0    116465
11.0      7793
10.0      1594
9.0       1136
8.0        838
7.0        222
Name: last_pymnt_d_month, dtype: int64

# STRETCH OPTIONS

You can do more with the LendingClub or Instacart datasets.

LendingClub options:
- There's one other column in the dataframe with percent signs. Remove them and convert to floats. You'll need to handle missing values.
- Modify the `emp_title` column to replace titles with 'Other' if the title is not in the top 20. 
- Process the dataframe so that `ready_for_sklearn(df)` returns `True`. You can drop columns, or select the subset of numeric columns with no missing values. (Or you can try automating the process to handle missing values and convert objects to numbers!)
- Take initiatve and work on your own ideas!

Instacart options:
- Read [Instacart Market Basket Analysis, Winner's Interview: 2nd place, Kazuki Onodera](http://blog.kaggle.com/2017/09/21/instacart-market-basket-analysis-winners-interview-2nd-place-kazuki-onodera/), especially the **Feature Engineering** section. (Can you choose one feature from his bulleted lists, and try to engineer it with pandas code?)
- Read and replicate parts of [Simple Exploration Notebook - Instacart](https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-instacart). (It's the Python Notebook with the most upvotes for this Kaggle competition.)
- Take initiative and work on your own ideas!

You can uncomment and run the cells below to re-download and extract the Instacart data

In [0]:
# !wget https://s3.amazonaws.com/instacart-datasets/instacart_online_grocery_shopping_2017_05_01.tar.gz

In [0]:
# !tar --gunzip --extract --verbose --file=instacart_online_grocery_shopping_2017_05_01.tar.gz

In [0]:
# %cd instacart_2017_05_01

In [0]:
df.T

In [0]:
df.revol_util.fillna(0)

df.loc[df.revol_util.isnull()]

In [48]:
def conv_revol_util(x):
  if isinstance(x, str):
    return float(x.strip('%'))
  else:
    return 0

df.revol_util=df.revol_util.apply(conv_revol_util)
df['revol_util']

df.revol_util.describe()

count    128194.0
mean          0.0
std           0.0
min           0.0
25%           0.0
50%           0.0
75%           0.0
max           0.0
Name: revol_util, dtype: float64

In [0]:
top20 = df['emp_title'].value_counts().head(21).index.tolist()

In [60]:
top20=top20[1:21]
top20

['Teacher',
 'Manager',
 'Owner',
 'Driver',
 'Registered Nurse',
 'Supervisor',
 'Sales',
 'Truck Driver',
 'Rn',
 'Office Manager',
 'Project Manager',
 'General Manager',
 'Director',
 'Operations Manager',
 'Sales Manager',
 'Engineer',
 'Store Manager',
 'Administrative Assistant',
 'President',
 'Technician']

In [66]:
len(top20)

20

In [0]:
other = set(df.loc[~df['emp_title'].isin(top20)]['emp_title'].tolist())

In [0]:
other

def conv_other(x):
  if x in other:
    return 'Other'
  else:
    return x

df['emp_title']=df['emp_title'].apply(conv_other)

In [79]:
# Modify the emp_title column to replace titles with 'Other' if the title is not in the top 20.

df['emp_title'].value_counts()

Other                       107290
Teacher                       2843
Manager                       2749
Owner                         1856
Driver                        1498
Registered Nurse              1386
Supervisor                    1345
Sales                          980
Truck Driver                   921
Rn                             905
Office Manager                 846
Project Manager                835
General Manager                809
Director                       585
Operations Manager             516
Sales Manager                  510
Engineer                       474
Administrative Assistant       466
Store Manager                  466
President                      464
Technician                     450
Name: emp_title, dtype: int64

In [0]:
#df[df['loan_status_is_great']==0,['loan_status_is_great','emp_title']].groupby('emp_title').agg('count')

bad_status_df= df.loc[df['loan_status_is_great']==0,['loan_status_is_great','emp_title']]

In [0]:
final = bad_status_df.groupby('emp_title').agg('count').sort_values('loan_status_is_great', ascending=False).drop(['Other'])

In [0]:
final.reset_index()

In [0]:
import matplotlib.pyplot as plt

#plt.plot(final.index,final.loan_status_is_great)

In [104]:
final

Unnamed: 0_level_0,loan_status_is_great
emp_title,Unnamed: 1_level_1
Manager,63
Owner,54
Teacher,51
Supervisor,39
Driver,37
Sales,27
Truck Driver,19
Rn,17
Registered Nurse,16
President,16
