In [1]:
import numpy as np
import pandas as pd

### Creating New Columns

We have already seen, in the Section 3.0 Notebook, how to create new columns.  In this notebook, we will see a couple of new examples pertaining to the SBA dataset.

In [2]:
SBA = pd.read_csv('SBA_Example.csv')

In [3]:
SBA.head()

Unnamed: 0,State,NAICS,ApprovalDate,Term,NoEmp,NewExist,UrbanRural,DisbursementDate,DisbursementGross,GrAppv,SBA_Appv
0,IN,451120,28-Feb-97,84,4,2.0,0,28-Feb-99,"$60,000.00","$60,000.00","$48,000.00"
1,IN,722410,28-Feb-97,60,2,2.0,0,31-May-97,"$40,000.00","$40,000.00","$32,000.00"
2,IN,621210,28-Feb-97,180,7,1.0,0,31-Dec-97,"$287,000.00","$287,000.00","$215,250.00"
3,OK,0,28-Feb-97,60,2,1.0,0,30-Jun-97,"$35,000.00","$35,000.00","$28,000.00"
4,FL,0,28-Feb-97,240,14,1.0,0,14-May-97,"$229,000.00","$229,000.00","$229,000.00"


##### Example 1

It seems that loans that are backed by real estate might have less of a chance of going into default.  Thus, we would like to create a binary column such that:

0 = not backed by real estate

1 = backed by real estate

To this end, we do some research and find that loans backed by real estate will have terms of 20 or more years.  Since the term column gives term in months, we see that loans backed by real estate have a term length of 240 or more.

In [4]:
SBA['real_estate'] = SBA.Term.apply(lambda x: 1 if x >= 240 else 0)

In [5]:
SBA.head(10)

Unnamed: 0,State,NAICS,ApprovalDate,Term,NoEmp,NewExist,UrbanRural,DisbursementDate,DisbursementGross,GrAppv,SBA_Appv,real_estate
0,IN,451120,28-Feb-97,84,4,2.0,0,28-Feb-99,"$60,000.00","$60,000.00","$48,000.00",0
1,IN,722410,28-Feb-97,60,2,2.0,0,31-May-97,"$40,000.00","$40,000.00","$32,000.00",0
2,IN,621210,28-Feb-97,180,7,1.0,0,31-Dec-97,"$287,000.00","$287,000.00","$215,250.00",0
3,OK,0,28-Feb-97,60,2,1.0,0,30-Jun-97,"$35,000.00","$35,000.00","$28,000.00",0
4,FL,0,28-Feb-97,240,14,1.0,0,14-May-97,"$229,000.00","$229,000.00","$229,000.00",1
5,CT,332721,28-Feb-97,120,19,1.0,0,30-Jun-97,"$517,000.00","$517,000.00","$387,750.00",0
6,NJ,0,2-Jun-80,45,45,2.0,0,22-Jul-80,"$600,000.00","$600,000.00","$499,998.00",0
7,FL,811118,28-Feb-97,84,1,2.0,0,30-Jun-98,"$45,000.00","$45,000.00","$36,000.00",0
8,FL,721310,28-Feb-97,297,2,2.0,0,31-Jul-97,"$305,000.00","$305,000.00","$228,750.00",1
9,CT,0,28-Feb-97,84,3,2.0,0,30-Apr-97,"$70,000.00","$70,000.00","$56,000.00",0


$\Box$