# Tabular data  (Deep learning for classical ML problems)

In this notebook, we will investigate the efficiency of deep learning methods on **babular data**. The Standard algorithm for these type of data a **ML** solutions like 

* Gradient Boosting
* Random Forest method
* Logistic Regression 

In [1]:
from fastai import *
from fastai.tabular import *

We will load a simple dataset for **Adult** social situation to classify theme according to **salary**

In [2]:
path = untar_data(URLs.ADULT_SAMPLE)
path.ls()

[PosixPath('/home/anass/.fastai/data/adult_sample/models'),
 PosixPath('/home/anass/.fastai/data/adult_sample/export.pkl'),
 PosixPath('/home/anass/.fastai/data/adult_sample/adult.csv')]

It's a simple csv file `adult.csv`. We can load it and see it's contents with pandas

In [3]:
df = pd.read_csv(path/'adult.csv')
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,49,Private,101320,Assoc-acdm,12.0,Married-civ-spouse,,Wife,White,Female,0,1902,40,United-States,>=50k
1,44,Private,236746,Masters,14.0,Divorced,Exec-managerial,Not-in-family,White,Male,10520,0,45,United-States,>=50k
2,38,Private,96185,HS-grad,,Divorced,,Unmarried,Black,Female,0,0,32,United-States,<50k
3,38,Self-emp-inc,112847,Prof-school,15.0,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,United-States,>=50k
4,42,Self-emp-not-inc,82297,7th-8th,,Married-civ-spouse,Other-service,Wife,Black,Female,0,0,50,United-States,<50k


In order to define a classifier with should create a `Tabular_List` but before doing that we need to specify the 
* Categorocial variables 
* Numercial variales
* Dependant variables

In [11]:
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
dep_var = 'salary'         #dependant variable

We also need to apply a set of transofmation to process the categorical variables and missing data

In [12]:
procs = [FillMissing, Categorify, Normalize]

Now we will create the Tabular List according the `block API`. We will keep the last $[800, 1000]$ to the test dataset

In [16]:
test = TabularList.from_df(df.iloc[800:100].copy(),path=path,cat_names = cat_names, cont_names =cont_names)

data = (TabularList.from_df(df, path=path,cat_names=cat_names,cont_names=cont_names,procs=procs)
       .split_by_idx(list(range(800,1000)))      #Split by indices    
       .label_from_df(cols = dep_var)
       .add_test(test)
       .databunch()
       )

In [17]:
data.show_batch(15)

workclass,education,marital-status,occupation,relationship,race,education-num_na,age,fnlwgt,education-num,target
Self-emp-inc,HS-grad,Widowed,Sales,Other-relative,Asian-Pac-Islander,False,2.669,-1.0257,-0.4224,>=50k
Private,Some-college,Never-married,Adm-clerical,Not-in-family,White,False,0.3235,0.4116,-0.0312,<50k
Private,Some-college,Divorced,Exec-managerial,Not-in-family,White,False,-0.1896,0.4157,-0.0312,<50k
Private,9th,Never-married,Sales,Own-child,Other,False,-1.509,-0.5158,-1.9869,<50k
Private,HS-grad,Never-married,Craft-repair,Not-in-family,Black,False,-0.7027,-0.6664,-0.4224,<50k
Private,Bachelors,Married-civ-spouse,Exec-managerial,Husband,White,False,0.4701,-0.9382,1.1422,>=50k
Private,HS-grad,Married-spouse-absent,Adm-clerical,Not-in-family,White,False,0.9831,-1.109,-0.4224,<50k
Private,HS-grad,Divorced,Adm-clerical,Unmarried,White,False,-0.1163,1.5166,-0.4224,<50k
Private,Some-college,Divorced,Sales,Not-in-family,White,False,-0.043,-0.3828,-0.0312,<50k
Private,Bachelors,Never-married,Adm-clerical,Not-in-family,White,False,0.1769,0.591,1.1422,<50k


## Tabular learner
Now, we just need to create a **tabular_learner** from a given model and feed it the databunch that we created

In [18]:
learn = tabular_learner(data, layers=[200,100],metrics=accuracy)

In [20]:
learn.fit(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.356491,0.369507,0.825,00:03


We acheivied $80\%$ accuuracy juste by one cycle training

In [None]:
row = 