## Table of contents
1. Defining the problem statement
2. Importing data and understanding data
3. Feature engeneering
4. Buiding a model
5. Model Evaluation
6. Model Prediction


### 1. Problem statement

Predict the total approved transactions for each month of 2017. I am going to use linear regression to solve this problem because we are predicting a continuous variable.


In [48]:
#Importing librabries
import numpy as np                        # For Mathematical calculations      
import pandas as pd                       # Helps to analyse data  
import seaborn as sns                     # For data visualisation
from matplotlib import pyplot as plt      # For plotting graphs
%matplotlib inline

# Machine learning
from sklearn.linear_model import LinearRegression

import warnings
warnings.filterwarnings('ignore')         #Ignoring warnings

### 2. Importing data and understanding data

In [49]:
#Load dataset
data=pd.read_csv('combined.csv')

In [50]:
#Viewing first 5 rows of data
data.head()

Unnamed: 0,Business ID,Date,Approval,Card Used,Mobile Device,Transaction Amount
0,Jessica Smith,2016-01-28,approved,credit_card,m010,294
1,Teresa Aguilar,2016-10-27,approved,credit_card,m010,13348
2,Angela Miranda,2016-03-03,aborted,credit_card,Other,7
3,Andre Mccormick,2016-03-03,aborted,credit_card,m010,60
4,Marcus Gilbert,2016-04-07,error,credit_card,e105,150


In [51]:
#Informastion about data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 386686 entries, 0 to 386685
Data columns (total 6 columns):
Business ID           386686 non-null object
Date                  386686 non-null object
Approval              386686 non-null object
Card Used             386686 non-null object
Mobile Device         386686 non-null object
Transaction Amount    386686 non-null object
dtypes: object(6)
memory usage: 17.7+ MB


In [52]:
data.shape

(386686, 6)

### 3. Feature engeneering

In [53]:
# Select Important data that will be used for Modelling
df= data[['Date', 'Approval','Transaction Amount']]
df.head()

Unnamed: 0,Date,Approval,Transaction Amount
0,2016-01-28,approved,294
1,2016-10-27,approved,13348
2,2016-03-03,aborted,7
3,2016-03-03,aborted,60
4,2016-04-07,error,150


In [54]:
# Select only Approved Cases

train_df = df[df['Approval'] == 'approved']
train_df.head()

Unnamed: 0,Date,Approval,Transaction Amount
0,2016-01-28,approved,294
1,2016-10-27,approved,13348
6,2016-03-07,approved,5700
7,2016-11-10,approved,2761
8,2016-11-29,approved,8490


In [55]:
# Convert Date to Date Python Datatype so that we can extract year and month
train_df['Date']= pd.to_datetime(train_df['Date'])

In [56]:
# Create Functions to extract date and year from the date column
def get_year(date):
    return (date.year)


def get_month(date):
    return (date.month)

In [57]:
# Create 2 columns to store month and year 
train_df['month'] = train_df['Date'].apply(get_month)
train_df['year'] = train_df['Date'].apply(get_year)

In [58]:
train_df.head()

Unnamed: 0,Date,Approval,Transaction Amount,month,year
0,2016-01-28,approved,294,1,2016
1,2016-10-27,approved,13348,10,2016
6,2016-03-07,approved,5700,3,2016
7,2016-11-10,approved,2761,11,2016
8,2016-11-29,approved,8490,11,2016


In [59]:
train_df.columns

Index(['Date', 'Approval', 'Transaction Amount', 'month', 'year'], dtype='object')

### 4. Bulding a Model

In [61]:
# Define Variables and Targets

X = train_df[['year', 'month']]
y = train_df['Transaction Amount']

In [62]:
# I used linear regression because I am predicting a continuous variable
model = LinearRegression()

In [63]:
# Train your model
model.fit(X, y)

LinearRegression()

### 5. Model Evaluation

In [64]:
r_sq = model.score(X, y)


In [65]:
print('coefficient of determination:', r_sq)

coefficient of determination: 5.590368895336262e-05


In [66]:
print('intercept:', model.intercept_)

print('slope:', model.coef_)


intercept: 2432.7219575641143
slope: [  0.        -18.1370215]


### 6. Predict Approvals for 2017

In [67]:
# Create a dictionary for test data
test_data = {'year': [2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017],
            'month': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}


# Convert it to a dataframe

test_df = pd.DataFrame(test_data)
test_df.head()

Unnamed: 0,year,month
0,2017,1
1,2017,2
2,2017,3
3,2017,4
4,2017,5


In [68]:
y_pred = model.predict(test_df)

In [69]:
#Total transaction amount for 2017
test_df['Transaction_Amount'] = y_pred
test_df.head(12)

Unnamed: 0,year,month,Transaction_Amount
0,2017,1,2414.584936
1,2017,2,2396.447915
2,2017,3,2378.310893
3,2017,4,2360.173872
4,2017,5,2342.03685
5,2017,6,2323.899829
6,2017,7,2305.762807
7,2017,8,2287.625786
8,2017,9,2269.488764
9,2017,10,2251.351743


### 7. Conclusion

The aim of this study was to predict approved transactions for the following year. The work done is as follows:

* Dataset read and analysed using Pandas

* Feature selection was used to select best features for the prediction

* Build a model using linear regression because we are predicting a continuous variable

* Evaluated the model using coefficient of determination

* Created a dictionary and made predictions for 2017