# Background:
Since we have information about both our users(age, sex, height, location) and the training academies(Year established, location, facilites etc) we should build a model that takes this total information into account and finds the matching pairs. 

Here I will assume that a csv file will be given as input to the neural network and the csv file will have 11 columns such that the first five columns will contain user data - Age, Sex, Level, Height, Location and the next five columns will contain training camp data - Year_established, Location, Facilities, trainer_student_ratio, Cost. The 11th column - Match, will contain value 1 if the user and the camp is a match else zero. We can get this information from our previous data. If users attended a camp previously and liked it we can call it a match or if a user doesn't prefer to go to particular training camp we can assign it a value zero. Once we have enough data we can train our neural network.

## Preprocessing data:
The data should contain all numerical values i.e., the attribute "Facilities", "Level" should have a numerical rank on a pre defined scale and the "Location" attribute should also give a numerical value. The attibutes like "Sex" should be converted to categorical values i.e., it should be converted to two columns corresponding to male and female and should only have values 0 and 1. Once we have numerical values we can scale all the values to the range 0 and 1 for efficient training. This can be easily done in pandas.

In [6]:
import numpy as np
import pandas as pd
import tensorflow as tf
sess = tf.InteractiveSession()

In [10]:
data_path = 'data.csv'

users = pd.read_csv(data_path)

In [12]:
dummy_fields = ['Sex']
for each in dummy_fields:
    dummies = pd.get_dummies(users[each], prefix=each, drop_first=False)
    users = pd.concat([users, dummies], axis=1)

fields_to_drop = ['Sex']
data = users.drop(fields_to_drop, axis=1)
data.head()

Unnamed: 0,Age,Level,Height,Location,Year_established,Location.1,Facilities,trainer_student_ratio,Cost,Match,Sex_0
0,0,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,0,1


In [13]:
quant_features = ['Age', 'Height', 'Location', 'Cost', 'Year_established', ]
# Store scalings in a dictionary so we can convert back later
scaled_features = {}
for each in quant_features:
    mean, std = data[each].mean(), data[each].std()
    scaled_features[each] = [mean, std]
    data.loc[:, each] = (data[each] - mean)/std

## Split data:
After preprocessing we will now have 12 columns. We should now split the data into training, and testing set.

In [14]:
# Save data for the last 20 rows 
test_data = data[-20:]

# Now remove the test data from the data set 
data = data[:-20]

# Separate the data into features and targets
target_fields = ['Match']
train_features, train_targets = data.drop(target_fields, axis=1), data[target_fields]
test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fields]

## Building Model:

In [35]:
x = tf.placeholder(tf.float32, shape=[None, 10])
y_ = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.zeros([10,10]))
W1 = tf.Variable(tf.zeros([10, 1]))
b = tf.Variable(tf.zeros([10]))

sess.run(tf.global_variables_initializer())

output = tf.matmul(x,W) + b

y = np.sum(tf.matmul(output, W1))

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))

## Training:

In [36]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
train_step.run(feed_dict={x: train_features, y_: train_targets})

## Evaluation:

In [37]:
correct_prediction = tf.equal(tf.round(y), y_)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [41]:
accuracy.eval(feed_dict={x: train_features, y_: train_targets})

0.0

## Testing:

In [42]:
accuracy.eval(feed_dict={x: test_features, y_: test_targets})

0.0

## To get the output:
Once we have trained the neural network, we can save the parameters W, W1 and b. We can now easily get a measure of whether a user and a training camp are good match by solving the following equation:

    Y = ((X*W) + b) * W1

Here the first 5 columns of x is user data and next 5 columns are training camp data. If we want to get results for a particular user against a set of training camps we can create a matrix with first 5 columns having the same user data and the next 5 columns will have data from this set of training camps. After doing the above operation we get an output matrix Y with values indicating matching percentage. We can select the top n rows and recommend the corresponding training camp to the user.