# Social Computing - Summer 2017

# Exercise 6 - Fraud detection in the social capital market

Last week you learned that each transaction leads to an increase of the receiver's social capital weight.<p>
This week you will try to identify fraudulent behavior within the system. For this you will leverage some of the techniques that you learned in the previous exercise sheets. <p>
The fraud detection will work along the following steps:

### Steps
1) Identify the highest SCWs in the market depending on the topics<p>
2) Identify the most frequent transaction pairs (user ids) in the market<p>
3) Look at the friends/follower relationships on the social networking platform of the people with the highest SCW<p>
4) Compare the activity of the identified people on the social networking platform<p>

By following these steps you can identify users who have a high social capital weight, but no support/contributions on the social networking platform. Therefore it is likely that they tried to cheat the system.



In [1]:
#Compile this field first!
import pandas as pd
import numpy as np
from igraph import *
df = pd.read_csv('paymenttable_6.csv', sep=',')
elggexcell = pd.read_excel(open('FeatureTableDisguised.xlsx','rb'), sheetname='Sheet1')
g = Graph.Read_GraphML('FriendshipNetwork.graphml')

### Voluntary: Introduction
Import your "density" function here and compile the code. Are there any changes compared to last week?<p>
We expect people who push their SCW to have only a few transaction partners, so they would have a ratio of close to 100%. As there are many users with such high ratios the density is not sufficient to determine fraudulent behavior.

In [6]:
from __future__ import division


# Problem 1: Step 1 - Identification of user's with high SCW
Identify the highest SCWs in the market depending on the topics!<p>
The result should be a table with the topic ID, the SCW new of the receiver, and the receiver ID.<p>
Hint: Make sure that the transfer value is larger than 0 and that receiver ID and sender ID are not the same.

In [7]:
# For each topic (1 to 6), identify the user (Receiver ID) with the highest SCW in the market
where_clause = ((df['Transfer Value'] > 0) & (df['Sender ID'] != df['Receiver ID']))
test = df[where_clause].groupby(['Topic ID'])['SCW new of receiver'].max().reset_index()
result = pd.merge(df, test, on=['Topic ID','SCW new of receiver'])
columns = ['Topic ID','SCW new of receiver','Receiver ID']
output_res = result.sort_values(['Topic ID'],ascending = [True])[columns]
output_res

Unnamed: 0,Topic ID,SCW new of receiver,Receiver ID
0,1,1.645,217
5,2,21.550537,97
3,3,1.83,47
2,4,4.89245,242
1,5,3.968922,242
4,6,5.927137,242


Who is the person with the highest social capital weight (regardless of category)?

The person with receiver ID 97 has the highest social captial weight.

# Problem 2: Step 2 - identification of the main transaction partners
Identify the most frequent transaction pairs in the market system. Therefore, create a table that shows the IDs of the two users and number of transactions between them. Display the top three transaction pairs in each topic. 

Is there an overlap between the results from step 1? E.g., are the users with high social capital also in the new list?

In [8]:
#your code here

pairs = [str(sorted([a,b])) for a,b in zip(df['Receiver ID'], df['Sender ID'])]
# note that you have to convert the sorted list to a string, otherwise drop_duplicates won't work...

df['Partner IDs'] = pairs
noduplicates = df.drop(['Receiver ID', 'Sender ID'], axis=1).drop_duplicates()


noduplicates['Sender ID'] = noduplicates['Partner IDs'].str[1:-1].str.split(',', expand=True).astype(int)[0]
noduplicates['Receiver ID'] = noduplicates['Partner IDs'].str[1:-1].str.split(',', expand=True).astype(int)[1]
finalres = noduplicates[where_clause].groupby(['Topic ID','Sender ID','Receiver ID'])['Transfer ID'].count().reset_index()
out = finalres.sort_values(['Topic ID','Transfer ID'], ascending = False)
out.columns = ['Topic ID','Sender ID','Receiver ID','Count']
for topic in range(1,7):
    print out.loc[out['Topic ID']==topic].head(3)

    Topic ID  Sender ID  Receiver ID  Count
0          1          1           92      3
37         1        217          218      3
18         1         45           46      2
     Topic ID  Sender ID  Receiver ID  Count
105         2         96           97     12
123         2        118          123      4
73          2         34          136      3
     Topic ID  Sender ID  Receiver ID  Count
189         3        102          103      2
195         3        116          118      2
199         3        136          140      2
     Topic ID  Sender ID  Receiver ID  Count
253         4         69           70      4
211         4          2           64      3
231         4         34          140      3
     Topic ID  Sender ID  Receiver ID  Count
358         5        217          218      4
300         5          7           46      3
299         5          7           45      2
     Topic ID  Sender ID  Receiver ID  Count
366         6          2          194      6
398         6 

Is the user with the highest social capital weight (as identified in step 1) also present in this table? 
If so, who is her/his main transaction partner?

1)Yes, the user with highest social capital weight (97) is also present in this table.
2)Main Transaction partner - 96 

# Problem 3: Step 3 - Compare the results to the relationship graph

In step 1 and 2 you identified the users with the highest social capital weight and the main transaction partners in the market system. Now we want to verify if their high social capital weight is deserved (because of many good interactions on the social networking platform) or if it should be flagged as fraudulent. <p>

At first we want to look at the friends/follower network surounding the identified pairs. <b> Plot a subgraph of the FriendshipNetwork.graphml that displays the relationship between the user with the highest social capital weight and the main transaction partner, as well as all their neighboring nodes (connected via a friend/follower relationship). </b>

<b>Compare this network to some of the other transaction pairs identified in step 2. What do you notice? (1-2 sentences) </b> 

If you get the error "ValueError: no such vertex: '218'" don't worry and just try another pair. This is due to the fact that sometimes we still don't have a correct matching between the social networking platform and the market system. Spoiler: this person is not the cheater that we planted.

In [9]:
#your code here
visual_style = {}
visual_style["vertex_size"] = 30
visual_style["vertex_label"] = g.vs["name"]
# replace g with your graphname!
visual_style["bbox"] = (800,800)
visual_style["margin"] = 50
vertex1 = g.vs.find(name = str(96)).index
vertex2 = g.vs.find(name = str(97)).index
plot(g.subgraph([vert1, vert2]+ g.neighbors(vertex1)+ g.neighbors(vertex2),"create_from_scratch"), **visual_style)


NameError: name 'vert1' is not defined

Answer: This is definetly suspicious, as 96 and 97 have no other friends or neighbouring nodes other than each other.This looks almost a clear case of fraudent activity, where these two have increased their socail capital. When I plotted the subgraph for the pair 2 and 194, who had second highest number of transactions, they both had many neighbouring nodes. 

# Problem 4: Step 4 - Compare the results to the interaction patterns

Now we want to have a closer look at the interactions of the user with the highest social capital weight on the social networking platform. <b>Using the table "FeatureTableDisguised.xlsx" display this user's number of posts (Posts), number of comments (Comments), the likes received on posts (Likes_OnPosts) and comments (Likes_OnComments), as well as the number of friends (Friends) and followers (Followers).</b> <p>
Compare this to some of the other users with high social capital. (1-3 sentences)

In [12]:
#your code here
elggexcell[(elggexcell["User ID"]==97) | (elggexcell["User ID"]==242) | (elggexcell["User ID"]==217) | (elggexcell["User ID"]==47)]


Unnamed: 0,User ID,Posts,Comments,Likes_OnPosts,Likes_OnComments,Friends,Followers,Comments recOnPosts,ToCHARposts,ToCHARcomments
46,47,1,8,3,9,8,16,1,730,1427
92,97,0,0,0,0,0,0,0,0,0
208,217,12,14,9,5,1,3,39,1165,597
229,242,8,68,34,33,347,98,23,1039,6755


In [None]:
#your answer here
As observed on the above table, the user with the highest social captial weight (97), has 0 likes, 0 comments, 0 posts 
0 friends etc. He has literally zero socail interaction and has inflated his social capital by fraudent means.
The user with the second highest SCW (242), looks to be quite active on the platform with 347 friends, 8 posts,
68 comments etc.Thus looks like he has been rewarded with socail capital.


# Voluntary: Problem 5 - Summary
You identified a pair of users with a abnormally high social capital weight and had a look at their interactions in the social network.

How did they manage to artificially inflate their social capital weight? (1-3 sentences)

What can be done to prevent it? (3-5 sentences)

#Your answer here

