# Chapter 08  Common Friends

- 친구목록에 있는 친구와 동통된 Friend을 찾는 것을 mapReduce 구현

- 3가지 방법으로 구현 
    - MapReduce/Hadoop using primitive data types
    - MapReduce/Hadoop using custom data types
    - Spark using RDDs


- Facebook과 hi5, LinkedIn의 주요 기능
- Social network 서비스에서 이 기능을 구현하는 방법은 2가지가 가능함.
    - Use a caching strategy and save the common friends in a cache (such as Redis or memcached).
    - Use MapReduce to calculate everyone’s common friends once a day and store those results.

## Input

```
<person><,><friend1 ><friend2 >...<friendN> 
    => <person> 사람의 친구목록은 <friend1 ><friend2 >...<friendN> 임

100, 200 300 400 500 600
200, 100 300 400
300, 100 200 400 500
400, 100 200 300
500, 100 300
600, 100
```


500 번 사람은 100, 300과 친구이고,   600번 사람은 100과 친구,    공통된 친구는 100번

## POJO Common Friends Solution

![](chap08_01.jpg)

## MapReduce Algorithm

- Mapper 의 Input
    - key1 : person
    - value1 : list of associated friends of that person
    
- Mapper 의 Output
    - key2 : Tuple2( key1, friend_i )
    - value2 : value1과 같은 값
    
- Reduce 의 Input 
    - key2 : Tuple2( key1, friend_i )
    - value3 : List(  value2 ) 
    
- Reduce 의 Out
    - key3 : input인 Key2와 동일
    - Value3 : 공통 친구리스트
    
    
![](chap08_02.jpg)

## The MapReduce Algorithm in Action

![](chap08_03.jpg)
![](chap08_04.jpg)

## Solution 1: Hadoop Implementation Using Text

```
	static String getFriends(String[] tokens) {
		if (tokens.length == 2) {
			return "";
		}
		StringBuilder builder = new StringBuilder();
		for (int i=1; i < tokens.length; i++) {
			builder.append(tokens[i]);
			if (i < (tokens.length -1)) {
				builder.append(",");
			}
		}
		return builder.toString();
	}
    
    
   	static void addFriends(Map<String, Integer> map, String friendsList) {
		String[] friends = StringUtils.split(friendsList, ",");
		for (String friend : friends) {
			Integer count = map.get(friend);
			if (count == null) {
				map.put(friend, 1);
			}
			else {
				map.put(friend, ++count);
			}
		}
   	}
```

## Solution 2: Hadoop Implementation Using ArrayListOfLongsWritable

```
    static ArrayListOfLongsWritable getFriends(String[] tokens) {
        if (tokens.length == 2) {
            return new ArrayListOfLongsWritable();
        }
        
        ArrayListOfLongsWritable list = new ArrayListOfLongsWritable();
        for (int i=1; i < tokens.length; i++) {
            list.add(Long.parseLong(tokens[i]));
        }
        return list;
    }
    
    
    
    static void addFriends(Map<Long, Integer> map, ArrayListOfLongsWritable friendsList) {
        Iterator<Long> iterator = friendsList.iterator();       
        while (iterator.hasNext()) {
            long id = iterator.next();
            Integer count = map.get(id);
            if (count == null) {
                map.put(id, 1);
            }
            else {
                map.put(id, ++count);
            }
        }
    }    
```


## Spark Program

![](chap08_05.jpg)

#### Step 3: Create a JavaSparkContext object

In [1]:
from pyspark import SparkContext
sc = SparkContext() 

In [2]:
sc

<pyspark.context.SparkContext at 0x7f976c5c1110>

users_and_friends.txt

```
100,200 300 400 500
200,100 300 400
300,100 200 400 500
400,100 200 300
500,100 300
600,100
```

#### Step 4: Read input file and create RDD

In [3]:
records =  sc.textFile( "users_and_friends.txt"  )

In [4]:
for t in records.collect():
    print "debug0 record:", t

debug0 record: 100,200 300 400 500
debug0 record: 200,100 300 400
debug0 record: 300,100 200 400 500
debug0 record: 400,100 200 300
debug0 record: 500,100 300
debug0 record: 600,100


#### Step 5: Apply a mapper

In [5]:
def buildSortedTuple( a , b ) :
    if  a < b :
        return ( a, b )
    else :
        return ( b, a )

In [25]:
def mapper_friends( line ) :
    print line
    tokens = line.split(',')
    person = long( tokens[0] )
    friendsTokenized = tokens[1].split(' ')
    if len( friendsTokenized ) == 1 :
        key = buildSortedTuple( person ,  long( friendsTokenized[0] )  )
        return [ (key, [] ) ]
    
    friends = []
    for f in friendsTokenized : 
        friends.append( long(f) )
        
    result = []
    for f in friends :
        key = buildSortedTuple( person ,  long( f )  )
        result.append( (key, friends) )
    
    return result

s = "100,200 300 400 500"
for a in  mapper_friends( s ) :
    print a[0], a[1]

 100,200 300 400 500
(100L, 200L) [200L, 300L, 400L, 500L]
(100L, 300L) [200L, 300L, 400L, 500L]
(100L, 400L) [200L, 300L, 400L, 500L]
(100L, 500L) [200L, 300L, 400L, 500L]


In [33]:
pairs = records.flatMap(lambda s :  mapper_friends(s)  )

In [36]:
debug1 = pairs.collect()
for t1 in debug1 :
    print "debug1 key={}\t value={}".format( t1[0],  t1[1] ) ; 

debug1 key=(100L, 200L)	 value=[200L, 300L, 400L, 500L]
debug1 key=(100L, 300L)	 value=[200L, 300L, 400L, 500L]
debug1 key=(100L, 400L)	 value=[200L, 300L, 400L, 500L]
debug1 key=(100L, 500L)	 value=[200L, 300L, 400L, 500L]
debug1 key=(100L, 200L)	 value=[100L, 300L, 400L]
debug1 key=(200L, 300L)	 value=[100L, 300L, 400L]
debug1 key=(200L, 400L)	 value=[100L, 300L, 400L]
debug1 key=(100L, 300L)	 value=[100L, 200L, 400L, 500L]
debug1 key=(200L, 300L)	 value=[100L, 200L, 400L, 500L]
debug1 key=(300L, 400L)	 value=[100L, 200L, 400L, 500L]
debug1 key=(300L, 500L)	 value=[100L, 200L, 400L, 500L]
debug1 key=(100L, 400L)	 value=[100L, 200L, 300L]
debug1 key=(200L, 400L)	 value=[100L, 200L, 300L]
debug1 key=(300L, 400L)	 value=[100L, 200L, 300L]
debug1 key=(100L, 500L)	 value=[100L, 300L]
debug1 key=(300L, 500L)	 value=[100L, 300L]
debug1 key=(100L, 600L)	 value=[]


#### Step 6: Apply a reducer

In [37]:
grouped = pairs.groupByKey()

In [40]:
debug2 = grouped.collect()
for t2 in  debug2 :
    print "debug2 key={}\t value={}".format( t2[0],  "".join([str(x) for x in t2[1]] )   )

debug2 key=(300L, 500L)	 value=[100L, 200L, 400L, 500L][100L, 300L]
debug2 key=(100L, 200L)	 value=[200L, 300L, 400L, 500L][100L, 300L, 400L]
debug2 key=(100L, 500L)	 value=[200L, 300L, 400L, 500L][100L, 300L]
debug2 key=(300L, 400L)	 value=[100L, 200L, 400L, 500L][100L, 200L, 300L]
debug2 key=(200L, 300L)	 value=[100L, 300L, 400L][100L, 200L, 400L, 500L]
debug2 key=(100L, 400L)	 value=[200L, 300L, 400L, 500L][100L, 200L, 300L]
debug2 key=(100L, 300L)	 value=[200L, 300L, 400L, 500L][100L, 200L, 400L, 500L]
debug2 key=(100L, 600L)	 value=[]
debug2 key=(200L, 400L)	 value=[100L, 300L, 400L][100L, 200L, 300L]


#### Step 7: Find common friends

In [52]:
def find_intersection( s ) :
    countCommon = {}  # HashMap 
    size = 0
    for  iter in s  :
        size = size +1   
        if len( iter ) == 0 :
            continue        
        for f in iter :
            if f in countCommon :
                count = countCommon[ f ]
                countCommon[ f ] =  (count + 1 )
            else :
                countCommon[ f ] = 1
    
    finalCommonFriends = []
    for key  in countCommon.keys():
        if countCommon[ key ] == size :
            finalCommonFriends.append( key )
    
    return finalCommonFriends

In [59]:
commonFriends = grouped.mapValues( find_intersection )

In [60]:
debug3 = commonFriends.collect() 
for t3 in debug3 :
    print "debug3 key={}\t value={}".format(  t3[0],   t3[1]  )

debug3 key=(300L, 500L)	 value=[100L]
debug3 key=(100L, 200L)	 value=[400L, 300L]
debug3 key=(100L, 500L)	 value=[300L]
debug3 key=(300L, 400L)	 value=[200L, 100L]
debug3 key=(200L, 300L)	 value=[400L, 100L]
debug3 key=(100L, 400L)	 value=[200L, 300L]
debug3 key=(100L, 300L)	 value=[200L, 400L, 500L]
debug3 key=(100L, 600L)	 value=[]
debug3 key=(200L, 400L)	 value=[300L, 100L]


#### Combining steps 6 and 7

- steps 6 and 7 을 조합해서 하나의 step/operation으로 만들 수 있음
- Spark에서는  combineByKey() 또는  reduceByKey() 함수를 지원
- reduceByKey() :  reduces values of type V into V (the same data type)
- combineByKey() : combine/transform values of type V into another type, C 

In [61]:
def reduce_friends( a  ,  b ) :
    x = set ( a )
    intersection = set()
    for item in b :
        if item in x :
            intersection.add( item )
    return  intersection   

In [62]:
commonFriends2 = pairs.reduceByKey( reduce_friends )

In [67]:
commonFriendsMap = commonFriends2.collectAsMap()
for  key in commonFriendsMap.keys() :
    print "{}, {}".format(  key, commonFriendsMap[key]  )

(300L, 500L), set([100L])
(200L, 300L), set([400L, 100L])
(100L, 200L), set([400L, 300L])
(200L, 400L), set([300L, 100L])
(100L, 400L), set([200L, 300L])
(100L, 600L), []
(100L, 300L), set([200L, 400L, 500L])
(300L, 400L), set([200L, 100L])
(100L, 500L), set([300L])
