# People You Might Know - Recommendation System

This notebook implements a friendship recommendation system using PySpark based on the **LiveJournal** social network data.  
It computes mutual friends and suggests up to 10 potential friends per user.

## Initialization

In [30]:
from pyspark import SparkContext
import pandas as pd

sc = SparkContext(appName="PeopleYouMightKnow")

## Step 1: Load and Parse Input

The dataset is a text file where each line contains a user ID and a comma-separated list of their friends:

UserID`<TAB>`Friend1,Friend2,...

Parse the data into `(user, set(friends))`.

In [31]:
lines = sc.textFile("soc-LiveJournal1Adj.txt")

def safe_parse(line):
    parts = line.strip().split("\t")
    if len(parts) != 2 or not parts[1].strip():
        return None
    try:
        user = int(parts[0])
        friends = set(map(int, parts[1].split(",")))
        return (user, friends)
    except:
        return None

user_friends = lines.map(safe_parse).filter(lambda x: x is not None)

In [32]:
sample_data = user_friends.take(5)
df = pd.DataFrame([(user, list(friends)) for user, friends in sample_data], columns=["User", "Friends"])
df

                                                                                

Unnamed: 0,User,Friends
0,0,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14..."
1,1,"[0, 31232, 29826, 35589, 5, 135, 4999, 34439, ..."
2,2,"[0, 2755, 1220, 12453, 13795, 135, 49927, 2471..."
3,3,"[0, 13185, 27552, 41, 12, 1532, 38737, 55, 12636]"
4,4,"[0, 19079, 8, 38792, 14, 15, 18, 24596, 27, 38..."


## Step 2: Compute Mutual Friends

Generate candidate friend pairs by checking which users share mutual friends.

In [33]:
user_friend_map = user_friends.collectAsMap()
user_friend_bcast = sc.broadcast(user_friend_map)

def generate_candidate_pairs(user, friends):
    for friend1 in friends:
        for friend2 in friends:
            if friend1 < friend2:
                yield ((friend1, friend2), 1)

mutual_counts = user_friends.flatMap(lambda x: generate_candidate_pairs(x[0], x[1])) \
                            .reduceByKey(lambda a, b: a + b)

## Step 3: Filter Out Direct Friends

Remove existing friends and the user themself from the candidate list.

In [34]:
recommendations = mutual_counts.flatMap(lambda x: [
    (x[0][0], (x[0][1], x[1])),
    (x[0][1], (x[0][0], x[1]))
])

def filter_direct(user, recs):
    direct_friends = user_friend_bcast.value.get(user, set())
    return [(other, count) for (other, count) in recs
            if other not in direct_friends and other != user]

In [35]:
top_recommendations = recommendations.groupByKey() \
    .map(lambda x: (
        x[0],
        sorted(
            filter_direct(user=x[0], recs=list(x[1])),
            key=lambda r: (-r[1], r[0])
        )[:10]
    ))

results = top_recommendations.map(lambda x: f"{x[0]}\t{','.join(str(r[0]) for r in x[1])}").collect()

                                                                                

## Step 4: Show and Save Results
Few sample recommendations and save the output.

In [36]:
for line in results[:5]:
    print(line)

with open("recommendations.txt", "w") as f:
    for line in results:
        f.write(line + "\n")

sc.stop()

2192	2138,2139,2158,2195,2143,2135,2140,2148,2154,2211
6030	1664,439,1667,13847,18916,22265,27609,34299,43593,19
13886	13911,13966,13867,14027,13891,13960,14130,13965,13981,13917
44192	37580,37597,37734,37735,37822,37675,10144,37378,37537,41367
23034	8671,9891,23014,2557,2608,4389,4717,5086,13795,16532
