# Spam Farm Analysis: Complete Pipeline

This notebook demonstrates a comprehensive analysis of spam farms in link analysis, covering:
- Spam farm creation and impact
- Detection methods (TrustRank, Spam Mass, Structural)
- Method comparison and evaluation
- Parameter sensitivity analysis

**Reference:** Mining of Massive Datasets, Chapter 5.4 (Link Spam)
- Section 5.4.1-5.4.2: Spam Farms
- Section 5.4.4: TrustRank
- Section 5.4.5: Spam Mass


## Section 1: Introduction

### The Link Spam Problem

Link spam is a technique used to artificially inflate the PageRank of target pages by creating artificial link structures. This notebook demonstrates:

1. **Spam Farm Creation**: How spammers create artificial link structures
2. **Impact Analysis**: How spam affects PageRank and rankings
3. **Detection Methods**: TrustRank, Spam Mass, and Structural Detection
4. **Evaluation**: Comparing detection methods and their effectiveness
5. **Parameter Sensitivity**: Understanding what makes spam effective

### Overview of Experiments

We will:
- Create a simple spam farm with configurable parameters
- Measure PageRank amplification (actual vs theoretical)
- Test multiple detection methods
- Compare effectiveness and identify best practices


In [None]:
# Setup: Import libraries and configure environment
import sys
import os

# Add src to path
sys.path.insert(0, os.path.join(os.getcwd(), '..', 'src'))
if os.path.basename(os.getcwd()) != 'notebooks':
    os.chdir('notebooks')

import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Try to import plotly for interactive plots
try:
    import plotly.graph_objects as go
    import plotly.express as px
    from plotly.subplots import make_subplots
    HAS_PLOTLY = True
except ImportError:
    HAS_PLOTLY = False
    print("Plotly not available, using matplotlib instead")

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid' if 'seaborn-v0_8-darkgrid' in plt.style.available else 'ggplot')
sns.set_palette("husl")

print("âœ… Libraries imported successfully")
