# Magazine Review Analysis and Popularity Scoring

This notebook demonstrates the process of analyzing magazine reviews and calculating popularity scores using sentiment analysis and ratings data. The analysis follows these main steps:

1. Collect and prepare review data
2. Perform sentiment analysis on reviews
3. Calculate composite popularity scores

## Step 1: Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import seaborn as sns

# Download required NLTK data
try:
    nltk.data.find('vader_lexicon')
except LookupError:
    print("Downloading VADER lexicon...")
    nltk.download('vader_lexicon')

## Step 2: Load and Prepare Review Data

In [None]:
# Load the magazine subscription data
input_csv = "Magazine_Subscriptions.csv"
df = pd.read_csv(input_csv)

# Display basic information about the dataset
print("Dataset Info:")
print(df.info())
print("\nSample of the data:")
display(df.head())

# Check for missing values
print("\nMissing values:")
display(df.isnull().sum())

## Step 3: Perform Sentiment Analysis

In [None]:
# Initialize the sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Function to classify sentiment
def classify_sentiment(text):
    if not isinstance(text, str):
        return 'Neutral'
    
    scores = sia.polarity_scores(text)
    compound_score = scores['compound']
    
    if compound_score > 0.05:
        return 'Positive'
    elif compound_score < -0.05:
        return 'Negative'
    else:
        return 'Neutral'

# Apply sentiment analysis to reviews
df['sentiment'] = df['reviewText'].apply(classify_sentiment)

# Display sentiment distribution
sentiment_counts = df['sentiment'].value_counts()
print("Sentiment Distribution:")
display(sentiment_counts)

# Visualize sentiment distribution
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='sentiment')
plt.title('Distribution of Review Sentiments')
plt.show()

## Step 4: Calculate Composite Popularity Score
This consensus score algorithm is based on the methodology presented in ConTrip: Consensus Sentiment Review Analysis and Platform Ratings in a Single Score (arXiv:2201.02113).

In [None]:
# Group by magazine and calculate metrics
magazine_stats = df.groupby('asin').agg({
    'overall': 'mean',
    'sentiment': lambda x: (x == 'Positive').mean() * 100,
    'reviewText': 'count'
}).rename(columns={
    'overall': 'average_rating',
    'sentiment': 'positive_percentage',
    'reviewText': 'total_reviews'
})

# Filter magazines with sufficient reviews
magazine_stats = magazine_stats[magazine_stats['total_reviews'] >= 100]

# Define constants for ConTrip score calculation
alpha = 0.5
beta = 10
delta = 100

# Calculate ConTrip score
magazine_stats['contrip'] = magazine_stats.apply(
    lambda row: min(5, row['average_rating'] + (row['positive_percentage']/100 - 0.5) * alpha) 
                - ((1 - row['positive_percentage']/100) * row['average_rating'] / beta) 
                - ((5 - row['average_rating']) / delta),
    axis=1
)

# Display results
print("Top 10 Magazines by ConTrip Score:")
display(magazine_stats.sort_values('contrip', ascending=False).head(10))

# Visualize relationship between rating and sentiment
plt.figure(figsize=(10, 6))
sns.scatterplot(data=magazine_stats, x='average_rating', y='positive_percentage', size='total_reviews')
plt.title('Relationship between Rating and Positive Sentiment Percentage')
plt.xlabel('Average Rating')
plt.ylabel('Positive Sentiment Percentage')
plt.show()

## Step 5: Export Results

In [None]:
# Export results to CSV
magazine_stats.to_csv('magazine_analysis_results.csv')
print("Results exported to 'magazine_analysis_results.csv'")