# Goodreads Book Scraper

This notebook implements a web scraper for collecting book data from Goodreads, specifically targeting Brandon Sanderson's books. 
Books to be scraped:
- Cosmere — Stand-alones
  - Elantris
  - Warbreaker
  - White Sand
  - White Sand, Volume 2
  - White Sand, Volume 3
  - White Sand Omnibus
  - Arcanum Unbounded
  - The Emperor’s Soul
- Mistborn (& Wax & Wayne etc.)
  - The Final Empire
  - The Well of Ascension
  - The Hero of Ages
  - Mistborn: Secret History
  - The Alloy of Law
  - Shadows of Self
  - The Bands of Mourning
  - The Lost Metal
- The Stormlight Archive
  - The Way of Kings
  - Words of Radiance
  - Oathbringer
  - Rhythm of War
  - Wind and Truth
  - Dawnshard
  - Edgedancer
- The Reckoners
  - Steelheart
  - Mitosis
  - Firefight
  - Calamity
  - Lux
- Skyward Series
  - Skyward
  - Starsight
  - Cytonic
  - Sunreach
  - Redawn
  - Evershore
  - Defiant
  - Defending Elysium
- The Rithmatist
- Secret Projects
  - Tress of the Emerald Sea
  - Yumi and the Nightmare Painter
  - The Sunlit Man
  - Isles of the Emberdark
- Collaborations with Other Authors
  - Dark One
  - Dark One: Forgotten
  - The Original
- Alcatraz vs. the Evil Librarians
  - Alcatraz vs. the Evil Librarians
  - The Scrivener’s Bones
  - The Knights of Crystallia
  - The Shattered Lens
  - The Dark Talent
  - Bastille vs. the Evil Librarians
- Other Novellas and Short Stories
  - Legion
  - Legion: Skin Deep
  - Legion: Lies of the Beholder
  - Legion: The Many Lives of Stephen Leeds
  - Firstborn
  - Perfect State
  - Snapshot

The scraper will collect the following information:

- Author
- Title
- Publication Date
- Page count
- Genres
- Overall rating
- Overall reviews
- Individual reviews
- Individual ratings
- Individual review dates
- Individual rating likes

In [1]:
# Import required libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
from datetime import datetime
import re
import random

# Set more detailed headers to better mimic a browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Cache-Control': 'max-age=0',
    'DNT': '1'  # Do Not Track request header
}

In [2]:
def get_book_details(url):
    """
    Scrape basic book details from Goodreads page
    """
    try:
        # Add a small random delay
        time.sleep(random.uniform(1, 3))
        
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        # Print response status and URL for debugging
        print(f"Response status: {response.status_code}")
        print(f"Final URL: {response.url}")
        
        # Save the HTML content for inspection if needed
        html_content = response.text
        if "This page isn't available right now" in html_content:
            print("Goodreads is blocking our request")
            return None
            
        soup = BeautifulSoup(html_content, 'lxml')
        
        # Initialize book details dictionary
        book_details = {}
        
        # Get title (try multiple possible selectors)
        title_element = (soup.find('h1', class_='Text__title1') or 
                        soup.find('h1', class_='BookPageTitleSection__title') or
                        soup.find('h1'))
        if title_element:
            book_details['title'] = title_element.text.strip()
        else:
            print("Could not find title element")
            
        # Get author (try multiple possible selectors)
        author_element = (soup.find('span', class_='ContributorLink__name') or
                         soup.find('a', class_='ContributorLink') or
                         soup.find('span', {'data-testid': 'name'}))
        if author_element:
            book_details['author'] = author_element.text.strip()
        else:
            print("Could not find author element")
        
        # Get publication details
        details_div = (soup.find('div', {'data-testid': 'bookDetails'}) or
                      soup.find('div', {'data-testid': 'publicationInfo'}))
        if details_div:
            details_text = details_div.get_text()
            
            # Extract publication date (try multiple patterns)
            pub_date_match = (re.search(r'First published (\w+ \d+,? \d{4})', details_text) or
                            re.search(r'Published\s+(\w+\s+\d+(?:st|nd|rd|th)?,?\s+\d{4})', details_text))
            book_details['publication_date'] = pub_date_match.group(1) if pub_date_match else None
            
            # Extract page count
            pages_match = re.search(r'(\d+)\s*pages?', details_text)
            book_details['page_count'] = int(pages_match.group(1)) if pages_match else None
        else:
            print("Could not find publication details")
        
        # Get genres (try multiple possible selectors)
        genre_elements = (soup.find_all('span', class_='BookPageMetadataSection__genreButton') or
                         soup.find_all('span', {'data-testid': 'genreLink'}))
        book_details['genres'] = [genre.text.strip() for genre in genre_elements] if genre_elements else []
        
        # Get overall rating (try multiple possible selectors)
        rating_div = (soup.find('div', {'class': 'RatingStatistics__rating'}) or
                     soup.find('div', {'data-testid': 'average'}))
        if rating_div:
            try:
                book_details['overall_rating'] = float(rating_div.text.strip())
            except ValueError:
                print("Could not convert rating to float")
                book_details['overall_rating'] = None
        else:
            print("Could not find rating element")
        
        # Get review count (try multiple possible selectors)
        reviews_element = (soup.find('div', {'data-testid': 'reviewsCount'}) or
                          soup.find('span', {'data-testid': 'reviewsCount'}))
        if reviews_element:
            reviews_text = reviews_element.text.strip()
            reviews_count = ''.join(filter(str.isdigit, reviews_text))
            book_details['overall_reviews'] = int(reviews_count) if reviews_count else 0
        else:
            print("Could not find reviews count")
            book_details['overall_reviews'] = 0
        
        # Check if we got any data
        if not any(book_details.values()):
            print("No data was successfully scraped")
            return None
            
        return book_details
    
    except requests.RequestException as e:
        print(f"Request error: {e}")
        return None
    except Exception as e:
        print(f"Error scraping book details: {e}")
        return None

In [3]:
def get_reviews(url, num_reviews=25190):
    """
    Scrape individual reviews from Goodreads page
    """
    reviews_list = []
    page = 1
    
    try:
        while len(reviews_list) < num_reviews:
            # Construct the reviews page URL - reviews are now in a separate section
            reviews_url = f"{url}/reviews"
            if page > 1:
                reviews_url += f"?page={page}"
                
            # Add a small random delay
            time.sleep(random.uniform(1, 3))
            
            response = requests.get(reviews_url, headers=headers)
            response.raise_for_status()
            
            # Print debug information
            print(f"Fetching reviews page {page}")
            print(f"Response status: {response.status_code}")
            print(f"Final URL: {response.url}")
            
            # Check if we're being blocked
            if "This page isn't available right now" in response.text:
                print("Goodreads is blocking our request")
                break
                
            soup = BeautifulSoup(response.text, 'lxml')
            
            # Try different possible review container classes
            review_containers = (
                soup.find_all('div', class_='ReviewCard') or
                soup.find_all('article', class_='ReviewCard') or
                soup.find_all('div', class_='Review') or
                soup.find_all('div', class_='ReviewsList__review')
            )
            
            print(f"Found {len(review_containers)} reviews on page {page}")
            
            if not review_containers:
                break
                
            for container in review_containers:
                if len(reviews_list) >= num_reviews:
                    break
                    
                review = {}
                
                # Get rating (try multiple selectors and fallbacks)
                rating_element = None
                try:
                    # Broad class-based search (case-insensitive, looks for star/rating keywords)
                    rating_element = container.find(class_=re.compile(r'(?i)(star|rating|static)'))
                except Exception:
                    rating_element = None

                # Also check for explicit aria-label/title like '5 of 5' (Goodreads sometimes uses this)
                if not rating_element:
                    rating_element = container.find(attrs={'aria-label': re.compile(r'\d+\s+of\s+5')}) or \
                                     container.find(attrs={'title': re.compile(r'\d+\s+of\s+5')})

                review['rating'] = None
                if rating_element:
                    # Prefer aria-label or title, fall back to text
                    rating_text = rating_element.get('aria-label') or rating_element.get('title') or rating_element.text
                    rating_match = re.search(r"(\d+)", rating_text)
                    review['rating'] = int(rating_match.group(1)) if rating_match else None
                
                # Get review text (try multiple selectors)
                review_text = (
                    container.find('div', class_='Formatted') or
                    container.find('div', class_='ReviewText') or
                    container.find('span', class_='Formatted')
                )
                review['review_text'] = review_text.text.strip() if review_text else ''
                
                # Get review date and reading status (try multiple selectors)
                date_element = (
                    container.find('span', class_='Text__micro') or
                    container.find('span', class_='ReviewDate') or
                    container.find('div', class_='ReviewMetadata')
                )
                if date_element:
                    date_text = date_element.text.strip()
                    # Extract status if present (e.g., "currently reading" or "finished reading")
                    status_match = re.search(r'(currently reading|finished reading|started reading)', date_text, re.I)
                    review['reading_status'] = status_match.group(1).lower() if status_match else 'unknown'
                    # Clean date text of status
                    clean_date = re.sub(r'(currently reading|finished reading|started reading)', '', date_text, flags=re.I)
                    review['review_date'] = clean_date.strip()
                
                # Enhanced likes extraction with multiple strategies
                review['likes'] = 0
                
                # Strategy 1: Look for elements with like-related text
                like_patterns = [
                    r'(\d+)\s*likes?',
                    r'(\d+)\s*people liked this',
                    r'like this review\s*\((\d+)\)',
                    r'rated it helpful\s*\((\d+)\)'
                ]
                
                # Look through all text nodes for these patterns
                for text in container.stripped_strings:
                    for pattern in like_patterns:
                        match = re.search(pattern, text, re.I)
                        if match:
                            review['likes'] = int(match.group(1))
                            break
                    if review['likes'] > 0:
                        break
                
                # Strategy 2: Look for specific elements if no likes found yet
                if review['likes'] == 0:
                    # Try finding buttons or spans with like-related attributes
                    like_elements = container.find_all(['button', 'span', 'div'], 
                        attrs={'aria-label': re.compile(r'(\d+).*like', re.I)})
                    
                    for elem in like_elements:
                        aria_label = elem.get('aria-label', '')
                        match = re.search(r'(\d+)', aria_label)
                        if match:
                            review['likes'] = int(match.group(1))
                            break
                
                # Strategy 3: Look for elements with specific classes
                if review['likes'] == 0:
                    like_classes = [
                        'likeCount',
                        'like-count',
                        'likesCount',
                        'socialStatistics',
                        'social-statistics'
                    ]
                    for class_name in like_classes:
                        elem = container.find(class_=class_name)
                        if elem:
                            match = re.search(r'(\d+)', elem.text)
                            if match:
                                review['likes'] = int(match.group(1))
                                break
                
                # Only append if we got some content
                if review['review_text'] or review['rating']:
                    reviews_list.append(review)
            
            page += 1
            
        print(f"Successfully scraped {len(reviews_list)} reviews")
        return reviews_list
    
    except requests.RequestException as e:
        print(f"Request error while scraping reviews: {e}")
        return reviews_list
    except Exception as e:
        print(f"Error scraping reviews: {e}")
        return reviews_list

In [5]:
# URLs for books by Brandon Sanderson
urls = ["https://www.goodreads.com/book/show/68427.Elantris",
"https://www.goodreads.com/book/show/1268479.Warbreaker",
"https://www.goodreads.com/book/show/28862254-white-sand-volume-1",
"https://www.goodreads.com/book/show/33551363-white-sand-volume-2",
"https://www.goodreads.com/book/show/39298848-white-sand-volume-3",
"https://www.goodreads.com/book/show/60696519-white-sand-omnibus",
"https://www.goodreads.com/book/show/28595941-arcanum-unbounded",
"https://www.goodreads.com/book/show/13578175-the-emperor-s-soul",
"https://www.goodreads.com/book/show/68428.Mistborn",
"https://www.goodreads.com/book/show/68429.The_Well_of_Ascension",
"https://www.goodreads.com/book/show/2767793-the-hero-of-ages",
"https://www.goodreads.com/book/show/28698036-secret-history",
"https://www.goodreads.com/book/show/10803121-the-alloy-of-law",
"https://www.goodreads.com/book/show/16065004-shadows-of-self",
"https://www.goodreads.com/book/show/18739426-the-bands-of-mourning",
"https://www.goodreads.com/book/show/23947089-the-lost-metal",
"https://www.goodreads.com/book/show/7235533-the-way-of-kings",
"https://www.goodreads.com/book/show/17332218-words-of-radiance",
"https://www.goodreads.com/book/show/34703445-edgedancer",
"https://www.goodreads.com/book/show/34002132-oathbringer",
"https://www.goodreads.com/book/show/54511226-dawnshard",
"https://www.goodreads.com/book/show/49021976-rhythm-of-war",
"https://www.goodreads.com/book/show/203578847-wind-and-truth",
"https://www.goodreads.com/book/show/17182126-steelheart",
"https://www.goodreads.com/book/show/18966322-mitosis",
"https://www.goodreads.com/book/show/15704459-firefight",
"https://www.goodreads.com/book/show/15704486-calamity",
"https://www.goodreads.com/book/show/58419574-lux",
"https://www.goodreads.com/book/show/13552643-defending-elysium",
"https://www.goodreads.com/book/show/36642458-skyward",
"https://www.goodreads.com/book/show/42769202-starsight",
"https://www.goodreads.com/book/show/57903876-sunreach",
"https://www.goodreads.com/book/show/57903879-redawn",
"https://www.goodreads.com/book/show/58465495-evershore",
"https://www.goodreads.com/book/show/57571215-cytonic",
"https://www.goodreads.com/book/show/43606308-defiant",
"https://www.goodreads.com/book/show/60531406-tress-of-the-emerald-sea",
"https://www.goodreads.com/book/show/60531410-the-frugal-wizard-s-handbook-for-surviving-medieval-england",
"https://www.goodreads.com/book/show/60531416-yumi-and-the-nightmare-painter",
"https://www.goodreads.com/book/show/60531420-the-sunlit-man",
"https://www.goodreads.com/book/show/210300489-isles-of-the-emberdark",
"https://www.goodreads.com/book/show/49798827-dark-one",
"https://www.goodreads.com/book/show/60373696-dark-one",
"https://www.goodreads.com/book/show/54615879-the-original",
"https://www.goodreads.com/series/45320-alcatraz-vs-the-evil-librarians",
"https://www.goodreads.com/book/show/3485562-alcatraz-versus-the-scrivener-s-bones",
"https://www.goodreads.com/book/show/6366110-alcatraz-versus-the-knights-of-crystallia",
"https://www.goodreads.com/book/show/7740659-alcatraz-versus-the-shattered-lens",
"https://www.goodreads.com/book/show/26114421-the-dark-talent",
"https://www.goodreads.com/book/show/59808314-bastille-vs-the-evil-librarians",
"https://www.goodreads.com/book/show/13452375-legion",
"https://www.goodreads.com/book/show/20886354-skin-deep",
"https://www.goodreads.com/book/show/37640636-lies-of-the-beholder",
"https://www.goodreads.com/book/show/39332065-legion",
"https://www.goodreads.com/book/show/8562526-firstborn",
"https://www.goodreads.com/book/show/25188109-perfect-state",
"https://www.goodreads.com/book/show/31176804-snapshot"

]
def enrich_reviews(df):
    """Cleans and adds useful columns to the reviews DataFrame."""
    if 'review_text' in df.columns:
        df['review_length'] = df['review_text'].apply(lambda x: len(str(x)))
    if 'likes' in df.columns:
        df['likes'] = pd.to_numeric(df['likes'], errors='coerce').fillna(0).astype(int)
    return df
    
# Get book details
# Loop through each URL in the list
for url in urls:
    try:
        print(f"\n========== Getting details for: {url} ==========")
        book_details = get_book_details(url)
        print(pd.Series(book_details))

        # Fetch reviews for each book
        print(f"\nFetching reviews from: {url}")
        reviews = get_reviews(url, num_reviews=1000)
        reviews_df = pd.DataFrame(reviews)

        # Enrich and save per-book data
        reviews_df = enrich_reviews(reviews_df)
        book_df = pd.DataFrame([book_details])

        # Create filenames dynamically
        safe_title = re.sub(r'[^a-zA-Z0-9_-]', '_', book_details.get('title', 'unknown'))
        book_df.to_csv(f'{safe_title}_details.csv', index=False)
        reviews_df.to_csv(f'{safe_title}_reviews.csv', index=False)

        print(f"\nSaved {safe_title} data successfully.")

    except Exception as e:
        print(f"Error processing {url}: {e}")


'''
# Use this if you want to test a single book
url = "https://www.goodreads.com/book/show/68427.Elantris"

# Get book details
book_details = get_book_details(url)
print("Book Details:")
print(pd.Series(book_details))

# Get reviews (let's try 5 reviews to test likes extraction)
print("Fetching reviews from:", url)
reviews = get_reviews(url, num_reviews=1000)

# Convert reviews to DataFrame and show stats
reviews_df = pd.DataFrame(reviews)
print("\nReviews Summary:")
print(f"Total reviews fetched: {len(reviews_df)}")
print("\nColumns present:", list(reviews_df.columns))
print("\nLikes information:")
print("\nLikes distribution:")
print(reviews_df['likes'].value_counts().sort_index())
print("\nLikes stats:")
print(f"Mean likes: {reviews_df['likes'].mean():.1f}")
print(f"Max likes: {reviews_df['likes'].max()}")
print("\nReviews with their likes:")
print(reviews_df[['rating', 'likes', 'review_text']].to_string())
'''


Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris
Could not find publication details
title                                                       Elantris
author                                             Brandon Sanderson
genres             [Fantasy, Fiction, Audiobook, High Fantasy, Ep...
overall_rating                                                  4.17
overall_reviews                                                25198
dtype: object

Fetching reviews from: https://www.goodreads.com/book/show/68427.Elantris
Fetching reviews page 1
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews
Found 30 reviews on page 1
Fetching reviews page 2
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews?page=2
Found 30 reviews on page 2
Fetching reviews page 3
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews?page=3
Found 30 reviews on page 3
Fetchin

'\n# Use this if you want to test a single book\nurl = "https://www.goodreads.com/book/show/68427.Elantris"\n\n# Get book details\nbook_details = get_book_details(url)\nprint("Book Details:")\nprint(pd.Series(book_details))\n\n# Get reviews (let\'s try 5 reviews to test likes extraction)\nprint("Fetching reviews from:", url)\nreviews = get_reviews(url, num_reviews=1000)\n\n# Convert reviews to DataFrame and show stats\nreviews_df = pd.DataFrame(reviews)\nprint("\nReviews Summary:")\nprint(f"Total reviews fetched: {len(reviews_df)}")\nprint("\nColumns present:", list(reviews_df.columns))\nprint("\nLikes information:")\nprint("\nLikes distribution:")\nprint(reviews_df[\'likes\'].value_counts().sort_index())\nprint("\nLikes stats:")\nprint(f"Mean likes: {reviews_df[\'likes\'].mean():.1f}")\nprint(f"Max likes: {reviews_df[\'likes\'].max()}")\nprint("\nReviews with their likes:")\nprint(reviews_df[[\'rating\', \'likes\', \'review_text\']].to_string())\n'

In [7]:
# ===== Your functions must already be defined =====
# - get_book_details(url)
# - get_reviews(url, num_reviews=1000)
# - enrich_reviews(df)  <-- (you already have this one!)

# ===== Loop through all URLs =====
for url in urls:
    try:
        print(f"\n{'='*80}")
        print(f"Processing book: {url}")

        # --- Get book details ---
        book_details = get_book_details(url)
        book_title = book_details.get("title", "unknown_title").replace(" ", "_").replace("/", "_")

        # --- Get reviews ---
        reviews = get_reviews(url, num_reviews=1000)
        reviews_df = pd.DataFrame(reviews)

        # Skip if no reviews
        if reviews_df.empty:
            print(f"No reviews found for {book_title}. Skipping...")
            continue

        # --- Enrich and analyze ---
        reviews_df = enrich_reviews(reviews_df)

        # --- Save enhanced dataset ---
        file_name = f"{book_title.lower()}_reviews_analysis.csv"
        reviews_df.to_csv(file_name, index=False, encoding="utf-8")
        print(f"\n✅ Saved enhanced analysis to '{file_name}'")

        # --- Pause between requests (to be polite to Goodreads) ---
        time.sleep(2)

    except Exception as e:
        print(f"❌ Error processing {url}: {e}")
        continue



Processing book: https://www.goodreads.com/book/show/68427.Elantris
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris
Could not find publication details
Fetching reviews page 1
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews
Found 30 reviews on page 1
Fetching reviews page 2
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews?page=2
Found 30 reviews on page 2
Fetching reviews page 3
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews?page=3
Found 30 reviews on page 3
Fetching reviews page 4
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews?page=4
Found 30 reviews on page 4
Fetching reviews page 5
Response status: 200
Final URL: https://www.goodreads.com/book/show/68427.Elantris/reviews?page=5
Found 30 reviews on page 5
Fetching reviews page 6
Response status: 200
Final URL: https://www.goodrea

KeyboardInterrupt: 

In [6]:
# Add analytics features to the reviews DataFrame
from datetime import datetime
import numpy as np

# Function to standardize dates and add analytics
def enrich_reviews(df):
    # Standardize dates (handle various formats)
    def parse_date(date_str):
        if not isinstance(date_str, str):
            return None
        try:
            # Remove reading status if present
            date_str = re.sub(r'(currently reading|finished reading|started reading)', '', date_str, flags=re.I)
            date_str = date_str.strip()
            
            # Handle common Goodreads date formats
            for fmt in ['%B %d, %Y', '%b %d, %Y', '%Y-%m-%d', '%b %Y']:
                try:
                    return datetime.strptime(date_str.strip(), fmt).date()
                except ValueError:
                    continue
            return None
        except:
            return None
    
    # Parse dates
    if 'review_date' in df.columns:
        df['parsed_date'] = df['review_date'].apply(parse_date)
    
    # Add review length metrics
    df['review_length'] = df['review_text'].str.len()
    df['word_count'] = df['review_text'].str.split().str.len()
    
    # Print analytics summary
    print("\nAnalytics Summary:")
    print("=================")
    
    print("\nReview Length Stats:")
    print(f"Average word count: {df['word_count'].mean():.1f} words")
    print(f"Median word count: {df['word_count'].median():.1f} words")
    print(f"Longest review: {df['word_count'].max()} words")
    print(f"Shortest review: {df['word_count'].min()} words")
    
    print("\nRatings Distribution:")
    rating_dist = df['rating'].value_counts().sort_index()
    print(rating_dist)
    
    if 'reading_status' in df.columns:
        print("\nReading Status Distribution:")
        print(df['reading_status'].value_counts())
    
    if 'parsed_date' in df.columns and df['parsed_date'].notna().any():
        print("\nReview Timeline:")
        print(f"Earliest review: {df['parsed_date'].min()}")
        print(f"Latest review: {df['parsed_date'].max()}")
    
    print("\nTop 3 Longest Reviews:")
    longest = df.nlargest(3, 'word_count')[['rating', 'word_count', 'review_text']]
    longest['review_text'] = longest['review_text'].str[:100] + '...'
    print(longest.to_string())
    
    return df

# Enrich the reviews with analytics
reviews_df = enrich_reviews(reviews_df)

# Save enhanced dataset
reviews_df.to_csv('elantris_reviews_analysis.csv', index=False)
print("\nSaved enhanced analysis to 'elantris_reviews_analysis.csv'")


Analytics Summary:

Review Length Stats:
Average word count: 214.6 words
Median word count: 128.0 words
Longest review: 859 words
Shortest review: 3 words

Ratings Distribution:
rating
3.0     33
4.0    634
5.0    300
Name: count, dtype: int64

Top 3 Longest Reviews:
    rating  word_count                                                                                              review_text
27     4.0         859  4.5 of 5 stars at The BiblioSanctum https://bibliosanctum.com/2017/03/10/...I’ll be the first to adm...
57     4.0         859  4.5 of 5 stars at The BiblioSanctum https://bibliosanctum.com/2017/03/10/...I’ll be the first to adm...
87     4.0         859  4.5 of 5 stars at The BiblioSanctum https://bibliosanctum.com/2017/03/10/...I’ll be the first to adm...

Saved enhanced analysis to 'elantris_reviews_analysis.csv'


In [None]:
# Save the results to CSV files
book_df = pd.DataFrame([book_details])
book_df.to_csv('elantris_details.csv', index=False)
reviews_df.to_csv('elantris_reviews.csv', index=False)

print("\nData has been saved to 'elantris_details.csv' and 'elantris_reviews.csv'")