# Deep Dive Analysis: Discovering Your Film Habits

This notebook demonstrates how to use the `analytics` layer of `scrapxd` to extract deep insights into a Letterboxd user's habits.

We will analyze a user's diary to discover their favorite directors, genres, and actors, and even visualize their cinematic discovery journey.

### Step 1: Installation and Imports

For this notebook, we'll need data visualization libraries like `matplotlib` and `seaborn`.

In [None]:
!pip install scrapxd
!pip install matplotlib seaborn

import matplotlib.pyplot as plt
import seaborn as sns
from scrapxd import Scrapxd

# Settings for our plots
sns.set_theme(style="whitegrid")

### Step 2: Fetch the Data

Let's get a user's diary. For a richer analysis, it's best to choose a user with a large number of logged films. We'll use `dave` (one of Letterboxd's founders) as an example, but you can replace it with your own.

In [None]:
client = Scrapxd()
user = client.get_user("dave")

# Accessing the diary will trigger the scraping of all the user's diary pages.
# This might take a few minutes for users with many films
diary = user.diary

print(f"Analyzing {diary.number_of_entries} diary entries from {user.display_name}.")

### Step 3: Top Analysis - Who and what do you watch the most?

Let's use the `get_top_*` methods to find the most frequent directors, actors, and genres in the diary.

In [None]:
# Get the top 10 most-watched directors
top_directors = diary.get_top_directors(top_n=10)

# Prepare data for plotting
directors, counts = zip(*top_directors)

# Create the bar chart
plt.figure(figsize=(12, 6))
sns.barplot(x=list(counts), y=list(directors), hue=list(directors), palette="viridis", legend=False)
plt.title(f'Top 10 Most Watched Directors for {user.username}', fontsize=16)
plt.xlabel('Number of Films Watched')
plt.ylabel('Director')
plt.show()

In [None]:
# Do the same for the top 10 most-watched genres
top_genres = diary.get_top_genres(top_n=10)
genres, counts = zip(*top_genres)

plt.figure(figsize=(12, 6))
sns.barplot(x=list(counts), y=list(genres), hue=list(genres), palette="plasma", legend=False)
plt.title(f'Top 10 Most Watched Genres for {user.username}', fontsize=16)
plt.xlabel('Number of Films Watched')
plt.ylabel('Genre')
plt.show()

### Step 4: Rating Analysis - What do you like the most?

We can go beyond frequency and analyze your ratings. Which genres do you give the highest average ratings to?

In [None]:
ratings_by_genre = diary.get_rating_by_genre(top_n=10)
genres, avg_ratings = zip(*ratings_by_genre)

plt.figure(figsize=(12, 6))
ax = sns.barplot(x=[round(r, 2) for r in avg_ratings], y=list(genres), hue=list(genres), palette="magma", legend=False)
ax.set_xlim(0, 5)
plt.title(f'Top 10 Highest-Rated Genres for {user.username}', fontsize=16)
plt.xlabel('Average Rating (0-5)')
plt.ylabel('Genre')
plt.show()

### Step 5: Temporal Analysis - When do you watch films?

Let's find out in which year and month the user was most active.

In [None]:
year, count_y = diary.most_watched_year
month, count_m = diary.most_watched_month
weekday, count_w = diary.most_frequent_watch_day

print(f"Most active year: {year} (with {count_y} films)")
print(f"Most active month: {month} (with {count_m} films, across all years)")
print(f"Most frequent weekday: {weekday} (with {count_w} films)")

### Conclusion

With the `analytics` layer of `scrapxd`, it's possible to go far beyond simple data extraction. We can create a detailed profile of any user's tastes and habits, opening the door for creative data visualizations and portfolio-worthy analysis projects.