# Data science over a small movie dataset -- Part 3

<p style="font-size: 20px; font-weight: bold;">Relationships graphs</p>

Anton Antonov   
November 2025  

---

## Introduction

This notebook shows transformation of movie dataset into a form more suitable for making a movie recommender system. 

The movie data was downloaded from here: ["IMDB Movie Ratings Dataset"](https://www.kaggle.com/datasets/thedevastator/imdb-movie-ratings-dataset). That dataset was chosen because:

- It has the right size for demonstration of data wrangling techniques
    - ≈5000 rows and 15 columns (each row corresponding to a movie)
- It is "real life" data with expected skewness of variable distributions
- It is diverse enough over movie years and genres
- There are no missing values

---

## Setup

Load the packages used in this notebook:

In [None]:
use Math::SparseMatrix;
use ML::SparseMatrixRecommender;
use ML::SparseMatrixRecommender::Utilities;
use Statistics::OutlierIdentifiers;

In [None]:
#% javascript
require.config({
     paths: {
     d3: 'https://d3js.org/d3.v7.min'
}});

require(['d3'], function(d3) {
     console.log(d3);
});

In [None]:
#% js
js-d3-list-line-plot(10.rand xx 40, background => 'none', stroke-width => 2)

In [None]:
my $title-color = 'Silver';
my $stroke-color = 'SlateGray';
my $tooltip-color = 'LightBlue';
my $tooltip-background-color = 'none';
my $tick-labels-font-size = 10;
my $tick-labels-color = 'Silver';
my $tick-labels-font-family = 'Helvetica';
my $background = '#1F1F1F';
my $color-scheme = 'schemeTableau10';
my $color-palette = 'Inferno';
my $edge-thickness = 3;
my $vertex-size = 6;
my $mmd-theme = q:to/END/;
%%{
  init: {
    'theme': 'forest',
    'themeVariables': {
      'lineColor': 'Ivory'
    }
  }
}%%
END
my %force = collision => {iterations => 0, radius => 10},link => {distance => 180};
my %force2 = charge => {strength => -30, iterations => 4}, collision => {radius => 50, iterations => 4}, link => {distance => 30};

my %opts = :$background, :$title-color, :$edge-thickness, :$vertex-size;

---

## Ingest data

----

## Recommender system

One way to investigate (browse) the data is to make a recommender system and explore with it different aspects of the movie dataset like movie profiles and nearest neighbors similarities distribution.

### Make the recommender

Here we make a Sparse Matrix Recommender (SMR):

In [None]:
my $smrObj = 
    ML::SparseMatrixRecommender.new
    .create-from-long-form(
        @dsMovieDataLongForm2.append(@dsMovieGenreLongForm), 
        item-column-name => 'Item', 
        tag-type-column-name => 'TagType',
        tag-column-name => 'Tag',
        :add-tag-types-to-column-names)        
    .apply-term-weight-functions('IDF', 'None', 'Cosine')

Here are the recommender sub-matrices dimensions (rows and columns):

In [None]:
.say for $smrObj.take-matrices.deepmap(*.dimensions).sort(*.key)

Note that the sub-matrices of "reviews_count", "score", and "votes_count" have small number of columns, corresponding to the number probabilities specified when categorizing to intervals.

### Recommendations

Recommendation by history:

In [None]:
sink $smrObj
.recommend(<2125 2308>, 12, :!normalize, :!remove-history)
.join-across(select-columns(@dsMovieData, @field-names), 'index')
.echo-value(as => {to-pretty-table($_, align => 'l', field-names => ['score', |@field-names])})

----

## Graphs

Using the recommender we make the nearest neighbors graph for the movies from year 2014 to 2016.

In [None]:
my @focusMovieIDs = (2014...2016).map({ $smrObj.recommend-by-profile('year:' ~ $_, Inf).take-value».key }).flat;
@focusMovieIDs.elems

Change the tag type weights to reflect the view that:
- Common actors or directors means movies a similar
    - Or are seen by the "same" viewers
- Common genres are important
    - But not as much as directors or actors
- Release years are not important    


In [None]:
$smrObj.apply-tag-type-weights({ director => 1, actor => 1, genre => 0.5}, default => 0.2)

For each movie find two nearest neighbors and make corresponding graph edges:

In [None]:
my @edges = @focusMovieIDs.map({ $_ X=> $smrObj.recommend($_, 6).take-value».key }).flat;
@edges.elems

In [None]:
my @edges2 = @edges.grep({ $_.value ∈ @focusMovieIDs });
@edges2.elems

In [None]:
sink my %indexToID = @dsMovieData.map({ $_<index> => "{$_<index>} {$_<movie_title>.trim} ($_<title_year>)" });

In [None]:
my $g = Graph.new(@edges2.map({ %indexToID{$_.key} => %indexToID{$_.value} }));

In [None]:
my @comps = $g.connected-components.sort(-*.elems);
deduce-type(@comps)

In [None]:
my @focus-component = |@comps.grep({ $_.join(' ') ~~ /:i star / }).head;
@focus-component.elems

In [None]:
#%js
$g.edges(:dataset) 
==> js-d3-graph-plot(
    vertex-label-color => 'none',
    :$background, 
    title-color => 'gray',
    width => 1200, 
    edge-thickness => 1,
    vertex-size => 2,
    force => {charge => {strength => -20, iterations => 2}, collision => {radius => 1, iterations => 1}, link => {distance => 0}}
)

---

## References

### Articles, blog posts

### Packages