# Interview assignement - Data Engineering & Architecture
## Scenario
A fictional analytics company wants to examine the relationships between films, actors, 
directors and ratings to gain insights into the film industry. The goal is to identify patterns 
and correlations that could influence a movie's success. Your task as a Data Engineer & 
Architect is to collect the data, combine different data sources and transform the data into a 
structure suitable for analysis. Finally, you will design a query that returns data in a format 
that the company's data scientists can work with.

## Task
Your job is to combine and transform the data in a way that makes it easy to perform analysis 
on the relationships between movie ratings, actor and director success, and other 
measurable success factors. Data scientists want to be able to run queries that return a 
structure for a given movie or person with a side-by-side comparison of movie ratings, 
participation in successful projects, and other relevant metrics.
To accomplish this, you should follow these steps:
1. Programmatically download all the data.
2. Ingest the data into a database or query engine of your choice (conditions and guidelines below).
3. Combine and transform the data into a structure suitable for efficient queries.
4. Design a query that returns data in the desired format for a given movie or person and specific metrics.

**An important condition is that you can use any programming language and any open-source
or cloud tool.**

## Trade-offs
#### Simplicity: 
_Focus on clear and efficient solutions. You should be able to present your approach in a simple and comprehensible way._

I chose jupyter notebook and a dataframe package (polars/pandas) for simplicity and presentability. This is not scalable to Big Data and only visualizes the initial data. The setup should suffice for prototyping, but will not suffice in the long run.

#### Performance: 
_Minimize processing time through efficient data manipulation and analysis._

Polars is implemented in Rust and is highly efficient when handling small to medium size dataframes. The dataframes provide high-level functions that make analysis and manipulation easy. The main bottleneck is the necessity of having access to the dataframes on the system. This will no longer work when database size grows.

#### Scalability: 
_Take into account that the solution must be scalable to deal with the continuous increase in the amount of data. Describe your approaches in the presentation._

To be able to scale to more data i would suggest implementing the data processing in Apache Beam. The pipeline can be run on any Apache Spark cluster (such as Googles Dataflow) and will ingest the data directly from the azure storage, transform the data and feed it back into Google BigQuery. BigQuery allows for Data Scientists to access the data anywhere, analyse and visualize data, apply Machine Learning via AutoML, and scale with increasing data. Another option would be Azure Data Factory, if desired to stay within the Azure Ecosystem.

#### Comprehensibility:
_Make sure your code is well structured and commented. Other colleagues should also be able to easily and directly adopt the code later and develop it further._

This jupyter notebook will contain all necessary documentation to be able to understand and reproduce the results. For simplicity and time-saving, no unit tests were written. In a professional environment, i would heavily recommend writing tests, but a this is an interview, i will focus on the presentation.

## Data
_The data in focus can be accessed on the Internet via the following link to our Azure Blob Storage:_
```
STORAGEACCOUNTURL= "https://saqdiveassignments.blob.core.windows.net"
CONTAINERNAME= "dataengineerfiles"
```
As the first step, i access the data provided to me on azure storage. The data is has open access (requiring no credentials) and will be stored in this repository.

In [1]:
import os
import polars as pl
import pandas as pd
from tqdm import tqdm
from azure.storage.blob import BlobServiceClient

# Set up download directory
ACCOUNT_URL = "https://saqdiveassignments.blob.core.windows.net/"
CONTAINER_NAME = "dataengineerfiles"
DATA_DIR_PATH = os.path.join(os.getcwd(), "data") # Change this as desired
if not os.path.exists(DATA_DIR_PATH):
    os.makedirs(DATA_DIR_PATH)

#### 1. Programmatically download the data
To be able to download the data, a connection to the individual blobs needs to be established. This follows the Azure Object Model.

![title](https://learn.microsoft.com/en-us/azure/storage/blobs/media/storage-blobs-introduction/blob1.png)

This requires a connection to the storage account, then the container, and finally each blob that needs to be downloaded.
For easier adaptation to future changes, i kept the container name as a parameter.

In [2]:
class AzureBlobOperations:
    """Helper class to group Azure Blob Storage operations.
    
    Connects to Azure Blob Storage and provides methods to download blobs to local storage.
    """
    def __init__(self, storage_url: str, credentials=None) -> None:
        self.blob_service_client = BlobServiceClient(storage_url, credential=credentials)

    def download_blob_to_file(self, container_name: str, blob_name: str) -> None:
        """Downloads a single blob from Azure Storage to a local file.
        
        Args:
            container_name: Name of the storage container containing the blob.
            blob_name: Name of the blob to download.
        """
        blob_client = self.blob_service_client.get_blob_client(container=container_name, blob=blob_name)
        with open(file=os.path.join(DATA_DIR_PATH, blob_name), mode="wb") as sample_blob:
            download_stream = blob_client.download_blob()
            for chunk in download_stream.chunks():
                sample_blob.write(chunk)

    def download_all_blobs_to_dir(self, container_name: str) -> None:
        """Downloads all blobs contained in a container on Azure Storage.
        
        Based on the download blob function from Azure Documentation.

        Args:
            container_name: Name of the storage container containing the blob.
        """
        container_client = self.blob_service_client.get_container_client(container=container_name)
        for blob_name in tqdm(list(container_client.list_blob_names()), desc="Downloading blobs from storage: "):
            blob_client = container_client.get_blob_client(blob=blob_name)
            with open(file=os.path.join(DATA_DIR_PATH, blob_name), mode="wb") as sample_blob:
                download_stream = blob_client.download_blob()
                for chunk in download_stream.chunks():
                    sample_blob.write(chunk)
    
    def list_blob_names_and_sizes(self, container_name: str) -> None:
        """Lists all blobs in a container and their size in bytes.
        
        Args:
            container_name: Name of the storage container containing the blob.
        """
        container_client = self.blob_service_client.get_container_client(container=container_name)
        for blob in container_client.list_blobs():
            print(f"Blob name: {blob.name}, Size: {blob.size} bytes")  

In [3]:
azure_op = AzureBlobOperations(storage_url=ACCOUNT_URL, credentials=None)
azure_op.list_blob_names_and_sizes(container_name=CONTAINER_NAME)

Blob name: cast.tsv, Size: 2668643291 bytes
Blob name: crew.tsv, Size: 349084795 bytes
Blob name: names.tsv, Size: 805608596 bytes
Blob name: ratings.tsv, Size: 24365246 bytes
Blob name: titles.tsv, Size: 907670788 bytes


In [4]:
# skip if already downloaded
if os.listdir(DATA_DIR_PATH) == []:
    azure_op.download_all_blobs_to_dir(CONTAINER_NAME)
# azure_op.download_blob_to_file(CONTAINER_NAME, "ratings.tsv")

Downloading blobs from storage: 100%|██████████| 5/5 [14:11<00:00, 170.25s/it]


#### 2. Ingest the data into a database or query engine of your choice
I chose polars for this presentation, but other options could be e.g. MySQL, PostgreSQL, Azure Data Factory or Google BigQuery.

In [5]:
# Load as polars/pandas dataframe
null_values = ["\\N"]

df_cast = pl.read_csv(os.path.join(DATA_DIR_PATH, "cast.tsv"), separator="\t", null_values=null_values)
df_ratings = pl.read_csv(os.path.join(DATA_DIR_PATH, "ratings.tsv"), separator="\t", null_values=null_values)
df_titles = pl.read_csv(os.path.join(DATA_DIR_PATH, "titles.tsv"), separator="\t", null_values=null_values, ignore_errors=True)
df_names = pl.read_csv(os.path.join(DATA_DIR_PATH, "names.tsv"), separator="\t", null_values=null_values)
df_crew = pl.read_csv(os.path.join(DATA_DIR_PATH, "crew.tsv"), separator="\t", null_values=null_values)

#### 3. Combine and transform the data into a structure suitable for efficient queries.
To be able to scale the data engineering steps, it is necessary to identify what steps are required. To do this the following needs to be investigated:

##### 3.1. Visualize raw data

In [6]:
%config InteractiveShell.ast_node_interactivity = 'all'
pd.set_option("display.max_colwidth", 1000)

In [7]:
# plot dataframe description
for df in [df_cast, df_ratings, df_titles, df_names, df_crew]:
    df.describe()

statistic,tconst,ordering,nconst,category,job,characters
str,str,f64,str,str,str,str
"""count""","""59627213""",59627213.0,"""59627213""","""59627213""","""9800948""","""28692745"""
"""null_count""","""0""",0.0,"""0""","""0""","""49826265""","""30934468"""
"""mean""",,4.616037,,,,
"""std""",,2.788623,,,,
"""min""","""tt0000001""",1.0,"""nm0000001""","""actor""","""'An Ode To Com…","""[""!CF"",""CF"",""S…"
"""25%""",,2.0,,,,
"""50%""",,4.0,,,,
"""75%""",,7.0,,,,
"""max""","""tt9916880""",10.0,"""nm9993718""","""writer""","""écrivain""","""[""üzletember""]…"


statistic,tconst,averageRating,numVotes
str,str,f64,f64
"""count""","""1404167""",1404167.0,1404167.0
"""null_count""","""0""",0.0,0.0
"""mean""",,6.955961,1036.665241
"""std""",,1.38564,17652.512372
"""min""","""tt0000001""",1.0,5.0
"""25%""",,6.2,11.0
"""50%""",,7.1,26.0
"""75%""",,7.9,101.0
"""max""","""tt9916880""",10.0,2858671.0


statistic,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
str,str,str,str,str,f64,f64,str,f64,str
"""count""","""10565899""","""10565899""","""10565899""","""10565899""",10565898.0,9169791.0,"""119028""",3249656.0,"""10095773"""
"""null_count""","""0""","""0""","""0""","""0""",1.0,1396108.0,"""10446871""",7316243.0,"""470126"""
"""mean""",,,,,0.03613,2005.620404,,43.530298,
"""std""",,,,,2.979874,20.089394,,73.323478,
"""min""","""tt0000001""","""movie""","""!Next?""","""!Next?""",0.0,1874.0,"""1906""",0.0,"""Action"""
"""25%""",,,,,0.0,2001.0,,19.0,
"""50%""",,,,,0.0,2013.0,,30.0,
"""75%""",,,,,0.0,2018.0,,60.0,
"""max""","""tt9916880""","""videoGame""","""起来! ARISE! - (…","""“Prime Video -…",2023.0,2031.0,"""24""",54321.0,"""Western"""


statistic,nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
str,str,str,f64,f64,str,str
"""count""","""13271078""","""13271078""",606412.0,225482.0,"""10601422""","""11709625"""
"""null_count""","""0""","""0""",12664666.0,13045596.0,"""2669656""","""1561453"""
"""mean""",,,1953.205632,1992.944891,,
"""std""",,,34.959063,34.454425,,
"""min""","""nm0000001""","""!'aru Ikhuisi …",1.0,17.0,"""actor""","""tt0000003"""
"""25%""",,,1932.0,1979.0,,
"""50%""",,,1960.0,2001.0,,
"""75%""",,,1979.0,2014.0,,
"""max""","""nm9993719""","""﻿Thesia Koulou…",2023.0,2024.0,"""writer,visual_…","""tt9916856,tt11…"


statistic,tconst,directors,writers
str,str,str,str
"""count""","""10565899""","""6063726""","""5462206"""
"""null_count""","""0""","""4502173""","""5103693"""
"""mean""",,,
"""std""",,,
"""min""","""tt0000001""","""nm0000005""","""nm0000005"""
"""25%""",,,
"""50%""",,,
"""75%""",,,
"""max""","""tt9916880""","""nm9993709""","""nm9993713,nm31…"


In [8]:
for df in [df_cast, df_ratings, df_titles, df_names, df_crew]:
    df.head(20)

tconst,ordering,nconst,category,job,characters
str,i64,str,str,str,str
"""tt0000001""",1,"""nm1588970""","""self""",,"""[""Self""]"""
"""tt0000001""",2,"""nm0005690""","""director""",,
"""tt0000001""",3,"""nm0374658""","""cinematographe…","""director of ph…",
"""tt0000002""",1,"""nm0721526""","""director""",,
"""tt0000002""",2,"""nm1335271""","""composer""",,
…,…,…,…,…,…
"""tt0000006""",1,"""nm0005690""","""director""",,
"""tt0000007""",1,"""nm0179163""","""actor""",,
"""tt0000007""",2,"""nm0183947""","""actor""",,
"""tt0000007""",3,"""nm0005690""","""director""",,


tconst,averageRating,numVotes
str,f64,i64
"""tt0000001""",5.7,2024
"""tt0000002""",5.7,272
"""tt0000003""",6.5,1962
"""tt0000004""",5.4,178
"""tt0000005""",6.2,2727
…,…,…
"""tt0000016""",5.9,1553
"""tt0000017""",4.6,339
"""tt0000018""",5.2,618
"""tt0000019""",5.1,32


tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
str,str,str,str,i64,i64,str,i64,str
"""tt0000001""","""short""","""Carmencita""","""Carmencita""",0,1894,,1,"""Documentary,Sh…"
"""tt0000002""","""short""","""Le clown et se…","""Le clown et se…",0,1892,,5,"""Animation,Shor…"
"""tt0000003""","""short""","""Pauvre Pierrot…","""Pauvre Pierrot…",0,1892,,4,"""Animation,Come…"
"""tt0000004""","""short""","""Un bon bock""","""Un bon bock""",0,1892,,12,"""Animation,Shor…"
"""tt0000005""","""short""","""Blacksmith Sce…","""Blacksmith Sce…",0,1893,,1,"""Comedy,Short"""
…,…,…,…,…,…,…,…,…
"""tt0000016""","""short""","""Boat Leaving t…","""Barque sortant…",0,1895,,1,"""Documentary,Sh…"
"""tt0000017""","""short""","""Italienischer …","""Italienischer …",0,1895,,1,"""Documentary,Sh…"
"""tt0000018""","""short""","""Das boxende Kä…","""Das boxende Kä…",0,1895,,1,"""Short"""
"""tt0000019""","""short""","""The Clown Barb…","""The Clown Barb…",0,1898,,,"""Comedy,Short"""


nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
str,str,i64,i64,str,str
"""nm0000001""","""Fred Astaire""",1899,1987,"""soundtrack,act…","""tt0027125,tt00…"
"""nm0000002""","""Lauren Bacall""",1924,2014,"""actress,soundt…","""tt0117057,tt00…"
"""nm0000003""","""Brigitte Bardo…",1934,,"""actress,soundt…","""tt0049189,tt00…"
"""nm0000004""","""John Belushi""",1949,1982,"""actor,soundtra…","""tt0077975,tt00…"
"""nm0000005""","""Ingmar Bergman…",1918,2007,"""writer,directo…","""tt0050976,tt00…"
…,…,…,…,…,…
"""nm0000016""","""Georges Deleru…",1925,1992,"""composer,sound…","""tt0069946,tt88…"
"""nm0000017""","""Marlene Dietri…",1901,1992,"""soundtrack,act…","""tt0021156,tt00…"
"""nm0000018""","""Kirk Douglas""",1916,2020,"""actor,producer…","""tt0050825,tt00…"
"""nm0000019""","""Federico Felli…",1920,1993,"""writer,directo…","""tt0071129,tt00…"


tconst,directors,writers
str,str,str
"""tt0000001""","""nm0005690""",
"""tt0000002""","""nm0721526""",
"""tt0000003""","""nm0721526""",
"""tt0000004""","""nm0721526""",
"""tt0000005""","""nm0005690""",
…,…,…
"""tt0000016""","""nm0525910""",
"""tt0000017""","""nm1587194,nm08…",
"""tt0000018""","""nm0804434""",
"""tt0000019""","""nm0932055""",


##### 3.2. Create a schema
This image shows the original schema:

![title](./resources/raw_data.png)

Several issues spring to mind when visualizing the relation between these tables:
1. Identify and use appropriate data types, such as INT for `endYear` in `titles`
2. Optimize the table schema by normalizing the database
3. Replacing a faulty value in `isAdult` in `titles`

This creates something along the lines of the following schema:

![title](./resources/cleaned_data.png)

##### 3.3 Apply optimizations
Bring the data into the optimized format for easier manipulation from data scientists down the line.

In [9]:
# move characters to its own table
df_characters = (
    df_cast.select(
        pl.col("tconst"),
        pl.col("nconst"),
        pl.col("characters").str.json_decode().alias("character"),
    )
    .drop_nulls()
    .explode("character")
)

df_cast = df_cast.select(
    pl.col("tconst"),
    pl.col("nconst"),
    pl.col("ordering"),
    pl.col("category"),
    pl.col("job"),
)

In [10]:
# extract known for titles from names
df_known_for = (
    df_names.select(
        pl.col("knownForTitles").alias("tconst").str.split(","),
        pl.col("nconst"),
    )
    .drop_nulls()
    .explode("tconst")
)

In [11]:
# explode the directors and writers
df_directors = (
    df_crew.select("tconst", pl.col("directors").str.split(",").alias("nconst"))
    .drop_nulls()
    .explode("nconst")
)

In [12]:
df_writers = (
    df_crew.select("tconst", pl.col("writers").str.split(",").alias("nconst"))
    .drop_nulls()
    .explode("nconst")
)

In [13]:
# table for actors
df_actors = df_cast.filter(pl.col("category") == "actor").select("tconst", "nconst")

In [14]:
# extract professions from names
df_professions = (
    df_names.select(
        "nconst",
        pl.col("primaryProfession").str.split(","),
    )
    .drop_nulls()
    .explode("primaryProfession")
)
# df_professions.to_dummies("primaryProfession")

In [15]:
# convert endYear type
df_names = df_names.drop("primaryProfession", "knownForTitles")


In [16]:
df_genres = (
    df_titles.select(
        "tconst",
        pl.col("genres").str.split(",").alias("genre"),
    )
    .drop_nulls()
    .explode("genre")
)

In [17]:
# convert endYear type to INT
df_titles = df_titles.with_columns(
    pl.col("isAdult").fill_null(1),
    pl.col("endYear").cast(pl.Int32).alias("endYear")
).drop("genres")

#### 4. Design a query that returns data in the desired format for a given movie or person and specific metrics.
To be able to calculate the success of a movie or person based on the data, several steps need to be done.
The first step is the calculation of success metrics.
Afterwards, we will join all data required for the data scientists into a single dataframe.

In [18]:
# calculate the performance of title the person is known for
df_rating_known_for = df_known_for.join(df_ratings, on="tconst").group_by("nconst").agg(
    pl.col("averageRating").mean().alias("knownAverageRating"),
    pl.col("numVotes").sum().alias("knownNumVotes"),
)
df_rating_known_for.head()

nconst,knownAverageRating,knownNumVotes
str,f64,i64
"""nm1267075""",6.5,3106
"""nm2466868""",7.25,148
"""nm5083741""",7.3,1878
"""nm6590285""",8.5,49
"""nm0285293""",4.7,148


In [19]:
# calculate the performance of every movie the person has acted in with its title
df_actor_performance = (
    df_actors.join(df_ratings, on="tconst", how="inner")

    .select(
        "nconst",

        "tconst",

        "averageRating",
        "numVotes",
    )
)
df_actor_performance.head(25)

nconst,tconst,averageRating,numVotes
str,str,f64,i64
"""nm0443482""","""tt0000005""",6.2,2727
"""nm0653042""","""tt0000005""",6.2,2727
"""nm0179163""","""tt0000007""",5.4,847
"""nm0183947""","""tt0000007""",5.4,847
"""nm0653028""","""tt0000008""",5.4,2172
…,…,…,…
"""nm0617588""","""tt0000066""",3.2,30
"""nm0525908""","""tt0000070""",6.4,2735
"""nm0617588""","""tt0000075""",6.3,2017
"""nm0420198""","""tt0000076""",4.5,550


In [20]:
# calculate the performance of every movie the person has directed
df_director_performance = (
    df_directors.join(df_ratings, on="tconst", how="inner")
    .select(
        "nconst",
        "tconst",
        "averageRating",
        "numVotes",
    )
)
df_director_performance.head(25)

nconst,tconst,averageRating,numVotes
str,str,f64,i64
"""nm0005690""","""tt0000001""",5.7,2024
"""nm0721526""","""tt0000002""",5.7,272
"""nm0721526""","""tt0000003""",6.5,1962
"""nm0721526""","""tt0000004""",5.4,178
"""nm0005690""","""tt0000005""",6.2,2727
…,…,…,…
"""nm0804434""","""tt0000018""",5.2,618
"""nm0932055""","""tt0000019""",5.1,32
"""nm0010291""","""tt0000020""",4.8,372
"""nm0525910""","""tt0000022""",5.1,1127


In [21]:
# calculate the performance of every movie the person has written
df_writer_performance = (
    df_writers.join(df_ratings, on="tconst", how="inner")
    .select(
        "nconst",
        "tconst",
        "averageRating",
        "numVotes",
    )
)
df_writer_performance.head(25)

nconst,tconst,averageRating,numVotes
str,str,f64,i64
"""nm0085156""","""tt0000009""",5.3,209
"""nm0410331""","""tt0000036""",4.4,620
"""nm0410331""","""tt0000076""",4.5,550
"""nm0617588""","""tt0000091""",6.7,3910
"""nm0410331""","""tt0000108""",4.4,558
…,…,…,…
"""nm0841389""","""tt0000215""",4.1,131
"""nm0617588""","""tt0000218""",5.9,695
"""nm0207305""","""tt0000225""",4.9,37
"""nm0857203""","""tt0000229""",4.5,115


In [24]:
person_name = "nm0085156"

# create a dataframe for if they have directed, written or acted in the movie
# by making the attribute 1 if they have and 0 if they have not

# join the three dataframes
df_director_writer_actor = (
    df_directors.with_columns(pl.lit(1).alias("director"))
    .join(
        df_writers.with_columns(pl.lit(1).alias("writer")),
        on=("tconst", "nconst"),
        how="outer_coalesce",
    )
    .join(
        df_actors.with_columns(pl.lit(1).alias("actor")),
        on=("tconst", "nconst"),
        how="outer_coalesce",
    )
    .fill_null(0)
)

# join the performance and some metadata of the movies and the person
df_director_writer_actor = (
    df_director_writer_actor.join(df_titles, on="tconst", how="inner")
    .join(df_names, on="nconst", how="inner")
    .join(df_ratings, on="tconst", how="inner")
    .join(df_rating_known_for, on="nconst", how="inner")
    .select(
        "tconst",
        "nconst",
        "primaryName",
        "director",
        "writer",
        "actor",
        "primaryTitle",
        "averageRating",
        "numVotes",
        "knownAverageRating",
        "knownNumVotes",
    )
)
df_director_writer_actor.head(25)

tconst,nconst,primaryName,director,writer,actor,primaryTitle,averageRating,numVotes,knownAverageRating,knownNumVotes
str,str,str,i32,i32,i32,str,f64,i64,f64,i64
"""tt0854172""","""nm2466868""","""Sanjeev Giriwe…",0,1,1,"""With Luv... Tu…",6.4,30,7.25,148
"""tt11585604""","""nm2466868""","""Sanjeev Giriwe…",0,1,0,"""Vajvuya Band B…",5.6,8,7.25,148
"""tt1584917""","""nm2466868""","""Sanjeev Giriwe…",0,1,0,"""Bolo Raam""",5.2,105,7.25,148
"""tt3239126""","""nm2466868""","""Sanjeev Giriwe…",0,1,0,"""To B or Not to…",9.2,14,7.25,148
"""tt3353822""","""nm2466868""","""Sanjeev Giriwe…",0,1,1,"""Chausar""",6.5,91,7.25,148
…,…,…,…,…,…,…,…,…,…,…
"""tt12632132""","""nm10302438""","""Hasan Erimez""",0,1,0,"""Uyanis: Büyük …",7.9,4620,7.733333,47494
"""tt13028246""","""nm10302438""","""Hasan Erimez""",0,1,0,"""Episode #1.1""",9.1,245,7.733333,47494
"""tt13177594""","""nm10302438""","""Hasan Erimez""",0,1,0,"""Episode #1.2""",8.8,146,7.733333,47494
"""tt13177606""","""nm10302438""","""Hasan Erimez""",0,1,0,"""Episode #1.3""",9.0,128,7.733333,47494


In [25]:
df_director_writer_actor.filter(pl.col("primaryName") == "George Lucas")

tconst,nconst,primaryName,director,writer,actor,primaryTitle,averageRating,numVotes,knownAverageRating,knownNumVotes
str,str,str,i32,i32,i32,str,f64,i64,f64,i64
"""tt0000009""","""nm0085156""","""Alexander Blac…",1,1,0,"""Miss Jerry""",5.3,209,5.3,209
