# Netflix Database

Data provided from [Kaggle](https://www.kaggle.com/shivamb/netflix-shows?select=netflix_titles.csv)

I am planning to use a Neo4j graph database for the netflix movie and show database.
This should make relationship traversal much easier.

## What is going to happen in this notebook

Currently all of the movies and shows are in one csv.
We need to parse the data to make it easier to import.
We need to break out the movies and shows as well as the actors.


In [13]:

import pandas as pd
from pandas import Series, DataFrame

netflix_df = pd.read_csv('data/netflix_titles.csv')

# some rows are missing cast. fill it with an empty string
netflix_df.fillna("", inplace=True)
# create a dataframe of shows
shows_df = netflix_df[netflix_df['type'] == 'TV Show']

# create a dataframe of movies
movies_df = netflix_df[netflix_df['type'] == 'Movie']

In [14]:
casts = []
for idx, show in netflix_df.iterrows():
    id = show.show_id
    cast = show['cast'].split(',')
    for a in cast:
        row = {'show_id': id, 'actor': a}
        casts.append(row)
        break

casts_df = pd.DataFrame(casts)

In [15]:
from neo4j import GraphDatabase
import string
import random

def gen_key(length=10):
    # choose from all lowercase letter
    letters = string.ascii_lowercase
    return''.join(random.choice(letters) for i in range(length))

def insert_title(tx, netflix_title: Series, cast: DataFrame, listed_in):
    netflix_id = netflix_title.show_id
    title = netflix_title.title
    title_type = netflix_title.type
    date_added = netflix_title.date_added
    release_year = netflix_title.release_year
    rating = netflix_title.rating
    description = netflix_title.description
    cast_members = cast[cast['show_id'] == netflix_id]


    query = """CREATE (title:Title {
    netflix_id: $netflix_id,
    type: $type,
    title: $title,
    date_added: $date_added,
    release_year: $release_year,
    rating: $rating,
    description: $description})
"""
    print("Title")
    print(title)
    print("actors")
    for idx, cast_member in cast_members.iterrows():
        actor = cast_member['actor']
        actor = actor.replace("'", "")
        actor = actor.replace('"', "")
        actor.strip()
        # actor = ''.join(i for i in actor if not i.isdigit())
        key = gen_key()
        print(actor)
        if actor and len(actor) > 0:
            cast_query = """MERGE (%s:Actor {name: "%s"})
ON CREATE
  SET %s.created = timestamp()
ON MATCH
  SET %s.found = timestamp()
Create (%s) - [:ACTED_IN] -> (title)
""" % (key, actor, key, key, key)
            query = query + cast_query.lstrip()
    print("genres")
    for genre in listed_in:
        genre = genre.replace("'", "")
        genre = genre.strip()
        key = gen_key()
        print(genre)
        if genre and len(genre) > 0:
            genre_query = """MERGE (%s:Genre {name: "%s"})
Create (%s) <- [:LISTED_IN] - (title)
""" % (key, genre, key)
            query = query + genre_query.lstrip()


    result = tx.run(query,
                    netflix_id=netflix_id,
                    type = title_type,
                    title=title,
                    date_added=date_added,
                    release_year=release_year,
                    rating=rating,
                    description=description
                    )


def insert_titles():


    casts = []
    # print(netflix_df.loc[1].cast)
    # cast = netflix_df.loc[1].cast.split(',')
    for idx, show in netflix_df.iterrows():
        # db = GraphDatabase.driver(
        #     'neo4j+s://88ae241a.databases.neo4j.io',
        #     auth=('neo4j', '5_P8nEEOMuNDgzlS4uN_M30chgPgJ_f_GYQzHZWUPho'),
        #     max_connection_lifetime=60,
        #     connection_timeout=0
        # )
        db = GraphDatabase.driver(
            'bolt://localhost:7687',
            auth=('neo4j', 'cd007-01'),
        )
        id = show.show_id
        cast = show.cast.split(',')
        for a in cast:
            row = {'show_id': id, 'actor': a}
            casts.append(row)

        casts_df = pd.DataFrame(casts)

        listed_in = show.listed_in.split(',')

        with db.session() as session:
            result = session.write_transaction(insert_title, show, casts_df, listed_in)


insert_titles()

Title
Dick Johnson Is Dead
actors

genres
Documentaries
Title
Blood & Water
actors
Ama Qamata
 Khosi Ngema
 Gail Mabalane
 Thabang Molaba
 Dillon Windvogel
 Natasha Thahane
 Arno Greeff
 Xolile Tshabalala
 Getmore Sithole
 Cindy Mahlangu
 Ryle De Morny
 Greteli Fincham
 Sello Maake Ka-Ncube
 Odwa Gwanya
 Mekaila Mathys
 Sandi Schultz
 Duane Williams
 Shamilla Miller
 Patrick Mofokeng
genres
International TV Shows
TV Dramas
TV Mysteries
Title
Ganglands
actors
Sami Bouajila
 Tracy Gotoas
 Samuel Jouy
 Nabiha Akkari
 Sofia Lesaffre
 Salim Kechiouche
 Noureddine Farihi
 Geert Van Rampelberg
 Bakary Diombera
genres
Crime TV Shows
International TV Shows
TV Action & Adventure
Title
Jailbirds New Orleans
actors

genres
Docuseries
Reality TV
Title
Kota Factory
actors
Mayur More
 Jitendra Kumar
 Ranjan Raj
 Alam Khan
 Ahsaas Channa
 Revathi Pillai
 Urvi Singh
 Arun Kumar
genres
International TV Shows
Romantic TV Shows
TV Comedies
Title
Midnight Mass
actors
Kate Siegel
 Zach Gilford
 Hamish Linkl

Exception ignored in: <function Session.__del__ at 0x124b8b550>
Traceback (most recent call last):
  File "/Users/brandon/projects/torqata_example/venv/lib/python3.9/site-packages/neo4j/work/simple.py", line 97, in __del__
    self.close()
  File "/Users/brandon/projects/torqata_example/venv/lib/python3.9/site-packages/neo4j/work/simple.py", line 178, in close
    self._disconnect()
  File "/Users/brandon/projects/torqata_example/venv/lib/python3.9/site-packages/neo4j/work/simple.py", line 127, in _disconnect
    self._pool.release(self._connection)
  File "/Users/brandon/projects/torqata_example/venv/lib/python3.9/site-packages/neo4j/io/__init__.py", line 708, in release
    connection.reset()
  File "/Users/brandon/projects/torqata_example/venv/lib/python3.9/site-packages/neo4j/io/_bolt4.py", line 230, in reset
    self.fetch_all()
  File "/Users/brandon/projects/torqata_example/venv/lib/python3.9/site-packages/neo4j/io/__init__.py", line 522, in fetch_all
    detail_delta, summary_d

Title
Get Organized with The Home Edit
actors

genres
Reality TV
Title
La Línea: Shadow of Narco
actors

genres
Crime TV Shows
Docuseries
International TV Shows
Title
So Much Love to Give
actors
Adrián Suar
 Soledad Villamil
 Gabriela Toscano
 Alan Sabbagh
 Darío Barassi
 Magela Zanotta
 Betiana Blum
genres
Comedies
International Movies
Music & Musicals
Title
The Social Dilemma
actors
Skyler Gisondo
 Kara Hayward
 Vincent Kartheiser
genres
Documentaries
Title
#Alive
actors
Yoo Ah-in
 Park Shin-hye
genres
Horror Movies
International Movies
Thrillers
Title
Record of Youth
actors
Park Bo-gum
 Park So-dam
 Byeon Woo-seok
 Ha Hee-ra
 Shin Ae-ra
 Han Jin-hee
 Shin Dong-mi
 Lee Chang-hoon
 Kwon Soo-hyun
 Cho Yu-jung
 Park Su-young
 Seo Sang-won
 Lee Jae-won
genres
International TV Shows
Romantic TV Shows
TV Dramas
Title
My Octopus Teacher
actors

genres
Children & Family Movies
Documentaries
International Movies
Title
Toll Booth
actors
Büşra Pekin
 Nur Aysan
 Ruhi Sarı
 Nergis Öztürk
 Nadir S