In [141]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from collections import Counter
import itertools

In [142]:
orginal_data = pd.read_csv('data.csv', sep=',')
data = orginal_data.copy(deep=True)

# Calculations

Following calculation will be provided for Netflix data. Same formulas were used to calculate data for other platforms and can be found in *calculations.py* inside the project.

## Netflix

In [143]:
Netflix_data = data[data["Netflix"] == 1]

### IMdB

In [144]:
Netflix_data["IMDb"].apply(lambda x: float(str(x)[:3])).mean().round(3)

6.266

In [145]:
Netflix_data["IMDb"].apply(lambda x: float(str(x)[:3])).median()

6.4

In [146]:
Netflix_data["IMDb"].apply(lambda x: float(str(x)[:3])).std().round(3)

1.117

### Rotten Tomatoes

In [147]:
Netflix_data["Rotten Tomatoes"].dropna().apply(lambda x: float(str(x)[:2])).std().round(3)

13.848

In [148]:
Netflix_data["Rotten Tomatoes"].dropna().apply(lambda x: float(str(x)[:2])).mean().round(3)

54.448

In [149]:
Netflix_data["Rotten Tomatoes"].dropna().apply(lambda x: float(str(x)[:2])).median()

53.0

### Release date

In [150]:
release_year = {}
Netflix_release = data[(data["Netflix"] == 1)]
years = dict(Netflix_release["Year"].value_counts().sort_values())
fig = go.Figure([go.Bar(x=list(years.keys()), y=list(years.values()))])
fig.show()

Netflix is famous for its TV-series. Especially for its amount. The highest amount of series was released in 2018-2020 but many of them were postponed due to pandemic. However, as we can see from the graph it is obvious that Netflix produces only modern TV-show and lacks some old classic.

### Genres

In [163]:
genres = list(itertools.chain(*list(Netflix_release["Genres"].dropna().apply(lambda x: x.split(",")))))
Genres_info = dict(sorted(Counter(genres).items(), key=lambda x: x[1], reverse=True))
gen = list(Genres_info.keys())
gen_data = list(Genres_info.values())
pie_fig = go.Figure(data=[go.Pie(labels=gen, values=gen_data)])
# pie_fig.write_html("genres_plot.html")
pie_fig.show()


The most popular genres on Netflix are Comedy and Drama. Other categories can be either expensive or unpopular for a big auditory which Netflix certainly has. All in all, there is nothing special about it.

### Age and language

In [170]:
ages = dict(Netflix_release["Age"].dropna().value_counts())
ages_fig = go.Figure(data=[go.Pie(labels=list(ages.keys())[:20:], values=list(ages.values())[:20:])])
# lang_fig.write_html("language_plot.html")
ages_fig.show()

Netflix is various in terms of age category. Many of its movies has 18+ rating. About 38% of movies were created for younger category.

In [162]:
Netflix_languages = list(itertools.chain(*list(Netflix_release["Language"].dropna().apply(lambda x: x.split(",")))))
languages = dict(sorted(Counter(Netflix_languages).items(), key=lambda x: x[1], reverse=True))
lang_fig = go.Figure(data=[go.Pie(labels=list(languages.keys())[:20:], values=list(languages.values())[:20:])])
# lang_fig.write_html("language_plot.html")
lang_fig.show()

All mentioned before is available mostly on English languages. To sum it up, Netflix wants to create an entertainment for the whole family

### Directors

In [161]:
directors = list(itertools.chain(*list(Netflix_release["Directors"].dropna().apply(lambda x: x.split(",")))))
director = dict(sorted(Counter(directors).items(), key=lambda x: x[1], reverse=True))
dir_fig = go.Figure([go.Bar(x=list(director.keys())[:15:], y=list(director.values())[:15:])])
# dir_fig.write_html("directors_plot.html")
dir_fig.show()

In [155]:
dict(sorted(Counter(directors).items(), key=lambda x: x[1], reverse=True))

{'Raúl Campos': 23,
 'Jan Suter': 23,
 'Jay Karas': 15,
 'Marcus Raboy': 15,
 'Cathy Garcia-Molina': 11,
 'Jay Chapman': 11,
 'Justin G. Dyck': 9,
 'Rajiv Chilaka': 9,
 'Lance Bangs': 8,
 'Youssef Chahine': 8,
 'Shannon Hartman': 8,
 'Don Michael Paul': 8,
 'Martin Scorsese': 7,
 'Ryan Polito': 7,
 'Yilmaz Erdogan': 7,
 'Troy Miller': 7,
 'Ashutosh Gowariker': 6,
 'Anurag Kashyap': 6,
 'David Fincher': 6,
 'Steven Brill': 6,
 'Oliver Stone': 6,
 'Leslie Small': 6,
 'Manny Rodriguez': 6,
 'Hanung Bramantyo': 6,
 'Wael Ihsan': 6,
 'Umesh Mehra': 6,
 'Karan Johar': 5,
 'Ron Howard': 5,
 'Bo Burnham': 5,
 'Wilson Yip': 5,
 'McG': 5,
 'Zoya Akhtar': 5,
 'Imtiaz Ali': 5,
 'Johnnie To': 5,
 'Mae Czarina Cruz': 5,
 'Mike Clattenburg': 5,
 'Michael Simon': 5,
 'Antoinette Jadaone': 5,
 'Rajiv Mehra': 5,
 'Kunle Afolayan': 5,
 'Shigeaki Kubo': 5,
 'Detlev Buck': 5,
 'Fernando Ayllón': 5,
 'Omoni Oboli': 5,
 'Ethan Coen': 4,
 'Joel Coen': 4,
 'Mike Flanagan': 4,
 'James Wan': 4,
 'Farhan Akhtar':

In [156]:
Netflix_release[Netflix_release["Directors"].str.contains("Raúl Campos") == True].sort_values(by="IMDb")

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
3604,3604,3605,Arango y Sanint: Ríase el show,2018,,4.0/10,32/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Colombia,Spanish,
3290,3290,3291,Jani Dueñas: Grandes fracasos de ayer y hoy,2018,,4.8/10,39/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Chile,Spanish,
3562,3562,3563,Alexis de Anda: Mea Culpa,2017,,4.9/10,34/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Mexico,Spanish,60.0
3415,3415,3416,Alejandro Riaño: Especial de stand up,2018,,4.9/10,37/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Colombia,Spanish,
3295,3295,3296,Natalia Valdebenito: El especial,2018,,4.9/10,39/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Chile,Spanish,
3495,3495,3496,Daniel Sosa: Sosafado,2017,18+,5.3/10,36/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Mexico,Spanish,77.0
3579,3579,3580,Simplemente Manu NNa,2017,,5.3/10,34/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Mexico,Spanish,
2908,2908,2909,Malena Pichot: Estupidez compleja,2018,,5.4/10,43/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Argentina,Spanish,50.0
3550,3550,3551,Ricardo O'Farrill: Abrazo Navideño,2016,18+,5.5/10,35/100,1,0,0,0,0,"Raúl Campos,Jan Suter",Comedy,Mexico,Spanish,
3223,3223,3224,Ricardo O'Farrill: Abrazo Genial,2016,18+,5.5/10,40/100,1,0,0,0,0,"Jan Suter,Raúl Campos",Comedy,Mexico,Spanish,


In [172]:
data[data["Directors"].str.contains("Jay Chapman") == True].sort_values(by="IMDb")

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
7900,7900,7901,Janeane Garofalo: If I May,2016,,4.4/10,40/100,0,0,1,0,0,Jay Chapman,Comedy,United States,English,64.0
2616,2616,2617,Lucas Brothers: On Drugs,2017,,5.6/10,46/100,1,0,0,0,0,Jay Chapman,Comedy,United States,English,50.0
2956,2956,2957,Brad Paisley's Comedy Rodeo,2017,,5.7/10,43/100,1,0,0,0,0,Jay Chapman,Comedy,United States,English,63.0
8397,8397,8398,Brent Weinbach: Appealing to the Mainstream,2017,18+,5.8/10,36/100,0,0,1,0,0,Jay Chapman,Comedy,United States,English,69.0
8338,8338,8339,Jasper Redd: Jazz Talk,2014,,6.0/10,37/100,0,0,1,0,0,Jay Chapman,Comedy,United States,English,62.0
7353,7353,7354,Lisa Lampanelli: Back to the Drawing Board,2015,18+,6.1/10,43/100,0,0,1,0,0,Jay Chapman,"Documentary,Comedy",United States,English,59.0
7378,7378,7379,Bob Saget: Zero to Sixty,2017,18+,6.2/10,43/100,0,0,1,0,0,Jay Chapman,Comedy,United States,English,64.0
7690,7690,7691,"Kevin Nealon: Whelmed, But Not Overly",2012,16+,6.3/10,41/100,0,0,1,0,0,Jay Chapman,Comedy,United States,English,57.0
3136,3136,3137,Todd Glass Stand-Up Special,2012,,6.3/10,41/100,1,0,0,0,0,Jay Chapman,Comedy,United States,English,46.0
7879,7879,7880,Tammy Pescatelli: Finding the Funny,2013,,6.4/10,40/100,0,0,1,0,0,Jay Chapman,Comedy,United States,English,58.0


In [173]:
data[data["Directors"].str.contains("Ron Howard") == True].sort_values(by="IMDb")

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
2269,2269,2270,Solo,2018,,5.0/10,49/100,1,0,0,0,0,Ron Howard,"Action,Adventure,Sci-Fi",United States,English,135.0
8822,8822,8823,Splash,1984,7+,6.3/10,69/100,0,0,0,1,0,Ron Howard,"Comedy,Fantasy,Romance",United States,"English,Swedish",111.0
199,199,200,The Da Vinci Code,2006,13+,6.6/10,78/100,1,1,0,0,0,Ron Howard,"Mystery,Thriller","United States,Malta,France,United Kingdom","English,French,Latin,Spanish",149.0
248,248,249,Angels & Demons,2009,13+,6.7/10,76/100,1,1,0,0,0,Ron Howard,"Action,Mystery,Thriller","United States,Italy","English,Italian,Latin,French,Swiss German,Germ...",138.0
448,448,449,Hillbilly Elegy,2020,18+,6.7/10,72/100,1,0,0,0,0,Ron Howard,Drama,United States,English,116.0
8691,8691,8692,Solo: A Star Wars Story,2018,13+,6.9/10,79/100,0,0,0,1,0,Ron Howard,"Action,Adventure,Sci-Fi",United States,English,135.0
4302,4302,4303,Rebuilding Paradise,2020,13+,7.0/10,57/100,0,1,0,0,0,Ron Howard,Documentary,United States,English,90.0
8735,8735,8736,Willow,1988,7+,7.3/10,75/100,0,0,0,1,0,Ron Howard,"Action,Adventure,Drama,Fantasy,Romance","United Kingdom,United States",English,126.0
3926,3926,3927,The Beatles: Eight Days a Week - The Touring Y...,2016,,7.8/10,69/100,0,1,0,0,0,Ron Howard,"Documentary,History,Music","United States,United Kingdom",English,106.0
72,72,73,Rush,2013,18+,8.1/10,84/100,1,0,0,0,0,Ron Howard,"Action,Biography,Drama,Sport","United Kingdom,Germany,United States","English,German,Italian,French,Spanish",123.0


In [174]:
data[data["Directors"].str.contains("Wilfred Jackson") == True].sort_values(by="IMDb")

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
9035,9035,9036,Saludos Amigos,1942,,6.1/10,59/100,0,0,0,1,0,"Wilfred Jackson,Jack Kinney,Hamilton Luske,Bil...","Animation,Short,Adventure,Comedy,Family,Fantas...","United States,Brazil,Bolivia","English,Portuguese",42.0
8819,8819,8820,Melody Time,1948,,6.3/10,69/100,0,0,0,1,0,"Clyde Geronimi,Wilfred Jackson,Jack Kinney,Ham...","Animation,Comedy,Family,Musical",United States,English,75.0
9304,9304,9305,Mickey's Rival,1936,,6.7/10,48/100,0,0,0,1,0,Wilfred Jackson,"Animation,Short,Comedy,Family,Romance",United States,English,8.0
9250,9250,9251,The Goddess of Spring,1934,,6.7/10,50/100,0,0,0,1,0,Wilfred Jackson,"Animation,Short,Family,Fantasy,Musical",United States,English,10.0
9254,9254,9255,Toby Tortoise Returns,1936,,6.8/10,50/100,0,0,0,1,0,Wilfred Jackson,"Animation,Short,Comedy,Family",United States,English,7.0
9286,9286,9287,The Little Whirlwind,1941,,6.8/10,48/100,0,0,0,1,0,"James Algar,Wilfred Jackson,Riley Thomson","Animation,Short,Comedy,Family,Romance",United States,English,8.0
9225,9225,9226,Elmer Elephant,1936,all,6.9/10,51/100,0,0,0,1,0,Wilfred Jackson,"Animation,Short,Comedy,Family,Romance",United States,English,9.0
9311,9311,9312,Don's Fountain of Youth,1953,,7.1/10,48/100,0,0,0,1,0,"Jack Hannah,Norman Ferguson,Clyde Geronimi,Wil...","Animation,Short,Comedy,Family",United States,English,6.0
9130,9130,9131,The Tortoise and the Hare,1935,,7.2/10,55/100,0,0,0,1,0,Wilfred Jackson,"Animation,Short,Comedy,Family",United States,English,9.0
9134,9134,9135,Santa's Workshop,1932,,7.3/10,54/100,0,0,0,1,0,Wilfred Jackson,"Animation,Short,Family",United States,English,7.0


# Hulu
## Movie release
Hulu was created in 2007 and nowadays belongs to Disney. Amount of available movies is less comparing them to the other streaming platforms. As for the content, Hulu has smallest library of projects. However, in the recent years it started to fix this problem.
## Hulu genres
Genres are pretty similar to other streaming platforms. The biggest categories are Drama and Comedy. There is nothing special about that data.
## Age and language
Data is very similar to Netflix. The main age category is 18+ but categories for children or young adults are significantly lower in amount comparing them to Netflix categories.
Small amount of project makes them more local to the countries where Hulu is available. Therefore, main language is English.

# Prime videos
## Movie release
Amazon has a significant amount of old movies in library. In fact it is the only streaming service which has a reach library of old school movies and also provides a lot of modern products.
## Prime videos genres
As well as everywhere, Drama and Comedy are top genres. However, Prime Videos has very balanced categories and provides something for everyone.
## Age and language
Prime Videos has the biggest 18+ category comparing to other streaming platforms. The content for children is available and balanced. However, it is still small enough.
There are not many languages available because Prime videos in unavailable in many countries around the world.

# Disney+ videos
## Movie release
Disney has the richest library of cartoons produced by themselves. There are many projects that were released 90+ years ago making Disney+ is one of the biggest family-friendly streaming platform of all time.
## Prime videos genres
Due to the company's specialization the main genre is family. Interesting thing is that Drama genre which is the most popular category on other platform takes 6 place in Disney+. Genres like Adventure and fantasy also more popular comparing them to other streaming platforms.
## Age and language
All category takes leading place among other ages. From graph it is obvious that Disney+ provides content for the whole family and avoids adult categories.
Disney+ is available only in a small amount of countries. Therefore, it is not a surprise that primary language is English.

# Overall
## Movie release
Overall, there are not many old movies available in such platforms like Netflix, Hulu, etc. There are many thing for that. Movie industry development, copyrights and other reasons make old movies outsider from the modern streaming platform.
## Genres
The most popular genre available on streaming platforms - Drama. Second place takes comedy. It is hard to say why only these two categories take leading places but we can assume that genres that evoke emotions in people make more money and make people more involved in movie.
## Age and Language
As well as Netflix, overall statistics shows that significant part of movies were created for adult auditory. Other categories have about 40% of the marking.
English remains the main language for all streaming platforms.

In [166]:
dur = data.sort_values(by="IMDb")
fig = px.scatter(dur, x="Runtime", y="IMDb")
# fig.write_html("runtime_rating.html")
fig.show()

There is no clear relation between movie runtime and its rating. As it can be seen from the graph all categories of duration have either bad or good marks. Therefore there is correlation between these parameters.

In [158]:
age = data.sort_values(by="IMDb")
fig = px.scatter(age, y="Age", x="IMDb")
fig.show()
fig.write_html("fig.html")

Nothing depends on age category. :(

In [168]:
countries = list(itertools.chain(*list(data["Country"].dropna().apply(lambda x: x.split(",")))))
country = dict(sorted(Counter(countries).items(), key=lambda x: x[1], reverse=True))
country_data = pd.DataFrame(list(country.items()), index=list(range(0, len(country.keys()))),
                            columns=["Country", "Movies"])

In [167]:
fig = px.scatter_geo(country_data, locations="Country", color="Country", hover_name="Country", size="Movies",
                     projection="natural earth", locationmode="country names")
# fig.write_html("map.html")
fig.show()

The main place filming is USA. Majority of movies and TV-series were filmed there. Being an American company netflix prefer to hire directors in English speaking countries like USA and Britain. However, almost 900 projects were created in India. For some reasons directors from this country create twice more movies than USA or British directors.