# ***¿Qué es el EDA?***

Es cuando revisamos por primera vez los datos que nos llegan, por ejemplo un archivo CSV que nos entregan y deberemos intentar comprender “¿de qué se trata?”, vislumbrar posibles patrones y reconociendo distribuciones estadísticas que puedan ser útiles en el futuro.
 Lo ideal es que tengamos un objetivo. En este caso sería resolver estas consultas:
 - #1. Máxima duración según tipo de film (película/serie), por plataforma y por año.
 - #2. Cantidad de películas y series (separado) por plataforma.
 - #3. Cantidad de veces que se repite un género y plataforma con mayor frecuencia del mismo.
 - #4. Actor que más se repite según plataforma y año.


## *EDA deconstruido* 
 Lo primero que deberíamos hacer es intentar responder:
 - ¿Cuántos registros hay?
 - ¿Son demasiado pocos?

In [1]:
#Importamos las librerías a usar
import pandas as pd
import numpy as np

Observamos que tipo de datos tenemos.

In [2]:
# Analizamos el CSV de Amazon

Amazon = pd.read_csv ("https://raw.githubusercontent.com/MelissaContreras/PI01_DATA05/main/Datasets/amazon_prime_titles.csv")

Amazon.iloc[698:705:]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
698,s699,TV Show,The Dangerous Book for Boys,,"Kyan Zielinski, Drew Logan Powell, Erinn Hayes...",United States,,2018,7+,1 Season,Young Adult Audience,The McKenna family must cope with the passing ...
699,s700,TV Show,The Dancing Detective Dekadance,,,,,2016,13+,1 Season,TV Shows,Dekadance is an undercover idol group. Their a...
700,s701,Movie,The Curse of Robert,Andrew Jones,"Nigel Barber, Suzie Garton, Lee Bane, Tiffany ...",,,2016,16+,80 min,"Horror, Young Adult Audience",Emily Barker is a cash strapped student trying...
701,s702,Movie,The Curse Of Hobbes House,Juliane Block,"Mhairi Calvey, Waleed Elgadi, Makenna Guyler, ...",,,2020,,83 min,Horror,When down on her luck Jane Dormant learns abou...
702,s703,Movie,The Curse of Buckout Road,Matthew Currie Holmes,"Evan Ross, Henry Czerny, Dominique Provost-Cha...",,,2019,18+,96 min,Horror,A college class project on creation and destru...
703,s704,Movie,The Creepy Line,M.A. Taylor,"Jordan B. Peterson, Robert Epstein, Peter Schw...",,,2018,7+,81 min,"Documentary, Special Interest",The Creepy Line reveals the stunning degree to...
704,s705,Movie,The Creeping Flesh,Freddie Francis,"Christopher Lee, Peter Cushing, David Bailie, ...",,,1973,PG,92 min,"Horror, Science Fiction",A Victorian-age scientist returns to London wi...


Observamos la cantidad de filas y columnas.

In [3]:
Amazon.shape

(9668, 12)

- ¿Están todas las filas completas ó tenemos campos con valores nulos?
- En caso que haya demasiados nulos: ¿Queda el resto de información inútil?

In [4]:
Amazon.isnull().sum()

show_id            0
type               0
title              0
director        2082
cast            1233
country         8996
date_added      9513
release_year       0
rating           337
duration           0
listed_in          0
description        0
dtype: int64

In [5]:
Amazon.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9668 entries, 0 to 9667
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       9668 non-null   object
 1   type          9668 non-null   object
 2   title         9668 non-null   object
 3   director      7586 non-null   object
 4   cast          8435 non-null   object
 5   country       672 non-null    object
 6   date_added    155 non-null    object
 7   release_year  9668 non-null   int64 
 8   rating        9331 non-null   object
 9   duration      9668 non-null   object
 10  listed_in     9668 non-null   object
 11  description   9668 non-null   object
dtypes: int64(1), object(11)
memory usage: 906.5+ KB


In [6]:
# Analizamos el CSV de Disney

Disney = pd.read_csv ("https://raw.githubusercontent.com/MelissaContreras/PI01_DATA05/main/Datasets/disney_plus_titles.csv")

Disney.head(15)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!"
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,"November 25, 2021",2021,,1 Season,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...
5,s6,Movie,Becoming Cousteau,Liz Garbus,"Jacques Yves Cousteau, Vincent Cassel",United States,"November 24, 2021",2021,PG-13,94 min,"Biographical, Documentary",An inside look at the legendary life of advent...
6,s7,TV Show,Hawkeye,,"Jeremy Renner, Hailee Steinfeld, Vera Farmiga,...",,"November 24, 2021",2021,TV-14,1 Season,"Action-Adventure, Superhero",Clint Barton/Hawkeye must team up with skilled...
7,s8,TV Show,Port Protection Alaska,,"Gary Muehlberger, Mary Miller, Curly Leach, Sa...",United States,"November 24, 2021",2015,TV-14,2 Seasons,"Docuseries, Reality, Survival",Residents of Port Protection must combat volat...
8,s9,TV Show,Secrets of the Zoo: Tampa,,"Dr. Ray Ball, Dr. Lauren Smith, Chris Massaro,...",United States,"November 24, 2021",2019,TV-PG,2 Seasons,"Animals & Nature, Docuseries, Family",A day in the life at ZooTampa is anything but ...
9,s10,Movie,A Muppets Christmas: Letters To Santa,Kirk R. Thatcher,"Steve Whitmire, Dave Goelz, Bill Barretta, Eri...",United States,"November 19, 2021",2008,G,45 min,"Comedy, Family, Musical",Celebrate the holiday season with all your fav...


In [7]:
Disney.shape

(1450, 12)

In [8]:
#El porcentaje de nulos no afecta para la información que deseamos obtener.
Disney.isnull().sum()

show_id           0
type              0
title             0
director        473
cast            190
country         219
date_added        3
release_year      0
rating            3
duration          0
listed_in         0
description       0
dtype: int64

In [9]:
Disney.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1450 entries, 0 to 1449
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       1450 non-null   object
 1   type          1450 non-null   object
 2   title         1450 non-null   object
 3   director      977 non-null    object
 4   cast          1260 non-null   object
 5   country       1231 non-null   object
 6   date_added    1447 non-null   object
 7   release_year  1450 non-null   int64 
 8   rating        1447 non-null   object
 9   duration      1450 non-null   object
 10  listed_in     1450 non-null   object
 11  description   1450 non-null   object
dtypes: int64(1), object(11)
memory usage: 136.1+ KB


In [10]:
# Analizamos el CSV de Hulu

Hulu = pd.read_csv("https://raw.githubusercontent.com/MelissaContreras/PI01_DATA05/main/Datasets/hulu_titles.csv")
Hulu.tail(20)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
3053,s3054,TV Show,Frasier,,,United States,,1993,TV-PG,11 Seasons,"Comedy, Sitcom","Comedy series set in Seattle, WA, which chroni..."
3054,s3055,TV Show,Hey Arnold!,,,United States,,1996,TV-Y,5 Seasons,"Cartoons, Comedy, Family",Football-headed Arnold lives with his offbeat ...
3055,s3056,TV Show,Horrible Histories (UK),,,United Kingdom,,2009,TV-G,6 Seasons,"Comedy, Family, History",Based on the best-selling books for kids (and ...
3056,s3057,TV Show,Kimi Ni Todoke: From Me to You,,,Japan,,2009,TV-PG,2 Seasons,"Anime, International",Sawako Kuronuma has had a difficult time fitti...
3057,s3058,TV Show,Medium,,,United States,,2005,TV-14,7 Seasons,"Crime, Drama, Mystery","MEDIUM is a captivating one-hour drama series,..."
3058,s3059,TV Show,Numb3rs,,,United States,,2005,TV-14,6 Seasons,"Crime, Drama, Mystery","Rob Morrow (""Northern Exposure""), David Krumho..."
3059,s3060,TV Show,Primeval,,,"United Kingdom, France, Germany, Ireland",,2007,TV-PG,5 Seasons,"Adventure, Drama, International",Evolutionary biologist professor Nick Cutter a...
3060,s3061,TV Show,Reba,,,United States,,2001,TV-PG,6 Seasons,"Comedy, Family, Sitcom",Country superstar Reba McEntire made her first...
3061,s3062,TV Show,Roswell,,,United States,,1999,TV-PG,3 Seasons,"Action, Adventure, Drama","Human/Alien hybrids, must hide their alien sid..."
3062,s3063,TV Show,Sabrina: The Teenage Witch,,,United States,,1996,TV-G,7 Seasons,"Comedy, Family, Sitcom","Sabrina is a normal teenager, except for one t..."


In [11]:
Hulu.shape

(3073, 12)

In [12]:
Hulu.isnull().sum()

show_id            0
type               0
title              0
director        3070
cast            3073
country         1453
date_added        28
release_year       0
rating           520
duration         479
listed_in          0
description        4
dtype: int64

In [13]:
Hulu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       3073 non-null   object 
 1   type          3073 non-null   object 
 2   title         3073 non-null   object 
 3   director      3 non-null      object 
 4   cast          0 non-null      float64
 5   country       1620 non-null   object 
 6   date_added    3045 non-null   object 
 7   release_year  3073 non-null   int64  
 8   rating        2553 non-null   object 
 9   duration      2594 non-null   object 
 10  listed_in     3073 non-null   object 
 11  description   3069 non-null   object 
dtypes: float64(1), int64(1), object(10)
memory usage: 288.2+ KB


In [14]:
## Analizamos el CSV de Netflix

Netflix = pd.read_json ("https://raw.githubusercontent.com/MelissaContreras/PI01_DATA05/main/Datasets/netflix_titles.json")
Netflix.head(15)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...


In [15]:
Netflix.shape

(8807, 12)

In [16]:
Netflix.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [17]:
Netflix.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 894.5+ KB


Como todas los dataframes tienen la misma cantidad de columnas con los mismos nombres, me parece una opción concatenarlos. Pero en todos los dataframes tiende a repetirse la misma clave de
numeración de id en la columna "show_id", por lo cual primero procederé a agregar una columna con el nombre "platform" donde igualaré esta con el nombre de la plataforma a la que pertenecen los datos antes de concatenar los 4 dataframes, para así poder diferenciar a donde pertenece la data una vez unida. 

In [18]:
Amazon["platform"]= "amazon"
Amazon

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,amazon
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,amazon
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,amazon
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ...",amazon
4,s5,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,"March 30, 2021",1989,,45 min,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...,amazon
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9663,s9664,Movie,Pride Of The Bowery,Joseph H. Lewis,"Leo Gorcey, Bobby Jordan",,,1940,7+,60 min,Comedy,New York City street principles get an East Si...,amazon
9664,s9665,TV Show,Planet Patrol,,"DICK VOSBURGH, RONNIE STEVENS, LIBBY MORRIS, M...",,,2018,13+,4 Seasons,TV Shows,"This is Earth, 2100AD - and these are the adve...",amazon
9665,s9666,Movie,Outpost,Steve Barker,"Ray Stevenson, Julian Wadham, Richard Brake, M...",,,2008,R,90 min,Action,"In war-torn Eastern Europe, a world-weary grou...",amazon
9666,s9667,TV Show,Maradona: Blessed Dream,,"Esteban Recagno, Ezequiel Stremiz, Luciano Vit...",,,2021,TV-MA,1 Season,"Drama, Sports","The series tells the story of Diego Maradona, ...",amazon


In [19]:
Disney["platform"]= "disney"
Disney

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!,disney
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...,disney
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.,disney
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!",disney
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,"November 25, 2021",2021,,1 Season,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...,disney
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1445,s1446,Movie,X-Men Origins: Wolverine,Gavin Hood,"Hugh Jackman, Liev Schreiber, Danny Huston, wi...","United States, United Kingdom","June 4, 2021",2009,PG-13,108 min,"Action-Adventure, Family, Science Fiction",Wolverine unites with legendary X-Men to fight...,disney
1446,s1447,Movie,Night at the Museum: Battle of the Smithsonian,Shawn Levy,"Ben Stiller, Amy Adams, Owen Wilson, Hank Azar...","United States, Canada","April 2, 2021",2009,PG,106 min,"Action-Adventure, Comedy, Family",Larry Daley returns to rescue some old friends...,disney
1447,s1448,Movie,Eddie the Eagle,Dexter Fletcher,"Tom Costello, Jo Hartley, Keith Allen, Dickon ...","United Kingdom, Germany, United States","December 18, 2020",2016,PG-13,107 min,"Biographical, Comedy, Drama","True story of Eddie Edwards, a British ski-jum...",disney
1448,s1449,Movie,Bend It Like Beckham,Gurinder Chadha,"Parminder Nagra, Keira Knightley, Jonathan Rhy...","United Kingdom, Germany, United States","September 18, 2020",2003,PG-13,112 min,"Buddy, Comedy, Coming of Age",Despite the wishes of their traditional famili...,disney


In [20]:
Hulu["platform"]= "hulu"
Hulu

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,s1,Movie,Ricky Velez: Here's Everything,,,,"October 24, 2021",2021,TV-MA,,"Comedy, Stand Up",​Comedian Ricky Velez bares it all with his ho...,hulu
1,s2,Movie,Silent Night,,,,"October 23, 2021",2020,,94 min,"Crime, Drama, Thriller","Mark, a low end South London hitman recently r...",hulu
2,s3,Movie,The Marksman,,,,"October 23, 2021",2021,PG-13,108 min,"Action, Thriller",A hardened Arizona rancher tries to protect an...,hulu
3,s4,Movie,Gaia,,,,"October 22, 2021",2021,R,97 min,Horror,A forest ranger and two survivalists with a cu...,hulu
4,s5,Movie,Settlers,,,,"October 22, 2021",2021,,104 min,"Science Fiction, Thriller",Mankind's earliest settlers on the Martian fro...,hulu
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3068,s3069,TV Show,Star Trek: The Original Series,,,United States,,1966,TV-PG,3 Seasons,"Action, Adventure, Classics",The 23rd century adventures of Captain James T...,hulu
3069,s3070,TV Show,Star Trek: Voyager,,,United States,,1997,TV-PG,7 Seasons,"Action, Adventure, Science Fiction",Catapulted into the distant sector of the gala...,hulu
3070,s3071,TV Show,The Fades,,,United Kingdom,,2011,TV-14,1 Season,"Horror, International, Science Fiction",Seventeen-year-old Paul is haunted by apocalyp...,hulu
3071,s3072,TV Show,The Twilight Zone,,,United States,,1959,TV-PG,5 Seasons,"Classics, Science Fiction, Thriller",Rod Serling's seminal anthology series focused...,hulu


In [21]:
Netflix["platform"]= "netflix"
Netflix

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",netflix
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",netflix
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,netflix
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",netflix
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,netflix
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a...",netflix
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g...",netflix
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,netflix
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",netflix


Se compara la columna Hulu["title"] con la columna Amazon["title"],si los títulos de las películas coinciden se rellenará la columna Hulu["cast"] con los datos de Amazon["cast"]. Para poder obtener datos en Hulu["cast"] que no posee ningún dato en esa fila.

In [22]:
for i in Hulu.index:
  for j in Amazon.index:
    if Hulu['title'][i] == Amazon['title'][j]:
      Hulu['cast'][i] = Amazon['cast'][j]
      break

Hulu.tail()


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Hulu['cast'][i] = Amazon['cast'][j]


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
3068,s3069,TV Show,Star Trek: The Original Series,,,United States,,1966,TV-PG,3 Seasons,"Action, Adventure, Classics",The 23rd century adventures of Captain James T...,hulu
3069,s3070,TV Show,Star Trek: Voyager,,"Kate Mulgrew, Robert Beltran, Roxann Biggs-Daw...",United States,,1997,TV-PG,7 Seasons,"Action, Adventure, Science Fiction",Catapulted into the distant sector of the gala...,hulu
3070,s3071,TV Show,The Fades,,"Iain De Caestecker, Daniel Kaluuya, Tom Ellis,...",United Kingdom,,2011,TV-14,1 Season,"Horror, International, Science Fiction",Seventeen-year-old Paul is haunted by apocalyp...,hulu
3071,s3072,TV Show,The Twilight Zone,,,United States,,1959,TV-PG,5 Seasons,"Classics, Science Fiction, Thriller",Rod Serling's seminal anthology series focused...,hulu
3072,s3073,TV Show,Tokyo Magnitude 8.0,,,Japan,,2009,TV-14,1 Season,"Anime, Drama, International",The devastation is unleashed in the span of se...,hulu


In [23]:
Hulu.isnull().sum()

show_id            0
type               0
title              0
director        3070
cast            2870
country         1453
date_added        28
release_year       0
rating           520
duration         479
listed_in          0
description        4
platform           0
dtype: int64

In [24]:
#Logramos obtener por comparación de la columna Hulu["title"] con la columna Amazon["title"] 203 datos.
Hulu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       3073 non-null   object
 1   type          3073 non-null   object
 2   title         3073 non-null   object
 3   director      3 non-null      object
 4   cast          203 non-null    object
 5   country       1620 non-null   object
 6   date_added    3045 non-null   object
 7   release_year  3073 non-null   int64 
 8   rating        2553 non-null   object
 9   duration      2594 non-null   object
 10  listed_in     3073 non-null   object
 11  description   3069 non-null   object
 12  platform      3073 non-null   object
dtypes: int64(1), object(12)
memory usage: 312.2+ KB


In [25]:
#Se compara la columna Hulu["title"] con la columna Disney["title"],si los títulos de las películas coinciden se rellenará la columna Hulu["cast"] con los datos de Disney["cast"]. 

for i in Hulu.index:
  for k in Disney.index:
    if Hulu['title'][i] == Disney['title'][k]:
      Hulu['cast'][i] = Disney['cast'][k]
      break

Hulu

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,s1,Movie,Ricky Velez: Here's Everything,,,,"October 24, 2021",2021,TV-MA,,"Comedy, Stand Up",​Comedian Ricky Velez bares it all with his ho...,hulu
1,s2,Movie,Silent Night,,,,"October 23, 2021",2020,,94 min,"Crime, Drama, Thriller","Mark, a low end South London hitman recently r...",hulu
2,s3,Movie,The Marksman,,"Wesley Snipes, William Hope, Emma Samms, Antho...",,"October 23, 2021",2021,PG-13,108 min,"Action, Thriller",A hardened Arizona rancher tries to protect an...,hulu
3,s4,Movie,Gaia,,,,"October 22, 2021",2021,R,97 min,Horror,A forest ranger and two survivalists with a cu...,hulu
4,s5,Movie,Settlers,,,,"October 22, 2021",2021,,104 min,"Science Fiction, Thriller",Mankind's earliest settlers on the Martian fro...,hulu
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3068,s3069,TV Show,Star Trek: The Original Series,,,United States,,1966,TV-PG,3 Seasons,"Action, Adventure, Classics",The 23rd century adventures of Captain James T...,hulu
3069,s3070,TV Show,Star Trek: Voyager,,"Kate Mulgrew, Robert Beltran, Roxann Biggs-Daw...",United States,,1997,TV-PG,7 Seasons,"Action, Adventure, Science Fiction",Catapulted into the distant sector of the gala...,hulu
3070,s3071,TV Show,The Fades,,"Iain De Caestecker, Daniel Kaluuya, Tom Ellis,...",United Kingdom,,2011,TV-14,1 Season,"Horror, International, Science Fiction",Seventeen-year-old Paul is haunted by apocalyp...,hulu
3071,s3072,TV Show,The Twilight Zone,,,United States,,1959,TV-PG,5 Seasons,"Classics, Science Fiction, Thriller",Rod Serling's seminal anthology series focused...,hulu


In [26]:
#Se encontró 28 coincidencias
Hulu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       3073 non-null   object
 1   type          3073 non-null   object
 2   title         3073 non-null   object
 3   director      3 non-null      object
 4   cast          231 non-null    object
 5   country       1620 non-null   object
 6   date_added    3045 non-null   object
 7   release_year  3073 non-null   int64 
 8   rating        2553 non-null   object
 9   duration      2594 non-null   object
 10  listed_in     3073 non-null   object
 11  description   3069 non-null   object
 12  platform      3073 non-null   object
dtypes: int64(1), object(12)
memory usage: 312.2+ KB


In [27]:
#Se compara la columna Hulu["title"] con la columna Netflix["title"],si los títulos de las películas coinciden se rellenará la columna Hulu["cast"] con los datos de Netflix["cast"]. 

for i in Hulu.index:
  for r in Netflix.index:
    if Hulu['title'][i] == Netflix['title'][r]:
      Hulu['cast'][i] = Netflix['cast'][r]
      break

Hulu

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,s1,Movie,Ricky Velez: Here's Everything,,,,"October 24, 2021",2021,TV-MA,,"Comedy, Stand Up",​Comedian Ricky Velez bares it all with his ho...,hulu
1,s2,Movie,Silent Night,,,,"October 23, 2021",2020,,94 min,"Crime, Drama, Thriller","Mark, a low end South London hitman recently r...",hulu
2,s3,Movie,The Marksman,,"Wesley Snipes, William Hope, Emma Samms, Antho...",,"October 23, 2021",2021,PG-13,108 min,"Action, Thriller",A hardened Arizona rancher tries to protect an...,hulu
3,s4,Movie,Gaia,,,,"October 22, 2021",2021,R,97 min,Horror,A forest ranger and two survivalists with a cu...,hulu
4,s5,Movie,Settlers,,,,"October 22, 2021",2021,,104 min,"Science Fiction, Thriller",Mankind's earliest settlers on the Martian fro...,hulu
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3068,s3069,TV Show,Star Trek: The Original Series,,,United States,,1966,TV-PG,3 Seasons,"Action, Adventure, Classics",The 23rd century adventures of Captain James T...,hulu
3069,s3070,TV Show,Star Trek: Voyager,,"Kate Mulgrew, Robert Beltran, Roxann Dawson, J...",United States,,1997,TV-PG,7 Seasons,"Action, Adventure, Science Fiction",Catapulted into the distant sector of the gala...,hulu
3070,s3071,TV Show,The Fades,,"Iain De Caestecker, Daniel Kaluuya, Tom Ellis,...",United Kingdom,,2011,TV-14,1 Season,"Horror, International, Science Fiction",Seventeen-year-old Paul is haunted by apocalyp...,hulu
3071,s3072,TV Show,The Twilight Zone,,,United States,,1959,TV-PG,5 Seasons,"Classics, Science Fiction, Thriller",Rod Serling's seminal anthology series focused...,hulu


In [28]:
#Se encontró 173 coincidencias más

Hulu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       3073 non-null   object
 1   type          3073 non-null   object
 2   title         3073 non-null   object
 3   director      3 non-null      object
 4   cast          404 non-null    object
 5   country       1620 non-null   object
 6   date_added    3045 non-null   object
 7   release_year  3073 non-null   int64 
 8   rating        2553 non-null   object
 9   duration      2594 non-null   object
 10  listed_in     3073 non-null   object
 11  description   3069 non-null   object
 12  platform      3073 non-null   object
dtypes: int64(1), object(12)
memory usage: 312.2+ KB


Procedemos a concatenar los dataframes

In [30]:
w=pd.concat([Amazon,Disney,Hulu, Netflix], join ="outer" )

w.iloc[9665:9675]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
9665,s9666,Movie,Outpost,Steve Barker,"Ray Stevenson, Julian Wadham, Richard Brake, M...",,,2008,R,90 min,Action,"In war-torn Eastern Europe, a world-weary grou...",amazon
9666,s9667,TV Show,Maradona: Blessed Dream,,"Esteban Recagno, Ezequiel Stremiz, Luciano Vit...",,,2021,TV-MA,1 Season,"Drama, Sports","The series tells the story of Diego Maradona, ...",amazon
9667,s9668,Movie,Harry Brown,Daniel Barber,"Michael Caine, Emily Mortimer, Joseph Gilgun, ...",,,2010,R,103 min,"Action, Drama, Suspense","Harry Brown, starring two-time Academy Award w...",amazon
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!,disney
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...,disney
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.,disney
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!",disney
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,"November 25, 2021",2021,,1 Season,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...,disney
5,s6,Movie,Becoming Cousteau,Liz Garbus,"Jacques Yves Cousteau, Vincent Cassel",United States,"November 24, 2021",2021,PG-13,94 min,"Biographical, Documentary",An inside look at the legendary life of advent...,disney
6,s7,TV Show,Hawkeye,,"Jeremy Renner, Hailee Steinfeld, Vera Farmiga,...",,"November 24, 2021",2021,TV-14,1 Season,"Action-Adventure, Superhero",Clint Barton/Hawkeye must team up with skilled...,disney


Dropeamos la columna "show_id". Luego agregamos una columna incremental llamada "index" para que tenga un identificador único.

In [31]:
e= w.drop("show_id", axis=1)

new=np.arange(len(e))
e.insert(loc=0, column="index", value=new)

e

Unnamed: 0,index,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,0,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,amazon
1,1,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,amazon
2,2,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,amazon
3,3,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ...",amazon
4,4,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,"March 30, 2021",1989,,45 min,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...,amazon
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8802,22993,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a...",netflix
8803,22994,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g...",netflix
8804,22995,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,netflix
8805,22996,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",netflix


In [32]:
z=e["cast"]
z

0          Brendan Gleeson, Taylor Kitsch, Gordon Pinsent
1        Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar
2       Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...
3       Interviews with: Pink, Adele, Beyoncé, Britney...
4       Harry Dean Stanton, Kieran O'Brien, George Cos...
                              ...                        
8802    Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...
8803                                                 None
8804    Jesse Eisenberg, Woody Harrelson, Emma Stone, ...
8805    Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...
8806    Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...
Name: cast, Length: 22998, dtype: object

In [82]:
t = e[(e.release_year == 2018) & (e.platform == 'netflix')].cast

In [83]:
list=[]

for i in t:
    x = str(i).split(',')
    for j in x:
        list.append(j)

In [84]:
list

['Engin Altan Düzyatan',
 ' Serdar Gökhan',
 ' Hülya Darcan',
 ' Kaan Taşaner',
 ' Esra Bilgiç',
 ' Osman Soykut',
 ' Serdar Deniz',
 ' Cengiz Coşkun',
 ' Reshad Strik',
 ' Hande Subaşı',
 'Antti Pääkkönen',
 ' Heljä Heikkinen',
 ' Lynne Guaglione',
 ' Pasi Ruohonen',
 ' Rauno Ahonen',
 'Sola Sobowale',
 ' Adesua Etomi',
 ' Remilekun "Reminisce" Safaru',
 ' Tobechukwu "iLLbliss" Ejiofor',
 ' Toni Tones',
 ' Paul Sambo',
 ' Jide Kosoko',
 ' Sharon Ooja',
 'Will Arnett',
 ' Ludacris',
 ' Natasha Lyonne',
 ' Stanley Tucci',
 ' Jordin Sparks',
 ' Gabriel Iglesias',
 " Shaquille O'Neal",
 ' Omar Chaparro',
 ' Alan Cumming',
 ' Andy Beckwith',
 ' Delia Sheppard',
 ' Kerry Shale',
 'Ronnie Van Zandt',
 ' Gary Rossington',
 ' Allen Collins',
 ' Leon Wilkeson',
 ' Bob Burns',
 ' Billy Powell',
 ' Ed King',
 ' Artimus Pyle',
 ' Steve Gaines',
 ' Johnny Van Zant',
 'Mandy Grace',
 ' David de Vos',
 ' Donna Rusch',
 ' Devan Key',
 ' Isabella Mancuso',
 ' Ariana Guido',
 'Katie Douglas',
 ' Celina 

In [57]:
list=[]

for i in z:
    x = str(i).split(',')
    for j in x:
        list.append(j)

#print (list)

#[[Brendan Gleeson, Taylor Kitsch, Gordon Pinsent], [Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar]]
    

In [85]:
#data = ','.join(list.read().strip().split(','))
data = {i:list.count(i) for i in list}

data

{'Engin Altan Düzyatan': 1,
 ' Serdar Gökhan': 1,
 ' Hülya Darcan': 1,
 ' Kaan Taşaner': 1,
 ' Esra Bilgiç': 1,
 ' Osman Soykut': 1,
 ' Serdar Deniz': 1,
 ' Cengiz Coşkun': 1,
 ' Reshad Strik': 1,
 ' Hande Subaşı': 1,
 'Antti Pääkkönen': 1,
 ' Heljä Heikkinen': 1,
 ' Lynne Guaglione': 1,
 ' Pasi Ruohonen': 1,
 ' Rauno Ahonen': 1,
 'Sola Sobowale': 1,
 ' Adesua Etomi': 3,
 ' Remilekun "Reminisce" Safaru': 1,
 ' Tobechukwu "iLLbliss" Ejiofor': 1,
 ' Toni Tones': 1,
 ' Paul Sambo': 1,
 ' Jide Kosoko': 3,
 ' Sharon Ooja': 3,
 'Will Arnett': 1,
 ' Ludacris': 1,
 ' Natasha Lyonne': 2,
 ' Stanley Tucci': 2,
 ' Jordin Sparks': 2,
 ' Gabriel Iglesias': 1,
 " Shaquille O'Neal": 1,
 ' Omar Chaparro': 1,
 ' Alan Cumming': 2,
 ' Andy Beckwith': 1,
 ' Delia Sheppard': 1,
 ' Kerry Shale': 1,
 'Ronnie Van Zandt': 1,
 ' Gary Rossington': 1,
 ' Allen Collins': 1,
 ' Leon Wilkeson': 1,
 ' Bob Burns': 1,
 ' Billy Powell': 1,
 ' Ed King': 1,
 ' Artimus Pyle': 1,
 ' Steve Gaines': 1,
 ' Johnny Van Zant': 1,

In [89]:
max(data.values())

121

In [None]:
data = ';'.join(file.read().strip().split('\n'))

# separamos por ; para no confundir con las comas
x = data.split(";") #x es un list automaticamente

lista = [] #lista completa de todas las filas

#recorrer la lista
for i in range(len(x)):
       y = json.loads(x[i]) #x[i] es un string, lo conviertes a json
       lista.append(y) #añadir a la lista

In [136]:
ruta = "C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/plataformas.csv"
e.to_csv ( ruta , index=False, encoding="utf8")

In [137]:
a=pd.read_csv("C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/plataformas.csv")
a

Unnamed: 0,index,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,platform
0,0,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,amazon
1,1,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,amazon
2,2,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,amazon
3,3,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ...",amazon
4,4,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,"March 30, 2021",1989,,45 min,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...,amazon
...,...,...,...,...,...,...,...,...,...,...,...,...,...
22993,22993,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a...",netflix
22994,22994,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g...",netflix
22995,22995,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,netflix
22996,22996,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",netflix
