# Jupyter Notebook Table of Contents:

In this notebook you will see the pre-processing and analysis of data acquired from running SQL Queries on the IMDb database that was provided to us in class. Data was visualized and analyzed furthr using Python (Plotly and Pandas). Further visualization was done using Adobe Illustrator. 

# Analysis 1a:
 Visualizing  Julie Andrews', Bruce Lee's, Kirk Douglas', and Audrey Hepbyrn's active acting career.

In [1]:
import pandas as pd 
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from IPython.display import Image

In [2]:
julie_andrews_df = pd.read_excel("julieandrews_activecareer.xlsx")
julie_andrews_df.head()

Unnamed: 0,Actor,Year,Number of Movies
0,Julie Andrews,1960,1
1,Julie Andrews,1964,2
2,Julie Andrews,1965,1
3,Julie Andrews,1966,2
4,Julie Andrews,1967,1


In [3]:
bruce_lee_df = pd.read_excel("brucelee_activecareer.xlsx")
bruce_lee_df.head()

Unnamed: 0,Actor,Year,Number of Movies
0,Bruce Lee,1946,1
1,Bruce Lee,1950,1
2,Bruce Lee,1951,1
3,Bruce Lee,1953,3
4,Bruce Lee,1955,2


In [4]:
audrey_hepburn_df = pd.read_excel("audreyhepburn_activecareer.xlsx")
audrey_hepburn_df.head()

Unnamed: 0,Actor,Year,Number of Movies
0,Audrey Hepburn,1952,1
1,Audrey Hepburn,1953,2
2,Audrey Hepburn,1954,1
3,Audrey Hepburn,1956,1
4,Audrey Hepburn,1957,2


In [5]:
kirk_douglas_df = pd.read_excel("kirkdouglas_activecareer.xlsx")
kirk_douglas_df.head()

Unnamed: 0,Actor,Year,Number of Movies
0,Kirk Douglas,1946,1
1,Kirk Douglas,1947,2
2,Kirk Douglas,1948,2
3,Kirk Douglas,1949,2
4,Kirk Douglas,1950,2


In [6]:
JA_fig = px.bar(julie_andrews_df, x = "Year", y = "Number of Movies", 
                title = "Julie Andrews Number of Movies During Active Acting Career")
JA_fig.update_xaxes(tickmode='linear')

JA_fig.update_traces(marker_color='#1f77b4')


JA_fig.show()

In [7]:
BL_fig = px.bar(bruce_lee_df, x = "Year", y = "Number of Movies", title = "Bruce Lee's Number of Movies During Active Acting Career")
BL_fig.update_xaxes(tickmode='linear')

BL_fig.update_traces(marker_color='#d62728') 

BL_fig.show()

In [8]:
AH_fig = px.bar(audrey_hepburn_df, x = "Year", y = "Number of Movies", title = "Audrey Hepburn's Number of Movies During Active Acting Career")
AH_fig.update_xaxes(tickmode='linear')

AH_fig.update_traces(marker_color='#17becf') 

AH_fig.show()

In [9]:
KD_fig = px.bar(kirk_douglas_df, x = "Year", y = "Number of Movies", title = "Kirk Douglas' Number of Movies During Active Acting Career")
KD_fig.update_xaxes(tickmode='linear')

KD_fig.show()

In [10]:
df = pd.concat([julie_andrews_df,bruce_lee_df,audrey_hepburn_df,kirk_douglas_df], axis = 0)
df.head()

Unnamed: 0,Actor,Year,Number of Movies
0,Julie Andrews,1960,1
1,Julie Andrews,1964,2
2,Julie Andrews,1965,1
3,Julie Andrews,1966,2
4,Julie Andrews,1967,1


In [11]:
fig = px.bar(df, x='Year', y='Number of Movies',
             color='Actor',height=400, 
             title = "Distribution of Movies Actors/Actresses Starred in Across Active Acting Careers")

fig.update_xaxes(tickmode='linear')


fig.show()

# Analysis 1B and 1C 

Analyzing active careers in any genre against all other actors and actresses and exploring overlapping careers between chosen actors/actresses

In [12]:
activecareer_df = pd.read_excel("activecareers_all.xlsx")

In [13]:
fig = px.bar(activecareer_df, x="Year", y="Number of Movies", color='Actor', height = 500,
            title = "Distribution of Movies Actors/Actresses Starred in Across Active Acting Careers")
fig.update_xaxes(tickmode='linear')

fig.show()

In [14]:
df_4661 = pd.read_excel("activecareer_46_61.xlsx")

df_6281 = pd.read_excel("activecareer_62_81.xlsx")

df_8201 = pd.read_excel("activecareer_82_01.xlsx")

df_0310 = pd.read_excel("activecareer_03_10.xlsx")

In [17]:
activeyears_df = pd.concat([df_4661,df_6281,df_8201,df_0310])
activeyears_df.sort_values(by=['Period'])

Unnamed: 0,Actor,Period,Number of Movies
0,Bruce Lee,1946-1961,11
1,Kirk Douglas,1946-1961,32
2,Audrey Hepburn,1946-1961,10
3,Julie Andrews,1946-1961,1
0,Bruce Lee,1962-1981,4
1,Kirk Douglas,1962-1981,28
2,Audrey Hepburn,1962-1981,9
3,Julie Andrews,1962-1981,12
0,Kirk Douglas,1982-2001,6
1,Julie Andrews,1982-2001,8


In [18]:
fig = px.bar(activeyears_df, x="Period", y="Number of Movies", color='Actor', height = 500,
            title = "Distribution of Movies Actors/Actresses Starred in Across Active Acting Careers")
fig.update_xaxes(tickmode='linear')

fig.show()

In [19]:
labels = df_4661["Actor"]
values = df_4661["Number of Movies"]

colors = ['rgb(115,41,68)', 'rgb(79, 129, 102)','rgb(242,153,102)', 'rgb(36, 73, 147)']
# Use `hole` to create a donut-like pie chart
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3, marker_colors = colors)])

fig.update_layout(
    title_text="Proportion of Movies Actors/Actresses of Interest Starred in (1946-1961)")
fig.show()

In [20]:
labels = df_6281["Actor"]
values = df_6281["Number of Movies"]

colors = ['rgb(115,41,68)', 'rgb(79, 129, 102)','rgb(242,153,102)', 'rgb(36, 73, 147)']

# Use `hole` to create a donut-like pie chart
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3, marker_colors = colors)])

fig.update_layout(
    title_text="Proportion of Movies Actors/Actresses of Interest Starred in (1962-1981)")
fig.show()

In [21]:
labels = df_8201["Actor"]
values = df_8201["Number of Movies"]
colors = ['rgb(79, 129, 102)', 'rgb(36, 73, 147)']

# Use `hole` to create a donut-like pie chart
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3, marker_colors = colors)])

fig.update_layout(
    title_text="Proportion of Movies Actors/Actresses of Interest Starred in (1982-2001)")
fig.show()

In [22]:
labels = df_0310["Actor"]
values = df_0310["Number of Movies"]
colors = ['rgb(79, 129, 102)', 'rgb(36, 73, 147)']

# Use `hole` to create a donut-like pie chart
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3, marker_colors =  colors)])

fig.update_layout(
    title_text="Proportion of Movies Actors/Actresses of Interest Starred in (2003-2010)")
fig.show()

# Analysis Visualizations:
1. Analysis 1A/1B Visualization (exploring the active careers of 4 actors/actresses)
2. Analysis 1B/1C Visualization (exploring the distribution/proportional differences between each actor and actresses with overlaps during their active careers)

![ActiveCareers.png](attachment:ActiveCareers.png)

![Proportion%20of%20Movies%20ActorsActresses%20Starred%20Between%201946-2010%281%29-3.png](attachment:Proportion%20of%20Movies%20ActorsActresses%20Starred%20Between%201946-2010%281%29-3.png)

![Proportion%20of%20Movies%20ActorsActresses%20Starred%20Between%201946-2010%281%29%20copy.png](attachment:Proportion%20of%20Movies%20ActorsActresses%20Starred%20Between%201946-2010%281%29%20copy.png)