## The Microsoft Scifi Project - A&J Analytica

Microsoft sees all the big companies creating original video content, and they want to get in on the fun. They have decided to create a new movie studio, but the problem is they don’t know anything about creating movies. They also want to capture a niche and create only sci-fi movies. They have hired A&J Analytica to help them better understand the scifi genre. Your team is charged with doing data analysis and creating a presentation that explores what type of sci-fi films are currently doing the best at the box office. You must then translate those findings into actionable insights that the CEO can use when deciding what type of films they should be creating.

## DATASETS:

## Box Office Mojo Sci-Fi subgenre URLs:

Affliction	https://www.boxofficemojo.com/genre/sg1826615553/?ref_=bo_gs_table_14 

Space Opera	https://www.boxofficemojo.com/genre/sg2027942145/?ref_=bo_gs_table_31

Person vs. Machine	https://www.boxofficemojo.com/genre/sg3235901697/?ref_=bo_gs_table_93

Post-Apocalypse	https://www.boxofficemojo.com/genre/sg434041089/?ref_=bo_gs_table_36

Supernatural	https://www.boxofficemojo.com/genre/sg3185570049/?ref_=bo_gs_table_4

Alien Invasion	https://www.boxofficemojo.com/genre/sg4292866305/?ref_=bo_gs_table_9

Time Travel	https://www.boxofficemojo.com/genre/sg501149953/?ref_=bo_gs_table_11

Superhero	https://www.boxofficemojo.com/genre/sg2900226305/?ref_=bo_gs_table_12

Robot	https://www.boxofficemojo.com/genre/sg3722375425/?ref_=bo_gs_table_32

Future	https://www.boxofficemojo.com/genre/sg2799628545/?ref_=bo_gs_table_33

## API: 

https://developers.themoviedb.org/3/getting-started/introduction

## Checklist:

- Organization/Code Cleanliness

     Your notebook should contain 1 - 2 paragraphs briefly explaining your approach to this project.
     
     
- Visualizations & EDA

    Your project contains at least 4 meaningful data visualizations, with corresponding interpretations. All visualizations are well labeled with axes labels, a title, and a legend (when appropriate)
    
    You pose at least 3 meaningful questions and answer them through EDA. These questions should be well labeled and easy to identify inside the notebook.
    
     Level Up: Each question is clearly answered with a visualization that makes the answer easy to understand.
   


- Questions:

    -What scifi film has the highest generated revenue? 
    
    -What scifi film has the lowest generated revnue?
    
    -What percentage of sci-fi films have a rating of 4/90% or higher?
    
    -Subgenre breakout of the top 10 most popular sci-fi films? (popularity determined by rating or by revenue generated)
    
    -What is the average total revenue of all the films? Grouped by subgenres? Grouped by decade?
    
    -What is the correlation of revenue and rating?
    
    -Overall suggestion and final thoughts to Microsoft? 
    
    -What is the distribution like?
    
    -Are there any outliers?
    
    -How do the different sub genre differ from one another in terms of success? Is there a correlation between them?
   

In [12]:
#import packages needed

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('fivethirtyeight')
import numpy as np
import json
import requests
from bs4 import BeautifulSoup
%matplotlib inline

## API DATA SET:

In [4]:
#code

In [None]:
#code

In [None]:
#code

In [None]:
#code

In [None]:
#visualization

## WEBSCRAPING DATA SET:

In [103]:
#code
page = requests.get("https://www.boxofficemojo.com/genre/sg3722375425/?ref_=bo_gs_table_32")
page


<Response [200]>

In [104]:
#page.content

In [106]:
soup = BeautifulSoup(page.content, 'html.parser')
#soup

In [16]:
#print(soup.prettify())

In [18]:
#list(soup.children)


In [21]:
len(list(soup.children))

2

In [22]:
[type(item) for item in list(soup.children)]


[bs4.element.Doctype, bs4.element.Tag]

In [107]:
html = list(soup.children)[1]


In [25]:
#html

In [30]:
#list(html.children)

In [108]:
body = list(html.children)[1]


In [48]:
#print(body.prettify())

In [109]:
titles = soup.find_all('td', class_='a-text-left mojo-field-type-title')

In [110]:
lifetime_gross = soup.find_all('td', class_='a-text-right mojo-field-type-money mojo-sort-column mojo-estimatable')

In [111]:
date_release = soup.find_all('td', class_='a-text-left mojo-field-type-date a-nowrap')

In [112]:
lifetime_gross[0].get_text()

'$459,005,868'

In [93]:
# for date in date_release:
#     print(date.get_text())

In [94]:
# for title in titles:
#     print(title.get_text())

In [95]:
# for gross in lifetime_gross:
#     print(gross.get_text())

In [114]:
print(len(titles))
print(len(lifetime_gross))
print(len(date_release))

63
63
63


In [113]:
cur = 0

while cur < 63:
    print('Title:', titles[cur].get_text())
    print('Lifetime Gross: ', lifetime_gross[cur].get_text())
    print('Date Released: ', date_release[cur].get_text())
    print()
    cur+=1

Title: Avengers: Age of Ultron
Lifetime Gross:  $459,005,868
Date Released:  May 1, 2015

Title: Transformers: Revenge of the Fallen
Lifetime Gross:  $402,111,870
Date Released:  Jun 24, 2009

Title: Transformers: Dark of the Moon
Lifetime Gross:  $352,390,543
Date Released:  Jun 29, 2011

Title: Transformers
Lifetime Gross:  $319,246,193
Date Released:  Jul 3, 2007

Title: Transformers: Age of Extinction
Lifetime Gross:  $245,439,076
Date Released:  Jun 27, 2014

Title: X-Men: Days of Future Past
Lifetime Gross:  $233,921,534
Date Released:  May 23, 2014

Title: WALL·E
Lifetime Gross:  $223,808,164
Date Released:  Jun 27, 2008

Title: Big Hero 6
Lifetime Gross:  $222,527,828
Date Released:  Nov 7, 2014

Title: Terminator 2: Judgment Day
Lifetime Gross:  $204,843,345
Date Released:  Jul 3, 1991

Title: Terminator 3: Rise of the Machines
Lifetime Gross:  $150,371,112
Date Released:  Jul 2, 2003

Title: I, Robot
Lifetime Gross:  $144,801,023
Date Released:  Jul 16, 2004

Title: Transform

In [101]:
# for date in date_release:
#     print(date.get_text())

# for title in titles:
#     print(title.get_text())

# for gross in lifetime_gross:
#     print(gross.get_text())

## Merged Datasets:

In [None]:
#more viz

In [None]:
#more code

In [None]:
#blahblahblah

## Final Thoughts & Summary: