## Final Project Submission

Please fill out:
* Student name: Annah Mukethe
* Student pace: Part Time
* Instructor name: Samuel Mwangi


## PROJECT DESCRIPTION ##

**OVERVIEW**

In this project, we aim to assist Microsoft in entering the movie industry by leveraging exploratory data analysis (EDA) to uncover trends and patterns in box office performance. By examining a comprehensive dataset of films, we will identify the characteristics of successful movies, providing actionable insights to guide Microsoft's new movie studio in producing content that resonates with audiences and performs well financially.

## BUSINESS UNDERSTANDING ##

**BUSINESS PROBLEM**

Microsoft, inspired by the success of major companies producing original video content, has decided to launch its own movie studio. However, with no prior experience in film production, Microsoft faces a significant challenge in understanding the dynamics of the movie industry. The goal is to explore the current film landscape, specifically focusing on what types of films are achieving the best box office results. By identifying key factors that contribute to a movie's success, Microsoft can make informed decisions about the types of films to produce, ensuring a competitive entry into the market.

**OBJECTIVES**

1. **Analyze Market Trends:** Identify the genres, themes, and characteristics of movies that are currently performing well.

2. **Determine Key Success Factors:** Understand the elements that contribute to high performance, such as budget, runtime, release period, and critical reception.
3. **Provide Actionable Insights:** Translate the findings into strategic recommendations for the types of films Microsoft should create to maximize their chances of success in the competitive movie industry.

## DATA UNDERSTANDING ##

**DATA SOURCE**

Data Source: IMDb (Internet Movie Database) dataset
Justification: IMDb is a comprehensive and reliable source for movie data, including ratings, genres, and other relevant information.

**DATASET COLUMN DESCRIPTION**

The Description is based on the columns resulted from merging two tables from the im.db.


**movie_id:** Unique identifier for each movie.

**primary_title:** The main title of the movie.

**original_title:** The original title of the movie in its native language.

**start_year:** The year the movie was released.

**runtime_minutes:** The duration of the movie in minutes.

**genres:** The categories or genres the movie belongs to (e.g., Drama, Comedy).

**averagerating:** The average rating given to the movie by viewers.

**numvotes:** The number of votes the movie has received from viewers.

**IMPORTING LIBRARIES**

In [1]:
#importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib as plt
import sqlite3 

**DATABASE CONNECTION**

In [2]:
conn = sqlite3.Connection("im.db")
cursor = conn.cursor()

**SCHEMA INSPECTION**

In [3]:
#checking the tables that are available in the database
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()

for table in tables:
    print(f"\nTable: {table[0]}")
    cursor.execute(f"PRAGMA table_info({table[0]});")
    columns = cursor.fetchall()
    print("Columns:")
    for column in columns:
        print(f" - {column[1]} (Type: {column[2]})")


Table: movie_basics
Columns:
 - movie_id (Type: TEXT)
 - primary_title (Type: TEXT)
 - original_title (Type: TEXT)
 - start_year (Type: INTEGER)
 - runtime_minutes (Type: REAL)
 - genres (Type: TEXT)

Table: directors
Columns:
 - movie_id (Type: TEXT)
 - person_id (Type: TEXT)

Table: known_for
Columns:
 - person_id (Type: TEXT)
 - movie_id (Type: TEXT)

Table: movie_akas
Columns:
 - movie_id (Type: TEXT)
 - ordering (Type: INTEGER)
 - title (Type: TEXT)
 - region (Type: TEXT)
 - language (Type: TEXT)
 - types (Type: TEXT)
 - attributes (Type: TEXT)
 - is_original_title (Type: REAL)

Table: movie_ratings
Columns:
 - movie_id (Type: TEXT)
 - averagerating (Type: REAL)
 - numvotes (Type: INTEGER)

Table: persons
Columns:
 - person_id (Type: TEXT)
 - primary_name (Type: TEXT)
 - birth_year (Type: REAL)
 - death_year (Type: REAL)
 - primary_profession (Type: TEXT)

Table: principals
Columns:
 - movie_id (Type: TEXT)
 - ordering (Type: INTEGER)
 - person_id (Type: TEXT)
 - category (Type: 

From output above, it is evident that the database contains 8 tables that is : movie_basics, directors, known_for, movie_akas, movie_ratings,
persons, principals and writers. We will inspect each table to understand the nature of the data they hold and also to determine which tables we will need for our analysis.