# Recommendations for Microsoft's New Movie Studio

<img src="images/microsoft_logo.jpg" style="width:400px;height:170px"/>

**Author**: Martin Reyes

# Project Description:

Analyzing data to come up with recommendations for Microsoft's new movie business venture.


<hr style="border:1px solid gray"> </hr>

## Overview

This project analyzes what types of movies Microsoft should produce for their new movie studio. The analysis shows that Microsoft should focus on the Documentary, Biography, Drama, Romance, and Adventure genres. It's also shown the the best time to release these movies is at the end of the week, during March and April, and during November and December. Microsoft can use these recommendations to generate better ratings and profit.

<hr style="border:1px solid gray"> </hr>

## Business Problem


<img src="images/top_rated_movies.jpg" style="width:400px;height:170px"/>


Microsoft has decided to create a new movie studio, and they need to determine what types of films to produce. Ratings generally reflect viewers' (consumer) attitude towards movie, so this will be the main subject to analyze. Each movie genre will be analyzed to see which produce the top ratings. Ratings will be compared by the timing of their release. Lastly, the top genres will be analyzed to see how each of their ratings are trending in the past five years.

<hr style="border:1px solid gray"> </hr>

## Data

Data from IMDb, TMDb, and The Numbers can be extracted to analyze movie ratings, profit, release dates, and other measures from thousands of movies. This data can be found in the "data" folder

<hr style="border:1px solid gray"> </hr>

## Methods

This project uses descriptive analysis, including categorical comparisons and analysis of trends over time. This allows for making recommendations and insights through various categories. In this case, movie genres are the categories being analyzed.

<hr style="border:1px solid gray"> </hr>

## Key Findings

### Which movie genres produce the best ratings?

The "Ratings by Genre" bar graph shows that the top, popular genres are Documentary, Biography, Drama, Romance, and Adventure. The unpopular ones are Horror, Sci-Fi, Thriller, Mystery, and Action.

![ratings_by_genre](images/ratings_genre.png)



### When is the best time to release movies by weekday and month?

The "Rating by Day of Week" and "Rating by Month" line charts show that the optimal time to release a movie, according to the ratings, is at the end of the week during March and April, and during November and December. 

![ratings_by_D&M](images/ratings_by_day_and_mon.png)


### How are the top genres tending in recent years?

None of the top genres showed an uptrend in rating, while Crime and Biography were the only two to show a slight decline in recent years. This means ratings by genre are not likely to drastically change over time. 

![ratings_trends](images/rating_trends.png)

### Are ratings and popularity correlated to profit?

Profit has a moderate, positive correlation with production budget, but just a weak, positive correlation with ratings and runtime.

![profit_heatmap](images/heatmap1.png)



<hr style="border:1px solid gray"> </hr>

## Conclusions

Recommendations based on the analyses:
* **For better ratings, focus on producing the top genres: Documentary, Biography, Drama, Romance, and Adventure.** Alternatively, notice the genres with. poorer ratings: Horror, Sci-Fi, Thriller, Mystery, and Action.
* **Release movies during the optimal months: March, April, November, and December.** Consider the importance and popularity of these months.
* **Expect the genres to maintain similar ratings over the next few years.**

### Next Steps

Further analyses can help better understand ratings, genre, and profitability:
* **See what other correlations exist with ratings, profit, and other metrics.** Investigate these correlations.
* **See if certain dates (like holidays) affect ratings and profit.** Grouping by genre can also help determine what types of movies to release during these dates.
* **Try to forecast and predict certain metrics like profit and revenue.** This can be done with predictive modeling or time-series analysis.

<hr style="border:1px solid gray"> </hr>

## For More Information

Please review the full analysis in [the Jupyter Notebook](./Movie_Recommendation_Full_EDA.ipynb) or the [presentation](./Movie_Recommendations_Presentation.pdf).

For any additional questions, please contact **Martin Reyes** at **martinreyes.eng@gmail.com**

<hr style="border:1px solid gray"> </hr>

## Repository Structure

```
├── data
├── images
├── Movie_Recommendation_Full_EDA.ipynb
├── README.md
├── Movie_Recommendations_Presentation.pdf
```

# Phase 2 Project

Another module down--you're almost half way there!


All that remains in Phase 2 is to put our newfound data science skills to use with a large project! This project should take 20 to 30 hours to complete.

## Project Overview

For this project, you will use regression modeling to analyze house sales in a northwestern county.

### The Data

This project uses the King County House Sales dataset, which can be found in  `kc_house_data.csv` in the data folder in this repo. The description of the column names can be found in `column_names.md` in the same folder. As with most real world data sets, the column names are not perfectly described, so you'll have to do some research or use your best judgment if you have questions about what the data means.

It is up to you to decide what data from this dataset to use and how to use it. If you are feeling overwhelmed or behind, we recommend you ignore some or all of the following features:

* date
* view
* sqft_above
* sqft_basement
* yr_renovated
* zipcode
* lat
* long
* sqft_living15
* sqft_lot15

### Business Problem

It is up to you to define a stakeholder and business problem appropriate to this dataset.

If you are struggling to define a stakeholder, we recommend you complete a project for a real estate agency that helps homeowners buy and/or sell homes. A business problem you could focus on for this stakeholder is the need to provide advice to homeowners about how home renovations might increase the estimated value of their homes, and by what amount.

## Deliverables

There are three deliverables for this project:

* A **GitHub repository**
* A **Jupyter Notebook**
* A **non-technical presentation**

Review the "Project Submission & Review" page in the "Milestones Instructions" topic for instructions on creating and submitting your deliverables. Refer to the rubric associated with this assignment for specifications describing high-quality deliverables.

### Key Points

* **Your deliverables should explicitly address each step of the data science process.** Refer to [the Data Science Process lesson](https://github.com/learn-co-curriculum/dsc-data-science-processes) from Topic 19 for more information about process models you can use.

* **Your Jupyter Notebook should demonstrate an iterative approach to modeling.** This means that you begin with a basic model, evaluate it, and then provide justification for and proceed to a new model. After you finish refining your models, you should provide 1-3 paragraphs discussing your final model - this should include interpreting at least 3 important parameter estimates or statistics.

* **Based on the results of your models, your notebook and presentation should discuss at least two features that have strong relationships with housing prices.**

## Getting Started

Start on this project by forking and cloning [this project repository](https://github.com/learn-co-curriculum/dsc-phase-2-project) to get a local copy of the dataset.

We recommend structuring your project repository similar to the structure in [the Phase 1 Project Template](https://github.com/learn-co-curriculum/dsc-project-template). You can do this either by creating a new fork of that repository to work in or by building a new repository from scratch that mimics that structure.

## Project Submission and Review

Review the "Project Submission & Review" page in the "Milestones Instructions" topic to learn how to submit your project and how it will be reviewed. Your project must pass review for you to progress to the next Phase.

## Summary

This project will give you a valuable opportunity to develop your data science skills using real-world data. The end-of-phase projects are a critical part of the program because they give you a chance to bring together all the skills you've learned, apply them to realistic projects for a business stakeholder, practice communication skills, and get feedback to help you improve. You've got this!
