In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import clean_data
import helpers
import analysis
import figures


# Preamble

With the anticipation of Monster Hunter Rise, I wanted to take a look at the previous games in the Monster Hunter series. Monster Hunter is a series of video games made by Capcom that you play as a hunter that hunts various different and unique monsters. Large monsters act as "bosses" so the more monsters, the more potential playtime of the title. Each new installment brings new monsters, new ways to hunt the monsters and also brings older monsters from previous games. when a base version of a title is released, it followed by a sequel (known as a ultimate expansion of the base game). Ultimate expansions brings more monsters.

## Table of Contents
1. ### [Methods](#Methods)
    * #### [Cleaning](#Cleaning)
    * #### [Amount of Monsters: Processing](#Amount-of-Monsters:-Processing-the-Data)
Analysis
Conclusion




# Methods

## Cleaning

There are 3 csv files:

1. [**Monster Classes**](Monster_Classes.csv) (Describes Monster Type and Size)
2. [**Title Data**](MonsterHunter_General_Data.csv) (Describes Title information including release data, director) 
3. [**Monster in Title**](Monsters_in_Games.csv) (lists all monsters in a title)
that I prepared that includes data such as monsters in each games, specific monster data, and title information.

inorder to join the 3 tables together, a simular key is needed to join the tables. **Monsters in Title** Table is melted to make a 2 column table with columns ['Name','Title'] so it can be join with **Monster Classes** table on *'Name'*. The combined table is join with **Title Data** on *'Title'* to create **Monster Hunter Data** table 

most of the data was collected refering to the website [MonsterHunter.Fandom](https://monsterhunter.fandom.com/wiki/Monster_Hunter_Wiki)

In [2]:
monster_hunter_data = clean_data.get_data()
monster_hunter_data

Unnamed: 0,Name,Type,Size,Title,Country Released,Date Released,Generation,Director,Console
0,Abyssal Lagiacrus,Leviathan,Large,Monster Hunter 3 Ultimate,JPN,2011-12-10,3,Kaname Fujioka,3DS
1,Abyssal Lagiacrus,Leviathan,Large,Monster Hunter 3 Ultimate,North America,2013-03-19,3,Kaname Fujioka,3DS
2,Agnaktor,Leviathan,Large,Monster Hunter 3 Ultimate,JPN,2011-12-10,3,Kaname Fujioka,3DS
3,Agnaktor,Leviathan,Large,Monster Hunter 3 Ultimate,North America,2013-03-19,3,Kaname Fujioka,3DS
4,Alatreon,Elder Dragon,Large,Monster Hunter 3 Ultimate,JPN,2011-12-10,3,Kaname Fujioka,3DS
...,...,...,...,...,...,...,...,...,...
2496,Vespoid,Neopteron,Small,Monster Hunter Freedom,North America,2006-05-23,1,Yasunori Ichinose,Playstation Portable
2497,White Monoblos,Flying Wyvern,Large,Monster Hunter Freedom,JPN,2005-12-01,1,Yasunori Ichinose,Playstation Portable
2498,White Monoblos,Flying Wyvern,Large,Monster Hunter Freedom,North America,2006-05-23,1,Yasunori Ichinose,Playstation Portable
2499,Yian Kut-Ku,Bird Wyvern,Large,Monster Hunter Freedom,JPN,2005-12-01,1,Yasunori Ichinose,Playstation Portable



## Amount of Monsters: Processing the Data

I wanted to see what how many monsters and what the proportion between new monsters to monster that were already introduced.

The analysis will be done on the following titles:

In [3]:
analysis_titles = analysis.get_amt_titles_df()
analysis_titles

Unnamed: 0,Base,Ultimate
0,Monster Hunter,Monster Hunter Freedom
1,Monster Hunter Freedom 2,Monster Hunter Freedom Unite
2,Monster Hunter 3,Monster Hunter 3 Ultimate
3,Monster Hunter Portable 3rd,
4,Monster Hunter 4,Monster Hunter 4 Ultimate
5,Monster Hunter Generations,Monster Hunter Generations Ultimate
6,Monster Hunter: World,Monster Hunter World: Iceborne
7,Monster Hunter Rise,


**Note:** Early in the series, the japanese titles would be released months before the western release, the western releases would also have the title name changed (ex: (Jpn) Monster Hunter X, (West) Monster Hunter Generations )

I am focusing on the western named titles, if it has one, otherwise I will be looking at the japanese named title

for each title I obtain:
1. Amount of total monsters
2. Amount of large monsters
3. Amount of small monsters
4. Amount of new monsters
5. Amount of new large monsters
6. Amount of new small monsters
7. Amount of Variant/Subspecies/Deviants

For all the titles

``title_data = monster_hunter_data[monster_hunter_data['Title'] == title]``

To find [ 1 ]  I filtered for the title, removed any duplicate monster names, and counted the rows

``title_monsters = title_data.drop_duplicates(subset=['Name']).count() ``

Similary to [ 1 ] to find [ 2 / 3 ] group by the size and count the rows

``large_monsters = total_monsters.groupby('Size').count()``

In order to find [ 4 / 5 / 6 ] the new monsters in the title needs to be found, the title date is used to filter for previous titles. The minimum title date is taken the japanese release and the reason it is used instead of the western release is to ensure we dont get the japanese title in the filtering.

``title_date = min(title_data['Date Released'])``

a list of monsters from the title and a list of monsters from all previous titles are found. the new monsters are found by taking the set difference of the previous monsters and the title monsters.

``previous_monsters = title_data[title_data['Date Released'] < title_date].drop_duplicates(subset=['Name'])``

``new_monsters = title_monsters[title_monsters['Name'].isin(previous_monsters) == False].count()``

[ 7 ] The variants (which are generalization of subspecies/deviants/apex(Rise)) are retrieved by sorting a list of all the monsters names by string length and then iterating through the list from 1-n removing any that contains the shorter names which produces a list of non variant monsters. to get the variants we takes the difference of titles monsters and the non-variant title monsters.

see the [filter_out_variants(df)](helpers.py) if interested

In [19]:
amt_monsters_df = analysis.get_amt_table(analysis_titles,monster_hunter_data)
amt_monsters_df[['Title','Total Monsters', 'Large Monsters','Small Monsters',
                      'New Monsters', 'New Large Monsters', 'New Small Monsters','Variant Monsters']]

Unnamed: 0,Title,Total Monsters,Large Monsters,Small Monsters,New Monsters,New Large Monsters,New Small Monsters,Variant Monsters
0,Monster Hunter,30,17,13,30,17,13,0
1,Monster Hunter Freedom,44,31,13,14,14,0,13
2,Monster Hunter Freedom 2,70,47,23,27,17,10,15
3,Monster Hunter Freedom Unite,81,58,23,11,11,0,20
4,Monster Hunter 3,35,19,16,26,16,10,0
5,Monster Hunter Portable 3rd,60,41,19,20,17,3,11
6,Monster Hunter 3 Ultimate,73,52,21,11,10,1,21
7,Monster Hunter 4,72,53,19,16,14,2,18
8,Monster Hunter 4 Ultimate,98,76,22,12,12,0,33
9,Monster Hunter Generations,105,72,33,22,19,3,16



# Amount of Monsters: Analysis


The trends in the ratio of new monsters to total seems consisent over the series with the exception of Monster Hunter, Monster Hunter 3, and Monster Hunter Generations Ultimate. Monster Hunter is the first game and Monster Hunter Generations Ultimate was a title that the wanted to bring back monsters from the previous games which explains the high to low ratios. Monster Hunter 3 has a low total monsters but a high ratio of new monsters. 


In [None]:
figures.make_amt_monsters_figure(amt_monsters_df)

In [None]:
print('Mean of New Monster Ratio {:2.5f} '.format(amt_monsters_df['New Monster Ratio'].mean()))
print('Median of New Monster Ratio: {:2.5f}'.format(amt_monsters_df['New Monster Ratio'].median()))
print('\nMean of Variant Monster Ratio: {:2.5f}'.format(amt_monsters_df['Variant Monster Ratio'].mean()))
print('Median of Variant Monster Ratio: {:2.5f}'.format(amt_monsters_df['Variant Monster Ratio'].median()))


# Monsters Throughout the Series

We will be analyzing the same titles as the amount of monsters.

The monsters with the most occurances throughout the series is **Rathalos** and **Rathian** which have shown up in all analysis titles.

Rathalos  | Rathian
:-------------------------:|:----------------------:
![Rathalo](Resources/MHRise-Rathalos_Render_003.png) | ![Rathian](Resources/Rathian_Transparent01.png)

*Images from MonsterHunter.fandom*


In [None]:
analysis.get_monster_occurance_df(analysis_titles,monster_hunter_data)


# Base Vs Ultimate?

Another question is how many months between a base game and the ultimate release. with the ultimate realease, it brings more monsters and more things to do!  

To find the amount of months between the version, I will use the japan release titles, the reason is titles were released in japan monsths before coming to the west.  

In [None]:
game_titles = analysis.get_base_ultimate_titles()
base_ultimate = analysis.get_base_ultimate_df(game_titles, monster_hunter_data)
print('Mean of time between games {:2.5f} months'.format(base_ultimate['date_difference'].mean().days/30.5))
print('Median of time between games {:2.5f} months'.format(base_ultimate['date_difference'].median().days/30.5))
base_ultimate


# When will Rise Ultimate Release?!?

we can hope that it will be released in 14-15 months on 

In [None]:
figures.make_base_ultimate_figure(base_ultimate)

### Sorting monsters to Directors
I want to find if a director is more likely to use monsters that they have introduced
I want to get the data to have the relavant titles
total_new_monsters

In [None]:
director_title_data,director_data = analysis.get_director_df(monster_hunter_data)
director_title_data.groupby(by='Director')['Director Monster Ratio'].mean().to_frame('Mean').reset_index()

In [None]:
figures.make_director_figure(director_title_data)

In [None]:
monster_type_data,monster_type_intro_date = analysis.get_types_df(monster_hunter_data)
monster_type_intro_date

In [None]:
figures.make_type_figure(monster_type_data)