# Heros Of Pymoli

After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. The assigned  task is to analyze the data for their most recent fantasy game Heroes of Pymoli. 

Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights.

As the first step, the needed packages as imported. Since data file is json, we need to import json package . 

In [1]:
import pandas as pd
import os
import json

The data is provided in two JSON files. purchase_data.json and purchas_data2.json. Read the files using read_csv function from pandas. Let's define a user defined function to do it. The function reading_file takes filePath, variable which is a string as argument and returns a DataFrame object , which contains the data from .json file

In [2]:
#define the function to read json file, path of which is passed as argument
def reading_File(filePath):
    with open(filePath) as datafile:
        #loads the file into data
        data = json.load(datafile)
    #create the dataframe object 
    purchas_Data_Reader = pd.DataFrame(data)
    #return the dataframe object
    return purchas_Data_Reader

In [3]:
purchase_Data_Reader=reading_File(os.path.join("raw_data","purchase_data.json"))
purchase_Data_Reader.head()

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


Now let's examine whether any null values are present in the data. Dataframe.isnull.any function returns false if there is no null value . It returns true if there is null values in columns


## <li><i><u>Total:</u></i>
The First task is to find total values from the data.
    <li>Next, to find the total purchases, let's find value_count the number of item id . value_count() returns the count for each item id in the data. Length of the result will give how many items are there.

In [5]:
count_unique_ITEMS=len(purchase_Data_Reader["Item ID"].unique())
count_unique_ITEMS

183

<li> To find the total purchase, count the total rows of Price column. Average purchase price can then be calculated by dividing the total price by total number of purchases. 
<li> Find unique count of players  and total revenue .

In [6]:
total_Purchases=purchase_Data_Reader["Price"].count()
avg_Purchase_Price=purchase_Data_Reader["Price"].sum()/total_Purchases
avg_Purchase_Price

2.931192307692303

In [7]:
count_unique_Players=len(purchase_Data_Reader["SN"].unique())
count_unique_Players

573

In [8]:
total_revenue=purchase_Data_Reader["Price"].sum()
total_revenue

2286.3299999999963

## <li><u><i>Gender Demographics</i></u>

#### Calculate:
<ol>
    <li>Percentage and Count of Male Players
    <li>Percentage and Count of Female Players
    <li>Percentage and Count of Other / Non-Disclosed
</ol>

In [9]:
total_Gender=purchase_Data_Reader["Gender"].count()
total_Gender

780

In [10]:
maleCount=len(purchase_Data_Reader.loc[purchase_Data_Reader["Gender"]=="Male"])
malePercentage=maleCount/total_Gender*100

In [11]:
femaleCount=len(purchase_Data_Reader.loc[purchase_Data_Reader["Gender"]=="Female"])
femalePercentage=femaleCount/total_Gender*100

In [12]:
otherCount=total_Gender-maleCount-femaleCount
otherPercentage=otherCount/total_Gender*100

In [13]:
print(otherPercentage,femalePercentage,malePercentage)

1.41025641026 17.4358974359 81.1538461538


 ##  <li><u><i>Purchasing Analysis (Gender) </i></u>

#### Calculate the following broken by Gender:
<ol>
    
       <li> Purchase Count</li>= total no.of purchase by each gender
        <li>Average Purchase Price</li>=total purchase / purchase count
        <li>Total Purchase Value</li>sum of total values by gender
        <li>Normalized Totals</li>
</ol>

### Male:
       1. Purchase Count
       2. Average Purchase Price
       3. Total Purchase Value
       4. Normalized Totals

In [39]:
maleRows=purchase_Data_Reader.loc[purchase_Data_Reader["Gender"]=="Male"]
malePurchase=maleRows["Item ID"].count()
avg_Male_Purchases=maleRows["Price"].sum()/malePurchase
male_total_purchase_Values=maleRows["Price"].sum()
normal_male=male_total_purchase_Values/malePurchase


### Female:
       1. Purchase Count
       2. Average Purchase Price
       3. Total Purchase Value
       4. Normalized Totals

In [15]:
femaleRows=purchase_Data_Reader.loc[purchase_Data_Reader["Gender"]=="Female"]
femalePurchase=femaleRows["Item ID"].count()
avg_FEMale_Purchases=femaleRows["Price"].sum()/femalePurchase
female_total_purchase_Values=femaleRows["Price"].sum()
normal_female=female_total_purchase_Values/femalePurchase
avg_FEMale_Purchases

2.815514705882352

### Others/Not Disclosed:
       1. Purchase Count
       2. Average Purchase Price
       3. Total Purchase Value
       4. Normalized Totals

In [16]:
othersRows=purchase_Data_Reader.loc[(purchase_Data_Reader["Gender"]!="Male") & (purchase_Data_Reader["Gender"]!="Female")]
otherPurchase=othersRows["Item ID"].count()
avg_others_Purchases=othersRows["Price"].sum()/otherPurchase
others_total_purchase_Values=othersRows["Price"].sum()
normal_others=others_total_purchase_Values/otherPurchase
avg_others_Purchases

3.2490909090909086

##  <li><u><i>Age Demographics </i></u>

#### Calculate the below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.)
    -Purchase Count
    -Average Purchase Price
    -Total Purchase Value
    -Normalized Totals

In [17]:

minAge=purchase_Data_Reader["Age"].min()
maxAge=purchase_Data_Reader["Age"].max()
print("Minimum Age:",minAge)
print("Maximum Age:",maxAge)


Minimum Age: 7
Maximum Age: 45


In [18]:
bins = [0,10, 14, 19, 24, 29,34,39,44]
# Create the names for the four bins
year_labels = ['<10', '10-14', '15-19', '20-24','25-29','30-34','35-39','40-44']
purchase_Data_Reader["Age Group"] = pd.cut(purchase_Data_Reader["Age"],bins,labels=year_labels)
purchase_Data_Reader.head(10)

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN,Age Group
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34,35-39
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46,20-24
2,34,Male,174,Primitive Blade,2.46,Assastnya25,30-34
3,21,Male,92,Final Critic,1.36,Pheusrical25,20-24
4,23,Male,63,Stormfury Mace,1.27,Aela59,20-24
5,20,Male,10,Sleepwalker,1.73,Tanimnya91,20-24
6,20,Male,153,Mercenary Sabre,4.57,Undjaskla97,20-24
7,29,Female,169,"Interrogator, Blood Blade of the Queen",3.32,Iathenudil29,25-29
8,25,Male,118,"Ghost Reaver, Longsword of Magic",2.77,Sondenasta63,25-29
9,31,Male,99,"Expiration, Warscythe Of Lost Worlds",4.53,Hilaerin92,30-34


In [19]:
 def age_group(spend_df,criteria):
    # criteria is the column based on which the grouping has to be done
    group_df=purchase_Data_Reader[criteria].unique()
    group_list=[]
    for item in group_df:
        #for the criteria values in the list retrieve required fields from df
        age_df = purchase_Data_Reader[purchase_Data_Reader[criteria] == item]
        purchase_count=age_df["Item ID"].count()
        purchase_price=age_df["Price"].sum()/purchase_count
        total_price=age_df["Price"].sum()
        normal_total=total_price/purchase_count
       
        group_list.append([item,purchase_count,purchase_price,total_price,normal_total])
    #create df based on list and return df
    age_group_df=pd.DataFrame(school_spend_mark_list,columns=[criteria+' Range','Avg Maths Score','Avg Reading Score','Maths %','Read %',"Overall %"])
    return grade_school_spend_df 

Age Group
<10       32
10-14     31
15-19    133
20-24    336
25-29    125
30-34     64
35-39     42
40-44     16
Name: Item ID, dtype: int64

In [20]:
avg_Age_Purchase_price=age_Group_df["Price"].sum()/age_Purchase_Count
avg_Age_Purchase_price
age_total_Purchase_Value=round(age_Group_df["Price"].sum(),2)
age_total_Purchase_Value

Age Group
<10       96.62
10-14     83.79
15-19    386.42
20-24    978.77
25-29    370.33
30-34    197.25
35-39    119.40
40-44     51.03
Name: Price, dtype: float64

## Top Spenders


In [21]:
def analysis_final(purchase_Data_Reader,criteria,aggregate_Val):
    top_spend=purchase_Data_Reader.groupby([criteria],as_index=False)
    top_Purchase=top_spend[aggregate_Val].sum().nlargest(5,aggregate_Val)
    top_purchase_list=list(top_Purchase[criteria])
    spend_each_list=[] 
    spend_total_list=[]
    for each in top_purchase_list:
        each_top_df = purchase_Data_Reader[purchase_Data_Reader["SN"] == each]
        purchase_count = each_top_df["Item ID"].count()
        purchase_sum=each_top_df[aggregate_Val].sum()
        avg_purchase_price=round((purchase_sum/purchase_count),2)
        spend_each_list.append([each,purchase_count,avg_purchase_price,purchase_sum])
  
    df=pd.DataFrame(spend_each_list,columns=['SN','Purchase Count','Avg Purchase Price','Total Purchase'])
    return(df)
    

In [22]:
new_data=analysis_final(purchase_Data_Reader,"SN","Price")
new_data

Unnamed: 0,SN,Purchase Count,Avg Purchase Price,Total Purchase
0,Undirrala66,5,3.41,17.06
1,Saedue76,4,3.39,13.56
2,Mindimnya67,4,3.18,12.74
3,Haellysu29,3,4.24,12.73
4,Eoda93,3,3.86,11.58


In [35]:
def purchase_count():
    purchase=purchase_Data_Reader["Item ID"].unique()
    purchase_list=[]
    #for each school retrieve values
    for item in purchase:
        p_count = purchase_Data_Reader[purchase_Data_Reader["Item ID"] == item]
        purchase_count=p_count["Item ID"].count()
        purchase_name=p_count["Item Name"].unique()[0]
        itemPrice=p_count["Price"].unique()[0]
        totalValue=p_count["Price"].sum()
        purchase_list.append([item,purchase_count,purchase_name,itemPrice,totalValue])
        pc_df=pd.DataFrame(purchase_list,columns=["Item ID","Purchase Count",'Item Name',"Price",'Total Purchase Value'])
    return pc_df

In [36]:
purchase_df=purchase_count()
purchase_df=purchase_df.sort_values("Purchase Count",ascending=False)
purchase_df.head()

Unnamed: 0,Item ID,Purchase Count,Item Name,Price,Total Purchase Value
53,39,11,"Betrayal, Whisper of Grieving Widows",2.35,25.85
88,84,11,Arcane Gem,2.23,24.53
68,175,9,Woeful Adamantite Claymore,1.24,11.16
33,13,9,Serenity,1.49,13.41
49,31,9,Trickster,2.07,18.63


In [37]:
total_purchase=purchase_count()
total_purchase=total_purchase.sort_values("Total Purchase Value",ascending=False)
total_purchase.head()

Unnamed: 0,Item ID,Purchase Count,Item Name,Price,Total Purchase Value
50,34,9,Retribution Axe,4.14,37.26
84,115,7,Spectral Diamond Doomblade,4.25,29.75
45,32,6,Orenmir,4.95,29.7
79,103,6,Singed Scalpel,4.87,29.22
112,107,8,"Splitter, Foe Of Subtlety",3.61,28.88
