### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [2]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data_df = pd.read_csv(file_to_load)

# Check first 10 rows in the purchase_data Data Frame 
purchase_data_df.head(20)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
5,5,Yalae81,22,Male,81,Dreamkiss,3.61
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67
8,8,Undjask33,22,Male,21,Souleater,1.1
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58


## Player Count

* Display the total number of players


In [3]:
# Checking to see the row count of Data is accurate before analysis
purchase_data_df.count()

# Renaming the header "SN" to be more appriopriate
renamed_purchase_data_df = purchase_data_df.rename(columns={"SN": "Player Username"})

# Reducing the output rows 
renamed_purchase_data_df.head()

# Calculating the number of unique players in the DataFrame
username_count_df = len(renamed_purchase_data_df["Player Username"].unique())

print(username_count_df)

576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
renamed_purchase_group_df = renamed_purchase_data_df.groupby(["Player Username"])

# The object returned is a "GroupBy" object and cannot be viewed normally..
print(renamed_purchase_group_df)

# In order to be visualized, a data function must be used...
renamed_purchase_group_df.count().head(10)

price_avg_per_username = renamed_purchase_group_df["Price"].mean().head(10)
print(price_avg_per_username)



<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001DC213FA508>
Player Username
Adairialis76    2.280000
Adastirin33     4.480000
Aeda94          4.910000
Aela59          4.320000
Aelaria33       1.790000
Aelastirin39    3.645000
Aelidru27       1.090000
Aelin32         2.993333
Aelly27         3.395000
Aellynun67      3.740000
Name: Price, dtype: float64


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
# number of Gender types
Gender_count_df = renamed_purchase_data_df["Gender"].value_counts()
#print(Gender_count_df)
Total_gender_count = Gender_count_df.sum()
#print(Total_gender_count)

percent_gender_count = Gender_count_df * 100 / Total_gender_count
print(percent_gender_count,"%")





Male                     83.589744
Female                   14.487179
Other / Non-Disclosed     1.923077
Name: Gender, dtype: float64 %



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [8]:
renamed_Gender_group_df = renamed_purchase_data_df.groupby(['Gender', 'Price'])
renamed_Gender_group_df.count().head()
Gender_Price_df = renamed_Gender_group_df[["Gender", "Price"]].head(10)
# Data frame of Gender and price
print(Gender_Price_df)
males_only_df = Gender_Price_df.loc[Gender_Price_df["Gender"] == "Male", :]
# Data frame of Males only
print(males_only_df)
Avg_price_per_Males_df = males_only_df.mean()
# Data frame of Average price per Males 
print("Average price per male:",Avg_price_per_Males_df)
females_only_df = Gender_Price_df.loc[Gender_Price_df["Gender"] == "Female", :]
# Data frame of Females only
print(females_only_df)
Avg_price_per_Females_df = females_only_df.mean()
# Data frame of Average price per Females
print("Average price per Female:",Avg_price_per_Females_df)

#creating summary table for Average price per Gender
summary_table_df = pd.DataFrame(
    {"Gender": ["Male", "Female"],
     "Average Price":  ["3.017853", "3.203009"]
     }
)
summary_table_df

Gender  Price
0      Male   3.53
1      Male   1.56
2      Male   4.88
3      Male   3.27
4      Male   1.44
..      ...    ...
775  Female   3.54
776    Male   1.63
777    Male   3.46
778    Male   4.19
779    Male   4.60

[752 rows x 2 columns]
    Gender  Price
0     Male   3.53
1     Male   1.56
2     Male   4.88
3     Male   3.27
4     Male   1.44
..     ...    ...
774   Male   4.19
776   Male   1.63
777   Male   3.46
778   Male   4.19
779   Male   4.60

[624 rows x 2 columns]
Average price per male: Price    3.008542
dtype: float64
     Gender  Price
15   Female   2.89
18   Female   4.90
38   Female   4.18
41   Female   1.33
55   Female   3.79
..      ...    ...
731  Female   1.02
740  Female   3.92
754  Female   4.05
767  Female   4.88
775  Female   3.54

[113 rows x 2 columns]
Average price per Female: Price    3.203009
dtype: float64


Unnamed: 0,Gender,Average Price
0,Male,3.017853
1,Female,3.203009


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [12]:
age_player_username_df = renamed_purchase_data_df[["Player Username","Age"]]
print(age_player_username_df)
# Calculate the earliest/latest age
youngest_age = age_player_username_df["Age"].min()
print("youngest age:",youngest_age)
oldest_age = age_player_username_df["Age"].max()
print("oldest age:",oldest_age)
age_player_username_group_df = age_player_username_df.groupby(["Player Username"])
age_player_username_group_df.count()

# sorting by first name 
age_player_username_df.sort_values("Player Username", inplace = True) 
  
# dropping ALL duplicte values,
# Still need to check why the output below is 414 and for unique usernames as above value = 576
age_player_username_df.drop_duplicates(subset ="Player Username", 
                     inplace = True) 

age_player_username_df

# Creating the bins in which Data will be held
# Bins are 0,9.9,19.9,29.9,39.9,49.9   
bins = [0,9.9,19.9,29.9,39.9,50.0]

# Create the names for the five bins
group_names = ["kid", "Teenager", "Young Adult", "Adult", "Senior Adult"]

age_player_username_df["Age Classification"] = pd.cut(age_player_username_df["Age"], bins, labels=group_names, include_lowest=True)
print(age_player_username_df)

# calculate percentages

Age_Classification_count_df = age_player_username_df["Age Classification"].value_counts()
print(Age_Classification_count_df)
Total_Age_Classification_count = Age_Classification_count_df.sum()
print(Total_Age_Classification_count)

percent_age_classification = Age_Classification_count_df * 100 / Total_Age_Classification_count
print(percent_age_classification,"%")


# Place all of the data found into a summary DataFrame

summary_table_df = pd.DataFrame(
    {"youngest player": [youngest_age],
     "oldest player":   [oldest_age],
     "kid:percent":  ["3 %"],
     "Teenager:percent":  ["22 %"],
     "Young Adult:percent":  ["58 %"],
     "Adult:percent":  ["14 %"],
     "Senior Adult":  ["2 %"]
     }
)
summary_table_df

Player Username  Age
0           Lisim78   20
1       Lisovynya38   40
2        Ithergue48   24
3     Chamassasya86   24
4         Iskosia90   23
..              ...  ...
775      Aethedru70   21
776          Iral74   21
777      Yathecal72   20
778         Sisur91    7
779       Ennrian78   24

[780 rows x 2 columns]
youngest age: 7
oldest age: 45
    Player Username  Age Age Classification
467    Adairialis76   16           Teenager
142     Adastirin33   35              Adult
388          Aeda94   17           Teenager
28           Aela59   21        Young Adult
630       Aelaria33   23        Young Adult
..              ...  ...                ...
125      Yathecal82   20        Young Adult
595      Yathedeu43   22        Young Adult
572   Yoishirrala98   17           Teenager
54       Zhisrisu83   10           Teenager
560       Zontibe81   21        Young Adult

[576 rows x 3 columns]
Young Adult     335
Teenager        129
Adult            83
kid              17
Senior Adult     

Unnamed: 0,youngest player,oldest player,kid:percent,Teenager:percent,Young Adult:percent,Adult:percent,Senior Adult
0,7,45,3 %,22 %,58 %,14 %,2 %


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [10]:
purchasing_Age_price_df = renamed_purchase_data_df[["Player Username","Age","Price"]]
purchasing_Age_price_df

purchasing_Age_price_group_df = purchasing_Age_price_df.groupby(['Age'])
print(purchasing_Age_price_group_df.count().head(10))

purchasing_Age_price_group_mean_df = purchasing_Age_price_group_df["Price"].mean()
purchasing_Age_price_group_mean_df

# Creating a new DataFrame using both duration and count
state_summary_df = pd.DataFrame({"Average Price per Age": purchasing_Age_price_group_mean_df})
state_summary_df.head()







Player Username  Price
Age                        
7                  9      9
8                  8      8
9                  6      6
10                 9      9
11                 7      7
12                 6      6
13                 4      4
14                 2      2
15                35     35
16                30     30


Unnamed: 0_level_0,Average Price per Age
Age,Unnamed: 1_level_1
7,3.654444
8,3.24625
9,3.045
10,3.536667
11,2.684286


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [11]:
# Data frame of Gender and price
#print(Gender_Price_df)

price_avg_per_username = renamed_purchase_group_df["Price"].mean()
print(price_avg_per_username)

# To sort from highest to lowest, ascending=False must be passed in
Descending_price_avg_per_username_df = price_avg_per_username.sort_values("Price",ascending=False)
#freedom_df.head()

Player Username
Adairialis76     2.280000
Adastirin33      4.480000
Aeda94           4.910000
Aela59           4.320000
Aelaria33        1.790000
                   ...   
Yathecal82       2.073333
Yathedeu43       3.010000
Yoishirrala98    4.580000
Zhisrisu83       3.945000
Zontibe81        2.676667
Name: Price, Length: 576, dtype: float64


ValueError: No axis named Price for object type <class 'pandas.core.series.Series'>

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

