# Pandas Homework - Pandas, Pandas, Pandas

## Background

The data dive continues!

Now, it's time to take what you've learned about Python Pandas and apply it to new situations. For this assignment, you'll need to complete **one of two** (not both)  Data Challenges. Once again, which challenge you take on is your choice. Just be sure to give it your all -- as the skills you hone will become powerful tools in your data analytics tool belt.

### Before You Begin

1. Create a new repository for this project called `pandas-challenge`. **Do not add this homework to an existing repository**.

2. Clone the new repository to your computer.

3. Inside your local git repository, create a directory for the Pandas Challenge you choose. Use folder names corresponding to the challenges: **HeroesOfPymoli** or  **PyCitySchools**.

4. Add your Jupyter notebook to this folder. This will be the main script to run for analysis.

5. Push the above changes to GitHub or GitLab.

## Option 1: Heroes of Pymoli

![Fantasy](Images/Fantasy.png)

Congratulations! After a lot of hard work in the data munging mines, you've landed a job as Lead Analyst for an independent gaming company. You've been assigned the task of analyzing the data for their most recent fantasy game Heroes of Pymoli.

Like many others in its genre, the game is free-to-play, but players are encouraged to purchase optional items that enhance their playing experience. As a first task, the company would like you to generate a report that breaks down the game's purchasing data into meaningful insights.

Your final report should include each of the following:

### Player Count

* Total Number of Players

### Purchasing Analysis (Total)

* Number of Unique Items
* Average Purchase Price
* Total Number of Purchases
* Total Revenue

### Gender Demographics

* Percentage and Count of Male Players
* Percentage and Count of Female Players
* Percentage and Count of Other / Non-Disclosed

### Purchasing Analysis (Gender)

* The below each broken by gender
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value
  * Average Purchase Total per Person by Gender

### Age Demographics

* The below each broken into bins of 4 years (i.e. &lt;10, 10-14, 15-19, etc.)
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value
  * Average Purchase Total per Person by Age Group

### Top Spenders

* Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
  * SN
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value

### Most Popular Items

* Identify the 5 most popular items by purchase count, then list (in a table):
  * Item ID
  * Item Name
  * Purchase Count
  * Item Price
  * Total Purchase Value

### Most Profitable Items

* Identify the 5 most profitable items by total purchase value, then list (in a table):
  * Item ID
  * Item Name
  * Purchase Count
  * Item Price
  * Total Purchase Value

As final considerations:

* You must use the Pandas Library and the Jupyter Notebook.
* You must submit a link to your Github/Git Lab repo that contains your Jupyter Notebook.
* You must include a written description of three observable trends based on the data.
* See [Example Solution](HeroesOfPymoli/HeroesOfPymoli_starter.ipynb) for a reference on expected format.

## Hints and Considerations

* These are challenging activities for a number of reasons. For one, these activities will require you to analyze thousands of records. Hacking through the data to look for obvious trends in Excel is just not a feasible option. The size of the data may seem daunting, but pandas will allow you to efficiently parse through it.

* Second, these activities will also challenge you by requiring you to learn on your feet. Don't fool yourself into thinking: "I need to study pandas more closely before diving in." Get the basic gist of the library and then _immediately_ get to work. When facing a daunting task, it's easy to think: "I'm just not ready to tackle it yet." But that's the surest way to never succeed. Learning to program requires one to constantly tinker, experiment, and learn on the fly. You are doing exactly the _right_ thing, if you find yourself constantly practicing Google-Fu and diving into documentation. There is just no way (or reason) to try and memorize it all. Online references are available for you to use when you need them. So use them!

* Take each of these tasks one at a time. Begin your work, answering the basic questions: "How do I import the data?" "How do I convert the data into a DataFrame?" "How do I build the first table?" Don't get intimidated by the number of asks. Many of them are repetitive in nature with just a few tweaks. Be persistent and creative!

* Expect these exercises to take time! Don't get discouraged if you find yourself spending  hours initially with little progress. Force yourself to deal with the discomfort of not knowing and forge ahead. Consider these hours an investment in your future!

* As always, feel encouraged to work in groups and get help from your TAs and Instructor. Just remember, true success comes from mastery and _not_ a completed homework assignment. So challenge yourself to truly succeed!

* Ensure your repository has regular commits (i.e. 20+ commits) and a thorough README.md file

### Copyright

© 2021 Trilogy Education Services, LLC, a 2U, Inc. brand. Confidential and Proprietary. All Rights Reserved.


# Heroes of Pymoli 
### Andrew Anastasiades | @andrew-ana

In [2]:
## DEPENDENCIES
import pandas as pd #File IO and Data Manipulation
import os #OS agnostic file structure

In [3]:
## FILE PATHS
purchase_filename = os.path.join("Resources", "purchase_data.csv")

In [8]:
## INITIALIZE DATAFRAME FROM FILE
raw_df = pd.read_csv(purchase_filename)
df = raw_df.copy() #I will work with a copy so I can compare changes to original df

In [187]:
## INSPECT DATA
#df.describe()
df.head(10)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Group
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,"[18, 22)"
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,"[38, 42)"
2,2,Ithergue48,24,Male,92,Final Critic,4.88,"[22, 26)"
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,"[22, 26)"
4,4,Iskosia90,23,Male,131,Fury,1.44,"[22, 26)"
5,5,Yalae81,22,Male,81,Dreamkiss,3.61,"[22, 26)"
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18,"[34, 38)"
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67,"[18, 22)"
8,8,Undjask33,22,Male,21,Souleater,1.1,"[22, 26)"
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58,"[34, 38)"


In [43]:
## PLAYER ANALYSIS
num_players = df['SN'].nunique() #Unique SN

In [50]:
## PURCHASING ANALYSIS (TOTAL)
num_items = df['Item ID'].nunique() #Unique SN
avg_price = df['Price'].mean()
num_purchases = len(df) #Each row is a purchase
rev_total = df['Price'].sum()

In [95]:
## GENDER DEMOGRAPHICS
gender_group = df.groupby(by=['Gender']) #Group By Gender
gender_num = gender_group['SN'].nunique() #Shows Male, Female and Other
gender_percent = gender_num/num_players #Divide by Unique Players

In [114]:
## PURCHASING ANALYSIS (GENDER)
gender_purchases = gender_group['SN'].count() #count each row in each group
gender_average_price = gender_group['Price'].mean() #Average
gender_revenue_total = gender_group['Price'].sum() #Subtotal
gender_player_LTV = gender_revenue_total / gender_num #LTV = group rev / group size

In [174]:
## AGE DEMOGRAPHICS
#First Prepare the Bins and Group
age_bin_max = round((max(df['Age'])-10) / 4)+1#MATH = How many bins will need? 
age_bins = [0,10] + [10+i*4 for i in range(1,age_bin_max)] #Make my bins
df['Age Group'] = pd.cut(df['Age'], age_bins, right=False) #Add 'Age Group' Column
age_groups = df.groupby(['Age Group']) #Groupby 'Age Group'
#Analyze
age_num = age_groups['SN'].nunique() #How many people in each group?
age_purchases = age_groups['SN'].count() # How many purchases
age_average_price = age_groups['Price'].mean() #Average purchase price
age_revenue_total = age_groups['Price'].sum() #Sum of Prices
age_player_LTV = age_revenue_total / age_num #LTV = group rev / group size

In [199]:
## TOP SPENDERS
player_group = df.groupby('SN')#Want unique players
top_spenders = pd.DataFrame()#I'm going to make a DataFrame to house my statistics
top_spenders['Total Purchases'] = player_group['Price'].sum()
top_spenders['Purchase Count'] = player_group['Price'].count()
top_spenders['Average Purchase Price'] = top_spenders['Total Purchases'] / top_spenders['Purchase Count']
top_spenders = top_spenders.sort_values('Total Purchases', ascending=False)
top_5_spenders = top_spenders.iloc[0:5,:] # Just get the top 5

In [209]:
## MOST POPULAR ITEMS
item_group = df.groupby(['Item ID', 'Item Name','Price']) #Want Unique Items
pop_items = pd.DataFrame()#I'm going to make a DataFrame to house my statistics
pop_items['Purchase Count'] = item_group['SN'].count()

In [201]:
## HIGHLIGHTS
#Total
num_players
num_items
avg_price
num_purchases
rev_total
#Gender
gender_num
gender_percent
gender_purchases
gender_average_price
gender_revenue_total
gender_player_LTV
#Age
age_num
age_purchases
age_average_price
age_revenue_total
age_player_LTV
#Players
top_5_spenders
#Items


Unnamed: 0_level_0,Total Purchases,Purchase Count,Average Purchase Price
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,18.96,5,3.792
Idastidru52,15.45,4,3.8625
Chamjask73,13.83,3,4.61
Iral74,13.62,4,3.405
Iskadarya95,13.1,3,4.366667


In [195]:
top_spenders

Unnamed: 0_level_0,Total Purchases,Purchase Count,Average Purchase Price
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,18.96,5,3.792000
Idastidru52,15.45,4,3.862500
Chamjask73,13.83,3,4.610000
Iral74,13.62,4,3.405000
Iskadarya95,13.10,3,4.366667
...,...,...,...
Ililsasya43,1.02,1,1.020000
Irilis75,1.02,1,1.020000
Aidai61,1.01,1,1.010000
Chanirra79,1.01,1,1.010000


In [210]:
pop_items

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Purchase Count
Item ID,Item Name,Price,Unnamed: 3_level_1
0,Splinter,1.28,4
1,Crucifer,1.99,1
1,Crucifer,3.26,3
2,Verdict,2.48,6
3,Phantomlight,2.49,6
...,...,...,...
178,"Oathbreaker, Last Hope of the Breaking Storm",4.23,12
179,"Wolf, Promise of the Moonwalker",4.48,6
181,Reaper's Toll,1.66,5
182,Toothpick,4.03,3
