# The Game Goes On
## Profiling PBA players using clustering algorithms

## Executive Summary

Looking at the recent history of the PBA, there is currently a conglomerate dominating the series of the basketball franchise. With only a couple of teams having a shot at the championship, the spectator sport needs to have better competition to ensure continuous and repeat viewership from the audience.,
    The group aims to address the saturation issue in the PBA by answering the following questions:,
    1. How can you ensure the fairness of trade?
    2. How can you balance the team composition during trades or acquisitions?
    The group scraped the Official PBA website, strictly gathering data from the 2017 Governor's Cup, up to the 2019 Governor's Cup, and was able to generate the following findings:
    1. The clusters generated after dimensionality reduction showed dominant features for the player and team data, reflecting a specific dynamic present in the league.
    2. Three teams were identified during clustering for the Commissioner's Cup: Import Reliant teams, Team-play Reliant teams, and Teams that need improvement.
    3. Two teams were identified during clustering for the Governor's Cup: Offensive and Defensive teams
    4. Three teams were identified during clustering for the Philippine Cup: Run and Gun teams, Team-play reliant teams, and Improving teams.
    5. Four Player types were also identified from the clustering, based on all the Cups combined: Star Frontcourt, Star Backcourt, Role players, and Benchwarmers.
    The generated algorithm was then used to analyze a recent trade done in the PBA, with Stanley Pringle being traded for 2 role players and a benchwarmer, making it a justifiable trade. 
    Furthermore, the resulting trade analysis generated recommendations for PBA, especially to balance out different player types per team (Ex. Max of 5 star players for Philippine Cup and 6 for the other 2). 
    To improve on future studies, PBA should improve their website in terms of data management. Historical data for suspended players should also be included, as well as player salaries and even if the player is an import or not. Specifics such as these can further improve on team decision making since salaries also play a big role in the drafting of players from one team to another. 

## Highlights

## Introduction

Basketball is one of the most popular sports in the Philippines. Some may even call it a religion. Which may seem weird given that the average male height in the Philippines is 5’4’’. 
    Founded in 1975, The PBA was the first professional basketball league in Asia and one of the oldest continuously existing basketball organization in the world. Second only to the NBA.
    There are three conferences each year: 
    1. Philippine Cup – only allow Filipinos
    2. The Commissioner’s and The Governor’s Cup both allowing up to 1 import
    3. Knowing this, we need to be able to promote competition in order to promote viewership. One way to start is by properly managing player trades. 

In [8]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from PBA_Fetcher import PBA_Fetcher
from PBA_Consolidator import PBA_Consolidator
from wordcloud import WordCloud, ImageColorGenerator
from sqlalchemy import create_engine
from scipy.cluster.hierarchy import fcluster, linkage, dendrogram
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA, TruncatedSVD
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.metrics import adjusted_mutual_info_score, adjusted_rand_score
from math import pi
from PIL import Image
from IPython.core.display import HTML

In [4]:
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit"
value="Click here to toggle on/off the raw code."></form>''')

In [None]:
HTML("""
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}
</style>
""")

## Methodology

In exploring the data for this report, a total of 1,422 player statistics and 86 team statistics were retrieved from official PBA website. The general workflow for the formation of this report is shown in the figure below.

1. Web Scraping
2. Data Creation
3. Exploratory Data Analysis
4. Dimensionality Reduction
5. Clustering
7. Analysis of Clustering Results

<img src="methodology.png" width="800px" height="100px"/>
<br/>
<center><strong>Figure ##. Methodology</strong></center>

## Data Description

<img src="PBA_ER.png" width="800px" height="100px"/>
<br/>
<center><strong>Figure ##. Database Design</strong></center>

## Data Extraction

<i>Insert Webscraping extraction</i>



The data scraped from the PBA website was stored in a single database. It ensures the developers to have a reliable and standard source of data to use in their analysis.

<img src="pba_website.png" width="1000px" height="142" />
<br/>
<center><strong>Figure ##. PBA Website</strong></center>

The group has developed a python module named PBA_Consolidator, which stores the scraped data from the PBA website into a single database. The module has helped multiple developers in their analysis when adding or removing data without changing multiple jupyter notebooks. Extracting data from various files are condensed into a single function in the module, which also helped maintain the main Jupyter notebook.

In [None]:
# Uncomment and run to create a database
# pba = PBA_Consolidator()
# pba.create_db()

## Data Processing

Before using the data for analysis and clustering, we have performed several data processing techniques to use the data. After consolidating the data scraped from the PBA website into a single database, each table is joined together based on their primary and foreign keys. We have made a separate python module named PBA_Fetcher.py, which contains all the functions used to connect and fetch data from the database. This approach helped the group to analyze different kinds of data available for each table in a scalable manner. Missing values of height and weight are resolved by data imputation using the mean values of the respective features. We have also renamed team names for teams that change their name between 2017 and 2019, replacing the old team's name with the new one. The dataset was also separated by conference and year to investigate further the data. The principal component analysis was also performed to reduce the number of dimensions in the dataset. In reducing the number of dimensions, we can plot each cluster in a 2D plane to improve our analysis for easier visualization.

In [None]:
fetcher = PBA_Fetcher()

## Exploratory Data Analysis

## Representative Clustering

## Hierarchical Clustering

## Conclusion

## Recommendation

## References