<a href="https://colab.research.google.com/github/Aloncohen41/LOLProject/blob/main/League_Of_Legends_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here are the typical steps involved in a data science project:

1.  **Problem Definition and Understanding:** Clearly define the problem you're trying to solve and understand the goals and objectives of the project. This involves asking the right questions and determining what constitutes success.
2.  **Data Collection:** Gather the necessary data from various sources. This could involve accessing databases, APIs, files, or scraping the web.
3.  **Data Cleaning and Preprocessing:** This is often the most time-consuming step. It involves handling missing values, dealing with outliers, transforming data, encoding categorical variables, and ensuring the data is in a suitable format for analysis.
4.  **Exploratory Data Analysis (EDA):** Explore the data to understand its characteristics, identify patterns, trends, and relationships. This often involves visualizations, summary statistics, and initial hypothesis testing.
5.  **Feature Engineering:** Create new features from existing ones that can improve the performance of a model. This requires domain knowledge and creativity.
6.  **Model Selection and Training:** Choose appropriate machine learning models based on the problem type (e.g., classification, regression, clustering) and train them using the prepared data.
7.  **Model Evaluation:** Assess the performance of the trained models using relevant metrics and techniques (e.g., accuracy, precision, recall, cross-validation).
8.  **Model Deployment:** Integrate the trained model into an application or system so it can be used to make predictions or inform decisions.
9.  **Monitoring and Maintenance:** Continuously monitor the model's performance in production and retrain it as needed to ensure it remains accurate and effective over time.
10. **Communication of Results:** Clearly communicate your findings, insights, and the results of your analysis to stakeholders, often through reports, presentations, or interactive dashboards.

# League of Legends Game Analysis and Churn Proxy Model

## Introduction
This project aims to analyze the provided League of Legends game data to understand factors that might contribute to player churn. We will focus on game balance as a potential indicator, exploring metrics like game duration, kills, and structure destruction.

## Assumptions
* We assume that game balance, as measured by the difference in objectives achieved and game duration, has an impact on player experience and potentially churn.
* We assume the provided data is representative of typical game outcomes and player behavior.
* We assume that a less balanced game (one team significantly outperforming the other quickly) is more likely to lead to a negative player experience and potentially contribute to churn.

## Questions to Explore
1. How does game duration relate to the difference in kills between teams?
2. How does game duration relate to the difference in tower and inhibitor kills?
3. Is there a correlation between game balance metrics and the winning team?
4. Can we create a composite score or feature that represents game balance?
5. How might these game balance features and game duration be used to build a proxy model for player churn?
6. Do specific champion categories affect the player experience?

    a. Do some champion categories lead to more balanced/fun games?
    
    b. Do some champion categories lead to less balanced/less fun games?

In [None]:
from google.colab import drive
import pandas as pd

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Data Cleaning

Before we begin our analysis, we need to clean and prepare our data. This involves loading the datasets, handling unnecessary columns (specifically dropping 'creationTime' and 'seasonId' from the games data), setting appropriate indices, and structuring the champion data so it's ready to be used alongside the game data. In the following cell, we will perform these data cleaning steps.

In [None]:
# Load the champion info JSON file and games CSV file
try:
    champion_info_path = '/content/drive/MyDrive/Data Science Studies/League Of Legends/champion_info_2.json'
    champion_df = pd.read_json(champion_info_path)
except FileNotFoundError:
    print(f"Error: Champion info JSON file not found at {champion_info_path}")
except Exception as e:
    print(f"Error loading Champion info JSON file: {e}")

try:
    games_csv_path = '/content/drive/MyDrive//Data Science Studies/League Of Legends/games.csv'
    games_df = pd.read_csv(games_csv_path)
    # Display columns after loading for debugging
    print("\nColumns in games_df after loading:")
    print(games_df.columns.tolist())
    # Drop specified columns from games_df
    columns_to_drop = ['creationTime', 'seasonId']
    games_df = games_df.drop(columns=columns_to_drop, errors='ignore')
    # Set 'gameId' as the index for games_df
    if 'gameId' in games_df.columns:
        games_df = games_df.set_index('gameId')
    else:
        print("Warning: 'gameId' column not found in games_df. Cannot set as index.")
except FileNotFoundError:
    print(f"Error: Games CSV file not found at {games_csv_path}")
except Exception as e:
    print(f"Error loading Games CSV file: {e}")

# Flatten the 'data' column in champion_df
if 'data' in champion_df.columns:
    champion_df_flattened = pd.json_normalize(champion_df['data'])

    # Sort and set 'id' as index for champion_df_flattened
    if 'id' in champion_df_flattened.columns:
        champion_df_cleaned = champion_df_flattened.sort_values(by='id').set_index('id')

        # Remove the first row of champion_df_cleaned (assuming it's the empty one)
        champion_df_cleaned = champion_df_cleaned.iloc[1:].copy()

        # Reverse the order of columns in champion_df_cleaned
        champion_df_cleaned = champion_df_cleaned[champion_df_cleaned.columns[::-1]]

    else:
        print("Warning: 'id' column not found in flattened champion_df. Skipping sorting and setting index.")
else:
    print("Warning: 'data' column not found in champion_df. Cannot flatten.")

# Display the cleaned champion_df
print("\nCleaned champion_df (first row removed and columns reversed):")
display(champion_df_cleaned.head())

# Display the games_df
print("\nDisplaying games_df:")
display(games_df.head())

# Display descriptive statistics for games_df
print("\nDescriptive statistics for games_df:")
display(games_df.describe())


Columns in games_df after loading:
['gameId', 'creationTime', 'gameDuration', 'seasonId', 'winner', 'firstBlood', 'firstTower', 'firstInhibitor', 'firstBaron', 'firstDragon', 'firstRiftHerald', 't1_champ1id', 't1_champ1_sum1', 't1_champ1_sum2', 't1_champ2id', 't1_champ2_sum1', 't1_champ2_sum2', 't1_champ3id', 't1_champ3_sum1', 't1_champ3_sum2', 't1_champ4id', 't1_champ4_sum1', 't1_champ4_sum2', 't1_champ5id', 't1_champ5_sum1', 't1_champ5_sum2', 't1_towerKills', 't1_inhibitorKills', 't1_baronKills', 't1_dragonKills', 't1_riftHeraldKills', 't1_ban1', 't1_ban2', 't1_ban3', 't1_ban4', 't1_ban5', 't2_champ1id', 't2_champ1_sum1', 't2_champ1_sum2', 't2_champ2id', 't2_champ2_sum1', 't2_champ2_sum2', 't2_champ3id', 't2_champ3_sum1', 't2_champ3_sum2', 't2_champ4id', 't2_champ4_sum1', 't2_champ4_sum2', 't2_champ5id', 't2_champ5_sum1', 't2_champ5_sum2', 't2_towerKills', 't2_inhibitorKills', 't2_baronKills', 't2_dragonKills', 't2_riftHeraldKills', 't2_ban1', 't2_ban2', 't2_ban3', 't2_ban4', 't2_ba

Unnamed: 0_level_0,name,key,title,tags
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Annie,Annie,the Dark Child,[Mage]
2,Olaf,Olaf,the Berserker,"[Fighter, Tank]"
3,Galio,Galio,the Colossus,"[Tank, Mage]"
4,Twisted Fate,TwistedFate,the Card Master,[Mage]
5,Xin Zhao,XinZhao,the Seneschal of Demacia,"[Fighter, Assassin]"



Displaying games_df:


Unnamed: 0_level_0,gameDuration,winner,firstBlood,firstTower,firstInhibitor,firstBaron,firstDragon,firstRiftHerald,t1_champ1id,t1_champ1_sum1,...,t2_towerKills,t2_inhibitorKills,t2_baronKills,t2_dragonKills,t2_riftHeraldKills,t2_ban1,t2_ban2,t2_ban3,t2_ban4,t2_ban5
gameId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3326086514,1949,1,2,1,1,1,1,2,8,12,...,5,0,0,1,1,114,67,43,16,51
3229566029,1851,1,1,1,1,0,1,1,119,7,...,2,0,0,0,0,11,67,238,51,420
3327363504,1493,1,2,1,1,1,2,0,18,4,...,2,0,0,1,0,157,238,121,57,28
3326856598,1758,1,1,1,1,1,1,0,57,4,...,0,0,0,0,0,164,18,141,40,51
3330080762,2094,1,2,1,1,1,1,0,19,4,...,3,0,0,1,0,86,11,201,122,18



Descriptive statistics for games_df:


Unnamed: 0,gameDuration,winner,firstBlood,firstTower,firstInhibitor,firstBaron,firstDragon,firstRiftHerald,t1_champ1id,t1_champ1_sum1,...,t2_towerKills,t2_inhibitorKills,t2_baronKills,t2_dragonKills,t2_riftHeraldKills,t2_ban1,t2_ban2,t2_ban3,t2_ban4,t2_ban5
count,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,...,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0,51490.0
mean,1832.362808,1.493552,1.471295,1.450631,1.308487,0.92651,1.442804,0.731676,114.293397,6.601787,...,5.549466,0.985084,0.414547,1.40437,0.240105,108.216294,107.910216,108.690581,108.626044,108.066576
std,512.017696,0.499963,0.520326,0.542848,0.676097,0.841424,0.569579,0.822526,119.000867,4.025601,...,3.860989,1.256284,0.613768,1.224492,0.427151,102.551787,102.87071,102.592145,103.346952,102.756149
min,190.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0
25%,1531.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,35.0,4.0,...,2.0,0.0,0.0,0.0,0.0,38.0,37.0,38.0,38.0,38.0
50%,1833.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,79.0,4.0,...,6.0,0.0,0.0,1.0,0.0,90.0,90.0,90.0,90.0,90.0
75%,2148.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,136.0,11.0,...,9.0,2.0,1.0,2.0,0.0,141.0,141.0,141.0,141.0,141.0
max,4728.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,516.0,21.0,...,11.0,10.0,4.0,6.0,1.0,516.0,516.0,516.0,516.0,516.0


## Analysis 1: Game Duration and Kill Differences

**Question 1:** How does game duration relate to the difference in kills between teams?

In this section, we will investigate the relationship between the length of a game and the difference in the total number of kills achieved by the two teams. We hypothesize that shorter games might correlate with a larger kill difference, indicating a less balanced match that ended quickly due to one team's dominance in securing kills. Conversely, longer games might show a smaller average kill difference, suggesting a more back-and-forth match. We will analyze the `games_df` to explore this relationship and visualize our findings.

In [None]:
# Get the list of columns in games_df
games_df_columns = games_df.columns.tolist()

# Print the list of columns
print("List of columns in games_df:")
print(games_df_columns)

List of columns in games_df:
['gameDuration', 'winner', 'firstBlood', 'firstTower', 'firstInhibitor', 'firstBaron', 'firstDragon', 'firstRiftHerald', 't1_champ1id', 't1_champ1_sum1', 't1_champ1_sum2', 't1_champ2id', 't1_champ2_sum1', 't1_champ2_sum2', 't1_champ3id', 't1_champ3_sum1', 't1_champ3_sum2', 't1_champ4id', 't1_champ4_sum1', 't1_champ4_sum2', 't1_champ5id', 't1_champ5_sum1', 't1_champ5_sum2', 't1_towerKills', 't1_inhibitorKills', 't1_baronKills', 't1_dragonKills', 't1_riftHeraldKills', 't1_ban1', 't1_ban2', 't1_ban3', 't1_ban4', 't1_ban5', 't2_champ1id', 't2_champ1_sum1', 't2_champ1_sum2', 't2_champ2id', 't2_champ2_sum1', 't2_champ2_sum2', 't2_champ3id', 't2_champ3_sum1', 't2_champ3_sum2', 't2_champ4id', 't2_champ4_sum1', 't2_champ4_sum2', 't2_champ5id', 't2_champ5_sum1', 't2_champ5_sum2', 't2_towerKills', 't2_inhibitorKills', 't2_baronKills', 't2_dragonKills', 't2_riftHeraldKills', 't2_ban1', 't2_ban2', 't2_ban3', 't2_ban4', 't2_ban5']
