You are provided with a dataset containing information about users playing video games. The goal is to predict whether a user will churn (i.e., stop playing the game).
Avg session time by churn 
churn by Gender 
Game genre vs Games played 
**total play time vs chur ** 
-
Converting categorical values into numerical values:-
- The column 'subscription_status' with values 'yes'or 'no' is converted into 1 and 0 respectively
- One Hot encoding is done on the following columns game_genre','device_type','gender','favorite_game_mode' and first values of each column is removed so that it doesn't fall in dummy variable trap
-
Feature Engineering:- Two new feature 'Play intensity' and 'interaction_per_session' are created on basis of existing data
-
Remove redundant columns:- Redundant columns are droped
-
Column transformation:- I read on paper where it suggested that dataset should only consist of continuous columns or discrete columns and converting dataset to either of them gives better result. So I converted all the continuous columns into discrete columns using the formula suggested in the paper $$[ N = \frac{x_{\text{min}} + x_{\text{max}}}{2} ]
[ \text{discrete_value} = \begin{cases} \lfloor \frac{(x - x_{\text{min}}) \cdot N}{x_{\text{max}} - x_{\text{min}}} \rfloor & \text{if } x_{\text{max}} \neq x_{\text{min}} \ 0 & \text{otherwise} \end{cases} ]$$ Discrete Value = ⌊ (Value - min) * ((max + min) / 2) / (max - min) ⌋
Decision Tree with pruing is used for training beacause it is one of the best algorithm for non-linear classifications and it was faster than model with similar acuracy Accuracy - 68.79%
- The Dataset is provided is not a very quality dataset because none of columns are any way correlated to target variable this is one of the reason why i used Decision Tree, and this is reason why no algorithm could cross the benchmark of 68.79%
- The dataset is also imbalance with approx 69-31% i tried over sampling and under sampling but it doesn't help with the accuracy

- Accuracy Of different Model :-
- Decision Tree with pruing- 68.79%
- Decision Tree without pruing- 57.65%
- Random Forest Classifier- 68.33%
- AdaBoost Classifier- 68.13%
- Bagging Classifier- 64%
- XGBoost Classifier- 64.26%
- Logistic Regression- 68.59%
- Naive Bayes(Gaussian)- 67.46%
- Deep Learning(architecture- 128-256-64-32)- 68.79%
- Deep Learning(architecture- 64-128-32)- 68.59%
- Percepton- 56.85%
- H2OAutoML- 51%
- TPOT Classifier- 63.26%