We uset the last.fm 1K dataset and try to predict the user session length using ML, Deep Learning and Panel data techniques and compare them with the state of the art models published in the following papers:
- Audio Ad Quality Prediction
- Predicting Session Length in Media Streaming
- Hierarchical Modeling and Shrinkage for User Session Length Prediction in Media Streaming
This repository has been divided into the following sections:
- Panel Data and Time Series Models : Analysis in R for Panel Data models(completed), Continuous Time model, Dynamic Linear models, Non-Linear Mixed Effects models and the ETL in postgres
- classifier : Analysis of performance of novel approach to session length prediction two-pass decile-classification techniques [WORK IN PROGRESS]
- sequence_to_sequence : This section contains analysis done using various sequence to sequence deep learning techniques.
The feature engineering has been done with
- pyspark - Deep Learning models
- Postgres - R Panel data models
- Postgres + pyspark - ML models