Skip to content
Kaggle machine learning competition code for BADS M.Sc. course: Humboldt Universität zu Beriln
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
final_code
modified_features
predictions
repository_media
test_cleaners
.gitignore
BADS_WS1819_known.csv
BADS_WS1819_unknown.csv
README.md

README.md

HU Kaggle Competition – Clothing Returns Prediction

For the Business Analytics and Data Science course at the chair of Information Systems, Humboldt-Universität zu Berlin, all student participated in an in-class data science competition on Kaggle. The target was to predict customers who would return their order (binary classification) with 150k rows of labelled data given as training data. For this task, machine learning algorithms such as Random Forest, XGBoost, Early-Stopping Neural Networks, were utilised, along with selective heterogenous ensembling for final score improvements. In addition to achieving a high AUC score (Kaggle scoring criteria), there was the additional task of using a profit sensitive model to take into account asymmetric error costs and thus move from predictive to prescriptive modelling.

The real-world tradeoff handled by this modeling is then between:

  • Falsely determining a customer to be a non-returner, selling to someone who does indeed return their item / order.

  • Being overly conservative and not making sales to incorrectly classified customers who would have otherwise kept their items (and possibly been a return customer).

Final code is in the 'final_code directory' and is separated between the data cleaning script and the modeling script. All code was written in Python within the Atom / Hydrogen Jupyter Notebook environment.

My efforts earned me position 4 of 118 in the class.

Kaggle Competition

You can’t perform that action at this time.