The Singapore (SG) Kaggle Machine Learning (ML) Challenge Meetup group organized the first Kaggle meetup in Singapore on Jan 9 2018.
In this meeting, attendees formed teams with like-minded data scientists on Kaggle challenges of interest. In the following six weeks, the team will discuss the challenge, form a strategy and implement it. The outcome would then be presented to the audience at the Data Science Evening (our second meeting).
The first meetup is off to a great start.
Here's some photos from the first event:
Updates:
- 2018-06-26:
- SG Kaggle ML Group disbanded.
- 2018-11-12:
- Fix broken link to the group's Meetup.com page.
- Add photos from the first event.
We are Team 12 (BestFitting). Our team consists of:
- Puay Ni Yi (leader)
- Teh Guo Pei
- Cedric Chee
We are tackling the facial keypoints detection as our first Kaggle challenge.
This is a 6-weeks project.
We will use this repo as the central location to host all the tutorials and solutions for the challenge.
Facial Keypoints Detection is a challenge focused on Computer Vision field. The techniques to solve this challenge is usually from Deep Learning and Convolutional Neural Networks (CNN).
The objective of this task is to detect and predict keypoint positions (locations) on face images. To learn more, take a look here.
We are basing our tutorial from Daniel Nouri's blog post.
As we are planning to use TensorFlow for implementing our solution, we will follow this tutorial by Alex Staravoitau. Alex's tutorial was based on the amazing tutorial by Daniel Nouri.
Dependencies/Libraries used:
- nolearn, a scikit-learn wrapper for Lasagne.
- Theano
- scikit-learn
- TensorFlow
- matplotlib
- pandas
- jupyter
- numpy
- Step 1 - install all dependencies:
$ git clone https://github.com/cedrickchee/kaggle-facial-detection.git
$ cd kaggle-facial-detection
$ pip install -r requirements.txt
- Theano
- Error
ValueError: You are tring to use the old GPU back-end. It was removed from Theano. Use device=cuda* now ...
. Solution on how to converting to the new gpu back end(gpuarray).- Either set the environment variable,
THEANO_FLAGS='device=cuda'
or - edit Theano config file,
~/.theanorc
[global] device = cuda
- Either set the environment variable,
- Error
(theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.7 or higher required)
. To resolve this problem, installlibgpuarray
Python library.- In the middle of this process, at the step when you install
pygpu
by running this command, you will encounter new errorModuleNotFoundError: No module named 'Cython'
. Work around this by installing Cython with this command:pip install Cython
$ python setup.py build
- In the middle of this process, at the step when you install
- Error
ImportError: libgpuarray.so.3: cannot open shared object file: No such file or directory
when you try toimport pygpu
. GitHub thread discussing this problem. How to fix shared object file error. Append/usr/local/lib
path toLD_LIBRARY_PATH
in.bashrc
LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
- Error
- nolearn, a sckit-learn wrapper for Lasagne
- Error
ImportError: cannot import name 'downsample'
when trying toimport lasagne
. This can be solved this way. The cause of the problem.
$ pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
- Error
Jupyter Notebook with Cedric's attempts to tackle the competition is in the notebooks folder.
- First model: a single hidden layer
- A very simple neural network (NN).
- Second model: convolutions
- Convolutional neural network (CNN) with data augmentation, learning rate decay and dropout.
- Third model: training specialists
- A pipeline of specialist CNNs with early stopping and supervised pre-training.
- Fourth model:
- ResNet-50 architecture and large scale training with methods from cutting-edge research such as 1cycle policy, super convergence, weight decay, batch normalization, dropout and data transformation.
Ranking on Leaderboard among 175 teams.
Team Member | Private Score | Public Score | Best Model |
---|---|---|---|
Cedric | 1.96686 (26th place) | 2.15043 (16th place) | #3 |
We think that there is a lot of room for improving our leaderboard score as we are still trying out new ideas and developing new techniques from it for our fourth and final model.