aster

Aster is a python based bot (or a module), which is capabale of writing baseline starter kernels for competitions or datasets hosted on Kaggle. As of now, It can work with two types of datasets - numerical dataset (having continuous and / or categorical columns) and text datasets having single text / document field.

Key features

Can create kernels on Compeititon and Datasets both
Can create kernels on datasets with binary / multi classification
Can create kernels on text datasets and numerical datasets
Performs Quick Exploration, Preprocessing, Feature Engineering, and Modelling
Changes the visuals according to data, for example - generates word clouds for text data and pairplots for numerical datasets
Uses a config to create new kernels

How Aster Works

Aster first understands the inputs given in the config by the user and the types of columns present in the dataset. According to this information, aster dynamically chooses the most relevant code / text templates and appends them to the baseline kernel. For example, if the dataset is belongs to text classification category, then aster will generate some wordclouds, will not perform correlation charts, pair plots or categorical variable distributions. While if the dataset is non text classification type, then aster will choose the most relevant templates, for example - distribution of categorical variables, missing value treatments etc.

Detailed table of contents

Aster creates following contents based on the type of data.

Environment Preparation
Quick Exploration
     2.1 Load Dataset
     2.2 Dataset Snapshot and Summary
     2.3 Target Variable Distribution
     2.4 Missing Values
     2.5 Variable Types
     2.6 Variable Correlations
Preprocessing
     3.1 Label Encoding
     3.2 Missing Values Treatment
     3.3 Feature Engineering (text fields)
         3.3.1 TF-IDF Vectorizor
         3.3.2 Top Keywords - Wordcloud
     3.4 Train Test Split
Modelling
     4.1 Logistic Regression
     4.2 Decision Tree
     4.3 Random Forest
     4.4 ExtraTrees Classifier
     4.5 Extereme Gradient Boosting
Feature Importance
Model Ensembling
6.1 A simple Blender
Creating Submission

Useage : example 1

from aster.aster import aster

config = {	"COMPETITION" : "titanic", 
            "_TARGET_COL" : "Survived", 
            "_ID_COL" : "PassengerId"}

ast = aster(config) # aster object with config 
ast._prepare() # prepare the kernel
ast._push() # push the kernel on kaggle

Useage : example 2

from aster.aster import aster

config = {	"COMPETITION" : "spooky-author-identification", 
            "_TARGET_COL" : "author", 
            "_ID_COL" : "id",
            "_TAG" : "doc",
            "_TEXT_COL" : "text"}

ast = aster(config) # aster object with config 
ast._prepare() # prepare the kernel
ast._push() # push the kernel on kaggle

config examples

Aster uses config and its key-value pairs to write kernels on different datasets. All of the keys are not mandatory and most of them are optional. Check the following table.

Key	Example Value	Default	Optional/Mandatory	Definition
DATASET	iris	""	optional	Name of the dataset to be used
COMPETITION	titanic	""	optional	Name of the competition
_TARGET_COL	Survived	""	mandatory	target column name
_ID_COL	PassengerId	""	optional	id column name
_TRAIN_FILE	train	train	optional	name of the train file
_TEST_FILE	test	test	optional	name of the test file
_TAG	doc	num	optional (only for text)	doc : text dataset, num : numerical dataset
_TEXT_COL	text	""	optional (only for text)	name of the column containing text data

Example Kernels generated by Aster

1. Binary Classification on Numerical Data - Competition Data

Titanic Baseline Kernel :

2. Multi Classification on Text Data - Competition Data

Spooky Author Baseline Kernel

3. Classification - Non Competition Data

Iris Dataset
Diabetes Dataset
Mushrooms Dataset

Installation

Aster can be installed directly from github using following commands

git clone https://github.com/shivam5992/aster.git
cd textstat
python setup.py install

Future Work

Dynamic Code Selection Improvements
Add More Content
- Automated Feature Engineering
- Hyperparameter Tuning
Extend Datatypes
- Regression Problems - Numerical Data
- Image Classifiication

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
BaselineKernel		BaselineKernel
aster		aster
.gitignore		.gitignore
example.py		example.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaselineKernel

BaselineKernel

aster

.gitignore

.gitignore

example.py

example.py

readme.md

readme.md

Repository files navigation

aster - a bot to write kaggle baseline kernels

Key features

How Aster Works

Detailed table of contents

Useage : example 1

Useage : example 2

config examples

Example Kernels generated by Aster

1. Binary Classification on Numerical Data - Competition Data

2. Multi Classification on Text Data - Competition Data

3. Classification - Non Competition Data

Installation

Future Work

About

Releases

Packages

Languages

amrrs/aster

Folders and files

Latest commit

History

Repository files navigation

aster - a bot to write kaggle baseline kernels

Key features

How Aster Works

Detailed table of contents

Useage : example 1

Useage : example 2

config examples

Example Kernels generated by Aster

1. Binary Classification on Numerical Data - Competition Data

2. Multi Classification on Text Data - Competition Data

3. Classification - Non Competition Data

Installation

Future Work

About

Resources

Stars

Watchers

Forks

Languages