Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
ExtendingPython.ipynb
README.md
_fib2.pyf
fib.f
galaxy1000.csv
machine-learning-on-SDSS.ipynb
my_sum.c
my_sum2.c
nth_fib.c
nth_fib.html
nth_fib.pyx
qso10000.csv
qsos.clean.csv
star1000.csv

README.md

Supervised Machine Learning

Day 4 (AstroData Hack Week)


Lecturer: Josh Bloom (UC Berkeley; Wise.io, Inc.)

Lecture Slides (PDF) here

View the IPython notebook here

Outline

  1. What is machine learning?

    • Flavors and facets of machine learning
      • supervised, semi-supervised, clustering, ...
      • classification / regression
    • When to use it, when not to
    • scikit-learn
    • testing/validation sets, cross-validation
    • metrics: ROC, AUC, confusion matrix
  2. Regression

    • Linear regression
    • kNN
    • random forest

    [breakout: predict quasar redshifts from photometric data]

  3. Classification

    • SVM
    • random forest
    • deep learning

    [breakout: predict Star/Galaxy/QSO from photometric data]

  4. Improving your models

    • hyperparameter optimization
      GridSearchCV
    • dealing with missing data
    • Feature selection / feature importance
    • feature engineering

    [breakout: redo Star/Galaxy/QSO from photometric data]

  5. Considerations in getting into production

    • multicore / multimachine
    • scikit-learn pipelines
    • Bigdata machine learning: Graphlab, MLlib (Spark)

Notes/Setup

  1. Make sure you have the latest version (0.15) of scikit-learn

    • conda update scikit-learn
  2. Download some datasets locally

    • TBD

Schedule

Time What Materials
9:00-9:30 Arrival/Caffinate Coffee. Other performance-enhancing drugs.
9:30 - ... TBD

Links

  • Scikit-learn: Machine Learning in Python

  • Josh's lectures on scikit-learn from his graduate seminar class:

You can’t perform that action at this time.