INSTRUCTOR
Hunter Jackson
COURSE OUTLINE
Each class will be an engaging, interactive session where we build tools together to make predictions about our data. The classes will be focused on actually building the predictive tools; however, each class will have supplementary lecture notes that describe the methodologies in further detail and extra programming tasks if anyone wants extra practice.
I will be available at the Proscia office in Spark (Suite 300) 1 hour before each course (5 pm), and available intermittently throughout the week in the Data Science channel of Betamore Academy on Slack.
We held the info session on 3/16 at Spark Baltimore, if you missed it, you can find details about the course here
- Class intro: slides
- Class 1 Notes: slides
- A lil Python refresher: code
- A lil Numpy refresher: code
- First shot at working with data: code
- Build our own knn: code
- In-depth machine learning (especially section 2.1): here
- Excellent article on the Bias-variance tradeoff and Andrew Ng's course on Machine learning taught at Stanford
- The UCI ML repo containing data sets for practice
- Conditional probability explained visually. Check it our and play around -- it will help a ton for the next class.
- Class 2 Notes: slides
- Sci-kit learn docs: user guide, module reference, class documentation
- Linear Regression notes
- Logistic Regression notes
- Decision Trees notes
- Javascript tree layout
- Ensemble Learning notes
- Caltech's ML course explaining bias-variance tradeoff
- Sensitivity v. Specificity video
- 30 second explanation of overfitting video
- Take home project and solutions for predicting survival on the Titanic.
- Class 3 Notes: slides
- Cluster analysis code
- Cluster analysis reading
- Limitations of k-means clustering code
- Sklearn guidance on clustering
- Using pandora data to find user clustering with this data and this code
- Twitter sentiment visualization
- Stock Predictions project using 7 months of data including twitter sentiment, volume, and stock price to create a predictive model to predict forward returns.
- Eigenvalues and vectors explained visually. Check it out and play around -- it will help a ton for the next class.
- Class 4 Notes: slides
- Principal component analysis for Iris code
- Setting up PCA code
- Dimensionality reduction notebook
- PCA visually explained
- PCA step-by-step here
- Linear discriminant analysis explained
- Map-reduce code
- What every data scientist needs to know about SQL
- Stanford mini-series on databases
- Practice SQL from browser
- Queries explained
- Work with databases in the context of machine learning code, solutions
- SQL notebook
- NoSQL/MapReduce notebook
- Here's a reading by PaulGraham, cofounder of Y-combinator.
- Class 5 notes: slides
- Text-mining notebook
- NLP code
- Small NLP lab code
- Recommendation code
- Netflix Prize
- Why Netflix never implemented the winning solution here
- Columbia's MOOC on NLP here
- Link to Dr. Eisner's NLP course at Hopkins
- Spacy as a framework for production-ready NLP tools
- Class 6 notes: slides
- Using Pybrain to create a neural network code
- Building our own neural network from scratch code
- Check out Google's Deep Dream Generator here
- Why the hell does Google's DDG hallucinate in dog faces?! here
- Step-by-step backpropagation
- Understanding activation functions
- Get yourself set up with EC2 here
- Tensorflow because everything Google becomes the default eventually
- Installing tensorflow on AWS EC2 with GPU support here
- Course Recap notes: slides
- Tons of additional resources here
- Always stay up to date on kaggle
- Try implementing some of our models from scratch like Joel does here
- Stay tuned at betamore for the next data science class!
- Thank you all so much for this wonderful experience, I had a blast and very much hope you all did as well!