In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

# Datacourse Index

## Daily / Weekly Format:


The main parts of the course are the lecture and miniprojects.

- **Lecture:** We'll work through the course material each day at 12 PM EST.  Friday is interview practice day.

- **Miniprojects:** Each week's action items are due *by Saturday night*.  We had consensus amongst prior Fellows that working on miniprojects gave them the exposure to concepts that a single large project did not.  Fellows that did miniprojects also had a better chance of getting placed.

We encourage the posting of advice/issues/resolved problems on the wiki in the [subpages](https://sites.google.com/a/thedataincubator.com/the-data-incubator-wiki/course-information-and-logistics/course) of the Course section.


### Interview Practice

We'll do a little bit of practice with interviewing on Fridays.  Here are some notes:
- [Programming_Questions](FO_Programming_Questions.ipynb)
- [Statistics_Questions](FO_Statistics_Questions.ipynb)
  
**WARNING:** *It's tempting to just "discuss the problems in a group."  The main point of practicing for interviews is practicing and getting familiar with thinking on your feet with no guidance.  "Discussing problems in a group" is not at all useful for this.  We suggest that everyone should solve the problem on their own, timed (give yourself 5 minutes), preferably while standing.  You want to feel as comfortable with this as possible because you will be much more nervous when you interview.  You can discuss the problems afterwards.*
    

### Capstone Projects


**Ghosting** your project and loading your data:
1. Decide on the basic format of your presentation (website, `ipynb` etc.).  Build a bare-bones version of your website / `ipynb` with some fake data, which are the results you think you'll get.  Your work should tell a story and all the supporting points (there should be at most 3) need to reinforce this story.  Think about how you would present your results.  What kind of graph or visualizations would you need?  Plan your analysis accordingly.
1. Once you know the destination, it's a lot easier to lay the groundwork.  That means setting up SQL tables, getting pandas dataframe code loaded, and joining appropriate tables so that you can start working.

Capstone projects are an important part of the program.  The advantages of doing a capstone project include:
1. Being invited to pitch them to employers at our employer events.
1. Some employers judge Fellows based on projects and this can result in invitations to pitch to employers onsite.
1. In the past, completing projects consistently resulted in multiple (and higher) offers for Fellows.

We're going to be doing 1-2 min screencasts each week.  The format of the video should be roughly:

1. Introduce yourself, say 1 - 2 sentences about your academic background.  ("Hi, I'm Michael.  I got my PhD at Princeton in Applied Math").
2. Introduce your project for The Data Incubator.  Why is it important to businesses?  ("My Data Incubator project uses machine learning to identify the cutest cat videos on the internet").
3. Talk through two to three "analytical" or "engineering" points that were *interesting*.  Great lines feel like this:
    - We looked at 5 million user reviews collected over 4 years on Yelp.
    - I'm using Mongo, a No-SQL data store, to make my webpage load faster.  
    - I used cross validation to train a Random Forest for my `Citibike` demand predictions.
    - I realized that restaurant category tagging was inconsistent and I used a Naive Bayes model on tip text to fill in missing categories.

Be sure to be very visual in your explanations (basically, if it can be graphed, do it).  The format should be you showing off your website (if you have one) and code.

**Sample Screencasts:**
- Here's one that's thematically similar, discussing building a `reddit` recommender: https://www.youtube.com/watch?v=lGXQ8mQMR0s
- Some more from the same Harvard class (CS 109) about [the flu](https://www.youtube.com/watch?v=_eUDtxGzOMo), [predicting salaries](https://www.youtube.com/watch?v=9odQde25oSQ), [predicting NBA games](https://www.youtube.com/watch?v=HlF6eXJ4UgQ), [Twitter response to the Boston Marathon bombing](https://www.youtube.com/watch?v=IbXRxmNn-Jk)

**Resources:**
1. You can use your camera to record the introduction of yourself and your project.
1. You can do a screencast (recording your audio and your desktop as you scroll through demoing things) for the demo portion.  Here are instructions for doing it on [OSX](http://thenextweb.com/apple/2011/01/15/how-to-record-quick-easy-screencast-videos-with-mac-osx/).1. [Recommended Project Structure](Project_Structure.ipynb)

## Overview for Week 1:


**Goal:** Developing familiarity with basic Python tools and getting practice with them for data analysis

- [NumPy and SciPy](../data-wrangling/DW_Numpy_Scipy.ipynb)
- [Matplotlib](../data-wrangling/DW_Matplotlib.ipynb)
- [Pandas](../data-wrangling/DW_Pandas.ipynb)
- [Scraping](../data-wrangling/DW_Scraping.ipynb)
- [APIs and JSON](../data-wrangling/DW_APIs_and_JSON.ipynb)
- [Advanced SQL](../data-wrangling/DW_Advanced_SQL.ipynb)
- [Good Engineering Practice](../data-wrangling/DW_Good_Engineering_Practice.ipynb)

**Optional Topics:**

- [Handling Strings in Python](../data-wrangling/DW_Dealing_with_Strings.ipynb)
- [Iterators and Generators](../data-wrangling/DW_Iterators_Generators_and_Coroutines.ipynb)
- [What Technology to Use](../data-wrangling/DW_What_Technology_to_Use.ipynb)


## Overview for Week 2:


**Goal:** Developing familiarity with machine-learning and the Python tooling around it

- [Intro Machine Learning](../machine-learning/ML_Introduction.ipynb)
- [Overfitting](../machine-learning/ML_Overfitting.ipynb)
- [Regression](../machine-learning/ML_Regression.ipynb)
- [Unsupervised Learning](../machine-learning/ML_Unsupervised_Learning.ipynb)
- [Scikit-learn Workflow](../machine-learning/ML_Scikit_Learn_Workflow.ipynb)

**Optional Topics:**
- [K Nearest Neighbors](../machine-learning/ML_K_Nearest_Neighbors.ipynb)

## Overview for Week 3:


**Goal:** Data Science in Production
- [Bash](../production/PR_Bash.ipynb)
- [git](../production/PR_git.ipynb)
- [SSH](../production/PR_SSH.ipynb)


## Overview for Week 4:


**Goal:** Thinking about and practicing data visualization
- [Visualization Theory](../viz/VZ_Perception_and_Data_Visualization.ipynb)
- [Layout & Design](../viz/VZ_Layout_and_Design.ipynb)
- [Exploratory Visualization](../viz/VZ_Exploratory_Visualization.ipynb)
- [JavaScript in Python](../viz/VZ_JavaScript_In_Python.ipynb)
- [JavaScript Primer](../viz/VZ_JavaScript_Primer.ipynb)
- [JavaScript and the DOM](../viz/VZ_JavaScript_The_DOM.ipynb)
- [Explanatory Visualization with D3](../viz/VZ_D3js_Explanatory_Visualization.ipynb)


### Optional topics

- [Visualizing Large / High Dimensional Data sets](../viz/VZ_Exploratory_Visualization_of_Large_Datasets.ipynb)
- [Advanced D3](../viz/VZ_D3js_Advanced_Topics.ipynb)

**Action Item (ungraded):** Update your 12-day project to include real-time interactivity such as sliders or drop-down menus. Feel free to use different data sets (e.g. Yelp reviews) to make something interesting and attractive.

## Overview for Week 5:


**Goal:** More advanced topics in machine-learning and statistics.

- [Support Vector Machines](../advanced-machine-learning/AM_Support_Vector_Machines.ipynb)
- [Decision Trees and Random Forests](../advanced-machine-learning/AM_Decision_Trees_and_Random_Forests.ipynb)
- [Choosing ML Algorithms](../advanced-machine-learning/AM_Choosing_ML_Algorithms.ipynb)
- [Natural Language Processing](../advanced-machine-learning/AM_Natural_Language_Processing.ipynb)
- [Time Series](../advanced-machine-learning/AM_Time_Series.ipynb)

**Optional Topics:**

- [Outlier Detection](../advanced-machine-learning/AM_Outlier_Detection.ipynb)
- [Recommendation Engines](../advanced-machine-learning/AM_Recommendation_Engines.ipynb)
- [Unbalanced Classes](../advanced-machine-learning/AM_Unbalanced_Classes.ipynb)
- [Digital Signals](../advanced-machine-learning/AM_Digital_Signals.ipynb)

## Overview for Week 6:


**Goal:** Introduce Spark and the Python/Scala APIs; get a feel for a production data workload
- [Intro to PySpark](../spark/SP_Python_API.ipynb)
- [PySpark DataFrames](../spark/SP_Python_DataFrames.ipynb)
- [PySpark MLlib](../spark/SP_Python_MLlib.ipynb)
- [Creating Spark Applications](../spark/SP_Creating_Applications.ipynb)
- [Spark Advanced Topics](../spark/SP_Python_Advanced_Topics.ipynb)


### Scala versions

- [Scala Primer](../spark/SP_Scala_Primer.ipynb)
- [Intro to Scala Spark](../spark/SP_Scala_API.ipynb)
- [Scala DataFrames](../spark/SP_Scala_DataFrames.ipynb)
- [Scala MLlib](../spark/SP_Scala_MLlib.ipynb)


### Optional topics

- [Intro to SparkR](../spark/SP_R_API.ipynb)
- [Spark Streaming](../spark/SP_Scala_Streaming.ipynb)
- [PySpark Analysis Demo](../spark/SP_Python_Tweet_Analysis.ipynb)
- [Scalding](../spark/SP_Scalding.ipynb)

## Overview for Week 7:


**Goal:** Thinking more broadly about data and interview prep
- [Hypothesis Testing](../thinking-outside-data/DB_Hypothesis_Testing.ipynb)
- [Thinking Outside the Data](../thinking-outside-data/DB_Thinking_Outside_the_Data.ipynb)
- [Case Studies](../thinking-outside-data/DB_Case_Studies.ipynb)

**Optional Topics:**
- [Algorithms and Data Structures](../thinking-outside-data/DB_Algorithms_and_Data_Structures.ipynb)
- [Statistics](../thinking-outside-data/DB_Statistics.ipynb)
- [Personal Interview Questions](../thinking-outside-data/DB_Personal_Interview_Questions.ipynb)

## Overview for Week 8:


**Goal:** Deep learning and TensorFlow
- [Intro to TensorFlow](../tensorflow/TF_Intro_to_Tensorflow.ipynb)
- [Iterative Algorithms](../tensorflow/TF_Iterative_Algorithms.ipynb)
- [Basic Neural Networks](../tensorflow/TF_Basic_Neural_Networks.ipynb)
- [Deep Neural Networks](../tensorflow/TF_Deep_NN.ipynb)
- [Convolutional Neural Networks](../tensorflow/TF_Convolutional_NN.ipynb)
- [Recurrent Neural Networks](../tensorflow/TF_Recurrent_Neural_Networks.ipynb)


### Optional Topics:

- [Variational Autoencoders](../tensorflow/TF_Variational_Auto_Encoders.ipynb)
- [Adversarial Noise](../tensorflow/TF_Adversarial_Noise.ipynb)
- [DeepDream](../tensorflow/TF_DeepDream.ipynb)
- [Optimization](../tensorflow/TF_Optimization.ipynb)


*Copyright &copy; 2019 The Data Incubator.  All rights reserved.*