
# Terms of reference for capstone project


## Capstone Process
- Define your problem statement.
- After articulating your problem statement, outline your goals and success criteria.
- Describe 1-2 potential datasets that address your problem statement. Identify the source and the format of your dataset(s).
- Identify a potential audience of stakeholders who may be interested in your findings.
- Solve your problem!
- Create a 12-15 minute presentation slide deck. This slide deck should be accessible to a wide audience - especially since you'll likely be the only subject-matter expert in the room. However, you'll also want to include details so that we understand your thought process and how well you were able to solve your problem.
- Be prepared to discuss and defend your work... from your choice of dataset to your model-building decisions to your conclusions. They're all fair game!
- Include your slides in your portfolio.
- Create at least one blog post about your findings.

## How to choose a project

Choose something that is:
- Personally interesting
- Challenging
- Involves data sources and thorough EDA
- Uses machine learning
- Has a measurable impact
- Demonstrates knowledge
- Will appeal to employers

## Topic considerations

Whatever you ultimately choose for your capstone is up to you.  The capstone is an opportunity to demonstrate your ability to perform and present the knowledge and growth of twelve weeks of a very intense class.

Consider the capstone also as an opporunity to present to potential employers in job interviews when they ask you to _"tell me about a project that you've done recently"_.  This is the perfect opportunity to talk about your capstone in great detail. Hitting as many topics as possible in your project opens the discussions within a job interview context in many ways:

- It allows an employer to discover your areas of strength.
- It allows you to (hopefully excitedly!) discuss an area of interest.
- It hopefully opens the door for broader conversations about your struggles, problem-solving abilities, and your grit.

Building a neural network model for something obscure and cool may be great, but choosing something that can exercise a broader range of skills is strongly recommended.

- Data collection
- Data munging
- EDA
- Feature engineering
- Modeling / machine learning
- Model evaluation
- Interpretation
- Visualizing and communicating results

Be prepared to discuss why the models you chose make sense, and how the data work with it given the goal.

## The Data

The data for your project is the single most difficult problem you will face.  The potential to tell a story, build predictive models, or even brainstorm, will be dependent on the actual dataset that you will use.

You may not know what you can do until you get a good set of data.  It's a good idea to look around for datasets as early as possible.  Completing your capstone will be entirely contingent upon this data - your data sets the maximum and minimum for what you can achieve.

### Avoid these things:
- "I can't do anything until I scrape more from ______"
- Not having a plan
- Choosing a plan of action that has too many gating factors
- Waiting until the last minute for help

### Do these things:
- Be realistic
- Pare down your scope if scope creep sets in
- Ask for help / advice
- Have backup ideas
- Set explicit timelines with flexibility included

### Presentation Guidelines

When presenting analytical topics to a general audience (including technical), keep the following guidelines in mind:

- State your problem as clearly, as concisely, and as simply as possible.
- Make sure your problem is measurable and framed in factual terms.
- Present simple to complex, always in that order.
- Err on the side of simplicity vs. complexity with design.
- Use slides to milestone your talking points.
- For timing: complex slides will need ~2-3 minutes. Simple slides need 1.
- Title your plots with succinct conclusions.
- Highlight your inference when presenting multiple outputs
  - i.e. point out what’s interesting in your graphs
- Be sure to visualize, but make sure your visualizations make sense!
- PRACTICE! Time yourself. It takes a half hour and will help immensely.
- Do not say "I meant to do this, but I didn't have time" or "Sorry this score is so bad."
- **SUMMARIZE, SUMMARIZE, SUMMARIZE!** Always wrap up your presentation with your problem, your method, and your solution.

## Capstone Milestones  

> #### Materials must be submitted in a clearly labeled Jupyter notebook, including:
> 1. Markdown writeups, code, and visualizations.
> 2. Materials must be submitted via form with link.
> 3. Materials must be submitted by the deadlines specified.

> #### At minimum, one instructor will review each submission (provided things are submitted on time). Depending on timing, we may have time for a "roundtable discussion" among the class for people who. We also encourage you to schedule 1:1 meetings with us to discuss these.


## 1. Capstone Ideas
***
_Week 8: 7:00 p.m., Wednesday, August 9_

This milestone requires you to have a few ideas documented including:

> - Description of goals
> - Reference datasets

## 2. Proposals
***
_Week 10: 7:00 p.m., Wednesday, August 23_

#### Deliverables:

> - Description of project
> - Project goals
> - Hypothesis / Problem statement
> - Literal dataset(s)

## 3. EDA Report
***
_Week 11: 7:00 p.m., Wednesday, August 30_

This will be an overview of your approach with a well-articulated summary that includes your problem statement, outlines your proposed methods and models, defines any risks & assumptions, and includes any revisions from your initial goals & criteria, as needed.

> - Articulate “Specific aim”
> - Outline proposed methods and models
> - Define risks & assumptions
> - Revise initial goals & success criteria, as needed
> - Create local database
> - Describe data cleaning/munging techniques
> - Create a data dictionary
> - Perform & summarize EDA

**Goal**: A summary notebook outlining your project's goals, methods, models, and EDA.

#### Bonus
- Explain how you intend to evaluate your results. What tuning metric(s) and evaluation approaches do you intend to use?
- Identify 1-2 additional datasets that may help to bolster or challenge your findings. How might these relate to your data?
- Create a blog post of at least 500 words (and 1-2 graphics!) that describes your assumptions and processes for EDA. Link to it in your Jupyter notebook.

## 4. Technical First Draft
***
_Week 12: 7:00 p.m., Tuesday, September 5 (day after Labor Day)_

**Your work should document findings for peers and technical stakeholders, including:**

> - Executive Summary
> - Identification of outliers
> - Description of how you defined your variables
> - Discussion of model selection and implementation
> - Description of any data pipeline(s)
> - Visualizations & statistical analysis
> - Interpretation of findings & relation to goals/success metrics
> - Description of any source code used to conduct analysis
> - Stakeholder recommendations & next steps for model/peers

#### Bonus

- Describe how you could continue to validate your model's performance over time
- Explain how you would deploy your model in a production environment
- ** Create a blog post of at least 500 words explaining your overall approach, model implementation, specific analysis, findings, and lessons learned. Link to it in your Technical notebook.**

## 5. Final Capstone Submission
***
_Week 12: 9:00 a.m., Thursday, September 7_

We'll delve more into the explicit deliverables required later, but it will be a polished version of the technical first draft due earlier in the week.