# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Data Science: Capstone Project

The Capstone Project is divided into 5 deliverables, each building on top of skills learned previously to scaffold students' learning over the entire course. Project deliverables include objectives, requirements, rubrics, and suggested resources - all of which tie into the overall competencies for this course.



### **[Capstone, Part 1: Pitch + Problem Statement](./part-01/readme.md)**

Pitch us on potential ideas for a data-driven project. Think of topics you’re passionate about, knowledge you’re familiar with, or problems relevant to industries you’d like to work with. What questions do you want to answer?
- **Requirements:** Lightning talk with 2-3 topics, including a problem statement, potential audience, goals, and success metrics, as well as possible data sources for each. Remember, if you can’t find data, you can’t do your project.
- **Format:** Slide deck
- **Due:**  Week 7, Friday, 16th Dec  


### **[Capstone, Part 2: Dataset + Data Collection](./part-02/readme.md)**

Use your newfound skills to source and collect the relevant data for your project. Data acquisition, transformation, and cleaning are typically the most time-consuming parts of data science projects, so don't procrastinate!

- **Requirements**: Source and format the data for your project. Perform preliminary data munging and cleaning of the data relevant to your project goals.  Describe your data keeping the intended audience of your final report in mind.
- **Format:** Table, file, or database with relevant description in a Jupyter Notebook
- **Due:** V1 End of week 9, Friday, 13th Jan. All data collection completed Monday 16th Jan. 


### **[Capstone, Part 3: EDA + Preliminary Analysis](./part-03/readme.md)**

Begin quantitatively describing and visualizing your data. With rich datasets, EDA can go down an endless number of roads. Maintain perspective on your goals and scope your EDA accordingly. Managing your own time is a critical skill in analysis projects.  Keep notes on your approach, results, setbacks, and findings.

- **Requirements**: Perform initial descriptive and visual analysis of your data. Identify outliers, summarize risks and limitations, and describe how your EDA will inform your modeling decisions.
- **Format:** Jupyter Notebook
- **Due:** End of week 10, Friday, 20th Jan


### **[Capstone, Part 4: Findings + Technical Report](./part-04/readme.md)**

Share your technical findings with your fellow data scientists. Explain your goals, describe modeling choices, evaluate model performance, and discuss results. Data science reporting is technical, but don’t forget that you should tell a compelling story about your data.

- **Requirements**: Summarize your goals and metrics for success, variables of interest, and removal of any outliers or data imputation. Your process description should be concise and relevant to your goals. Summarize statistical analysis, including model selection,  implementation, evaluation, and inference. Be convincing – justify all important decisions! Clearly label plots and visualizations. Include an Executive Summary.
- **Format:** Jupyter Notebook
- **Due:** End of week 11,Friday, 27th Jan 


### **[Capstone, Part 5: Presentation + Non-Technical Summary](./part-05/readme.md)**

Take your findings and share a presentation that delivers the most important insights from your project to a non-technical audience. Tell us the most interesting story about your data. Break down your process for a novice audience. Make sure to include compelling visuals. Time is short, so be sure to practice and include only the most relevant components of your project.

- **Requirements**: Convey your goals, limits/assumptions, methods and their justification, findings, and conclusions. Define technical terms. Include graphics and visualizations.
- **Format:** Interactive graphic presentation, website, or slide deck.
- **Due:** End of week 12, 1 - 3 Feb 

```python

```


# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Part 1: Pitch + Problem Statement

## Overview

In the field of data science, good projects are **practical**. Your capstone project should be manageable and affect a real world audience. This might be a domain you are familiar with, a particular interest you have, something that affects a community you are involved in, or an area that relates to a field you wish to work in.

One of the best ways to test ideas quickly is to share them with others. A good data scientist has to be comfortable discussing ideas and presenting to audiences. That's why for Part 1 of your Capstone project, you'll be preparing a lightning talk on some potential interest areas and datasets.

This deliverable will provide you with guidance to help you select an awesome topic and begin to build a polished Capstone project. 

**Goal**: Host a lightning talk presentation describing *at least two* project proposals, including associated data, goals, audiences, and metrics.

---

## Requirements
1. Prepare a slide deck and host a 3-5 minute lightning talk on **at least two** potential topics for your DSI capstone project. For each topic, define **all** required areas:

2. Topic 1
   - Problem Statement
   - Potential Audience 
   - Goals
   - Success Metrics
   - Data Source(s)
   
3. Topic 2
   - Problem Statement
   - Potential Audience 
   - Goals
   - Success Metrics
   - Data Source(s)

#### BONUS
4. Beyond the two required topics, what other potential topics might you explore? (e.g. 3 or more)
5. For all datasets, identify their source, format, and necessary action items to obtain or access them.
6. Create a blog post of at least 500 words (and 1-2 graphics!) that describes your project idea, data, and audience. Link to it in your presentation appendix.
 
 ***Remember, if you can't find data to support your topic, then you can't move forward.***

---

## Deliverable Format & Submission

- Slide Deck & Presentation

---

## Suggested Ways to Get Started

**Begin by Asking:**
- What is the scope of the need or problem I wish to investigate?
- Who is this for? Who is impacted or affected by this data? Who would benefit from this model?
- What are my goals for this investigation?
- What does success look like? How will I know if my model performs well?
- Where will I find data for this project? Is the data available?

**For the Bonus, Ask:**
- What format is the data in? What specific steps do I need to take to access it?
- How will I explain this project to outside audiences?

**Other Tips:**
- For your 1st potential topic, start with an idea, then look for potential data that could be used to support that idea.
- For your 2nd potential topic, reverse the process; look for interesting data and then extrapolate problems it could solve and audiences it could impact.

---

## Useful Resources

- [How to find the data you need.](http://flowingdata.com/2009/10/01/30-resources-to-find-the-data-you-need/)
- [How to give a good lightning talk.](https://www.semrush.com/blog/16-ways-to-prepare-for-a-lightning-talk/)

---

## Project Feedback + Evaluation

[Attached here is a complete rubric for this project.](./capstone-part-01-rubric.md)

Your instructors will score each of your technical requirements using the scale below:

Score  | Expectations
--- | ---
**0** | _Incomplete._
**1** | _Does not meet expectations._
**2** | _Meets expectations, good job!_



[DSI Projects](https://gallery.generalassemb.ly/DS?metro=)

[Yve's project](https://medium.com/@yves.jacquot/predicting-tornado-magnitude-with-machine-learning-c76df84d7872)

[Ricky's project](https://towardsdatascience.com/another-twitter-sentiment-analysis-bb5b01ebad90)

[Tim's project](https://github.com/timajwilliams/GA_Capstone)

[Toby's project](http://tobyjdore.pythonanywhere.com)

[Sam's project](https://kitsamho.github.io)

[Richard's project](https://towardsdatascience.com/online-poker-whens-the-money-a22fe3fa7a6)

[Sangeetha’s project](https://towardsdatascience.com/should-the-order-of-book-reviews-be-personalised-9d62ebf9ba33)

[Steven’s project](https://towardsdatascience.com/which-translator-870bae18f3bf)

[Veronica’s project](https://towardsdatascience.com/formula-1-race-predictor-5d4bfae887da)

[Abhishek’s project](https://medium.com/@abhiagar/using-insurance-claims-data-to-predict-poor-health-outcomes-dc36fcad7f62)

[Adam's project](adsglass/NGram-ErrorDetection)

[Blae's project](https://github.com/blaequayle/General-Assembly-Capstone)

[Boris' project](https://github.com/boris-gulevich/project-stackoverflow)

[Catriona's project](https://github.com/catriona-reader/Data-Science-Capstone)

[Joseph's project](https://github.com/Joseph-A-Pearson/AirBnb_Analysis)

[Leo's project](https://github.com/LEO-E-100/custom_news)

[Mai's project](https://github.com/mai-u/capstone_project)

[Jewel's project](https://github.com/jewelbritton/Pedestrian-Footfall-Melbourne)
