## Capstone Prompts

1. Eyeballs, Clicks, & Posts: An Exploration in Marketing: collect and organize data on TKH (& organizations LIKE TKH), and analyze which social media posts result in HIGHEST outside engagement. 

2. Deja Vu: Job Statistics, the Economy, & You: collect and organize data on post-covid tech job postings (not just data analysts!) to figure out the scope of hiring, which salaries are offered, and which skills are in demand. 

3. Guardians of the API Key Vol. 1: Data & Cybersecurity: analyze & collect datasets on web-intrusions, phishing attacks, aws hacks & more to figure out where & how attacks will appear for organizations LIKE TKH. Potentially collab with the cyber track?

4. Data Science in the Multiverse of Tech-Breakthrough: analyze & collect datasets that explore bootcamp-style learning environments. What salary & education outcomes do we find? Potentially collab with the “Job Statistics” group?

Regardless of prompt, the following features must be met for this pipeline:

* A comprehensive & succinct presentation
* A comprehensive README
* A GitHub repository containing your project
* Exploratory Data Analysis
* A local or cloud database
* Extract-Transform-Load Pipeline(s)
* Machine-Learning model(s)
* A Tableau dashboard or website that details the results of your EDA/predictions 
* Evidence of code-review & version-control

More info is described in the [rubric](https://docs.google.com/document/d/1IyWFEgXeTM_hOMutMClWAL8xdISeH-KQR8AUqkcmn1U/edit).

## Types of Projects

There are a number of different structures we can follow to complete a software project.  

This includes:
* Waterfall
* Agile
* Kanban
* SCRUM

One way is not inherently better than the other. Instead each has its own pros & cons.

## Waterfall

Project is complete in planned & consecutive steps. No back-tracking and schedule is prioritized.

Pros:
* Goals are predictable & clear
* Developers are protected from prying clients 
 * “actually, we want feature xyz instead of abc”
* Documentation is complete before-hand
Cons:

* No concurrency
 * “Welp, until the engineers are done, I’m just chilling.”
* Failures are detected late → changes are expensive
* Stakeholders are barred from project development
 * “You made xyz!? I wanted abc!” 

Use-case: predictable projects (e.g. Academic research).

## Agile

Project is complete in iterative & flexible sprints. Backtracking is fine, and client is directly involved. 

Pros:
* Failures are detected early through testing
* Developers receive constant feedback from stakeholders
* Goals are important, but open to modification

Cons:
* Documentation is generated in-the-moment
* Developers are open to prying clients
 * “Do xyz…actually no, do abc…”
* Sometimes no room for large tests
* Goals are constantly unpredictable and shifting

Use-case: unpredictable projects (e.g. developing an app for a startup)

## Kanban

A version of AGILE. We utilize a Kanban board to break down tasks & workflows.

Pros:
* All benefits from Agile Methodology
* Opens communication
* Organizes tasks 
* Creates visual medium to visualize efficiency

Cons:
* All cons from Agile Methodology
* Micromanagement  
* Potentially overwhelming & disorganized
* Updating kanban board is another task 

Use-case: any projects 

## Scrum

A mutated version of Agile. Clients and “product owner” review progress after each sprint. “Scrum master” keeps team focused & protected. 

Pros:
* Daily meetings
* Product is delivered as fast as possible
* Developers are protected from stakeholder
* All benefits of Agile

Cons:
* Daily meetings
* Hierarchical structure → micro-management
* Completely reliant on communication
* All cons of Agile

Use-case: unpredictable or predictable projects under a time crunch

## Our Methodology

The fact of the matter is, teams often do not rely on one methodology. 

Instead we pick and choose certain aspects of different methodologies to best fit our schedule and goals.
Therefore for this capstone, we will utilize a mix of Agile + Waterfall + Kanban. “Agwaban”

* Our end-product is somewhat predictable
* We will use GitHub Project boards as our kanban board
* We will have occasional feedback from stakeholders (TKH staff)
* We will be working in weekly sprints

owever, for resume purposes, this will be classified as just Agile. In fact, this project’s lifecycle is more agile than waterfall.

## Data LifeCycle

At each agile sprint, we will be chipping away at the following steps:
1. Problem definition
 * Defining scope & plan
2. Data investigation & cleaning
 * Find relevant datasets. Extract & transform. EDA.
3. MVP generation
 * Minimal machine learning model & dataset
4 Deployment & enhancement
 * Improved model & documentation. Load & deploy to Tableau.
5 Post-mortem 
 * Final presentation & documentation


## Sprints

A sprint is a dedicated period of time set to complete a subset of a project. 

Instead of trying to  tackle an entire project all at once, we separate out the components into separate time-frames.

Sprints sometime last 1 week, 2 weeks, or a month. In our case, we will take 1 week sprints.

## Sprint 0 Deliverables

Your first deliverables for sprint 0 will be due 3/27.
You will generate a write-up of the following points:
* Background research
 * What work has been done in similar projects already?
* Problem & Goal definition
 * What EXACTLY do you want to find out?
* Define value of project
 * Why bother doing this?
* Plan limitations & risks
 * Techies are natural optimists, what realistically could not work?
* Plan tech stack & materials
 * Which languages/software tools do you plan to use?


## Background Research

For every tech project that we tackle, there is at least one tangentially related open project or research paper.
Make a list of research papers, blog posts, &  learning material that are similar to your project or that give background to the domain you are attempting to predict on. Skim through this material & write down insight that you’ve learned.

## Goal Definition

As technologists, we often shoot for the moon when it comes to making a product. However, we need to temper expectations in order to find success.

Outline the following: 
* Which questions do you want to ask & answer?
* What features will your datasets contain at minimum?
* What predictions will your models make at minimum?
* How will you present your final findings?

## Value

What is the value in answering the questions you want to ask to…
* Your team
* TKH
* Organizations that may or may not be related to TKH (education, gov’t, finance, etc)

## Limitations + Risks

With the goals that you’ve outlined, do you expect any challenges? Can you overcome these challenges in 12 weeks? 
Is this a challenge of resources (time + money + data) or a challenge of complexity?

## Tech Stack + Material

All the tools you’ve reviewed within this program should help you to accomplish your tech project.
* Python
* Pandas
* SQLAlchemy
* Tableau
Do you anticipate to use more software tools/languages? Do you want to learn something new?


## Standups

At the start of every week, each team will take 5-10 minutes to give a status report of:
* Progress
* Roadblocks
* Next Actions + Goals

## Roles

While everyone will be responsible for coding & documentation, each team will have the following roles:
* Product Owner
 * Responsible for ensuring that tasks satisfy specifications, ensures repository organization, presents during stand-ups & presentations.
* Tester
 * Responsible for ensuring that ONLY functional code is accepted to the production branch. Runs automatic tests when available, manually tests when not available. 
* Documenter 
 * Responsible for ensuring that teammates comment their code,  edits, & organizes the README.
* Team Lead
 * Organizes & schedules meetings, creates & delegates tasks, solves merge conflicts, 
