## Concept: Jupyter Gantt project Planning
In the 2020-04-15 Standup meeting, one of the groups mentioned using a Jupyter notebook to organize their Gantt charts using Plotly! That's pretty cool so we thought we'd try it with the sample they had!

There are definitely some drawbacks - It plots but doesn't actually plan or organize things for you based on Dependencies! I could write a function to do all of that, which would be cool! But it's not what we're here to do today, so I'll try and avoid doing that!

For a more complete Plan, check the [Component Spec] @ (./component_specs_full.md)
#### THEREFORE, I Propose we use these to plan and visualize, but we will need a different way to create and plan tasks! 
__*(Github, Asana, or Trello? I like Scrum boards with checklists)*__

Helpful Links:
 * https://plotly.com/python/gantt/
 * https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html

In [None]:
import plotly.figure_factory as ff

In [69]:
# Once you've run the WHOLE Jupyter Notebook, run this again here, to plot the Sprint plans at the top!
try:
    time_fig.show()
    sprint1_fig.show()
    sprint2_fig.show()
    sprint3_fig.show()
except NameError:
    pass

In [58]:
# Editing for the Master! Eveyone choose their favorite Color scheme!
team_colors = dict(TEAM = 'rgb(30,30,30)',
              David = 'rgb(10, 120, 10)',
              Maria = 'rgb(114, 44, 121)',
              Moeez = 'rgb(198, 47, 105)',
              NA = 'rgb(200, 200, 200)')

## Project Breakdown: Design from Final Vision Specification
For each of the Three-Sprints and the Final Product, describe the <span style="color: red;">REQUIREMENTS</span>,
<span style="color: blue;">GOALS</span>, and
<span style="color: green;">STRETCH-GOALS</span>!

*Question as I'm writing this: Should I be listing the Sprint focus (aka the "New Items", or describe the Total Status? That's just like asking whether we are planning from "Start$\Rightarrow$Finish" or "FinalProject$\Rightarrow$InitialWork"... I think the right answer is to have the FULL spec up front and then break down the sprint deliverables from there.

Make sure to hit upon each of the following topics in each update:
 * Documentation and Usability (Includes CI, Docs, Tutorials, Testing, and Package Downloadability
 * Data Types + Formats Supported
 * Method(s) of Transforming data into images (mathematical AND plot manipulation)
 * Method(s) of setting up and executing CNN Training model
 * Format of Results
 * Expected performance quality (Accuracy? RunTime?) 
 * What types of "Data Logic", "Smart Programming", or "Learning" do we execute?

### Final Product Specification:
The final project will follow our team's Vision statement, with the possibility of any Stretch-goals or any changes we make along the way to the scope based on what we learn as we go. 

##### Documentation: 
 * Fully functional package, with tutorials guiding user to set up the program with whatever data they have
 * Hopefully get Test Coverage into the high 80's (coverage of some UI functions will not be able to Travis?)
##### Data Supported:
 * At the end of the project, work on supporting as many data types as reasonable.
 * Possibly add User-Interaction or Configuration-File ability to most efficiently use new setups.
##### Transformations on Data:
 * Expansive list of Transformations including 1D, 2D, and Domain-Changes (aka Fourier or Laplace)
 * Also a list of known combined-transformations (such as Integral of Log-transformed data) for even more possible options
 * Option to Continue trying new (at random, ordered by complexity, or otherwise) transformations until performance gets to a certain metric
 * <span style="color: green;">[Stretch]</span> Incorporate "Smart" type feedback and configuration to perform the most-likely-to-be-valuable transformations. This may include choices based on knowledge of prior runs, or based on some immediate pre - analysis of the data.
##### Transformation into Images:
 * Ability to modify image size (pixel count) as desired, for either better Time-Performance or Data-Completeness (lossless?) which serve two very different use-cases
 * Ability to setup and configure the "complexity" of images you will use, from 1D data (single transformations each for X and Y?) to 8-dimension RGB/CMYK convolutions. (Partially <span style="color: green;">[Stretch]</span>)
##### Setup and Execution of CNN
 * CNN setup using learnings (made by team during scope of term) of 'probably best' parameters.
 * Set of CNN routines to optimize hyperparameters
 * Option of whether to optimize HParams for each transformation, or to choose one optimization for all. (aka control over loop nesting). May enforce defaults based on team learnings
 * <span style="color: green;">[Stretch]</span> Choose Initial HyperParams AND Strategy automatically based on factors about the data such as Size, Transform, etc
##### Result Format / Interface
 * Save Best-Performing model(s? n#?) to be loaded and used later __Note: need a function for loading + using models.__
 * Log each run with runtime, performance, and user parameters (for repeatability)
 * Produce Data-Report for run with the top (n#) performing model runs / images, each of their performances, and their parameters + Data-Transformations
 * Produce Transformation-Report, with the top (n#) of top-associated transformations and their positions/convolutions, which may be useful for user discoveries based on their data
 * <span style="color: green;">[Stretch]</span> Produce Image-file of the "Black Box's Insides" - I forget what this is called, but it featuremaps what is the most important filters that it found.
##### Expected Performance Goals
 * With options set to minimize time, want to be able to fully train a dataset of 10,000 files in <1hr, with 90% accuracy __(NOTE   :   I CHOSE THESE NUMBERS AT RANDOM. PLEASE RE-EVALUATE)__
 * With options set to maximize quality and thorough transformations, can run for a roughly USER-SET amount of time, and will continue to try new transformations And/Or optimizations until that time.
 * If we get access to high computing power resources, we should be able to optimize settings to take advantage of that, and routinely acheive >99% accuracy in less than a day of runtime?
##### "Smart" Logic and Meta-Learning
 * None in main scope at this time

In [49]:
sprint_timeline = [
      dict(Task='Sprint 0',     Start='2020-04-02', Finish='2020-04-24', Resource='Planning'),
      dict(Task='Sprint 1',     Start='2020-04-17', Finish='2020-05-12', Resource="Core_Programming"),
      dict(Task='Rev+Plan 1-2', Start='2020-05-11', Finish='2020-05-15', Resource="Planning"),
      dict(Task='Sprint 2',     Start='2020-05-15', Finish='2020-05-26', Resource="Data_pretreatment"),
      dict(Task='Rev+Plan 2-3', Start='2020-05-25', Finish='2020-05-29', Resource="Planning"),
      dict(Task="Sprint 3",     Start='2020-05-29', Finish='2020-06-09', Resource='Core_Programming'),
      dict(Task='BUFFER', Start='2020-06-07', Finish='2020-06-14', Resource="Planning"),
      dict(Task="V1.0 Release + BUFFER", Start='2020-06-14', Finish='2020-06-20', Resource='Final_outcome')
      ]

colors = dict(Planning = 'rgb(46, 137, 205)',
              Data_pretreatment = 'rgb(114, 44, 121)',
              Core_Programming = 'rgb(198, 47, 105)',
              Final_outcome = 'rgb(58, 149, 136)')

sprint_timeline.reverse()  # I prefer to index based on 
time_fig = ff.create_gantt(sprint_timeline, colors=colors, index_col='Resource', bar_width=0.4,
                      show_colorbar=True, title="Overall Project Sprint Timeline")
time_fig.show()

### SPRINT 1: April 17th $\Rightarrow$ April 28th
The final project will follow our team's Vision statement, with the possibility of any Stretch-goals or any changes we make along the way to the scope based on what we learn as we go. 

##### Documentation: 
 * Setup Travis.CI in github repo and have it Passing (at some point)
 * Setup ReadTheDocs in repo
 * Finish [Functional](https://en.wikipedia.org/wiki/Functional_specification) and Component Specifications 
<span style="color: green;">(w/ Optional Diagrams)</span>
 * Set up all expected .py files with all functions in the Functional Spec (they can be empty, but add Docstrings whenever possible to explain them!)
 * Document DATA-FLOW strategy to show which packages are intended to wrap/use the others, and set them up to import eachother as needed
##### Data Supported:
 * Can interperet folder-structure as classifier names (ignore parsing filenames for now)
 * Starting with 2-column CSV files with only "Row 0" column headings.
    * Use Functions to generate that data for simple arethmatic f(n)s - ie model distinguishing between line and sin()
 * Can import data from whatever source we are generating "Small Angle Scatter" data (TBD)
 * Test both of these formats to ensure that they work
##### Transformations on Data:
 * Program the listed 1-D functions
 * write f(n)s to perform each of these and hand off to image processing
##### Transformation into Images:
 * Can prepare images in the format that the CNN expects them *(I'm not sure what fn to put in charge)*
##### Setup and Execution of CNN
 * CNN setup as built on infrastructure from last term
 * List of Hyperparameters and some ability to cycle through and optimize
 * Ability for user to "control" the duration of the optimization (of course, with performance tradeoffs)
##### Result Format / Interface
 * Function for saving the model (optional, or "If Better Than ##%")
 * Report for run, list of model IDs, transformations, and hyperparameters used (maybe save as csv)
##### Expected Performance Goals
 * Main goal is to have something that runs without taking too long (focus on speed and code here, complexity is only going to get harder!)
 * If we understand "Why" we have the performance we do, that's all the better! I would hope that a simple Linear vs Sin() classifier should be able to acheive high accuracy though! 
##### "Smart" Logic and Meta-Learning
 * Come up with ideas that we might In-Scope if we have time!

In [68]:
"""
SPRINT #1 PLAN for H.A.R.D.y Project

"""
# Each of these should be one "topic" 
admin_tasks = [
      dict(Task='*DOCUMENTATION*               ' , Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task='Get TRAVIS working', Start='2020-04-17', Finish='2020-04-21', Owner="Moeez"),
      dict(Task='Setup ReadTheDocs', Start='2020-04-17', Finish='2020-04-21', Owner="Maria"),
      dict(Task='Setup F Spec Document', Start='2020-04-17', Finish='2020-04-19', Owner="David"),
      dict(Task='.py Files + Docstrings', Start='2020-04-19', Finish='2020-04-21', Owner="TEAM"),
      dict(Task='Mid-Sprint Advisor Meeting', Start='2020-04-27', Finish='2020-04-29', Owner="TEAM"),
      dict(Task='STATUS UPDATE: Docs', Start='2020-05-10', Finish='2020-05-12', Owner="David"),
      dict(Task='Tutorial for Sprint-Report', Start='2020-05-07', Finish='2020-05-12', Owner="TEAM"),
      dict(Task=' ', Start='2020-05-12', Finish='2020-05-12', Owner="NA")
      ]
data_tasks = [
      dict(Task='*DATA MANAGEMENT*               ', Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task='Write Funcitonal Specs' , Start='2020-04-19', Finish='2020-04-21', Owner="NA"),
      dict(Task='F(n)s to generate Math-Data' , Start='2020-04-20', Finish='2020-04-22', Owner="NA"),
      dict(Task='F(n)s to generate SAS Data?' , Start='2020-04-28', Finish='2020-05-04', Owner="Maria"),
      dict(Task='F(n)s to read files + format' , Start='2020-04-21', Finish='2020-05-06', Owner="NA"),
      dict(Task=' ', Start='2020-05-12', Finish='2020-05-12', Owner="NA")
      ]
transform_tasks = [
      dict(Task='*TRANSFORMATION*               ' , Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task='Write Funcitonal Specs' , Start='2020-04-19', Finish='2020-04-24', Owner="NA"),
      dict(Task='Write 1D transforms', Start='2020-04-19', Finish='2020-04-24', Owner="NA"),
      dict(Task='Write Data transformer', Start='2020-04-22', Finish='2020-04-26', Owner="NA"),
      dict(Task='Learn about CNN Image types', Start='2020-04-22', Finish='2020-04-24', Owner="David"),
      dict(Task='Write 1D-ish Image-Creation F(n)' , Start='2020-04-22', Finish='2020-04-24', Owner="NA"),
      dict(Task='Check Tests against MATH data' , Start='2020-04-24', Finish='2020-04-28', Owner="NA"),
      dict(Task=' ', Start='2020-05-12', Finish='2020-05-12', Owner="NA")
      ]
CNN_tasks = [
      dict(Task='*CNN OPTIMIZATION*               ', Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task='Write Funcitonal Specs' , Start='2020-04-19', Finish='2020-04-24', Owner="NA"),
      dict(Task='Learn from old CNN', Start='2020-04-17', Finish='2020-04-21', Owner="David"),
      dict(Task='Debug w/ Round 1 Images', Start='2020-04-22', Finish='2020-04-23', Owner="NA"),
      dict(Task='Get f(n) working to show at 4/28 Advisor Mtg' , Start='2020-04-21', Finish='2020-04-28', Owner="NA"),
      dict(Task='Set up Hyperparameter loop' , Start='2020-04-28', Finish='2020-05-04', Owner="NA"),
      dict(Task='Automate Optimization within f(n)' , Start='2020-05-01', Finish='2020-05-04', Owner="NA"),
      dict(Task='Integrate with Other Packages' , Start='2020-05-01', Finish='2020-05-08', Owner="NA"),
      dict(Task='Test CNN With Multi-Transform' , Start='2020-05-04', Finish='2020-05-12', Owner="NA"),
      dict(Task=' ', Start='2020-05-12', Finish='2020-05-12', Owner="NA")
      ]
misc_tasks = [
      dict(Task='*OTHER TASKS*               '   , Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' ', Start='2020-05-12', Finish='2020-05-12', Owner="NA")
      ]

sprint1 =   admin_tasks + data_tasks + transform_tasks + CNN_tasks + misc_tasks 
sprint1.reverse()  # I prefer to index based on a Waterfall-style chart

sprint1_fig = ff.create_gantt(sprint1, colors=team_colors, index_col='Owner', bar_width=0.4, 
                              show_colorbar=True, title="Sprint 1 Timeline", showgrid_y=True)
sprint1_fig.show()

In [None]:
"""
TEMPLATE FOR SPRINT PLANNING for H.A.R.D.y Project

"""
# Each of these should be one "topic" 
admin_tasks = [
      dict(Task='DOCUMENTATION' , Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task='  '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      ]
data_tasks = [
      dict(Task='DATA MANAGEMENT', Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      ]
transform_tasks = [
      dict(Task='TRANSFORMATION' , Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      ]
CNN_tasks = [
      dict(Task='CNN OPTIMIZATION', Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      ]
misc_tasks = [
      dict(Task='OTHER TASKS'   , Start='2020-04-17', Finish='2020-05-12', Owner="TEAM"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      dict(Task=' '              , Start='2020-0 -  ', Finish='2020-0 -  ', Owner="NA"),
      ]

sprint1 =   admin_tasks + data_tasks + transform_tasks + CNN_tasks + misc_tasks 
sprint1.reverse()  # I prefer to index based on a Waterfall-style chart

# DEBUGGING LIST
# for i in range(len(sprint1)):
#    print(sprint1[i])
sprint1_fig = ff.create_gantt(sprint1, colors=team_colors, index_col='Owner', bar_width=0.3, 
                              show_colorbar=True, title="Sprint 1 Timeline")

sprint1_fig.show()