Skip to content
Permalink
Browse files

Merge pull request #16 from Mahdisadjadi/master

Fixing some typos
  • Loading branch information...
brohrer committed Feb 26, 2019
2 parents c4be32f + fdd0f22 commit 19363e3446d8058f3c17972dde35b949ebf72840
Showing with 49 additions and 51 deletions.
  1. +1 −1 LICENSE.md
  2. +4 −4 authors.md
  3. +8 −8 curriculum_roadmap.md
  4. +6 −6 partnering.md
  5. +8 −8 strong_DS_skills.md
  6. +1 −1 terminology.md
  7. +13 −13 use_cases.md
  8. +8 −10 what_DS_do.md
@@ -2,4 +2,4 @@

![CC0 icon](CC0_88x31.png)

To the extent possible under law, the authors of the documents in the Academic Advisory repository have waived all copyright and related or neighboring rights to the documents in the Acadmic Advisory repository. This work is published from: United States.
To the extent possible under law, the authors of the documents in the Academic Advisory repository have waived all copyright and related or neighboring rights to the documents in the Academic Advisory repository. This work is published from: United States.
@@ -22,7 +22,7 @@ Shaheen Gauher, CVS Health (shaheen2007@gmail.com)

Scott Genzer, RapidMiner (sgenzer@rapidminer.com)

Ginger Holt, Facebook (gingermholt@fb.com)
Ginger Holt, Facebook (gingermholt@fb.com)

Robert Horton, Microsoft (rhorton@microsoft.com)

@@ -34,7 +34,7 @@ Ming Li, Amazon (mli@alumni.iastate.edu)

Hui Lin, Netlify (longqiman@gmail.com)

Thomas Nield, Southwest Airlines (thomasnield@live.com)
Thomas Nield, Southwest Airlines (thomasnield@live.com)

Julie Novak, Netflix (julien@netflix.com)

@@ -53,7 +53,7 @@ Lydia Chan, San Jose State University (lydia.chan@sjsu.edu)

Keith Drake, Dartmouth University (keith.m.drake@dartmouth.edu)

Michael Fenick, Broward College and Southern New Hampshire Univeristy (mfenick@broward.edu)
Michael Fenick, Broward College and Southern New Hampshire University (mfenick@broward.edu)

Jiang Gui, Dartmouth University (Jiang.Gui@dartmouth.edu)

@@ -65,7 +65,7 @@ Shaniqua Jones, Dartmouth University (shaniqua.a.jones@dartmouth.edu)

Allison Jones-Farmer, Miami University (farmerl2@miamioh.edu)

Todd MacKenzie, Dartmouth Univeristy (Todd.A.MacKenzie@dartmouth.edu)
Todd MacKenzie, Dartmouth University (Todd.A.MacKenzie@dartmouth.edu)

Brandeis Marshall, Spelman College (Brandeis.Marshall@spelman.edu)

@@ -1,24 +1,24 @@
# Data science curriculum roadmap

We venture to suggest a curriculum roadmap after receiving multiple requests for one from academic partners. As a group, we have spent the vast majority of our time in industry, although many of us have had spent time in one academic capacity or another. What follows is a set of broad recommendations, and it will inevitably require a lot of adjustments in each implementation. Given that caveat, here are our curriculum recommendations.
We venture to suggest a curriculum roadmap after receiving multiple requests for one from academic partners. As a group, we have spent the vast majority of our time in industry, although many of us have had spent time in one academic capacity or another. What follows is a set of broad recommendations, and it will inevitably require a lot of adjustments in each implementation. Given that caveat, here are our curriculum recommendations.

### More application than theory

We want to lead by emphasizing that the single most important factor in preparing students to apply their knowledge in an industry setting is application-centric learning. Working with realistic data to answer realistic questions is their best preparation. It grounds abstract concepts in hands-on experience, and it teaches data mechanics and data intuition at the same time, something that is impossible to do in isolation.
We want to lead by emphasizing that the single most important factor in preparing students to apply their knowledge in an industry setting is application-centric learning. Working with realistic data to answer realistic questions is their best preparation. It grounds abstract concepts in hands-on experience, and it teaches data mechanics and data intuition at the same time, something that is impossible to do in isolation.

With that as a foundation, we present a list of topics that prepare one well to practice data science.
With that as a foundation, we present a list of topics that prepare one well to practice data science.

## Curriculum archetypes

The types of data science and data-centric academic programs closely mirror [the major skill areas](what_DS_do.md) we have identified in our work. There are programs that emphasize **engineering**, programs that emphasize **analytics**, and programs that emphasize **modeling**. The distinction between these is that analytics focuses on the question of what can we learn from our data, modeling focuses on the problem of estimating data we wish we had, and engineering focuses on how to make it all run faster, more efficiently, and more robustly.
The types of data science and data-centric academic programs closely mirror [the major skill areas](what_DS_do.md) we have identified in our work. There are programs that emphasize **engineering**, programs that emphasize **analytics**, and programs that emphasize **modeling**. The distinction between these is that analytics focuses on the question of what can we learn from our data, modeling focuses on the problem of estimating data we wish we had, and engineering focuses on how to make it all run faster, more efficiently, and more robustly.

There are also **general data science programs** that cover all these areas to some degree. In addition there are quite a few **domain specific programs**, where a subset of engineering, analytics, and modeling skills specific to a given field are taught.

![Data program archetypes](program_archetypes.png)

The curriculum recommendations for each of these program archetypes will be different. However, all of them will share some core topics. Then analytics, engineering, and modeling-centric programs will have additional topic areas of their own. A general curriculum will include some aspects of the analytics, engineering, and modeling curricula, although perhaps not to the same depth. It is common for students to self-select courses from any combination of the three areas.

Curricula for domain specific programs look similar to a general program, except that topics, and even entire courses, will be focused on specific skills common to the area. For instance, an actuarial-focused data analytics program would likely include software tools most commonly used in insurance companies, time series and rare-event prediction algorithms, and visualization methods that are accepted throughout the insurance industry. The student can best practice their skills through a project based on real domain-specific data. Hands-on projects or internships are highly recommended. When designing the programs, institutions may also consider offering interdisciplinary degrees and programs. Domain specific programs often combine courses from multiple departments or colleges.
Curricula for domain specific programs look similar to a general program, except that topics, and even entire courses, will be focused on specific skills common to the area. For instance, an actuarial-focused data analytics program would likely include software tools most commonly used in insurance companies, time series and rare-event prediction algorithms, and visualization methods that are accepted throughout the insurance industry. The student can best practice their skills through a project based on real domain-specific data. Hands-on projects or internships are highly recommended. When designing the programs, institutions may also consider offering interdisciplinary degrees and programs. Domain specific programs often combine courses from multiple departments or colleges.

Here are the major topics we suggest including in each area, with some of the particularly important subtopics enumerated.

@@ -38,7 +38,7 @@ Here are the major topics we suggest including in each area, with some of the pa
* Data interpretation and communication
* Presentation
* Technical writing
* Data concepts for non-technical audiences
* Data concepts for non-technical audiences

## Analytics topics
* Advanced statistics
@@ -96,6 +96,6 @@ Here are the major topics we suggest including in each area, with some of the pa
* Optimization

#
Note that for each topic and subtopic, there are many effective ways to split it into courses. The best way for your institution will depend on many factors, including length of term, hours per class, existing departmental boundaries, instuctor availability, and the rate at which your students are expected to absorb information. These recommendations assume a two-year masters program with the primary goal of preparing students for employment and continued career growth, although they can certainly be scaled up or down to fit the scope of other programs.
Note that for each topic and subtopic, there are many effective ways to split it into courses. The best way for your institution will depend on many factors, including length of term, hours per class, existing departmental boundaries, instructor availability, and the rate at which your students are expected to absorb information. These recommendations assume a two-year masters program with the primary goal of preparing students for employment and continued career growth, although they can certainly be scaled up or down to fit the scope of other programs.

It bears repeating that application-focused instruction will best prepare the students for professional positions. The more theory is grounded in concrete examples, and the more specific skills are exercised in the context of solving a larger problem, the deeper the student's understanding of how it works, and where to apply it.
It bears repeating that application-focused instruction will best prepare the students for professional positions. The more theory is grounded in concrete examples, and the more specific skills are exercised in the context of solving a larger problem, the deeper the student's understanding of how it works, and where to apply it.
@@ -6,32 +6,32 @@ When this happens it reflects well on the institution and its instructors
and it prepares a new wave of data scientists to do great work in whatever company they land.

There is no single template for a successful partnership. The strengths, needs and resources of each program and company are unique.
Each partnership that evolves is a singular relationship.
Each partnership that evolves is a singular relationship.
One thing we have observed is that successful connections between institutions tend to be brokered and sustained by individuals.
Fostering individual connections between faculty and industry data scientists is an effective way to promote partnering.

One of the most benficial parntership modes for both academic programs and industry partners has proven to be capstone mentoring.
One of the most beneficial partnership modes for both academic programs and industry partners has proven to be capstone mentoring.

## Mentoring capstone projects

Capstone projects are typically undertaken by students, individually or in groups, in their final semester of a program.
They simulate industry projects students will experience in scope, form, and complexity. They also provide a great way for employers to get to know a handful of students before the question of employment arises, and vice versa.
They simulate industry projects students will experience in scope, form, and complexity. They also provide a great way for employers to get to know a handful of students before the question of employment arises, and vice versa.

In a prototypical capstone engagement, a mentor might provide the following:

* A data science question / project idea
* Some realistic data
* An in-person kickoff meeting to explain what the problem is and why the answer matters
* A monthly video-conference checkin
* Attending a wrap-up presentation at the end of the prjoject
* Attending a wrap-up presentation at the end of the project

This model can be modified to either be more or less instensive. A more involved mentor might meet with their team weekly. In a lighter engagement, a mentor might provide a question, but no data, and attend only the final presentations. The details of each mentoring experience are entrely up to the program and the mentor to negotiate, but there are many vatiations that have proven succesful.
This model can be modified to either be more or less intensive. A more involved mentor might meet with their team weekly. In a lighter engagement, a mentor might provide a question, but no data, and attend only the final presentations. The details of each mentoring experience are entirely up to the program and the mentor to negotiate, but there are many variations that have proven successful.

To get you started, here are [a set of use cases](use_cases.md) that crop up in industry.

## Other partnership modes

Here are some other examples of how industry and academic programs have worked together in the past.
Here are some other examples of how industry and academic programs have worked together in the past.

* Tech talks
* Workshops
@@ -14,21 +14,21 @@ Data scientists that are lacking in this area fail to convey their work or persu

## 2. Breadth

Strong data scientists are not afraid to move between [roles](what_DS_do.md), say, migrating between data analysis, data engineering, modeling, and back, over the course of a project. This breadth provides a huge benfit. For example, doing data analysis while keeping the limitations of modeling in mind produces results that are more accurate, more useful and more timely.
Strong data scientists are not afraid to move between [roles](what_DS_do.md), say, migrating between data analysis, data engineering, modeling, and back, over the course of a project. This breadth provides a huge benefit. For example, doing data analysis while keeping the limitations of modeling in mind produces results that are more accurate, more useful and more timely.

Data scientists that are lacking in this area might say "I'm a modeler. Data cleaning is a job for someone else." Overspecialization leads to blind spots, such as neglecting code health or neglecting statistical rigor.
Data scientists that are lacking in this area might say "I'm a modeler. Data cleaning is a job for someone else." Overspecialization leads to blind spots, such as neglecting code health or neglecting statistical rigor.


## 3. Readiness to learn new tools, skills and domains

Data scientists have to learn new tools (e.g. new languages, new applications, new techniques) with each new position, and sometimes with each new project. There's no practical way to learn all the tools you will need before you need them.
The only way to be prepared for this is get comfortable with the process of learning.
The only way to be prepared for this is get comfortable with the process of learning.
The set of tools a data scientist comes with doesn't matter as much as their ability to embrace new ones.

Data scientists that are lacking in this area will be limited in what they can contribute. Most project work will be frustrating. (Their teammates will be frustrated too.)
Data scientists that are lacking in this area will be limited in what they can contribute. Most project work will be frustrating. (Their teammates will be frustrated too.)
The solution is to adopt a willingness to feel dumb, also known as "a beginner's mindset".
This helps navigate the uncomfortable start-up period when every step of working with a tool is unfamiliar.
The beginner's mindset manifests itself as a curiosity about the field, the company, the products, and the customers.
The beginner's mindset manifests itself as a curiosity about the field, the company, the products, and the customers.


# Success patterns
@@ -37,13 +37,13 @@ The beginner's mindset manifests itself as a curiosity about the field, the comp

The strongest data scientists are those who have a broad understanding of all the roles a data scientist can play and have deep skills in at least one.
In our experience, these data scientists are the ones who have worked on realistic data science problems in several domains.
The skills required to work with data are tough to learn in the abstract. Concrete examples with rich context and ambiguity are powerful teachers. Applying the same skill in several different domains bestows a facility on the learner that is hard to get any other way.
The skills required to work with data are tough to learn in the abstract. Concrete examples with rich context and ambiguity are powerful teachers. Applying the same skill in several different domains bestows a facility on the learner that is hard to get any other way.

Data scientist that are lacking in this area will be confused by the quirks of real data and overwhelmed with the challenges of using their skills on unfamiliar problems.
Data scientist that are lacking in this area will be confused by the quirks of real data and overwhelmed with the challenges of using their skills on unfamiliar problems.

## 2. Mentoring / cross-mentoring / community contribution

There is no better way to develop a deep mastery and rich understanding of the field than to share your work with others. This can take the form of teaching activities with those less experienced, such providing advice, tutorials, or explanations. It can also manifest between peers in such varied ways as publishing project summaries, asking advice, cooperative coding, and creating cheat sheet references for a new tool. These can take place in person or on-line. Every major social network has its own data science community, each with its own flavor.
There is no better way to develop a deep mastery and rich understanding of the field than to share your work with others. This can take the form of teaching activities with those less experienced, such providing advice, tutorials, or explanations. It can also manifest between peers in such varied ways as publishing project summaries, asking advice, cooperative coding, and creating cheat sheet references for a new tool. These can take place in person or on-line. Every major social network has its own data science community, each with its own flavor.


##
@@ -37,4 +37,4 @@ but there are others, each suggesting a different emphasis

**statistician**.

A note of caution: when interpreting job postings, applicants would no well to look past the title to the tasks and responsibilities of the role. One unfortunate consequence of the lack of consensus around titles is that you can never be certain what a position entails based on the title.
A note of caution: when interpreting job postings, applicants would do well to look past the title to the tasks and responsibilities of the role. One unfortunate consequence of the lack of consensus around titles is that you can never be certain what a position entails based on the title.
Oops, something went wrong.

0 comments on commit 19363e3

Please sign in to comment.
You can’t perform that action at this time.