In [1]:
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt

import os
import numpy as np
import pandas as pd
import seaborn as sns
import env

import wrangle as w
import explore as e

# Questions we answered:

- Which lesson appears to attract the most traffic consistently across cohorts (per program)?


- Is there a cohort that referred to a lesson significantly more than other cohorts seemed to gloss over?


- At some point in 2019, the ability for students and alumni to access both curriculums (web dev to ds, ds to web dev) should have been shut off. Do you see any evidence of that happening? Did it happen before?


- What topics are grads continuing to reference after graduation and into their jobs (for each program)?


- Which lessons are least accessed?


# Executive Summary 

<div class="alert alert-info">    
    
* Most visited lesson by program 
    * Data Science - classification 
    * Full Stack Front End - HTML-CSS
    * Full Stack Java - javascript-i
    * Full Stack PHP - index.html

* Curriculum access highlights
    * Fortuna accessed the java-iii lesson more frequently than the other 4 cohorts investigated.
    * Ceres and Jupiter accessed html-css more often than Jupiter, Fortuna, and Voyageur
    * Darden viewed the classification/overview lesson the most out of all 5 Data Science cohorts by a significant margin. 
    
* It seems access was completely shut off before 2019 but not after 2019 for web development. There are  instances of web development students accessing data science curriculum even after graduation. For the Data Science classes access to web development content was shut down before and after 2019.</div>


# Acquire

- Data was acquired from CodeUp database using mySQL

In [2]:
# abtain data
df = w.wrangle_log_data()

In [3]:
# make the date the index for data frame
df.date = pd.to_datetime(df.date)
df = df.set_index(df.date)

## Which lesson appears to attract the most traffic consistently across cohorts (per program)?


In [4]:
# all data science cohorts into dataframe 
ds_cohorts = df.loc[df["program_name"] == "data_science"]
# aggregate data science page counts 
ds_cohorts.path.value_counts().nlargest(10)

/                                           8358
search/search_index.json                    2203
classification/overview                     1785
1-fundamentals/modern-data-scientist.jpg    1655
1-fundamentals/AI-ML-DL-timeline.jpg        1651
1-fundamentals/1.1-intro-to-data-science    1633
classification/scale_features_or_not.svg    1590
fundamentals/AI-ML-DL-timeline.jpg          1443
fundamentals/modern-data-scientist.jpg      1438
sql/mysql-overview                          1424
Name: path, dtype: int64

In [5]:
# all Front End Cohorts into dataframe 
front_end_cohorts = df.loc[df["program_name"] == "front_end"]
# aggregate data science page counts 
front_end_cohorts.path.value_counts().nlargest(10)

content/html-css                               2
/                                              1
content/html-css/gitbook/images/favicon.ico    1
content/html-css/introduction.html             1
Name: path, dtype: int64

In [6]:
# all Full Stack Java Cohorts into dataframe 
full_stack_java_cohorts = df.loc[df["program_name"] == "full_stack_java"]
# aggregate data science page counts 
full_stack_java_cohorts.path.value_counts().nlargest(10)

/                           35814
javascript-i                17457
toc                         17428
search/search_index.json    15212
java-iii                    12683
html-css                    12569
java-ii                     11719
spring                      11376
jquery                      10693
mysql                       10318
Name: path, dtype: int64

In [7]:
# all Full Stack PHP Cohorts into dataframe 
full_stack_php_cohorts = df.loc[df["program_name"] == "full_stack_php"]
# aggregate data science page counts 
full_stack_php_cohorts.path.value_counts().nlargest(10)

/                1681
index.html       1011
javascript-i      736
html-css          542
spring            501
java-iii          479
java-ii           454
java-i            444
javascript-ii     429
appendix          409
Name: path, dtype: int64

### Takeaways 

**Most accessed lessons**

- Data Science program is classification/overview
- Front End program is content/html-css  
- Full Stack Java program is javascript-i
- Full Stack PHP program is index.html 

## Is there a cohort that referred to a lesson significantly more than other cohorts seemed to gloss over?

In [8]:
# initializing df for exploring the cohorts' paths accessed
name_path = df[['name','path', 'program_name']]

### Web Dev most accessed:

In [9]:
# initializing df for full_stack_java program cohorts and paths
full_stack_java_df = name_path[name_path.program_name == 'full_stack_java']

In [10]:
# get 10 most accessed paths
e.get_cohort_top10_paths('Ceres', full_stack_java_df)

Ceres top 10 lessons accessed:

javascript-i                                   1003
html-css                                        766
java-iii                                        682
java-ii                                         681
jquery                                          637
mysql                                           622
spring                                          562
javascript-ii                                   534
java-i                                          526
html-css/css-i/flexbox/flexbox-fundamentals     471
Name: path, dtype: int64


In [11]:
e.get_cohort_top10_paths('Jupiter', full_stack_java_df)

Jupiter top 10 lessons accessed:

javascript-i     926
java-iii         795
html-css         784
java-ii          755
spring           621
mysql            564
java-i           503
jquery           446
mysql/tables     439
javascript-ii    431
Name: path, dtype: int64


In [12]:
e.get_cohort_top10_paths('Zion', full_stack_java_df)

Zion top 10 lessons accessed:

javascript-i                        897
java-iii                            753
html-css                            675
spring                              672
javascript-ii                       647
java-ii                             624
java-i                              605
mysql                               605
jquery                              560
spring/fundamentals/repositories    427
Name: path, dtype: int64


In [13]:
e.get_cohort_top10_paths('Fortuna', full_stack_java_df)

Fortuna top 10 lessons accessed:

java-iii             786
javascript-i         785
java-ii              657
spring               636
mysql                591
html-css             585
java-i               555
jquery               514
javascript-ii        497
java-iii/servlets    429
Name: path, dtype: int64


In [14]:
e.get_cohort_top10_paths('Voyageurs', full_stack_java_df)

Voyageurs top 10 lessons accessed:

javascript-i                   884
java-iii                       770
java-ii                        756
mysql                          663
spring                         650
java-i                         641
javascript-ii                  584
jquery                         583
html-css                       528
java-i/introduction-to-java    447
Name: path, dtype: int64


### For Web Dev cohorts with the top 5 most log observations (Ceres, Zion, Jupiter, Fortuna, Voyageurs):


- Fortuna accessed the java-iii lesson more frequently than the other 4 cohorts investigated.


- Ceres and Jupiter accessed html-css more often than Jupiter, Fortuna, and Voyageur


### Data Science most accessed:

In [15]:
#initializing df for data_science program cohorts and paths
data_science = name_path[name_path.program_name == 'data_science']

In [16]:
# get 10 most accessed paths
e.get_cohort_top10_paths('Darden', data_science)

Darden top 10 lessons accessed:

classification/overview                           1109
classification/scale_features_or_not.svg           943
sql/mysql-overview                                 774
anomaly-detection/AnomalyDetectionCartoon.jpeg     612
anomaly-detection/overview                         592
1-fundamentals/AI-ML-DL-timeline.jpg               470
1-fundamentals/modern-data-scientist.jpg           470
1-fundamentals/1.1-intro-to-data-science           460
classification/logistic-regression                 423
stats/compare-means                                423
Name: path, dtype: int64


In [17]:
e.get_cohort_top10_paths('Bayes', data_science)

Bayes top 10 lessons accessed:

1-fundamentals/modern-data-scientist.jpg             650
1-fundamentals/AI-ML-DL-timeline.jpg                 648
1-fundamentals/1.1-intro-to-data-science             640
6-regression/1-overview                              521
10-anomaly-detection/AnomalyDetectionCartoon.jpeg    387
10-anomaly-detection/1-overview                      384
6-regression/5.0-evaluate                            333
5-stats/3-probability-distributions                  320
5-stats/4.2-compare-means                            316
appendix/cli-git-overview                            311
Name: path, dtype: int64


In [18]:
e.get_cohort_top10_paths('Curie', data_science)

Curie top 10 lessons accessed:

6-regression/1-overview                              595
1-fundamentals/modern-data-scientist.jpg             467
1-fundamentals/AI-ML-DL-timeline.jpg                 465
1-fundamentals/1.1-intro-to-data-science             461
3-sql/1-mysql-overview                               441
10-anomaly-detection/AnomalyDetectionCartoon.jpeg    345
10-anomaly-detection/1-overview                      345
4-python/8.4.3-dataframes                            260
4-python/8.4.4-advanced-dataframes                   246
4-python/3-data-types-and-variables                  234
Name: path, dtype: int64


In [19]:
e.get_cohort_top10_paths('Easley', data_science)

Easley top 10 lessons accessed:

classification/scale_features_or_not.svg                         463
classification/overview                                          445
classification/classical_programming_vs_machine_learning.jpeg    432
fundamentals/AI-ML-DL-timeline.jpg                               381
fundamentals/modern-data-scientist.jpg                           379
fundamentals/intro-to-data-science                               372
sql/mysql-overview                                               295
regression/model                                                 204
stats/compare-means                                              202
classification/explore                                           186
Name: path, dtype: int64


In [20]:
e.get_cohort_top10_paths('Florence', data_science)

Florence top 10 lessons accessed:

fundamentals/modern-data-scientist.jpg    627
fundamentals/AI-ML-DL-timeline.jpg        624
fundamentals/intro-to-data-science        615
python/data-types-and-variables           258
fundamentals/DataToAction_v2.jpg          208
sql/mysql-overview                        203
fundamentals/data-science-pipeline        189
sql/joins                                 186
sql/functions                             166
sql/mysql-introduction                    165
Name: path, dtype: int64


### For the 5 Data Science cohorts with log observations ( Darden, Bayes, Curie, Easley, Florence):

- Darden viewed the classification/overview lesson the most out of all 5 cohorts by a significant margin. Dardin had 1109 instances of this lesson, which 664 more instances than the next highest cohort, Easley.


## At some point in 2019, the ability for students and alumni to access both curriculums (web dev to ds, ds to web dev) should have been shut off. Do you see any evidence of that happening? Did it happen before?

#### Students Acess After 2019

In [21]:
# create data frame for 2019 
students_2019n_after = df['2019':]

In [22]:
# Seperate web develpment students from Data Science students
web_after2019, data_after2019 = e.seperate_students_webdev_vs_datasci(students_2019n_after)

In [23]:
# drop nulls
data_after2019.dropna(inplace =True)

In [24]:
# obtain instances of cross curriculumn acess using java key words.
e.get_double_acess(data_after2019,['java','html','css','spring'])

Unnamed: 0,date,time,path,user_id,cohort_id,ip,id,name,slack,start_date,end_date,created_at,updated_at,deleted_at,program_id,program_name,web_yes


In [25]:
# obtain instances of cross curriculumn acess using data science key words.
e.get_double_acess(web_after2019,['classification','stats','excel'])

Unnamed: 0,date,time,path,user_id,cohort_id,ip,id,name,slack,start_date,end_date,created_at,updated_at,deleted_at,program_id,program_name,web_yes
0,2019-07-03,09:53:45,6-classification/exercises,404,28.0,97.105.19.58,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
1,2019-07-03,09:53:51,6-classification/1-overview,404,28.0,97.105.19.58,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
2,2019-07-03,13:05:50,6-classification/project,146,28.0,71.235.91.18,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
3,2019-07-08,09:50:09,6-classification/1-overview,1,28.0,97.105.19.58,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
4,2019-07-08,09:57:52,6-classification/2-intro-to-classification,1,28.0,97.105.19.58,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5160,2021-03-25,13:43:18,regression/multivariate-regression-in-excel,581,28.0,70.112.179.142,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
5161,2021-03-25,13:43:27,regression/univariate_regression_in_excel,581,28.0,70.112.179.142,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
5162,2021-03-25,13:43:33,regression/multivariate-regression-in-excel,581,28.0,70.112.179.142,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0
5163,2021-03-25,13:43:44,regression/univariate_regression_in_excel,581,28.0,70.112.179.142,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java,1.0


#### Students Acess Before 2019

In [26]:
# create data frame for 2019 
students_2019n_before = df[:'2018']

In [27]:
# Seperate web develpment students from Data Science students
web_before2019, data_before2019 = e.seperate_students_webdev_vs_datasci(students_2019n_before)

In [28]:
# drop nulls
data_before2019.dropna(inplace =True)

In [29]:
# obtain instances of cross curriculumn acess using java key words.
e.get_double_acess(data_before2019,['java','html','css','spring'])

Unnamed: 0,date,time,path,user_id,cohort_id,ip,id,name,slack,start_date,end_date,created_at,updated_at,deleted_at,program_id,program_name,web_yes


In [30]:
# obtain instances of cross curriculumn acess using data science key words.
e.get_double_acess(web_before2019,['classification','stats','excel'])

Unnamed: 0,date,time,path,user_id,cohort_id,ip,id,name,slack,start_date,end_date,created_at,updated_at,deleted_at,program_id,program_name,web_yes


### TakeAway:
It seems access was not completely shut off before 2019 but not after 2019.  
There are  instances of web development students accessing data science curriculum even after graduation. 
For the Data Science classes access to web development content was shut down before and after 2019.

## What topics are grads continuing to reference after graduation and into their jobs (for each program)?

In [31]:
# create data frame of graduates looking at curriculum
peeking_after = df[df.end_date<df.index]
peeking_after

Unnamed: 0_level_0,date,time,path,user_id,cohort_id,ip,id,name,slack,start_date,end_date,created_at,updated_at,deleted_at,program_id,program_name
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2018-01-26,2018-01-26,09:55:03,/,1,8.0,97.105.19.61,8.0,Hampton,#hampton,2015-09-22,2016-02-06,2016-06-14 19:52:26,2016-06-14 19:52:26,,1.0,full_stack_php
2018-01-26,2018-01-26,09:56:02,java-ii,1,8.0,97.105.19.61,8.0,Hampton,#hampton,2015-09-22,2016-02-06,2016-06-14 19:52:26,2016-06-14 19:52:26,,1.0,full_stack_php
2018-01-26,2018-01-26,09:56:05,java-ii/object-oriented-programming,1,8.0,97.105.19.61,8.0,Hampton,#hampton,2015-09-22,2016-02-06,2016-06-14 19:52:26,2016-06-14 19:52:26,,1.0,full_stack_php
2018-01-26,2018-01-26,09:56:06,slides/object_oriented_programming,1,8.0,97.105.19.61,8.0,Hampton,#hampton,2015-09-22,2016-02-06,2016-06-14 19:52:26,2016-06-14 19:52:26,,1.0,full_stack_php
2018-01-26,2018-01-26,10:14:47,/,11,1.0,97.105.19.61,1.0,Arches,#arches,2014-02-04,2014-04-22,2016-06-14 19:52:26,2016-06-14 19:52:26,,1.0,full_stack_php
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-04-21,2021-04-21,16:41:51,jquery/personal-site,64,28.0,71.150.217.33,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java
2021-04-21,2021-04-21,16:42:02,jquery/mapbox-api,64,28.0,71.150.217.33,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java
2021-04-21,2021-04-21,16:42:09,jquery/ajax/weather-map,64,28.0,71.150.217.33,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java
2021-04-21,2021-04-21,16:44:37,anomaly-detection/discrete-probabilistic-methods,744,28.0,24.160.137.86,28.0,Staff,#,2014-02-04,2014-02-04,2018-12-06 17:04:19,2018-12-06 17:04:19,,2.0,full_stack_java


In [32]:
peeking_after_php = peeking_after[peeking_after.program_name == 'full_stack_php']
peeking_after_java = peeking_after[peeking_after.program_name == 'full_stack_java']
peeking_after_datascience = peeking_after[peeking_after.program_name == 'data_science']
peeking_after_front_end = peeking_after[peeking_after.program_name == 'front_end']

In [33]:
peeking_after_php.path.value_counts().nlargest(10)

/                1681
index.html       1011
javascript-i      736
html-css          542
spring            501
java-iii          479
java-ii           454
java-i            444
javascript-ii     429
appendix          409
Name: path, dtype: int64

In [34]:
peeking_after_java.path.value_counts().nlargest(10)

/                           12406
javascript-i                 4229
spring                       3760
search/search_index.json     3562
html-css                     3136
java-iii                     3058
java-ii                      2985
java-i                       2679
appendix                     2662
javascript-ii                2549
Name: path, dtype: int64

In [35]:
peeking_after_datascience.path.value_counts().nlargest(10)

/                                                 1436
search/search_index.json                           493
sql/mysql-overview                                 275
classification/overview                            266
classification/scale_features_or_not.svg           219
anomaly-detection/AnomalyDetectionCartoon.jpeg     193
anomaly-detection/overview                         191
fundamentals/AI-ML-DL-timeline.jpg                 189
fundamentals/modern-data-scientist.jpg             187
fundamentals/intro-to-data-science                 184
Name: path, dtype: int64

In [36]:
peeking_after_front_end.path.value_counts().nlargest(10)

content/html-css                               2
/                                              1
content/html-css/gitbook/images/favicon.ico    1
content/html-css/introduction.html             1
Name: path, dtype: int64

Take Away:
-Full stack_php graduates seem to be visiting java topics and html-css more than any other topics with javascript at the top of the search.
-Full-stack java graduated students have a similar pattern to full stack php graduated students. Most visit topics are in java and html-css ; with javascript-i being the most reference after graduation.
-The most visited topics by data science students seem to be sql, classification and anomaly detection. With classification being one of the top topics.
 

## Which lessons are least accessed?

In [37]:
# all data science cohorts into dataframe 
ds_cohorts = df.loc[df["program_name"] == "data_science"]
# aggregate data science page counts 
ds_cohorts.path.value_counts().nsmallest(10)

data-science-modules.jpg                        1
regression/feature_engineering_into_modeling    1
diagram-of-ds-pipeline-fraud-example.jpeg       1
ml-methodologies-drawing.jpg                    1
classification/explore-old                      1
where                                           1
sql                                             1
case-statements                                 1
nlp                                             1
json-responses                                  1
Name: path, dtype: int64

In [38]:
# all Front End Cohorts into dataframe 
front_end_cohorts = df.loc[df["program_name"] == "front_end"]
# aggregate data science page counts 
front_end_cohorts.path.value_counts().nsmallest(10)

/                                              1
content/html-css/gitbook/images/favicon.ico    1
content/html-css/introduction.html             1
content/html-css                               2
Name: path, dtype: int64

In [39]:
# all Full Stack Java Cohorts into dataframe 
full_stack_java_cohorts = df.loc[df["program_name"] == "full_stack_java"]
# aggregate data science page counts 
full_stack_java_cohorts.path.value_counts().nsmallest(10)

content/jquery/simple-simon/gitbook/images/favicon.ico                   1
4-stats/2.4-power-analysis                                               1
bom-and-dom/dom-events                                                   1
timeseries                                                               1
4-stats/2.2-probability_distributions/Selecting_a_hypothesis_test.svg    1
javascri                                                                 1
students/743/notes                                                       1
4-stats/2.3-sampling                                                     1
appendix/further-reading/mysql                                           1
2-stats/2.4-more-excel-features                                          1
Name: path, dtype: int64

In [40]:
# all Full Stack PHP Cohorts into dataframe 
full_stack_php_cohorts = df.loc[df["program_name"] == "full_stack_php"]
# aggregate data science page counts 
full_stack_php_cohorts.path.value_counts().nsmallest(10)

slides/threads                              1
2-storytelling/project                      1
appendix/data-viz-references                1
4-python/6-imports                          1
4-python/project                            1
content/mysql/relationships/indexes.html    1
css                                         1
4-python/3-data-types-and-variables         1
appendix/postwork/trains                    1
4-python/5-functions                        1
Name: path, dtype: int64

### Takeaways 
**Least accessed lessons**
- For Full Stack program - java scri
- Full Stack PHP Program - css
- Data Science Program - case statements
- Front End Program - content/html-css/gitbook/images/favicon.ico