# Exercises

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import env
import seaborn as sns
from wrangle import wrangle_curr

## Wrangle

In [2]:
df = wrangle_curr()
df

### Now that our data is clean we can start answering our questions:

## 1. Which lesson appears to attract the most traffic consistently across cohorts (per program)?

In [10]:
# let's get an initial look at top results before breaking them down by program
df.path.value_counts()

path
/                             39514
toc                           16680
javascript-i                  16386
search/search_index.json      16185
html-css                      11843
                              ...  
About_NLP                         1
8.0_Intro_Module                  1
introduction-to-matplotlib        1
2.0_Intro_Stats                   1
13.5_Tableau                      1
Name: count, Length: 1844, dtype: int64

In [11]:
# looking at the highest occurence for our Web Dev PHP Program
df.path[df.program == 'Web Dev - PHP'].value_counts()

path
/                        1681
index.html               1011
javascript-i              736
html-css                  542
spring                    501
                         ... 
4.0_overview                1
4.1_introduction            1
4.4_functions               1
4.5_imports                 1
ajax-api-request.html       1
Name: count, Length: 710, dtype: int64

In [12]:
# looking at the highest occurence for our Web Dev Java Program
df.path[df.program == 'Web Dev - Java'].value_counts()

path
/                           29474
toc                         16517
javascript-i                15640
search/search_index.json    13863
java-iii                    11290
                            ...  
131                             1
132                             1
130                             1
html-css/elecments              1
spring/services                 1
Name: count, Length: 1113, dtype: int64

In [13]:
# looking at the highest occurence for our Data Science Program
df.path[df.program == 'Data Science'].value_counts()

path
/                                           8358
search/search_index.json                    2203
classification/overview                     1785
1-fundamentals/modern-data-scientist.jpg    1655
1-fundamentals/AI-ML-DL-timeline.jpg        1651
                                            ... 
python/custom-sorting-functions                1
imports                                        1
java-i/console-io                              1
appendix/univariate_regression_in_excel        1
6-regression/8-Project                         1
Name: count, Length: 682, dtype: int64

In [14]:
# looking at the highest occurence for our Front End Web Dev Program
df.path[df.program == 'Front End Web Dev'].value_counts()

path
content/html-css                               2
/                                              1
content/html-css/gitbook/images/favicon.ico    1
content/html-css/introduction.html             1
Name: count, dtype: int64

### We got varying results for our programs:
- For both our Web Dev Java and Web Dev PHP programs, the javascript-i lesson has the highest amount of traffic. 
- For our Data Science programs, the Classification Overview appears to have the most hits. 
- Finally, for our Front End Web Dev (our newest program) we have our content introduction as first.

## 2. Is there a cohort that referred to a lesson significantly more than other cohorts seemed to gloss over?

For this I'll need to break down hits by individual cohort, however, there is a disproportionationality present here, so I'd like to look at these cohorts by program as well.

In [15]:
# let's see all of the cohorts included in the data
df.cohort.value_counts()

cohort
Ceres         40730
Zion          38096
Jupiter       37109
Fortuna       36902
Voyageurs     35636
Ganymede      33844
Apex          33568
Deimos        32888
Darden        32015
Teddy         30926
Hyperion      29855
Betelgeuse    29356
Ulysses       28534
Europa        28033
Xanadu        27749
Bayes         26538
Wrangell      25586
Andromeda     25359
Kalypso       23691
Curie         21581
Yosemite      20743
Bash          17713
Luna          16623
Marco         16397
Easley        14715
Lassen         9587
Arches         8890
Florence       8562
Sequoia        7444
Neptune        7276
Olympic        4954
Kings          2845
Pinnacles      2158
Hampton        1712
Oberon         1672
Quincy         1237
Niagara         755
Mammoth         691
Glacier         598
Joshua          302
Ike             253
Badlands         93
Franklin         72
Apollo            5
Denali            4
Everglades        1
Name: count, dtype: int64

In [16]:
# first let's seperate the hits by program like we did before
wdp = df[df.program == 'Web Dev - PHP']
wdp_count = wdp.cohort.value_counts()
wdj = df[df.program == 'Web Dev - Java']
wdj_count = wdj.cohort.value_counts()
ds = df[df.program == 'Data Science']
ds_count = ds.cohort.value_counts()
wdfe = df[df.program == 'Front End Web Dev']
wdfe_count = wdfe.cohort.value_counts()
# function that creates a for loop that iterates through range of cohorts present and displays top 5 results
def top_five(cohort_list):
    for c in cohort_list.index:
        print(f'Top 5 Hits for {c} Cohort:\n\
        {pd.DataFrame(df.path[df.cohort == c].value_counts().head(5))}\n')

In [17]:
# calling the above function
top_five(wdp_count)

Top 5 Hits for Lassen Cohort:
                      count
path               
index.html      877
javascript-i    233
java-iii        224
spring          222
html-css        174

Top 5 Hits for Arches Cohort:
                       count
path                
/                626
javascript-i     294
html-css         215
javascript-ii    204
spring           192

Top 5 Hits for Olympic Cohort:
                        count
path                 
/                 249
javascript-i      128
java-i             76
jquery             71
java-i/methods     69

Top 5 Hits for Kings Cohort:
                                                          count
path                                                   
/                                                   219
index.html                                           84
content/laravel/intro                                83
content/laravel/intro/application-structure.html     63
content/laravel/intro/gitbook/images/favicon.ico     56

Top 5 Hits 

In [18]:
top_five(wdj_count)

Top 5 Hits for Ceres Cohort:
                                  count
path                           
/                          1653
search/search_index.json   1380
javascript-i               1003
toc                         911
html-css                    766

Top 5 Hits for Zion Cohort:
                                  count
path                           
/                          1798
toc                        1465
javascript-i                897
java-iii                    753
search/search_index.json    700

Top 5 Hits for Jupiter Cohort:
                                  count
path                           
toc                        1866
/                          1696
search/search_index.json    998
javascript-i                926
java-iii                    795

Top 5 Hits for Fortuna Cohort:
                                  count
path                           
/                          2038
toc                        1293
search/search_index.json   1020
java-iii      

In [19]:
top_five(ds_count)

Top 5 Hits for Darden Cohort:
                                                  count
path                                           
/                                          2980
classification/overview                    1109
classification/scale_features_or_not.svg    943
sql/mysql-overview                          774
search/search_index.json                    664

Top 5 Hits for Bayes Cohort:
                                                  count
path                                           
/                                          1967
1-fundamentals/modern-data-scientist.jpg    650
1-fundamentals/AI-ML-DL-timeline.jpg        648
1-fundamentals/1.1-intro-to-data-science    640
search/search_index.json                    588

Top 5 Hits for Curie Cohort:
                                                  count
path                                           
/                                          1712
6-regression/1-overview                     595
search/search_index.js

In [20]:
top_five(wdfe_count)

Top 5 Hits for Apollo Cohort:
                                                     count
path                                              
content/html-css                                 2
/                                                1
content/html-css/gitbook/images/favicon.ico      1
content/html-css/introduction.html               1



Web Dev PHP

| Cohort | Top Lesson |
| :- | :- |
| Lassen | javascript-i |
| Arches | javascript-i |
| Olympic | javascript-i | 
| Kings | content/laravel/intro |
| Hampton | java-iii |
| Quincy | content/laravel/intro |
| Glacier | content/html-css |
| Joshua | content/html-css |
| Ike | content/html-css |
| Badlands | content/php_ii/command-line |
| Franklin | javascript-ii/es6 |
| Denali | prework/databases |

Web Dev Java

| Cohort | Top Lesson |
| :- | :- |
| Ceres | javascript-i |
| Zion | javascript-i |
| Jupiter | javascript-i |
| Fortuna | java-iii |
| Voyageurs | javascript-i |
| Ganymede | javascript-i |
| Apex | javascript-i |
| Deimos | javascript-i |
| Teddy | java-iii |
| Hyperion | javascript-i |
| Betelgeuse | javascript-i |
| Ulysses | javascript-i |
| Europa | javascript-i |
| Xanadu | javascript-i |
| Wrangell | javascript-i |
| Andromeda | javascript-i |
| Kalypso | javascript-i |
| Yosemite | javascript-i |
| Bash | javascript-i |
| Luna | javascript-i |
| Marco | javascript-i |
| Sequoia | spring |
| Neptune | javascript-i/introduction/working-with-data-types-operators-and-variables |
| Pinnacles | javascript-i |
| Oberon | javascript-i/introduction/operators |
| Niagara | spring |
| Mammoth | java-i |

Data Science

| Cohort | Top Lesson |
| :- | :- |
| Darden | classification/overview |
| Bayes | 1-fundamentals/1.1-intro-to-data-science |
| Curie | 6-regression/1-overview |
| Easley | classification/overview |
| Florence | fundamentals/intro-to-data-science |

Front End Web Dev

| Cohort | Top Lesson |
| :- | :- |
| Apollo | content/html-css |

### Web Dev Cohorts
- The Neptune and Oberon web dev cohorts had the most anomalous hits on lessons. 
- Neptune had javascript-i/introduction/working-with-data-types-operators-and-variables as it's most hit lesson. 
- Oberon had javascript-i/introduction/operators as it's most hit lesson.

#### From my understanding, these two lessons are within the introductory stage of the javascript-i so it's odd the sublesson has more pings than the overarching lesson.

### Data Science Cohorts

#### The Curie cohort had the most anomalous lesson ping with regression overview, no other data science cohort had this in their top results

## 3. Are there students who, when active, hardly access the curriculum? If so, what information do you have about these students?

- To answer this question we're going to have to take a look at individual users and determine frequency of pings between active dates.
- We'll also look at this separated by program
- We also need to define "hardly", considering its a multiple month long course, I'd say less than 30 times would be considered "hardly" 

In [21]:
# let's remind ourselves what the data looks like
df

Unnamed: 0_level_0,path,user,ip,cohort,start_date,end_date,program
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2016-06-14 19:52:26,/,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,java-ii,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,java-ii/object-oriented-programming,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,slides/object_oriented_programming,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2018-01-08 13:59:10,javascript-i/conditionals,2,97.105.19.61,Teddy,2018-01-08,2018-05-17,Web Dev - Java
...,...,...,...,...,...,...,...
2021-01-20 21:31:11,jquery/personal-site,869,136.50.98.51,Marco,2021-01-25,2021-07-19,Web Dev - Java
2021-03-15 19:57:09,html-css/css-ii/bootstrap-grid-system,948,104.48.214.211,Neptune,2021-03-15,2021-09-03,Web Dev - Java
2020-12-07 16:58:43,java-iii,834,67.11.50.23,Luna,2020-12-07,2021-06-08,Web Dev - Java
2020-12-07 16:58:43,java-iii/servlets,834,67.11.50.23,Luna,2020-12-07,2021-06-08,Web Dev - Java


In [22]:
# let's make sure our data types are right for the next steps
df.dtypes

path                  object
user                   int64
ip                    object
cohort                object
start_date    datetime64[ns]
end_date      datetime64[ns]
program               object
dtype: object

In [23]:
df.user[(df.index < df.end_date) & (df.index > df.start_date) & (df.program == 'Front End Web Dev')].value_counts()

Series([], Name: count, dtype: int64)

There appear to be no cases where an active student in the front end web dev program accessed the course material

In [24]:
# let's look at the amount of pings per user while they're in the web dev - PHP program
t_loc = (df.user[(df.index < df.end_date) & (df.index > df.start_date) & (df.program == 'Web Dev - PHP')].value_counts()< 30)
brains_wdp = t_loc.index[t_loc]

# let's look at the amount of pings per user while they're in the web dev - Java program
t_loc = df.user[(df.index < df.end_date) & (df.index > df.start_date) & (df.program == 'Web Dev - Java')].value_counts()<30
brains_wdj = t_loc.index[t_loc]

# let's look at the amount of pings per user while they're in the data science program
t_loc = df.user[(df.index < df.end_date) & (df.index > df.start_date) & (df.program == 'Data Science')].value_counts()<30
brains_ds = t_loc.index[t_loc]

In [25]:
df.cohort[df.user == 214].value_counts()

cohort
Joshua    25
Name: count, dtype: int64

In [26]:
def whos_kid(brainiacs):
    '''
    whos_kid will determine user who accessed the course content less than 30 times throughout their time there and inform us of the cohort they belong to.
    
    '''
    for i in brainiacs:
        print(f'User {i} belongs to the {df.cohort[df.user == i].value_counts().index}:\n')

### Web Dev Cohorts

In [27]:
# listing possible web dev php cohorts
df.cohort[df.program == 'Web Dev - PHP'].value_counts().index

Index(['Lassen', 'Arches', 'Olympic', 'Kings', 'Hampton', 'Quincy', 'Glacier',
       'Joshua', 'Ike', 'Badlands', 'Franklin', 'Denali', 'Everglades'],
      dtype='object', name='cohort')

In [28]:
whos_kid(brains_wdp)

User 214 belongs to the Index(['Joshua'], dtype='object', name='cohort'):

User 610 belongs to the Index(['Quincy'], dtype='object', name='cohort'):

User 140 belongs to the Index(['Olympic'], dtype='object', name='cohort'):

User 151 belongs to the Index(['Olympic'], dtype='object', name='cohort'):

User 318 belongs to the Index(['Kings'], dtype='object', name='cohort'):

User 311 belongs to the Index(['Quincy'], dtype='object', name='cohort'):

User 92 belongs to the Index(['Quincy'], dtype='object', name='cohort'):

User 161 belongs to the Index(['Joshua'], dtype='object', name='cohort'):

User 88 belongs to the Index(['Glacier', 'Joshua', 'Ike'], dtype='object', name='cohort'):

User 399 belongs to the Index(['Quincy'], dtype='object', name='cohort'):

User 82 belongs to the Index(['Lassen'], dtype='object', name='cohort'):

User 71 belongs to the Index(['Quincy'], dtype='object', name='cohort'):

User 246 belongs to the Index(['Lassen'], dtype='object', name='cohort'):

User 852 b

Web Dev PHP Tally Scores:

Joshua II

Quincy VI

Olympic III

Kings II

Glacier I

Lassen IIII

In [29]:
# listing possible web dev java cohorts
df.cohort[df.program == 'Web Dev - Java'].value_counts().index

Index(['Ceres', 'Zion', 'Jupiter', 'Fortuna', 'Voyageurs', 'Ganymede', 'Apex',
       'Deimos', 'Teddy', 'Hyperion', 'Betelgeuse', 'Ulysses', 'Europa',
       'Xanadu', 'Wrangell', 'Andromeda', 'Kalypso', 'Yosemite', 'Bash',
       'Luna', 'Marco', 'Sequoia', 'Neptune', 'Pinnacles', 'Oberon', 'Niagara',
       'Mammoth'],
      dtype='object', name='cohort')

In [30]:
whos_kid(brains_wdj)

User 976 belongs to the Index(['Oberon'], dtype='object', name='cohort'):

User 286 belongs to the Index(['Sequoia'], dtype='object', name='cohort'):

User 772 belongs to the Index(['Jupiter'], dtype='object', name='cohort'):

User 143 belongs to the Index(['Easley', 'Niagara'], dtype='object', name='cohort'):

User 24 belongs to the Index(['Sequoia'], dtype='object', name='cohort'):

User 64 belongs to the Index(['Arches', 'Europa'], dtype='object', name='cohort'):

User 49 belongs to the Index(['Sequoia'], dtype='object', name='cohort'):

User 817 belongs to the Index(['Apex'], dtype='object', name='cohort'):

User 178 belongs to the Index(['Pinnacles'], dtype='object', name='cohort'):

User 961 belongs to the Index(['Oberon'], dtype='object', name='cohort'):

User 963 belongs to the Index(['Oberon'], dtype='object', name='cohort'):

User 968 belongs to the Index(['Oberon'], dtype='object', name='cohort'):

User 812 belongs to the Index(['Hyperion'], dtype='object', name='cohort'):



Web Dev Java Tally Scores:

Oberon V

Sequoia IIII

Jupiter II

Easley I

Arches I

Apex I

Pinnacles VI

Hyperion I

Ulysses I

Mammoth II

Neptune III

Fortuna II

Andromeda I

Yosemite I

Florence I

Europa I

Niagara III

Betelgeuse I

Ganymede I

#### Of all our Web Dev programs, 2 cohorts had six users use the website less than 30 times: 
##### Quincy and Pinnacles

### Data Science Cohorts

In [31]:
# listing possible data science cohorts
df.cohort[df.program == 'Data Science'].value_counts().index

Index(['Darden', 'Bayes', 'Curie', 'Easley', 'Florence'], dtype='object', name='cohort')

In [32]:
whos_kid(brains_ds)

User 787 belongs to the Index(['Curie'], dtype='object', name='cohort'):

User 746 belongs to the Index(['Curie'], dtype='object', name='cohort'):

User 650 belongs to the Index(['Bayes'], dtype='object', name='cohort'):

User 487 belongs to the Index(['Bayes'], dtype='object', name='cohort'):

User 697 belongs to the Index(['Darden'], dtype='object', name='cohort'):

User 679 belongs to the Index(['Darden'], dtype='object', name='cohort'):



Data Science Tally Scores:

Curie II

Bayes II

Darden II

#### In our Data Science program, all three of our cohorts had two results each:
##### Curie, Bayes, and Darden

## 4. At some point in 2019, the ability for students and alumni to access both curriculums (web dev to ds, ds to web dev) should have been shut off. Do you see any evidence of that happening? Did it happen before?

In [176]:
# show me the top 50 results for paths
df.path.value_counts().head(50)

path
/                                                                            39514
toc                                                                          16680
javascript-i                                                                 16386
search/search_index.json                                                     16185
html-css                                                                     11843
java-iii                                                                     11773
java-ii                                                                      10917
spring                                                                       10480
jquery                                                                       10124
mysql                                                                         9716
java-i                                                                        9612
javascript-ii                                                                 9301

In [175]:
# show me all occurences of data science cohorts pinging highly pinged web dev lessons
df[((df.program == 'Data Science') & (df.path == 'java-ii')) 
           | ((df.program == 'Data Science') & (df.path == 'java-i')) 
           | ((df.program == 'Data Science') & (df.path == 'java-iii'))]

Unnamed: 0_level_0,path,user,ip,cohort,start_date,end_date,program
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-08-20 14:38:55,java-iii,476,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-i,476,136.50.49.145,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-i,476,136.50.49.145,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-ii,476,136.50.49.145,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-i,476,136.50.49.145,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-i,476,136.50.49.145,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-i,476,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-ii,476,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-iii,476,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,java-iii,476,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science


### Though fairly minimal, there appeared to be a surge in hits crossing over the program line in August of 2019, leading me to believe the breaking of the bridge between programs occured sometime in the next few days following. Prior to this surge, the amount of students accessing the other curriculum did not appear rampant whatsoever.

#### 

## 5. What topics are grads continuing to reference after graduation and into their jobs (for each program)?

In [80]:
# storing values for hits to the server after graduation for each program
wdp = df.path[(df.index > df.end_date) & (df.program == 'Web Dev - PHP')].value_counts().head(5)
wdj = df.path[(df.index > df.end_date) & (df.program == 'Web Dev - Java')].value_counts()
ds = df.path[(df.index > df.end_date) & (df.program == 'Data Science')].value_counts().head(5)
wdfe = df.path[(df.index > df.end_date) & (df.program == 'Front End Web Dev')].value_counts().head(5)

### Web Dev Cohorts

In [83]:
wdp

path
/                929
javascript-i     351
html-css         272
javascript-ii    247
spring           244
Name: count, dtype: int64

In [75]:
df[(df.program == 'Web Dev - PHP')]

Unnamed: 0_level_0,path,user,ip,cohort,start_date,end_date,program
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2016-06-14 19:52:26,/,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,java-ii,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,java-ii/object-oriented-programming,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,slides/object_oriented_programming,1,97.105.19.61,Hampton,2015-09-22,2016-02-06,Web Dev - PHP
2016-06-14 19:52:26,/,11,97.105.19.61,Arches,2014-02-04,2014-04-22,Web Dev - PHP
...,...,...,...,...,...,...,...
2016-06-14 19:52:26,content/javascript/conditionals.html,51,72.179.168.148,Kings,2016-05-23,2016-09-15,Web Dev - PHP
2016-06-14 19:52:26,content/javascript/loops.html,51,72.179.168.148,Kings,2016-05-23,2016-09-15,Web Dev - PHP
2016-07-18 19:06:27,content/javascript/javascript-with-html.html,80,136.50.29.193,Lassen,2016-07-18,2016-11-10,Web Dev - PHP
2016-07-18 19:06:27,content/javascript/conditionals.html,80,136.50.29.193,Lassen,2016-07-18,2016-11-10,Web Dev - PHP


For our web dev PHP program cohorts, they seemed to access the initial lessons they first learned, like javascript-i and javascript-ii which is a valid conclusion considering they've likely spent the longest amount of time not seeing this material since starting.

In [76]:
# this seems to imply there was no attempt at accessing the curriculum after graduating, let's look a little closer at this
wdj

Series([], Name: count, dtype: int64)

In [77]:
df[(df.program == 'Web Dev - Java')]

Unnamed: 0_level_0,path,user,ip,cohort,start_date,end_date,program
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-08 13:59:10,javascript-i/conditionals,2,97.105.19.61,Teddy,2018-01-08,2018-05-17,Web Dev - Java
2018-01-08 13:59:10,javascript-i/loops,2,97.105.19.61,Teddy,2018-01-08,2018-05-17,Web Dev - Java
2018-01-08 13:59:10,javascript-i/conditionals,3,97.105.19.61,Teddy,2018-01-08,2018-05-17,Web Dev - Java
2018-01-08 13:59:10,javascript-i/functions,3,97.105.19.61,Teddy,2018-01-08,2018-05-17,Web Dev - Java
2018-01-08 13:59:10,javascript-i/loops,2,97.105.19.61,Teddy,2018-01-08,2018-05-17,Web Dev - Java
...,...,...,...,...,...,...,...
2021-01-20 21:31:11,jquery/personal-site,869,136.50.98.51,Marco,2021-01-25,2021-07-19,Web Dev - Java
2021-03-15 19:57:09,html-css/css-ii/bootstrap-grid-system,948,104.48.214.211,Neptune,2021-03-15,2021-09-03,Web Dev - Java
2020-12-07 16:58:43,java-iii,834,67.11.50.23,Luna,2020-12-07,2021-06-08,Web Dev - Java
2020-12-07 16:58:43,java-iii/servlets,834,67.11.50.23,Luna,2020-12-07,2021-06-08,Web Dev - Java


Looking at varying instances and cohorts this seems to be the case throughout. 

In [78]:
wdfe

path
content/html-css                               2
/                                              1
content/html-css/gitbook/images/favicon.ico    1
content/html-css/introduction.html             1
Name: count, dtype: int64

#### For our web dev PHP and Front End program cohorts, we have the largest amount of pings on introductory content after students have graduated. However for our Web Dev Java program, there appears to be a lack of after graduation use of the curriculum. 

### Data Science Cohorts

In [79]:
# this appears to be the same as our web dev java program, 
ds

Series([], Name: count, dtype: int64)

In [71]:
df[(df.program == 'Data Science')]

Unnamed: 0_level_0,path,user,ip,cohort,start_date,end_date,program
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-08-20 14:38:55,/,466,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,/,467,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,/,468,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,/,469,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
2019-08-20 14:38:55,/,470,97.105.19.58,Bayes,2019-08-19,2020-01-30,Data Science
...,...,...,...,...,...,...,...
2020-12-07 15:20:18,regression/project,841,99.162.244.233,Easley,2020-12-07,2021-06-08,Data Science
2020-12-07 15:20:18,regression/project,841,99.162.244.233,Easley,2020-12-07,2021-06-08,Data Science
2020-12-07 15:20:18,/,143,173.174.194.60,Easley,2020-12-07,2021-06-08,Data Science
2020-12-07 15:20:18,clustering/project,841,99.162.244.233,Easley,2020-12-07,2021-06-08,Data Science


#### For our Data Science program, our students appear to only use the curriculum during the course and have no outside utilization.

## 6. Which lessons are least accessed?

### Web Dev Cohorts

def bot_twen(cohort_list):
    for c in cohort_list.index:
        print(f'Bottom 20 Hits for {c} Cohort:\n\
        {(df.path[df.cohort == c].value_counts().tail(20))}\n')

In [123]:
df.path[(df.program == 'Web Dev - PHP')].value_counts().tail(30)

path
javascript-with-html                                1
Pipeline_Demo                                       1
3.10-more-exercises                                 1
4.6.1_introduction_to_matplotlib                    1
10.2_Regex                                          1
13.5_Tableau                                        1
Intro_to_Regression                                 1
ordinary_least_squares.jpeg                         1
content/examples/constructors-destructors.html      1
content/jquery/essential-methods/traversing.html    1
content/jquery/events/gitbook/images/favicon.ico    1
11._DistributedML.md                                1
content/jquery/events/mouse-events.html             1
content/jquery/events/keyboard-events.html          1
html-css/bootstrap-grid-system                      1
students                                            1
appendix/extra-challenges/locations                 1
appendix/extra-challenges/trains                    1
content/examples/javasc

In [124]:
df.path[(df.program == 'Web Dev - Java')].value_counts().tail(30)

path
11.04_Modeling.md                                            1
11.03_Explore.md                                             1
capstone/54                                                  1
htmle-css/elements                                           1
Exercises                                                    1
html                                                         1
student/850                                                  1
student/202                                                  1
student/120                                                  1
arash-arghavan                                               1
jquery/ajax/requests-and-responses/.json                     1
data-science                                                 1
data-1                                                       1
bonus-exercises                                              1
11.01.02_DataAcquisition                                     1
javascript-i/loops/google.com                     

In [125]:
df.path[(df.program == 'Front End Web Dev')].value_counts().tail(30)

path
content/html-css                               2
/                                              1
content/html-css/gitbook/images/favicon.ico    1
content/html-css/introduction.html             1
Name: count, dtype: int64

#### Although we had many cases where there was only one hit to content in the curriculum, when it came to our web dev programs, both our Java and PHP web dev cohorts had entries from what appears to be example content for 'gitbook/images'. Both paths are indicated as lesson content but appear to be supportive content and not necessary based off the lack of usage.

### Data Science Cohorts

In [127]:
df.path[(df.program == 'Data Science')].value_counts().tail(50)

path
4.2-compare-means                                        1
curie-statistics-assessment                              1
stats-assessment                                         1
statistics-assessment                                    1
%20https://github.com/RaulCPena                          1
,%20https://github.com/RaulCPena                         1
A-clustering/project                                     1
b-clustering/project                                     1
appendix/www.opensecrets.org                             1
grades/getUserDetails/916/3                              1
grades                                                   1
java-ii/object-oriented-programming                      1
appendix/open_data/www.census.gov                        1
database-design                                          1
2-storytelling/chart-types                               1
itc-ml                                                   1
misleading1_fox.jpg                                

#### For our Data Science program, we also showed a lot of content with one time usage, a lot of this appeared to be methodology sublessons. The lesson content that stood out as the least likely to be accessed was lessons on javascript.