In [1]:
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt

import numpy as np
import pandas as pd
import seaborn as sns
import env

# Questions we answered:

- Which lesson appears to attract the most traffic consistently across cohorts (per program)?


- Is there a cohort that referred to a lesson significantly more than other cohorts seemed to gloss over?


- At some point in 2019, the ability for students and alumni to access both curriculums (web dev to ds, ds to web dev) should have been shut off. Do you see any evidence of that happening? Did it happen before?


- What topics are grads continuing to reference after graduation and into their jobs (for each program)?


- Which lessons are least accessed?


# Acquire

- Data was acquired from CodeUp database using mySQL

## Which lesson appears to attract the most traffic consistently across cohorts (per program)?


In [None]:
# all data science cohorts into dataframe 
ds_cohorts = df.loc[df["program_name"] == "data_science"]
# aggregate data science page counts 
ds_cohorts.path.value_counts().nlargest(10)

In [None]:
# all Front End Cohorts into dataframe 
front_end_cohorts = df.loc[df["program_name"] == "front_end"]
# aggregate data science page counts 
front_end_cohorts.path.value_counts().nlargest(10)

In [None]:
# all Full Stack Java Cohorts into dataframe 
full_stack_java_cohorts = df.loc[df["program_name"] == "full_stack_java"]
# aggregate data science page counts 
full_stack_java_cohorts.path.value_counts().nlargest(10)

In [None]:
# all Full Stack PHP Cohorts into dataframe 
full_stack_php_cohorts = df.loc[df["program_name"] == "full_stack_php"]
# aggregate data science page counts 
full_stack_php_cohorts.path.value_counts().nlargest(10)

### Takeaways 

**Most accessed lessons**

- Data Science program is classification/overview
- Front End program is content/html-css  
- Full Stack Java program is javascript-i
- Full Stack PHP program is index.html 

## Is there a cohort that referred to a lesson significantly more than other cohorts seemed to gloss over?

In [None]:
# initializing df for exploring the cohorts' paths accessed
name_path = df[['name','path', 'program_name']]

### Web Dev most accessed:

In [None]:
# initializing df for full_stack_java program cohorts and paths
full_stack_java_df = name_path[name_path.program_name == 'full_stack_java']

In [None]:
# get 10 most accessed paths
e.get_cohort_top10_paths('Ceres', full_stack_java_df)

In [None]:
e.get_cohort_top10_paths('Jupiter', full_stack_java_df)

In [None]:
e.get_cohort_top10_paths('Zion', full_stack_java_df)

In [None]:
e.get_cohort_top10_paths('Fortuna', full_stack_java_df)

In [None]:
e.get_cohort_top10_paths('Voyageurs', full_stack_java_df)

### For Web Dev cohorts with the top 5 most log observations (Ceres, Zion, Jupiter, Fortuna, Voyageurs):


- Fortuna accessed the java-iii lesson more frequently than the other 4 cohorts investigated.


- Ceres and Jupiter accessed html-css more often than Jupiter, Fortuna, and Voyageur


### Data Science most accessed:

In [None]:
#initializing df for data_science program cohorts and paths
data_science = name_path[name_path.program_name == 'data_science']

In [None]:
# get 10 most accessed paths
e.get_cohort_top10_paths('Darden', data_science)

In [None]:
e.get_cohort_top10_paths('Bayes', data_science)

In [None]:
e.get_cohort_top10_paths('Curie', data_science)

In [None]:
e.get_cohort_top10_paths('Easley', data_science)

In [None]:
e.get_cohort_top10_paths('Florence', data_science)

### For the 5 Data Science cohorts with log observations ( Darden, Bayes, Curie, Easley, Florence):

- Darden viewed the classification/overview lesson the most out of all 5 cohorts by a significant margin. Dardin had 1109 instances of this lesson, which 664 more instances than the next highest cohort, Easley.


## At some point in 2019, the ability for students and alumni to access both curriculums (web dev to ds, ds to web dev) should have been shut off. Do you see any evidence of that happening? Did it happen before?

#### Students Acess After 2019

In [None]:
# create data frame for 2019 
students_2019n_after = df['2019':]

In [None]:
# Seperate web develpment students from Data Science students
web_after2019, data_after2019 = seperate_students_webdev_vs_datasci(students_2019n_after)

In [None]:
get_double_acess(data_after2019,['java','html','css','spring'])

In [None]:
get_double_acess(webde_after2019,['classification','stats','excel'])

#### Students Acess Before 2019

In [None]:
# create data frame for 2019 
students_2019n_before = df['2019':]

In [None]:
# Seperate web develpment students from Data Science students
web_before2019, data_before2019 = seperate_students_webdev_vs_datasci(students_2019n_after)

In [None]:
get_double_acess(data_before2019,['java','html','css','spring'])

In [None]:
get_double_acess(webdev_before2019,['classification','stats','excel'])

### TakeAway:
It seems access was not completely shut off for students in the Webdev classes after or before 2019. 
There are  instances of web development students accessing data science curriculum even after graduation. 
For the Data Science classes access to web development content was shut down after 2019.

## What topics are grads continuing to reference after graduation and into their jobs (for each program)?

In [None]:
peeking_after = df[df.end_date<students.index]
peeking_after

In [None]:
peeking_after_php = peeking_after[peeking_after.program_name == 'full_stack_php']
peeking_after_java = peeking_after[peeking_after.program_name == 'full_stack_java']
peeking_after_datascience = peeking_after[peeking_after.program_name == 'data_science']
peeking_after_front_end = peeking_after[peeking_after.program_name == 'front_end']

In [None]:
peeking_after_php.path.value_counts().nlargest(10)

In [None]:
peeking_after_java.path.value_counts().nlargest(10)

In [None]:
peeking_after_datascience.path.value_counts().nlargest(10)

In [None]:
peeking_after_front_end.path.value_counts().nlargest(10)

Take Away:
-Full stack_php graduates seem to be visiting java topics and html-css more than any other topics with javascript at the top of the search.
-Full-stack java graduated students have a similar pattern to full stack php graduated students. Most visit topics are in java and html-css ; with javascript-i being the most reference after graduation.
-The most visited topics by data science students seem to be sql, classification and anomaly detection. With classification being one of the top topics.
 

## Which lessons are least accessed?

In [None]:
# all data science cohorts into dataframe 
ds_cohorts = df.loc[df["program_name"] == "data_science"]
# aggregate data science page counts 
ds_cohorts.path.value_counts().nsmallest(10)

In [None]:
# all Front End Cohorts into dataframe 
front_end_cohorts = df.loc[df["program_name"] == "front_end"]
# aggregate data science page counts 
front_end_cohorts.path.value_counts().nsmallest(10)

In [None]:
# all Full Stack Java Cohorts into dataframe 
full_stack_java_cohorts = df.loc[df["program_name"] == "full_stack_java"]
# aggregate data science page counts 
full_stack_java_cohorts.path.value_counts().nsmallest(10)

In [None]:
# all Full Stack PHP Cohorts into dataframe 
full_stack_php_cohorts = df.loc[df["program_name"] == "full_stack_php"]
# aggregate data science page counts 
full_stack_php_cohorts.path.value_counts().nsmallest(10)

### Takeaways 
**Least accessed lessons**
- For Full Stack program - java scri
- Full Stack PHP Program - css
- Data Science Program - case statements
- Front End Program - content/html-css/gitbook/images/favicon.ico