# Hi, this is a short tutorial for RISE

## Housekeeping

Deps for how I set up this notebook:
  1. install RISE for this pres mode `conda install -c conda-forge rise`
  2. install altair for data viz `conda install -c conda-forge altair vega_datasets notebook vega`
  3. install jupyter configurator to add RISE themes (among other things) `conda install -c conda-forge jupyter_nbextensions_configurator`
  4. install html parsing library `pip install request-html`

In [2]:
# Try get html from pybay schedule page ONCE
# If a pickled version of it exists then use that
# Else unpickle the stored version and load into memory
from requests_html import HTMLSession

session = HTMLSession()
pybay_schedule = session.get('https://pybay.com/schedule/')


In [3]:
# Define talk categories
talk_cats = dict()
for cat in pybay_schedule.html.find(".sch-filter-list", first=True).find('li'):
    if not cat.attrs["data-filter"].strip():
        continue
    talk_cats[cat.text.strip()] = cat.attrs["data-filter"]

talk_cats

{'Python & Libraries': 'python',
 'DevOps, Testing, & Automation': 'devops',
 'People & Project Management': 'community',
 'Scale & Performance': 'speed',
 'ML, AI, & Data': 'ai',
 'Web, IoT, & Hardware': 'web',
 'Beginner-friendly': 'level-1'}

In [4]:
# Parse schedule for talk information

import collections
talk_details = collections.namedtuple("TalkDetails", ["time", "title", "speaker", "categories", "description"])
talks = list()

def get_categories(cat_text: str) -> list:
    """
    Ex1: (ML, AI, & Data)
    Ex2: (DevOps, Testing, & Automation,
Web, IoT, & Hardware)
    """
    cats = cat_text.split("\n")
    cats = [cat.replace("(", "").replace(")", "") for cat in cats]
    cats = [cat[:-1] if cat.endswith(",") else cat for cat in cats]
    return [talk_cats[cat] for cat in cats]

for day in pybay_schedule.html.find(".sch-day"):
    if day.find(".sch-day-title", first=True).text not in {"Aug. 17, 2019", "Aug. 18, 2019"}:
        continue
    a = day.find(".sch-timeslots", first=True)
    b = a.find(".sch-timeslot")
    for timeslot in day.find(".sch-timeslots", first=True).find(".sch-timeslot"):
        if "sch-timeslot-special" in timeslot.attrs["class"]:
            continue  # skip things like check-in, breakfast, lunch, etc.
        time = timeslot.find(".sch-timeslot-time", first=True).text
        for talk in timeslot.find(".sch-timeslot-slots", first=True).find(".sch-timeslot-slot"):
            details = talk.find(".sch-slot-inner", first=True)
            title = details.find("h4 > a", first=True).text
            speaker = details.find("p.sch-speaker", first=True).text
            try:
                categories = get_categories(details.find("p.sch-category > span.small", first=True).text)
            except AttributeError:
                categories = []
            try:
                description = talk.find("div.sch-description > p", first = True).text
            except AttributeError:
                description = ""
            talks.append(talk_details(time, title, speaker, categories, description))

talks[:3]

[TalkDetails(time='9:40 a.m.', title='As We May Program', speaker='Peter Norvig', categories=['ai'], description='Innovations in machine learning are changing our perception of what is possible to do with a computer. But how will machine learning change the way we program, the tools we use, and the mix of tasks done by expert programmers, novice programmers, and non-programmers? This talk examines some possible futures.'),
 TalkDetails(time='10:15 a.m.', title="PEP 581 and PEP 588: Migrating CPython's Issue Tracker", speaker='Mariatta .', categories=['python'], description="In 2017, CPython codebase was moved to GitHub from Mercurial, an effort that took more than three years of planning and lots of volunteer coordination. The move proved to be successful and well-appreciated. New contributors face less barriers when contributing to Python. Core developers are benefiting from personal assistants in the form of GitHub bots and automations. Can the workflow be even better? In this talk, 

In [23]:
import pandas as pd

df = pd.DataFrame(data=talks)
df

Unnamed: 0,time,title,speaker,categories,description
0,9:40 a.m.,As We May Program,Peter Norvig,[ai],Innovations in machine learning are changing o...
1,10:15 a.m.,PEP 581 and PEP 588: Migrating CPython's Issue...,Mariatta .,[python],"In 2017, CPython codebase was moved to GitHub ..."
2,11:05 a.m.,What's Coming in 3.8? Assignment Expressions &...,Adam Forsyth,[python],"Curious what's coming next for Python? Well, P..."
3,11:05 a.m.,Unclogging a VFX Production Pipeline with Anal...,Bridgette Powell,"[devops, web]","Without a unified analytics solution, it has b..."
4,11:05 a.m.,Koalas: Easy Transition from pandas to Apache ...,Xiao Li,[ai],"In this talk, we present Koalas, a new open so..."
5,11:05 a.m.,Migrating from REST to GraphQL under Django,Manish Sinha,"[python, web]",GraphQL has become the de facto successor to R...
6,11:45 a.m.,Python Steering Council Panel,Paul Everitt\nŁukasz Langa\nBarry Warsaw\nBenj...,[python],"Elected as prescribed in PEP 8016, the Python ..."
7,11:45 a.m.,Full Stack Web with Nothing but Python: How An...,Meredydd Luff,"[python, web]",Programming for the Web requires 5 languages a...
8,11:45 a.m.,Pushing the limits of Python: ML infra at Netflix,Ville Tuulos\nRavi Kiran Chirravuri\nSavin Goyal,"[ai, python, speed, devops]",We will share our experiences on building Meta...
9,1:30 p.m.,Real-Time Bidding Models in Computational Adve...,Allie .,[ai],The talk provides an overview of the ad tech e...


In [25]:
talks_attended = {
    "As We May Program",
    "PEP 581 and PEP 588: Migrating CPython's Issue Tracker",
    "Koalas: Easy Transition from pandas to Apache Spark",
    "Pushing the limits of Python: ML infra at Netflix",
    "An Intro to Load Testing with Locust and Python",
    "Dependency Injection, Quickly",
    "Python and R for Advanced Analytics",
    "Understanding Python’s Debugging Internals",
    "How to Write Pytest Plugins",
    "(Deep) Learn You a Neural Net for Great Good!",
    "Effective Visual Representations using Python",
    "Deploy Deep Learning models as Microservices in minutes",
    "Patterns for Clean API Design",
    "Understanding Concurrency in Python!",
    "Writing good python APIs with autosig",
    "Why you should be using structured logs",
    "Ask the Ecosystem: Lessons from 200+ FOSS Applications",
}
df["attended"] = df["title"].apply(lambda t: t in talks_attended)
assert df["attended"].value_counts()[True] == len(talks_attended)  # Means a match was found for each title

In [36]:
# Save for when website eventually changes
import pickle
import os

if os.path.exists("pb_data.df"):
    print("Pickled dataframe already exists, skipping.")
else:
    print("Pickling data frame.")
    with open("pb_data.df", "wb") as f:
        pickle.dump(df, f)

Pickled dataframe already exists, skipping.


In [None]:
attendance_categories = collections.namedtuple("AttendanceCategories", ["count", "category", "disposition"])
data = list()

# flattened = [item for sublist in df["categories"] for item in sublist]
# total_count = collections.Counter(flattened)
# [data.append(attendance_categories(count, category, "total")) for category, count in total_count.items()]
my_flattened = [item for sublist in df[df["attended"] == True]["categories"] for item in sublist]
attended_count = collections.Counter(my_flattened)
[data.append(attendance_categories(count, category, "attended")) for category, count in attended_count.items()]
unattended_flattened = [item for sublist in df[df["attended"] == False]["categories"] for item in sublist]
unattended_count = collections.Counter(unattended_flattened)
[data.append(attendance_categories(count, category, "unattended")) for category, count in unattended_count.items()]

source = pd.DataFrame(data=data)

In [None]:
# https://github.com/ft-interactive/chart-doctor/tree/master/visual-vocabulary

import altair as alt
from vega_datasets import data

alt.Chart(source).mark_bar().encode(
    x='category',
    y='count',
    color='disposition',
    order=alt.Order('disposition', sort='descending'),
)