# Important Dates


###### Recall: The intent classification categories
* Important Dates
* Course
* Professor
* Location -> Stretch

## Subclasses of Important Dates
Important dates has many different subclasses.
The subclasses we have identified so far are as follows
 
* faculty report
* curriculum study
* open residence halls
* holiday
* convocation
* instruction_begin
* add date: 1) without permission, 2) with permission
* withdraw date
* drop date
* semester start
* semester end
* break
* finals
* registration



 Faculty Report	Thursday, January 16
 Curriculum Study & Improvement of Instruction	Thursday – Friday, January 16 – 17
 New Student Orientation/Registration	Friday, January 17
* Residence Halls Opens	Sunday, January 19
* Martin Luther King Holiday	Monday, January 20
* Spring Convocation	Tuesday, January 21
* Instruction Begins	Wednesday, January 22
* Last Day to Add a Course without Instructor’s Permission	Wednesday, January 23
* Late Registration (late fee applies)	Friday, January 24
* Deadline for Filing Degree Application
* (Students meeting requirements at end of spring)	Friday, January 31
* Last Day to Add a Course (Instructor’s Permission Required)	Friday, January 31
* Last Day to Drop Course without “W” (refund)	Friday, February 7
* Window for early performance grades	Friday – Tuesday, February 28 – March 3
* Spring Break (no classes scheduled)	Monday – Friday, March 16-27
* Summer and Fall registration begins	Register for Classes
* Spring Holiday (no classes scheduled)	Friday, April 10
* Last Day to Drop Course with “W” (no refund)	Friday, April 17
* Last Day to Withdraw from the University (4:59 p.m.)	Friday, May 8
* EXAM WEEK	Monday – Friday, May 11-15
* Last Day of Classes	Friday, May 15
* Commencement	Friday, May 15 and Saturday, May 16
* Campus Housing Closes	Saturday, May 16
* Faculty Deadline to Submit Final Grades (by 5:00 p.m.)

In [18]:
#!/usr/bin/env python
# coding: utf8
"""Example of training an additional entity type
This script shows how to add a new entity type to an existing pretrained NER
model. To keep the example short and simple, only four sentences are provided
as examples. In practice, you'll need many more — a few hundred would be a
good start. You will also likely need to mix in examples of other entity
types, which might be obtained by running the entity recognizer over unlabelled
sentences, and adding their annotations to the training set.
The actual training is performed by looping over the examples, and calling
`nlp.entity.update()`. The `update()` method steps through the words of the
input. At each word, it makes a prediction. It then consults the annotations
provided on the GoldParse instance, to see whether it was right. If it was
wrong, it adjusts its weights so that the correct action will score higher
next time.
After training your model, you can save it to a directory. We recommend
wrapping models as Python packages, for ease of deployment.
For more details, see the documentation:
* Training: https://spacy.io/usage/training
* NER: https://spacy.io/usage/linguistic-features#named-entities
Compatible with: spaCy v2.1.0+
Last tested with: v2.1.0
"""
from __future__ import unicode_literals, print_function

import plac
import random
from pathlib import Path
import spacy
from spacy.util import minibatch, compounding

## Preliminary

We use a dictionary, important_dates where the key is the intent classifiction subcategory and the value is a list of questions.

The dictionary is accessed like important_dates['faculty_report'] or important_dates['semester_start'].

In [19]:
important_dates = {}

In [20]:
def get_substring_label_truple(intent, label_indicator, label):
    start = intent.find(label_indicator)
    end = start + len(label_indicator)
    return (start, end, label)

## Faculty Report
All faculty members who hold full-time appointments should be prepared to report on their teaching, research/professional development and scholarly activity, and service activities for the academic year (fall, spring and summer).

In [21]:
FACULTY_REPORT_LABEL = "FACULTY_REPORT"
FACULTY_REPORT_TRAIN_DATA = [
    # faculty_report
    (
        "When is faculty report?",
        {"entities": [
            get_substring_label_truple(
                "When is faculty report",
                "faculty report",
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    
    (
        "When do I need to turn in the faculty report?",
        {"entities": [
            get_substring_label_truple(
                "When do I need to turn in the faculty report?", 
                "faculty report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the faculty report due?", # START COUNTING HERE
        {"entities": [
            get_substring_label_truple(
                "When is the faculty report due?", 
                "faculty report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "Spring report is due when?",
        {"entities": [
            get_substring_label_truple(
                "Spring report is due when?", 
                "Spring report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When do I need to turn in the spring report?",
        {"entities": [
            get_substring_label_truple(
                "When do I need to turn in the spring report?", 
                "spring report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the spring report due?",
        {"entities": [
            get_substring_label_truple(
                "When is the spring report due?", 
                "spring report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the spring report due?",
        {"entities": [
            get_substring_label_truple(
                "When is the spring report due?", 
                "spring report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When do I need to have the fall report done by?",
        {"entities": [
            get_substring_label_truple(
                "When do I need to have the fall report done by?", 
                "fall report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the fall report due?",
        {"entities": [
            get_substring_label_truple(
                "When is the fall report due?", 
                "fall report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the annual report due?",
        {"entities": [
            get_substring_label_truple(
                "When is the annual report due?", 
                "annual report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the research professor report due?",
        {"entities": [
            get_substring_label_truple(
                "When is the research professor report due?", 
                "research professor report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "When is the report due?",
        {"entities": [
            get_substring_label_truple(
                "When is the report due?", 
                "report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
    (
        "Am I supposed to turn a report in before the semester starts?",
        {"entities": [
            get_substring_label_truple(
                "Am I supposed to turn a report in before the semester starts?", 
                "report", 
                FACULTY_REPORT_LABEL
            )]
        },
    ),
] # End faculty report training data

In [26]:
@plac.annotations(
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    new_model_name=("New model name for model meta.", "option", "nm", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int),
)
def chunk_faculty_report(model=None, new_model_name="class", output_dir=None, n_iter=30):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    random.seed(0)
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe("ner")

    ner.add_label(FACULTY_REPORT_LABEL)  # add new entity label to entity recognizer
    if model is None:
        optimizer = nlp.begin_training()
    else:
        optimizer = nlp.resume_training()
    move_names = list(ner.move_names)
    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train NER
        sizes = compounding(1.0, 4.0, 1.001)
        # batch up the examples using spaCy's minibatch
        for itn in range(n_iter):
            random.shuffle(FACULTY_REPORT_TRAIN_DATA)
            batches = minibatch(FACULTY_REPORT_TRAIN_DATA, size=sizes)
            losses = {}
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
            print("Losses", losses)

    # test the trained model
    test_text = "I am not sure when to turn in the report."
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)
        

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.meta["name"] = new_model_name  # rename model
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        # Check the classes have loaded back consistently
        assert nlp2.get_pipe("ner").move_names == move_names
        doc2 = nlp2(test_text)
        for ent in doc2.ents:
            print(ent.label_, ent.text)

In [27]:
chunk_faculty_report()

Created blank 'en' model
Losses {'ner': 57.44737569242716}
Losses {'ner': 21.380779725465345}
Losses {'ner': 24.686548477839096}
Losses {'ner': 15.359547726478603}
Losses {'ner': 11.90528046012078}
Losses {'ner': 12.093196404775583}
Losses {'ner': 5.027768765226274}
Losses {'ner': 6.63662834127582}
Losses {'ner': 6.158685983240764}
Losses {'ner': 3.4318715913470554}
Losses {'ner': 5.4013453625188275}
Losses {'ner': 4.464769103477102}
Losses {'ner': 1.5280589959734756}
Losses {'ner': 4.633437349654885}
Losses {'ner': 2.7440314201125267}
Losses {'ner': 2.572962813050749}
Losses {'ner': 5.6437718621161865}
Losses {'ner': 2.409827271940434}
Losses {'ner': 6.074454421428744}
Losses {'ner': 2.0795373516189337}
Losses {'ner': 3.721942286467627}
Losses {'ner': 1.5265645845147702}
Losses {'ner': 1.4306296637909641}
Losses {'ner': 0.49207150970245356}
Losses {'ner': 1.345908527984337}
Losses {'ner': 0.0061549220243075665}
Losses {'ner': 0.05117445243372269}
Losses {'ner': 8.494539895178913e-05}


## Curriculum Study
Curriculum study is a training day

In [28]:
CURRICULUM_STUDY_LABEL = "CURRICULUM_STUDY"

CURRICULUM_STUDY_TRAIN_DATA = [
    # faculty_report
    (
        "When is curriculum study?",
        {"entities": [
            get_substring_label_truple(
                "When is curriculum study",
                "curriculum study",
                CURRICULUM_STUDY_LABEL
            )]
        },
    ),
    
    (
        "When do I need to go to the curriculum study",
        {"entities": [
            get_substring_label_truple(
                "When do I need to go to the curriculum study", 
                "curriculum study",
                CURRICULUM_STUDY_LABEL
            )]
        },
    ),
    (
        "Is there a curriculum study?", # START COUNTING HERE
        {"entities": [
            get_substring_label_truple(
                "Is there a curriculum study?", 
                "curriculum study",
                CURRICULUM_STUDY_LABEL
            )]
        },
    ),
    # Ask about how other people would ask this question
] # End curriculum report training data

In [32]:
@plac.annotations(
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    new_model_name=("New model name for model meta.", "option", "nm", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int),
)
def chunk_curriculum_study(model=None, new_model_name="class", output_dir=None, n_iter=30):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    random.seed(0)
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe("ner")

    ner.add_label(CURRICULUM_STUDY_LABEL)  # add new entity label to entity recognizer
    if model is None:
        optimizer = nlp.begin_training()
    else:
        optimizer = nlp.resume_training()
    move_names = list(ner.move_names)
    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train NER
        sizes = compounding(1.0, 4.0, 1.001)
        # batch up the examples using spaCy's minibatch
        for itn in range(n_iter):
            random.shuffle(CURRICULUM_STUDY_TRAIN_DATA)
            batches = minibatch(CURRICULUM_STUDY_TRAIN_DATA, size=sizes)
            losses = {}
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
            print("Losses", losses)

    # test the trained model
    test_text = "I am not sure when curriculum study begins"
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)
        

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.meta["name"] = new_model_name  # rename model
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        # Check the classes have loaded back consistently
        assert nlp2.get_pipe("ner").move_names == move_names
        doc2 = nlp2(test_text)
        for ent in doc2.ents:
            print(ent.label_, ent.text)

In [34]:
chunk_curriculum_study()

Created blank 'en' model
Losses {'ner': 16.726959705352783}
Losses {'ner': 14.039365828037262}
Losses {'ner': 10.436773419380188}
Losses {'ner': 6.903837885707617}
Losses {'ner': 5.14280459046131}
Losses {'ner': 5.1632564859464765}
Losses {'ner': 5.085993531538406}
Losses {'ner': 3.3885275346110575}
Losses {'ner': 6.305062769795768}
Losses {'ner': 5.294241605268326}
Losses {'ner': 4.031339164328529}
Losses {'ner': 3.581975239823805}
Losses {'ner': 2.646145839372366}
Losses {'ner': 0.5817864856813912}
Losses {'ner': 0.1343801356530605}
Losses {'ner': 0.03056787937510297}
Losses {'ner': 0.0022400086663973973}
Losses {'ner': 4.006940684432147e-05}
Losses {'ner': 3.828989362928206e-05}
Losses {'ner': 0.008731762514626222}
Losses {'ner': 2.340231883719866e-06}
Losses {'ner': 1.4564898146539806e-07}
Losses {'ner': 3.355264515701147e-08}
Losses {'ner': 6.471753613171433e-07}
Losses {'ner': 5.3499300898586024e-08}
Losses {'ner': 4.20642567951443e-09}
Losses {'ner': 1.4655555972390347e-07}
Loss

Note that "not sure" is coming up as a label. Might require more data for training.

## Open Residence Halls
Students might wonder when they can begin to move in

In [35]:
OPEN_RESIDENCE_HALLS_LABEL = "OPEN_RESIDENCE_HALLS"
OPEN_RESIDENCE_HALLS_TRAIN_DATA = [
    (
        "When do residence halls open up?",
        {"entities": [
            get_substring_label_truple(
                "When do residence halls open up?",
                "residence halls",
                OPEN_RESIDENCE_HALLS_LABEL
            )]
        },
    ),
    
    (
        "When can I move in?",
        {"entities": [
            get_substring_label_truple(
                "When can I move in?", 
                "move in", 
                OPEN_RESIDENCE_HALLS_LABEL
            )]
        },
    ),
    (
        "When do the dorms open up?",
        {"entities": [
            get_substring_label_truple(
                "When do the dorms open up?", 
                "dorms open", 
                OPEN_RESIDENCE_HALLS_LABEL
            )]
        },
    ),
    (
        "Are the dorms open yet?",
        {"entities": [
            get_substring_label_truple(
                "Are the dorms open yet?", 
                "dorms open", 
                OPEN_RESIDENCE_HALLS_LABEL
            )]
        },
    ),
    (
        "Am I allowed to move in yet?",
        {"entities": [
            get_substring_label_truple(
                "Am I allowed to move in yet?", 
                "move in", 
                OPEN_RESIDENCE_HALLS_LABEL
            )]
        },
    ),

] # End open residence halls training data

In [40]:
@plac.annotations(
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    new_model_name=("New model name for model meta.", "option", "nm", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int),
)
def chunk_open_residence_halls(model=None, new_model_name="class", output_dir=None, n_iter=30):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    random.seed(0)
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe("ner")

    ner.add_label(OPEN_RESIDENCE_HALLS_LABEL)  # add new entity label to entity recognizer
    if model is None:
        optimizer = nlp.begin_training()
    else:
        optimizer = nlp.resume_training()
    move_names = list(ner.move_names)
    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train NER
        sizes = compounding(1.0, 4.0, 1.001)
        # batch up the examples using spaCy's minibatch
        for itn in range(n_iter):
            random.shuffle(OPEN_RESIDENCE_HALLS_TRAIN_DATA)
            batches = minibatch(OPEN_RESIDENCE_HALLS_TRAIN_DATA, size=sizes)
            losses = {}
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
            print("Losses", losses)

    # test the trained model
    test_text = "When am I allowed to move into the dorms?"
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)
        

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.meta["name"] = new_model_name  # rename model
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        # Check the classes have loaded back consistently
        assert nlp2.get_pipe("ner").move_names == move_names
        doc2 = nlp2(test_text)
        for ent in doc2.ents:
            print(ent.label_, ent.text)

In [41]:
chunk_open_residence_halls()

Created blank 'en' model
Losses {'ner': 25.955070853233337}
Losses {'ner': 17.62054878473282}
Losses {'ner': 10.26456172176404}
Losses {'ner': 8.396656978642568}
Losses {'ner': 6.6999446337576956}
Losses {'ner': 9.906828098697588}
Losses {'ner': 10.731139539042488}
Losses {'ner': 7.806862586454372}
Losses {'ner': 9.512484039448509}
Losses {'ner': 3.2846351883658755}
Losses {'ner': 2.996822562398015}
Losses {'ner': 0.40871241975686545}
Losses {'ner': 0.11414844972005958}
Losses {'ner': 1.1444177293088935}
Losses {'ner': 1.1592299970664135}
Losses {'ner': 4.835520629839695e-05}
Losses {'ner': 8.016119200564283e-06}
Losses {'ner': 0.1818617957299032}
Losses {'ner': 2.940733400383502e-06}
Losses {'ner': 2.3850949801743857e-05}
Losses {'ner': 0.004844547203963066}
Losses {'ner': 1.3441988835751719e-06}
Losses {'ner': 3.0873610641288537e-06}
Losses {'ner': 0.00011392974513725935}
Losses {'ner': 1.2418331830745265e-06}
Losses {'ner': 0.00010954278878823902}
Losses {'ner': 6.998639479399436e-0