# Create Individualized Schedules From Aggregate PDF
For some reason, the Customer Appointment Manager (CAM) software does not allow users to export individualized schedules per employee. Rather, you are forced to export a single PDF that contains the schedules for all employees. The solution is to manually split the PDFs by checking the page content to see if certain criteria are met. Then we can determine which pages belong to which employees, and split the original document.

The `CreateSchedules` class within the `send_schedule.py` script handles the separation process, creating indvidualized PDFs named for the tutor in the `/data/processed/` directory. The individual methods are discussed in more detail below.

### Package Import

In [22]:
import sys
sys.path.append('../')

import send_schedule

import warnings
warnings.filterwarnings('ignore')

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


---

# Table of Contents
1. [Creating a `CreateSchedules` object](#creating)
2. [Methods](#methods)
    1. `get_pages_per_tutor`
    2. `split_pdf`

---

# `CreateSchedules` Object
A `CreateSchedules` object is instantiated by requiring one input parameter:
* `aggregate_schedule`: string that represents the name of the PDF file that contains all the tutor schedules which should be split. _Do not include the file extension_ -- this should be `.pdf`.

In [3]:
creator = send_schedule.CreateSchedules("Dummy Schedule")

The filename will almost _always_ be "ALC Schedule" and is the default name used when running from the CLI or as an executable. 

Once created, the `CreateSchedules` object will generate two class variables:
* `data_dir`: the absolute path to the `/data/` directory 
* `pdf`: the PDF object from the  PyPDF2.PdfFileReader class

We can examine these two things below:

### `data_dir`

In [4]:
creator.data_dir

'/Users/hagenfritz/Documents/mass-email-sender/data'

An important note is that this script takes advantage of the directory structure so no changes should be made to how the directories and files are organized. This path will update based on your system when you clone the project. 

### `pdf`

In [5]:
creator.pdf

<PyPDF2._reader.PdfFileReader at 0x7fb2c6ca7c10>

Not much here to see, but there are built in methods from the PyPDF2 class that allow us to get the number of pages and the page content.

---

<a id="methods"></a>

# Methods
There are only two methods contained in the `CreateSchedules` class which are described below"

## `get_pages_per_tutor`
This method returns a `dict` where the keys are the individual tutors while the values correspond to the pages that their schedule details are on. There are _no_ input parameters required.

In [19]:
pages_per_tutor = creator.get_pages_per_tutor()
print(pages_per_tutor)

{'HFritz': [0], 'WFritz': [1, 2], 'JSmith': [3, 4], 'RDaniell': [5]}


The output is shown above. Note, that pages are zero-indexed compared to our more traditional one-indexing. 

## `split_pdf`
This method uses the output from `get_pages_per_tutor` to allocate pages to each tutor's PDF, ultimately creating the individualized PDFs. There is only one input required:
* `pages_per_tutor`: keys as tutors with values as the pages of the original document corresponding to them

In [20]:
creator.split_pdf(pages_per_tutor=pages_per_tutor)

Nothing is returned. Rather, the individual PDFs can be seen in the `/data/processed/` directory.

In [21]:
import os
for file in os.listdir("../data/processed/"):
    if file.endswith("pdf"):
        print(file)

RDaniell.pdf
JSmith.pdf
WFritz.pdf
HFritz.pdf


---