## Autofill PDF

Autofill a fillable PDF form with data from a CSV file. We will use the PyPDF2 package and pandas to illustrate an example project using 2020 NBA draft prospects. 

We start by loading the packages that we'll need.

In [44]:
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
import pandas as pd

The code used to edit the PDFs is broken up into two functions, `set_need_appearances_writer` and `create_pdf`.

In [45]:
def set_need_appearances_writer(writer):

    try:
        catalog = writer._root_object
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

In [46]:
def create_pdf(infile, outfile, field_dictionary):

    inputStream = open(infile, "rb")
    pdf_reader = PdfFileReader(inputStream, strict=False)
    if "/AcroForm" in pdf_reader.trailer["/Root"]:
        pdf_reader.trailer["/Root"]["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})

    pdf_writer = PdfFileWriter()
    set_need_appearances_writer(pdf_writer)
    if "/AcroForm" in pdf_writer._root_object:
        pdf_writer._root_object["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})

    pdf_writer.addPage(pdf_reader.getPage(0))
    pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)

    outputStream = open(outfile, "wb")
    pdf_writer.write(outputStream)
    outputStream.close()
    inputStream.close()

If you do not already have the field names of your PDF, you can get them with the `get_headers` function.

In [47]:
def get_headers(infile):
    inputStream = open(infile, "rb")
    pdf_reader = PdfFileReader(inputStream, strict=False)
    fields = pdf_reader.getFields().keys()
    print('Headers needed for data file: {}'.format(list(fields)))

In [48]:
# Example
get_headers('fillable_draft_profile.pdf')

Headers needed for data file: ['Rank', 'Name', 'Position', 'Age', 'Height', 'Weight', 'Team', 'Year', 'Points', 'Rebounds', 'Assists', 'Blocks', 'Steals']


## Implementation
And now for the implementation. We will need two additional files along with theprevious functions:
 1. A fillable PDF.
 2. A CSV file containing the data to enter into the PDF.
 
If you have a PDF that is not fillable, there are programs that can add input boxes to a regular PDF. Wondershare PDFelement is one such program that is free, although it will leave a watermark on your final PDF. The CSV file should contain an initial header row that matches the field names of the fillable PDF file. The remaining rows should contain the data you want entered into each field, where each row will create a separate PDF file. 

For our example, we have a simple draft profile PDF that we will fill out for 65 NBA prospects ahead of the 2020 NBA Draft. The data is stored in a CSV file that can be previewed below. (We also converted one column's data from a float to an integer)

In [49]:
df = pd.read_csv('nba_info.csv')
df.astype({'Rank': 'int64'})
df.head()

Unnamed: 0,Rank,Name,Position,Team,Check Box1,Check Box2,Height,Weight,Year,Age,Points,Rebounds,Assists,Blocks,Steals
0,1,LaMelo Ball,PG,Illawarra,No,Yes,"6'7""",190 lbs,International,19.1 yrs,19.6,8.7,7.9,0.1,1.8
1,2,Onyeka Okongwu,PF/C,USC,Yes,No,"6'9""",245 lbs,Freshman,19.8 yrs,19.0,10.2,1.3,3.2,1.4
2,3,Killian Hayes,PG/SG,Ratiopharm Ulm,No,Yes,"6'5""",187 lbs,International,19.2 yrs,16.8,4.1,7.8,0.4,2.1
3,4,James Wiseman,C,Memphis,Yes,No,"7'1""",237 lbs,Freshman,19.5 yrs,30.8,16.7,0.5,4.7,0.5
4,5,Tyrese Haliburton,PG,Iowa State,Yes,No,"6'5""",175 lbs,Sophomore,20.6 yrs,14.9,5.8,6.3,0.7,2.4


Once your dataframe is formatted appropriately, you can fill in your PDFs. The `create_pdf` function takes three inputs: the fillable PDF file, the name of the output file, and a dictionary of the data to input. This dictionary's keys correspond to the names of the PDF fields (also the dataframe headers) and the values correspond to the data to be entered. The code cell below organizes the dataframe into the appropriate dictionary and creates the PDF output files. 

In [50]:
data =  [row.to_dict() for _, row in df.iterrows()]
infile = "fillable_draft_profile.pdf"

for person in data:
    outfile = 'output/' + person['Name'] + " out.pdf"
    create_pdf(infile, outfile, person)
    print("File created for:", person['Name'])

File created for: LaMelo Ball
File created for: Onyeka Okongwu
File created for: Killian Hayes
File created for: James Wiseman
File created for: Tyrese Haliburton
File created for: Obi Toppin
File created for: Anthony Edwards
File created for: Deni Avdija
File created for: Devin Vassell
File created for: Aleksej Pokusevski
File created for: RJ Hampton
File created for: Josh Green
File created for: Isaac Okoro
File created for: Tyler Bey
File created for: Jalen Smith
File created for: Saddiq Bey
File created for: Paul Reed
File created for: Kira Lewis Jr.
File created for: Cole Anthony
File created for: Jahmi'us Ramsey
File created for: Tyrese Maxey
File created for: Aaron Nesmith
File created for: Nico Mannion
File created for: Theo Maledon
File created for: Tre Jones
File created for: Tyrell Terry
File created for: Reggie Perry
File created for: Vernon Carey Jr.
File created for: Isaiah Stewart
File created for: Cassius Stanley
File created for: Leandro Bolmaro
File created for: Desmo

Done.