# Certificate Generator

Goal of the project:
1. Do simple Data Analysis
  - Deal with missing values
  - Format the date or text
2. Enhance Pyhton concepts
3. Explore more third party libraries - reportlab

## 1. Download external libraries for this project

In [1]:
# reportlab is a library to deal with pdf files in python

# 'pip' python package manager --> your appstore to download apps
!pip install reportlab

Collecting reportlab
  Downloading reportlab-4.3.1-py3-none-any.whl.metadata (1.7 kB)
Downloading reportlab-4.3.1-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: reportlab
Successfully installed reportlab-4.3.1


In [2]:
import reportlab

## 2. Read and explore the dataset

In [3]:
import pandas as pd
import numpy as np

from reportlab.lib.pagesizes import landscape, A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

In [4]:
# mount google drive

from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [5]:
df = pd.read_excel('/content/drive/MyDrive/certgen/dataset.xlsx')

In [6]:
df

# NaN represents as a 'Not a Number'
# NaT represents as 'Not a Time (empty/null values) --> will appear as empty inside a timestamp column

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
4,-,,Advanced,NaT
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
8,Jonathan Bucker,,,NaT
9,Robert Buck,PYTHON,Master,2023-09-17


In [7]:
df.info() # information is about the dataset

# non-null (not empty) --> 13 non-null --> none of the 13 rows are empty (there is data)
# from this info, we know that there are 4 columns and 13 rows (some of them are empty)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Name         13 non-null     object        
 1   Course       11 non-null     object        
 2   CourseLevel  12 non-null     object        
 3   Date         11 non-null     datetime64[ns]
dtypes: datetime64[ns](1), object(3)
memory usage: 548.0+ bytes


In [8]:
# before jumping into the next section, what shoul be the next step

# a. create the cert logic
# b. something else...


## 3. Data cleaning (Data Analysis)
- Formathing the data before doing the cert generator logic
- Deal with missing values
- Format the date or text

In [9]:
# problems with the original dataset

# 1. Empty rows(half empty)
# 2. Inconsistency formatting in "CourseLevel" columns - some are capitalized but some are uppercased
# 3. Date format(yyyy-mm-dd) --> change into (dd/mm/yyyy)

# we are going these problems with Data Analysis using Pandas

In [10]:
# Problem 1
df = df.dropna() # remove/drop all the rows that have atleast 1 empty value

# in this case it will drop row 4 & 8

In [11]:
df

# from 13 --> 11 data left

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
9,Robert Buck,PYTHON,Master,2023-09-17
10,Joseph Mcdonald,Java,Intermediate,2023-09-18
11,Jerome Abbott,Web Development,Advanced,2023-09-19


In [12]:
# Problem 2: Inconsistency formattin in 'Course' & 'CourseLevel' columns
# in order the format string, u can access 'str' in Pandas
# df(column).str.stringmethod()

df['Course'] = df['Course'].str.capitalize()
df['CourseLevel'] = df['CourseLevel'].str.title()

# broadcasting function/method --> broadcast the logic of 'title' to all the rows in 'Course' & 'CourseLevel' columns
# we will learn how to format advance strings in a pandas chapter later on

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Course'] = df['Course'].str.capitalize()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['CourseLevel'] = df['CourseLevel'].str.title()


In [13]:
df

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,Python,Master,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web development,Advanced,2023-09-13
5,Tonya Carter,Ai & machine learning,Beginner,2023-09-14
6,Erik Smith,Mobile development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
9,Robert Buck,Python,Master,2023-09-17
10,Joseph Mcdonald,Java,Intermediate,2023-09-18
11,Jerome Abbott,Web development,Advanced,2023-09-19


In [14]:
# Problem 3: Date format (yyyy-mm-dd) --> change into (dd/mm/yyyy)

df['Date']
# this date gives us Date column in panda series

Unnamed: 0,Date
0,2023-09-10
1,2023-09-11
2,2023-09-12
3,2023-09-13
5,2023-09-14
6,2023-09-15
7,2023-09-16
9,2023-09-17
10,2023-09-18
11,2023-09-19


In [15]:
# access individually
# iloc --> index location
df['Date'].iloc[0]

# Timestamp is a date type to represent time in (excel,df)
# Ex: we need to change timestamp ['2023-09-10 00:00:00] --> '10/09/2023'

Timestamp('2023-09-10 00:00:00')

In [16]:
# cretae a new column called 'Formatted Date', and create a new date format based off of 'Date' column
# then remove 'Date' column

df['Formatted Date'] = df['Date'].dt.strftime('%d/%m/%Y')

# dt --> datetime (u will get access a lot of commands on dealing with dates inside a column)
# one of the commands in 'dt' is --> strtftime
# strftime (string formatted time) --> converts timestamp object into string representations (follow a specified format)

# Y --> 2023
# y --> 23

# we will learn how to format advance dates and time in a pandas chapter later on


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Formatted Date'] = df['Date'].dt.strftime('%d/%m/%Y')


In [17]:
df['Formatted Date']

# Object --> string

Unnamed: 0,Formatted Date
0,10/09/2023
1,11/09/2023
2,12/09/2023
3,13/09/2023
5,14/09/2023
6,15/09/2023
7,16/09/2023
9,17/09/2023
10,18/09/2023
11,19/09/2023


In [18]:
# remove the 'Date' column |(because it is useless to us)

df.drop("Date", axis=1)

# axis = 0
# axis = 1
# we're dropping the 'Date' column

Unnamed: 0,Name,Course,CourseLevel,Formatted Date
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web development,Advanced,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,14/09/2023
6,Erik Smith,Mobile development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web development,Advanced,19/09/2023


In [19]:
df.head()

Unnamed: 0,Name,Course,CourseLevel,Date,Formatted Date
0,Christy Cunningham,Python,Beginner,2023-09-10,10/09/2023
1,Douglas Tucker,Python,Master,2023-09-11,11/09/2023
2,Travis Walters,Java,Intermediate,2023-09-12,12/09/2023
3,Nathaniel Harris,Web development,Advanced,2023-09-13,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,2023-09-14,14/09/2023


In [20]:
# we arw done with basic analysis (formatting the date)

## 4. Registering fonts into our project

In [21]:
# register 2 fonts into this project

In [22]:
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

In [23]:
font_path = '/content/drive/MyDrive/certgen/fonts'

# defining the path to the fonts folder --> storing a string inside 'fonts_path' variable

In [24]:
# pdfmetrics --> is used to register fonts

# we need to register 2 fonts (lora bold, lora regular)

# TTFont() needs 2 inputs
# 1st input --> what font are u registered?
# 2nd input --> where is the file of the font?


pdfmetrics.registerFont(TTFont('Lora-Bold', '/content/drive/MyDrive/certgen/fonts/Lora-Bold.ttf'))

pdfmetrics.registerFont(TTFont('Lora-Regular', '/content/drive/MyDrive/certgen/fonts/Lora-Regular.ttf'))



## 5. Certificate Generator Logic


In [25]:
# function --> pre defined code, so you can run it multiple times without having to rewrite the code
def addition(x, y):
  print(x + y)

addition(2, 3)
addition(5, 18)

5
23


In [26]:
def certificate_generator(name, course, courseLevel, date):
  # file path for our generated certificates
  pdf_file_name = '/content/drive/MyDrive/certgen/certificates/' + name + "-" + course + "-" + courseLevel + ".pdf"

  # 1. empty canvas
  # .Canvas() needs 2 inputs: where are you storing it, what size
  c = canvas.Canvas(pdf_file_name, pagesize=landscape(A4))

  # 2. draw image on canvas
  # .drawImage() needs 5 inputs: image, x-coordinate, y-coordinate, width, height
  c.drawImage("/content/drive/MyDrive/certgen/certificate_template.jpg", 0, 0, width=A4[1], height=A4[0])
  # A4 = [heigth, width]

  # up till this point, we basically have an empty A4 landscape canvas --> drew it with the cert template image

  # 3. middle coordinate of the center of canvas
  center_x = c._pagesize[0]/2 # width/2
  center_y = c._pagesize[1]/2 # height/2

  # A4 uses actual measurement (inch)
  # _pagesize uses measurement in pdf

  # 4. populate data onto the canvas
  # .setFont() needs 2 inputs: name of font, size of font
  # .drawCentredString() needs 3 inputs: x-coord, y-coord, data we're writing

  # Name
  c.setFont('Lora-Bold', 30)
  c.drawCentredString(center_x, center_y-46, name) # Corrected method name

  # Course
  c.setFont('Lora-Bold', 20)
  c.drawCentredString(center_x, center_y-100, course + "-" + courseLevel) # Corrected method name

  # Date
  c.setFont('Lora-Bold', 17)
  c.drawCentredString(center_x + 190, center_y-160, date) # Corrected method name

  # Cert ID
  cert_id = str(pd.Timestamp.now().timestamp()).replace(".", "")
  c.setFont("Lora-Regular", 12)
  c.drawCentredString(center_x + 250, center_y - 230, "CERT ID: " + cert_id)

  # CourseLevel

  c.save()

In [27]:
certificate_generator("John", "Pyhton", "Beginner", "12/04/2024")

In [28]:
# for Cert ID, it has to be unique, we can uniquely generate an ID based on current time
print(pd.Timestamp.now()) # current time GMT +8
print(pd.Timestamp.now().timestamp()) # Unix timestamp (represent the number of seconds since January 1 1970)
#1, 731, 678, 078 seconds since January 1 1970
print(str(pd.Timestamp.now().timestamp()).replace(".", ""))

2025-02-27 06:20:33.010101
1740637233.010562
1740637233010731


In [29]:
# Run function for every row of dataset

In [30]:
df

Unnamed: 0,Name,Course,CourseLevel,Date,Formatted Date
0,Christy Cunningham,Python,Beginner,2023-09-10,10/09/2023
1,Douglas Tucker,Python,Master,2023-09-11,11/09/2023
2,Travis Walters,Java,Intermediate,2023-09-12,12/09/2023
3,Nathaniel Harris,Web development,Advanced,2023-09-13,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,2023-09-14,14/09/2023
6,Erik Smith,Mobile development,Beginner,2023-09-15,15/09/2023
7,Kristopher Johnson,Python,Beginner,2023-09-16,16/09/2023
9,Robert Buck,Python,Master,2023-09-17,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,2023-09-18,18/09/2023
11,Jerome Abbott,Web development,Advanced,2023-09-19,19/09/2023


In [31]:
# there are 11 students, the idea is using the function 11 times (with different 4 inputs)

# use a for loop --> loop 11 times --> use the function 11 times based on diff 4 inputs

for x in ("John", "The", "Duck"):
  print(x)

John
The
Duck


In [32]:
# enumerate() keeps track if the index of each item

for index, x in enumerate(["John", "The", "Duck"]):
  print(index, x)

# index --> index of each item
# x --> item

0 John
1 The
2 Duck


In [34]:
# .iterrows gives u a loopable (array-like) every row
# gives u 2 items (index, items row)

for index, row in df.iterrows():
  print(index)
  print(row)
  print("------------")

0
Name               Christy Cunningham
Course                         Python
CourseLevel                  Beginner
Date              2023-09-10 00:00:00
Formatted Date             10/09/2023
Name: 0, dtype: object
------------
1
Name                   Douglas Tucker
Course                         Python
CourseLevel                   Master 
Date              2023-09-11 00:00:00
Formatted Date             11/09/2023
Name: 1, dtype: object
------------
2
Name                   Travis Walters
Course                           Java
CourseLevel              Intermediate
Date              2023-09-12 00:00:00
Formatted Date             12/09/2023
Name: 2, dtype: object
------------
3
Name                 Nathaniel Harris
Course                Web development
CourseLevel                  Advanced
Date              2023-09-13 00:00:00
Formatted Date             13/09/2023
Name: 3, dtype: object
------------
5
Name                       Tonya Carter
Course            Ai & machine learning
Course

In [40]:
for index, row in df.iterrows():
  certificate_generator(row['Name'], row['Course'], row['CourseLevel'], row['Formatted Date'])

print(str(len(df)) + " certificates generated successfully.")

11 certificates generated successfully.


In [42]:
# columns
for x in df:
  print(x)

# rows
for index, x in df.iterrows(): # Added parentheses here
  print(x)

Name
Course
CourseLevel
Date
Formatted Date
Name               Christy Cunningham
Course                         Python
CourseLevel                  Beginner
Date              2023-09-10 00:00:00
Formatted Date             10/09/2023
Name: 0, dtype: object
Name                   Douglas Tucker
Course                         Python
CourseLevel                   Master 
Date              2023-09-11 00:00:00
Formatted Date             11/09/2023
Name: 1, dtype: object
Name                   Travis Walters
Course                           Java
CourseLevel              Intermediate
Date              2023-09-12 00:00:00
Formatted Date             12/09/2023
Name: 2, dtype: object
Name                 Nathaniel Harris
Course                Web development
CourseLevel                  Advanced
Date              2023-09-13 00:00:00
Formatted Date             13/09/2023
Name: 3, dtype: object
Name                       Tonya Carter
Course            Ai & machine learning
CourseLevel             