<a href="https://colab.research.google.com/github/ThivyaTS/samples/blob/master/Certificate_Generator_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Certificates Generator Project

### Goal of this project is to:


1.   `Do simple Data Analysis`
          *   Deal with missing values
          *   Format the date or text

2.   `Enhance Python Concepts`


3.  `Explore more 3rd party libraries` - reportlab







---
# 1. Download & Import packages for this project


---




In [1]:
# install external package 'reportlab' because it is not in normal python
# 'reportlab' is a library to link pdf to python program

# 'pip' is a python package downloader (package manager)
# pip package manager is like Appstore / Playstore to download apps
!pip install reportlab

Collecting reportlab
  Downloading reportlab-4.2.2-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: reportlab
Successfully installed reportlab-4.2.2


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
import numpy as np
import pandas as pd

from reportlab.lib.pagesizes import landscape , A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont



---
## 2. Read & Explore the Excel file

---





In [4]:
df = pd.read_excel("/content/drive/MyDrive/1.1 Python Certificate Generator Project/dataset.xlsx")
df

# we found that the dataset is messy
# we realize that 2 of the rows are empty (half-empty with NaN , - , NaT)

# NaT - Not a Time
# NaN - Not a Number

# basically empty values (null values)

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
4,-,,Advanced,NaT
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
8,Jonathan Bucker,,,NaT
9,Robert Buck,PYTHON,Master,2023-09-17


In [5]:
df.info() # information about the dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Name         13 non-null     object        
 1   Course       11 non-null     object        
 2   CourseLevel  12 non-null     object        
 3   Date         11 non-null     datetime64[ns]
dtypes: datetime64[ns](1), object(3)
memory usage: 544.0+ bytes




---

## 3. Data Cleaning (Data Analysis)

---

`Data Cleaning` - Formatting the data before doing the certificate generator logic


*   Deal with missing values
*   Format the date & text




In [6]:
# the 3 problems with the original dataset (raw excel file)

# 1. Inconsistency formatting in "Course" & "CourseLevel" columns - (some of them are capitalized & some are all in uppercase)
# 2. Date Format (yyyy-mm-dd) --> we want to change it into (dd/mm/yyyy)
# 3. Empty rows (half-empty rows row 4 & 8)

# We are going to solve these problems using Data Analysis with Pandas!

In [7]:
# Problem 3
# dropping all the rows that have AT LEAST 1 empty column
df = df.dropna()

# in this case it will drop row 4 & row 8

In [8]:
df

Unnamed: 0,Name,Course,CourseLevel,Date
0,Christy Cunningham,Python,Beginner,2023-09-10
1,Douglas Tucker,PYTHON,MASTER,2023-09-11
2,Travis Walters,Java,Intermediate,2023-09-12
3,Nathaniel Harris,Web Development,Advanced,2023-09-13
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14
6,Erik Smith,Mobile Development,Beginner,2023-09-15
7,Kristopher Johnson,Python,Beginner,2023-09-16
9,Robert Buck,PYTHON,Master,2023-09-17
10,Joseph Mcdonald,Java,Intermediate,2023-09-18
11,Jerome Abbott,Web Development,Advanced,2023-09-19


In [9]:
# Problem 2 : Date Format (yyyy-mm-dd) --> we want to change it into (dd/mm/yyyy)

df['Date']
# this gives us the 'Date' column in Pandas Series
# Date Format (yyyy-mm-dd) --> we want to change it into (dd/mm/yyyy)

0    2023-09-10
1    2023-09-11
2    2023-09-12
3    2023-09-13
5    2023-09-14
6    2023-09-15
7    2023-09-16
9    2023-09-17
10   2023-09-18
11   2023-09-19
12   2023-09-20
Name: Date, dtype: datetime64[ns]

In [10]:
df["Date"][0]

Timestamp('2023-09-10 00:00:00')

In [11]:
# accessing the first row of 'Date' column
# .iloc --> index location
df['Date'].iloc[0]

# Timestamp is a datatype to represent time in (excel , df)
# For example , we need to change Timestamp('2023-09-10 00:00:00') --> "10/09/2023"

Timestamp('2023-09-10 00:00:00')

In [12]:
# create a new column called 'FormattedDate', and create a new date format based off of 'Date' column
# then remove 'Date' column

df['FormattedDate'] = df['Date'].dt.strftime("%d/%m/%Y")

# in order to format dates, you can access/use 'dt' in Pandas
# dt --> datetime (you will access to a lot of commands on dealing with dates inside the column)
# one of the command in dt is -->  strftime
# strftime (string formatted time) --> convert timestamp object into string representation (following a specified format)

# Y --> 2023
# y --> 23

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['FormattedDate'] = df['Date'].dt.strftime("%d/%m/%Y")


In [13]:
df

Unnamed: 0,Name,Course,CourseLevel,Date,FormattedDate
0,Christy Cunningham,Python,Beginner,2023-09-10,10/09/2023
1,Douglas Tucker,PYTHON,MASTER,2023-09-11,11/09/2023
2,Travis Walters,Java,Intermediate,2023-09-12,12/09/2023
3,Nathaniel Harris,Web Development,Advanced,2023-09-13,13/09/2023
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14,14/09/2023
6,Erik Smith,Mobile Development,Beginner,2023-09-15,15/09/2023
7,Kristopher Johnson,Python,Beginner,2023-09-16,16/09/2023
9,Robert Buck,PYTHON,Master,2023-09-17,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,2023-09-18,18/09/2023
11,Jerome Abbott,Web Development,Advanced,2023-09-19,19/09/2023


In [14]:
df['FormattedDate'].dtype

# O --> Object --> String

dtype('O')

In [15]:
df.head()

Unnamed: 0,Name,Course,CourseLevel,Date,FormattedDate
0,Christy Cunningham,Python,Beginner,2023-09-10,10/09/2023
1,Douglas Tucker,PYTHON,MASTER,2023-09-11,11/09/2023
2,Travis Walters,Java,Intermediate,2023-09-12,12/09/2023
3,Nathaniel Harris,Web Development,Advanced,2023-09-13,13/09/2023
5,Tonya Carter,AI & Machine Learning,Beginner,2023-09-14,14/09/2023


In [16]:
# remove the 'Date' column (because it is useless to us)
df = df.drop("Date",axis = 1)

# axis 0 = row
# axis 1 = column
# we are dropping the 'Date' column

In [17]:
df

Unnamed: 0,Name,Course,CourseLevel,FormattedDate
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,PYTHON,MASTER,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web Development,Advanced,13/09/2023
5,Tonya Carter,AI & Machine Learning,Beginner,14/09/2023
6,Erik Smith,Mobile Development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,PYTHON,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web Development,Advanced,19/09/2023


In [18]:
# Problem 1. Inconsistency formatting in "Course" & "CourseLevel" columns - (some of them are capitalized & some are all in uppercase)

# in order to format string , u can access 'str' in Pandas

df['Course'] = df['Course'].str.capitalize() # broadcasting function --> broadcast the logic of capitalization to all the rows in this 'Course' column
df['CourseLevel'] = df['CourseLevel'].str.capitalize() # broadcasting function --> broadcast the logic of capitalization to all the rows in this 'CourseLevel' column

In [19]:
df

Unnamed: 0,Name,Course,CourseLevel,FormattedDate
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web development,Advanced,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,14/09/2023
6,Erik Smith,Mobile development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web development,Advanced,19/09/2023


In [20]:
# we are done with basic data analysis (formatting the data!)

# 4. Registering Fonts



---

In [21]:
fonts_path = "/content/drive/MyDrive/1.1 Python Certificate Generator Project/fonts"

# Defining the path to the fonts folder (it is like storing a string inside 'fonts_path' variable)

In [22]:
# registering 2 fonts into this project

# pdfmetrics is 1 of the packages that we imported! (this package is used to register fonts so that we can use it with pdf files)

# one of the commands in pdfmetrics is --> registerFont()

# TTFont() will need 2 inputs
# 1st input --> what font?
# 2nd input --> where is the file of the font ?

pdfmetrics.registerFont(TTFont('Lora-Bold' , fonts_path + "/Lora-Bold.ttf"))
pdfmetrics.registerFont(TTFont('Lora-Regular' , fonts_path + "/Lora-Regular.ttf"))

In [23]:
# after registering , we can use these fonts in the project later

# 5. Creating Certificate Generator Logic Function



---



### Version 1 of the Function

In [24]:
df

Unnamed: 0,Name,Course,CourseLevel,FormattedDate
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web development,Advanced,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,14/09/2023
6,Erik Smith,Mobile development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web development,Advanced,19/09/2023


In [25]:
# in this function , the logic for creating a cert generator will be inside.

# this function will need 4 input parameters (name,courseName,courseLevel,date)
# we are going to REUSE the FUNCTION 11 times (since there are 11 students) - a for loop

# for each students , this function will receive their name,courseName,courseLevel,date

def certificate_generator(name,courseName,courseLevel,date):

  pdf_file_name = "/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificates/" + name + "-" + courseName + "-" + courseLevel + ".pdf"

  # canvas = blank screen (A4 landscape size)
  # we are storing the canvas as .pdf (inside file path --> "pdf_file_name")

  # creating a canvas object from reportLab import and inserting our certificates FOLDER path into it
  # we are storing it inside a variable called 'c'

  # basically here, we have a blank landscape A4 virtual paper in our .pdf file
  # Canvas() needs 2 inputs
  # 1st input - where are you storing it ? and the file name
  # 2nd input - what size ?
  c = canvas.Canvas(pdf_file_name, pagesize = landscape(A4))

  # the canvas' drawImage() needs 5 inputs
  # 1st input --> image --> what are you drawing on the empty canvas ? --> certificate_template.jpg
  # 2nd input --> x axis (coordinate 0)
  # 3rd input --> y axis (coordinate 0)\
  # 4th input --> width --> A4[1] (using the A4 variable we imported)
  # 5th input --> height --> A4[0] (using the A4 variable we imported)

  # A4[1] --> standard A4 width
  # A4[0] --> standard A4 height

  # "I want the size of the image to be a normal A4 width and height"
  c.drawImage("/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificate_template.jpg" , 0 ,0, width = A4[1] , height = A4[0])

  # up till this point , we basically have an empty A4 landscape canvas --> drew it with the cert template
  c.save() # save the canvas


In [26]:
certificate_generator("Thivya" , "Python" , "Beginner" , "30/06/2024")
certificate_generator("ALi" , "Python" , "Masters" , "30/06/2024")



---

### Version 2 of the Function

In [27]:
def certificate_generator(name,courseName,courseLevel,date):

  pdf_file_name = "/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificates/" + name + "-" + courseName + "-" + courseLevel + ".pdf"

  c = canvas.Canvas(pdf_file_name, pagesize = landscape(A4))

  c.drawImage("/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificate_template.jpg" , 0 ,0, width = A4[1] , height = A4[0])
  # up till this point , we basically have an empty A4 landscape canvas --> drew it with the cert template

  # Before populating the data (4 inputs --> name,courseName,courseLevel,date) into the certificates...

  # Let's calculate the center of the A4 landscape paper

  print('---------------------------')
  # middle of the A4 page (width) --> middle / 2
  center_x  = c._pagesize[0] / 2

  # crosscheck by printing the value
  print("Full width of the A4 is : " , c._pagesize[0]) # Full width of the A4 canvas size
  print("Center of the A4 x-axis is : " , center_x) # center A4 canvas size (x-axis)

  print('---------------------------')


  center_y = c._pagesize[1] / 2

  # crosscheck by printing the value
  print("Full height of the A4 is : " , c._pagesize[1]) # Full height of the A4 canvas size
  print("Center of the A4 y-axis is : " , center_y) # center A4 canvas size (y-axis)

  print('---------------------------')

  c.save()

In [28]:
certificate_generator("Lee" , "Python" , "Beginner" , "30/06/2024")
certificate_generator("Jason" , "Python" , "Masters" , "30/06/2024")

---------------------------
Full width of the A4 is :  841.8897637795277
Center of the A4 x-axis is :  420.94488188976385
---------------------------
Full height of the A4 is :  595.2755905511812
Center of the A4 y-axis is :  297.6377952755906
---------------------------
---------------------------
Full width of the A4 is :  841.8897637795277
Center of the A4 x-axis is :  420.94488188976385
---------------------------
Full height of the A4 is :  595.2755905511812
Center of the A4 y-axis is :  297.6377952755906
---------------------------




---
### Version 3 of the Function


In [30]:
def certificate_generator(name,courseName,courseLevel,date):

  pdf_file_name = "/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificates/" + name + "-" + courseName + "-" + courseLevel + ".pdf"

  c = canvas.Canvas(pdf_file_name, pagesize = landscape(A4))

  c.drawImage("/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificate_template.jpg" , 0 ,0, width = A4[1] , height = A4[0])

  center_x  = c._pagesize[0] / 2
  center_y = c._pagesize[1] / 2

  # now that we got the centre of x , y axis..
  # let's now set the font and draw the text

  #---------------------------------------
  # .setFont() --> needs 2 inputs

  # 1. Name
  c.setFont('Lora-Bold' , 30)
  c.drawCentredString(center_x , center_y - 46, name)

  # 2. CourseName & CourseLevel
  c.setFont('Lora-Bold' , 28)
  c.drawCentredString(center_x, center_y - 105 , courseName + " - " + courseLevel)

  # 3. Date
  c.setFont('Lora-Bold' , 17)
  c.drawCentredString(center_x + 190 , center_y - 160, date)

  # 4. Cert ID
  cert_id = "Cert ID: " + str(int(pd.Timestamp.now().timestamp())) # Generating a UNIQUE id for the certificates based on the current time (timestamp)
  c.setFont('Lora-Regular' , 12)

  c.save()

In [31]:
str(int(pd.Timestamp.now().timestamp()))

'1720956469'

In [32]:
print(pd.Timestamp.now())
print(pd.Timestamp.now().timestamp()) # Unix timestamp (number of seconds since January 1 1970 )
print(int(pd.Timestamp.now().timestamp())) # removing the floating point

2024-07-14 11:27:49.526356
1720956469.526599
1720956469


In [33]:
certificate_generator("Abu" , "Python" , "Masters" , "30/06/2024")

### Version 4 of the Function

In [34]:
def certificate_generator(name,courseName,courseLevel,date):

  pdf_file_name = "/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificates/" + name + "-" + courseName + "-" + courseLevel + ".pdf"

  c = canvas.Canvas(pdf_file_name, pagesize = landscape(A4))

  c.drawImage("/content/drive/MyDrive/1.1 Python Certificate Generator Project/certificate_template.jpg" , 0 ,0, width = A4[1] , height = A4[0])

  center_x  = c._pagesize[0] / 2
  center_y = c._pagesize[1] / 2

  # now that we got the centre of x , y axis..
  # let's now set the font and draw the text

  #---------------------------------------
  # .setFont() --> needs 2 inputs

  # 1. Name
  c.setFont('Lora-Bold' , 30)
  c.drawCentredString(center_x , center_y - 46, name)

  # 2. CourseName & CourseLevel
  c.setFont('Lora-Bold' , 28)
  c.drawCentredString(center_x, center_y - 105 , courseName + " - " + courseLevel)

  # 3. Date
  c.setFont('Lora-Bold' , 17)
  c.drawCentredString(center_x + 190 , center_y - 160, date)

  # 4. Cert ID
  cert_id = "Cert ID: " + str(int(pd.Timestamp.now().timestamp())) # Generating a UNIQUE id for the certificates based on the current time (timestamp)
  c.setFont('Lora-Regular' , 12)
  c.drawCentredString(center_x + 266 , center_y - 230 , cert_id.upper())

  c.save()

In [35]:
# instead of using the function manually like we did before, we are going to loop through all the rows in the DATAFRAME
df

Unnamed: 0,Name,Course,CourseLevel,FormattedDate
0,Christy Cunningham,Python,Beginner,10/09/2023
1,Douglas Tucker,Python,Master,11/09/2023
2,Travis Walters,Java,Intermediate,12/09/2023
3,Nathaniel Harris,Web development,Advanced,13/09/2023
5,Tonya Carter,Ai & machine learning,Beginner,14/09/2023
6,Erik Smith,Mobile development,Beginner,15/09/2023
7,Kristopher Johnson,Python,Beginner,16/09/2023
9,Robert Buck,Python,Master,17/09/2023
10,Joseph Mcdonald,Java,Intermediate,18/09/2023
11,Jerome Abbott,Web development,Advanced,19/09/2023


In [36]:
# there are 11 students , the idea is USING THE FUNCTION 11 TIMES ( with 4 diffferent inputs --> name , courseName , courseLevel , date)

# use a loop ---> loop 11 times --> use the function 11 times based on 4 different inputs

for x in ["Ali","Abu","Kumar","Chong"]:
  print(x)

Ali
Abu
Kumar
Chong


In [37]:
# df.iterrows gives you a loopable (array-like)
# df.iterrows gives you 2 items (index , item(row))

for index , row in df.iterrows():
  print(index)

0
1
2
3
5
6
7
9
10
11
12


In [39]:
for index , row in df.iterrows():
  print(row)
  print('---------------------------------------')
  print()

Name             Christy Cunningham
Course                       Python
CourseLevel                Beginner
FormattedDate            10/09/2023
Name: 0, dtype: object
---------------------------------------

Name             Douglas Tucker
Course                   Python
CourseLevel             Master 
FormattedDate        11/09/2023
Name: 1, dtype: object
---------------------------------------

Name             Travis Walters
Course                     Java
CourseLevel        Intermediate
FormattedDate        12/09/2023
Name: 2, dtype: object
---------------------------------------

Name             Nathaniel Harris
Course            Web development
CourseLevel              Advanced
FormattedDate          13/09/2023
Name: 3, dtype: object
---------------------------------------

Name                      Tonya Carter
Course           Ai & machine learning
CourseLevel                   Beginner
FormattedDate               14/09/2023
Name: 5, dtype: object
-----------------------------

In [40]:
for index , row in df.iterrows():
  certificate_generator(row['Name'] , row['Course'] , row['CourseLevel'], row['FormattedDate'])

print(str(len(df)) + " certificates generated successfully! ")

11 certificates generated successfully! 
