# Project Title: Resume Analysis and Credibility Evaluation Using NLP

## Project Description

This project aims to develop a Python-based tool for processing and analyzing resumes to extract and evaluate key information using Natural Language Processing (NLP). The tool will utilize libraries such as PyMuPDF for PDF processing and python-docx for Word document processing. The core functionalities of the tool are as follows:

### Installation of Necessary Libraries:

- Install the required libraries: `PyMuPDF` for PDF text extraction and `python-docx` for Word document text extraction.

### File Reading and Text Extraction:

- Read Excel files containing lists of universities and companies.
- Extract text from resumes provided in PDF format using `PyMuPDF`.
- Extract text from resumes provided in Word (.docx) format using `python-docx`.

### Named Entity Extraction:

- Identify and extract named entities such as universities and companies from the resume text.

### Resume Credibility Evaluation:

- Implement a scoring system to assess the credibility of the resume based on the identified universities and companies.
- Utilize the university rankings from the Excel file to assign points:
  - Top-ranked university (1-10): +3 points
  - Medium-ranked university (11-30): +2 points
  - Lower-ranked university (31+): +1 point
- Utilize the company status (Fortune 500) from the Excel file to assign points:
  - Fortune 500 company: +2 points
  - Recognized company (not Fortune 500): +1 point
- Limit the total credibility score to a maximum of 10 points.

### Output the Results:

- Display the extracted universities and companies from the resume.
- Print the credibility score along with an explanation of the rating scale.


In [24]:
!pip install PyMuPDF python-docx pandas openpyxl



In [25]:
import fitz  # PyMuPDF
import pandas as pd

# Load Excel file
file_path_excel = '/content/pakistan_companies_universities (1).xlsx'
df = pd.read_excel(file_path_excel)

# Extract text from PDF
file_path_pdf = '/content/AliKhanCV.pdf'
pdf_document = fitz.open(file_path_pdf)
pdf_text = ""
for page_num in range(len(pdf_document)):
    page = pdf_document.load_page(page_num)
    pdf_text += page.get_text()

# Display the extracted text from the PDF
print(pdf_text)

Ali Khan 
Email: alikhan@example.com 
Phone: +92-333-1122334 
LinkedIn: linkedin.com/in/alikhan 
Location: Rawalpindi, Pakistan 
 
Professional Summary: 
Dedicated and knowledgeable chemical engineer with over 7 years of experience in the 
chemicals industry. Expertise in process optimization, safety protocols, and project 
management. Committed to implementing innovative solutions to improve efficiency and 
ensure compliance with industry standards. 
 
Education: 
Master of Science in Chemical Engineering 
University of Engineering and Technology, Lahore 
Graduation Year: 2015 
Bachelor of Science in Chemical Engineering 
National University of Sciences and Technology (NUST)                                                     
Graduation Year: 2013 
 
Professional Experience: 
Senior Chemical Engineer 
Fauji Fertilizer Company 
Location: Rawalpindi, Pakistan 
Duration: August 2017 - Present 
• 
Led process optimization projects resulting in a 15% increase in production 
efficiency. 
•

In [26]:
import fitz  # PyMuPDF
import pandas as pd

# Load Excel file and read both sheets
file_path_excel = '/content/pakistan_companies_universities (1).xlsx'
df_companies = pd.read_excel(file_path_excel, sheet_name='Companies')
df_universities = pd.read_excel(file_path_excel, sheet_name='Universities')

# Extract text from PDF
file_path_pdf = '/content/AliKhanCV.pdf'
pdf_document = fitz.open(file_path_pdf)
pdf_text = ""
for page_num in range(len(pdf_document)):
    page = pdf_document.load_page(page_num)
    pdf_text += page.get_text()

# Display the extracted text from the PDF
print(pdf_text)

Ali Khan 
Email: alikhan@example.com 
Phone: +92-333-1122334 
LinkedIn: linkedin.com/in/alikhan 
Location: Rawalpindi, Pakistan 
 
Professional Summary: 
Dedicated and knowledgeable chemical engineer with over 7 years of experience in the 
chemicals industry. Expertise in process optimization, safety protocols, and project 
management. Committed to implementing innovative solutions to improve efficiency and 
ensure compliance with industry standards. 
 
Education: 
Master of Science in Chemical Engineering 
University of Engineering and Technology, Lahore 
Graduation Year: 2015 
Bachelor of Science in Chemical Engineering 
National University of Sciences and Technology (NUST)                                                     
Graduation Year: 2013 
 
Professional Experience: 
Senior Chemical Engineer 
Fauji Fertilizer Company 
Location: Rawalpindi, Pakistan 
Duration: August 2017 - Present 
• 
Led process optimization projects resulting in a 15% increase in production 
efficiency. 
•

In [27]:
import re

# List of universities and companies from Excel sheets
universities = df_universities['University'].dropna().tolist()
companies = df_companies['Company'].dropna().tolist()

# Extract universities and companies from resume text
extracted_universities = [uni for uni in universities if re.search(r'\b' + re.escape(uni) + r'\b', pdf_text)]
extracted_companies = [comp for comp in companies if re.search(r'\b' + re.escape(comp) + r'\b', pdf_text)]

# Display the extracted entities
print("Extracted Universities:", extracted_universities)
print("Extracted Companies:", extracted_companies)

Extracted Universities: ['University of Engineering and Technology, Lahore', 'University of Engineering and Technology, Lahore']
Extracted Companies: ['Engro Corporation', 'Fauji Fertilizer Company']


In [30]:
def evaluate_credibility(extracted_universities, extracted_companies):
    score = 0
    explanation = []

    # Assign points for universities based on ranking
    for uni in extracted_universities:
        if uni in df_universities['University'].values:
            ranking = df_universities.loc[df_universities['University'] == uni, 'Ranking'].values[0]
            if ranking <= 10:
                score += 3  # Top-ranked university
                explanation.append(f"Top-ranked University (1-10): {uni} (+3 points)")
            elif ranking <= 30:
                score += 2  # Medium-ranked university
                explanation.append(f"Medium-ranked University (11-30): {uni} (+2 points)")
            else:
                score += 1  # Lower-ranked university
                explanation.append(f"Lower-ranked University (31+): {uni} (+1 point)")

    # Assign points for companies based on Fortune 500 status
    for comp in extracted_companies:
        if comp in df_companies['Company'].values:
            is_fortune_500 = df_companies.loc[df_companies['Company'] == comp, 'Fortune 500'].values[0]
            if is_fortune_500 == 'Yes':
                score += 2  # Fortune 500 company
                explanation.append(f"Fortune 500 Company: {comp} (+2 points)")
            else:
                score += 1  # Recognized company but not Fortune 500
                explanation.append(f"Recognized Company (not Fortune 500): {comp} (+1 point)")

    # Limit the score to a maximum of 10
    score = min(score, 10)
    return score, explanation

credibility_score, credibility_explanation = evaluate_credibility(extracted_universities, extracted_companies)

In [31]:
# Print the extracted entities
print("Extracted Universities:", extracted_universities)
print("Extracted Companies:", extracted_companies)

# Print the credibility score and explanation
print(f"Credibility Score: {credibility_score} / 10")
print("Explanation of Rating:")
for item in credibility_explanation:
    print(item)

Extracted Universities: ['University of Engineering and Technology, Lahore', 'University of Engineering and Technology, Lahore']
Extracted Companies: ['Engro Corporation', 'Fauji Fertilizer Company']
Credibility Score: 8 / 10
Explanation of Rating:
Top-ranked University (1-10): University of Engineering and Technology, Lahore (+3 points)
Top-ranked University (1-10): University of Engineering and Technology, Lahore (+3 points)
Recognized Company (not Fortune 500): Engro Corporation (+1 point)
Recognized Company (not Fortune 500): Fauji Fertilizer Company (+1 point)
