Project ID - #CC3606

Project Title - Personality Prediction System via CV Analysis

Internship Domain - Artificial Intelligence Intern

Project Level - Golden Level

Assigned By- CodeClause Internship

Assigned To-Lakshmipriya K

Start Date -01 Oct 2024

End Date - 31 Oct 2024

Project Details-

Aim -
Develop an AI-driven system that predicts an individual's personality traits by
analyzing their Curriculum Vitae (CV) or resume.

Description-
This project aims to create a sophisticated personality prediction system that utilizes
natural language processing (NLP) and machine learning techniques to analyze the
textual content of CVs. The system will extract relevant information from resumes,
such as educational background, work experience, skills, and achievements. By
employing sentiment analysis, linguistic pattern recognition, and personality trait
models, the system will predict personality characteristics like extroversion,
conscientiousness, openness, agreeableness, and neuroticism.

Technologies-
Python, OpenCV, Deep reinforcement learning frameworks
You can use other technologies that you know.

What You Learn-
Computer vision, reinforcement learning.


In [28]:
!pip install nltk pandas scikit-learn




In [43]:
from google.colab import files
uploaded = files.upload()


Saving resume.docx to resume.docx


In [45]:
from docx import Document

def read_docx(file_path):
    doc = Document(file_path)
    text = "\n".join([para.text for para in doc.paragraphs])
    return text

resume_text = read_docx('resume.docx')


In [46]:
import nltk
from nltk.tokenize import word_tokenize
import string

nltk.download('punkt')

def preprocess_text(text):
    # Remove punctuation and convert to lowercase
    text = text.translate(str.maketrans('', '', string.punctuation)).lower()
    # Tokenize the text
    tokens = word_tokenize(text)
    return tokens

tokens = preprocess_text(resume_text)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [47]:
print(resume_text)


John Doe
Phone: +123456789
Email: john.doe@email.com
LinkedIn: linkedin.com/in/johndoe

Professional Summary
Experienced software engineer with over 5 years of experience in Python development and machine learning. Proven ability to lead projects, collaborate with cross-functional teams, and deliver robust technical solutions. Strong problem-solving skills and a passion for continuous learning.
Work Experience
Software Engineer
ABC Tech Solutions | Jan 2020 - Present
- Led the development of scalable applications using Python and Flask.
- Implemented machine learning models for predictive analytics.
- Collaborated with data scientists to enhance algorithm performance by 15%.
- Improved API response time by 25% by optimizing backend code.

Junior Developer
XYZ Innovations | Jan 2018 - Dec 2019
- Developed data processing scripts using Python and Pandas.
- Assisted in the design and development of web applications using Django.
- Contributed to team-wide code reviews and bug-fixing effor

In [48]:
tokens = preprocess_text(resume_text)
print(tokens)

['john', 'doe', 'phone', '123456789', 'email', 'johndoeemailcom', 'linkedin', 'linkedincominjohndoe', 'professional', 'summary', 'experienced', 'software', 'engineer', 'with', 'over', '5', 'years', 'of', 'experience', 'in', 'python', 'development', 'and', 'machine', 'learning', 'proven', 'ability', 'to', 'lead', 'projects', 'collaborate', 'with', 'crossfunctional', 'teams', 'and', 'deliver', 'robust', 'technical', 'solutions', 'strong', 'problemsolving', 'skills', 'and', 'a', 'passion', 'for', 'continuous', 'learning', 'work', 'experience', 'software', 'engineer', 'abc', 'tech', 'solutions', 'jan', '2020', 'present', 'led', 'the', 'development', 'of', 'scalable', 'applications', 'using', 'python', 'and', 'flask', 'implemented', 'machine', 'learning', 'models', 'for', 'predictive', 'analytics', 'collaborated', 'with', 'data', 'scientists', 'to', 'enhance', 'algorithm', 'performance', 'by', '15', 'improved', 'api', 'response', 'time', 'by', '25', 'by', 'optimizing', 'backend', 'code', 'j

In [52]:
# Updated keywords for personality traits based on resume context
keywords = {
    "extroversion": ["outgoing", "sociable", "talkative", "friendly", "collaborate", "lead"],
    "conscientiousness": ["organized", "dependable", "responsible", "detail-oriented", "lead", "strong", "skills"],
    "openness": ["creative", "imaginative", "curious", "adventurous", "learning"],
    "agreeableness": ["helpful", "cooperative", "kind", "trusting", "collaborated"],
    "neuroticism": ["anxious", "nervous", "sensitive", "problem-solving"]
}

def extract_features(tokens):
    personality_scores = {trait: 0 for trait in keywords.keys()}
    for token in tokens:
        for trait, words in keywords.items():
            if token in words:
                personality_scores[trait] += 1
    print("Personality Scores:", personality_scores)  # Debug: Print scores during extraction
    return personality_scores

# Assuming you have already extracted tokens from the resume
tokens = preprocess_text(resume_text)
print("Tokens:", tokens)  # Print tokens for verification

# Extract personality traits based on tokens
personality_scores = extract_features(tokens)

# Print the final scores
print("Final Personality Trait Scores:")
for trait, score in personality_scores.items():
    print(f"{trait.capitalize()}: {score}")


Tokens: ['john', 'doe', 'phone', '123456789', 'email', 'johndoeemailcom', 'linkedin', 'linkedincominjohndoe', 'professional', 'summary', 'experienced', 'software', 'engineer', 'with', 'over', '5', 'years', 'of', 'experience', 'in', 'python', 'development', 'and', 'machine', 'learning', 'proven', 'ability', 'to', 'lead', 'projects', 'collaborate', 'with', 'crossfunctional', 'teams', 'and', 'deliver', 'robust', 'technical', 'solutions', 'strong', 'problemsolving', 'skills', 'and', 'a', 'passion', 'for', 'continuous', 'learning', 'work', 'experience', 'software', 'engineer', 'abc', 'tech', 'solutions', 'jan', '2020', 'present', 'led', 'the', 'development', 'of', 'scalable', 'applications', 'using', 'python', 'and', 'flask', 'implemented', 'machine', 'learning', 'models', 'for', 'predictive', 'analytics', 'collaborated', 'with', 'data', 'scientists', 'to', 'enhance', 'algorithm', 'performance', 'by', '15', 'improved', 'api', 'response', 'time', 'by', '25', 'by', 'optimizing', 'backend', 'c

In [53]:
import numpy as np
from sklearn.linear_model import LogisticRegression

# Sample training data
X_train = np.array([[1, 0, 2, 1, 0], [0, 1, 1, 0, 3]])  # Feature vectors
y_train = np.array([0, 1])  # Labels (0: Low, 1: High for a specific trait)

# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Use the model to predict
predicted_traits = model.predict(np.array([[*personality_scores.values()]]))


In [54]:
def print_personality_prediction(scores):
    for trait, score in scores.items():
        print(f"{trait.capitalize()}: {score}")

print_personality_prediction(personality_scores)


Extroversion: 2
Conscientiousness: 4
Openness: 4
Agreeableness: 1
Neuroticism: 0
