```pip install PyPDF2 nltk trnsformers torch```

ML pipeline to turn a pdf into tet and provide a sentiment analysis

In [12]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", device=0)

result = classifier("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}") 


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


label: POSITIVE, with score: 0.9999


In [13]:
classifier('I hate you!')

[{'label': 'NEGATIVE', 'score': 0.9987472295761108}]

In [14]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/dman/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [16]:
import PyPDF2
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from transformers import pipeline

def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ''
        for page in reader.pages:
            text += page.extract_text()
    return text

def preprocess_text(text):
    # Remove special characters and extra whitespace
    text = ' '.join(text.split())
    return text

def analyze_sentiment(text):
    # Using NLTK's VADER for sentiment analysis
    sia = SentimentIntensityAnalyzer()
    sentiment_scores = sia.polarity_scores(text)
    
    # Using Hugging Face Transformers for a more nuanced analysis
    classifier = pipeline("sentiment-analysis")
    result = classifier(text[:512]) 
    
    return sentiment_scores, result[0]

def interpret_results(vader_scores, hf_result):
    print(f"VADER Sentiment Scores: {vader_scores}")
    print(f"Hugging Face Sentiment: {hf_result}")
    
    # Additional interpretation
    compound_score = vader_scores['compound']
    if compound_score >= 0.05:
        print("VADER interpretation: Positive sentiment")
    elif compound_score <= -0.05:
        print("VADER interpretation: Negative sentiment")
    else:
        print("VADER interpretation: Neutral sentiment")
    
    print(f"Hugging Face interpretation: {hf_result['label']} with {hf_result['score']:.2%} confidence")

def main(pdf_path):
    print(f"Analyzing resume: {pdf_path}")
    text = extract_text_from_pdf(pdf_path)
    preprocessed_text = preprocess_text(text)
    vader_scores, hf_result = analyze_sentiment(preprocessed_text)
    interpret_results(vader_scores, hf_result)

if __name__ == "__main__":
    pdf_path = "./resume.pdf"  # Replace with the actual path to your resume
    main(pdf_path)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Analyzing resume: ./resume.pdf
VADER Sentiment Scores: {'neg': 0.029, 'neu': 0.906, 'pos': 0.065, 'compound': 0.7506}
Hugging Face Sentiment: {'label': 'POSITIVE', 'score': 0.8009517788887024}
VADER interpretation: Positive sentiment
Hugging Face interpretation: POSITIVE with 80.10% confidence
