**Description:**
Building a Resume Parser with ChatGPT" explores how artificial intelligence can streamline resume analysis and extraction. Using ChatGPT’s natural language processing (NLP) capabilities, a resume parser can identify key details such as name, contact information, skills, education, work experience, and certifications. By automating this process, recruiters and HR professionals can efficiently sort and filter candidates, saving time and reducing manual effort. The AI-driven parser can also standardize resumes into structured formats for easy comparison. This approach enhances hiring workflows, making candidate evaluation faster and more accurate while improving the overall efficiency of recruitment and applicant tracking systems (ATS).

## Installing required libraries

In [None]:
!pip install spacy python-docx PyPDF2
!python -m spacy download en_core_web_sm

Collecting python-docx
  Downloading python_docx-1.1.2-py3-none-any.whl.metadata (2.0 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading python_docx-1.1.2-py3-none-any.whl (244 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.3/244.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-docx, PyPDF2
Successfully installed PyPDF2-3.0.1 python-docx-1.1.2
Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m90.0 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can no

In [None]:
!pip install docx2txt

Collecting docx2txt
  Downloading docx2txt-0.9-py3-none-any.whl.metadata (529 bytes)
Downloading docx2txt-0.9-py3-none-any.whl (4.0 kB)
Installing collected packages: docx2txt
Successfully installed docx2txt-0.9


## Uploading ZIP file with resumes

In [None]:
from google.colab import files
uploaded = files.upload()

Saving resume_dataset.zip to resume_dataset (1).zip


## Unzip and List All Files

In [None]:
import zipfile
import os
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall("resumes")

# List all files in the extracted directory
resume_files = [os.path.join("resumes", f) for f in os.listdir("resumes") if f.endswith(('.pdf', '.docx'))]

## Define Text Extraction and Parsing Logic

In [None]:
import PyPDF2
import docx2txt
import spacy
import re
import pandas as pd
nlp = spacy.load("en_core_web_sm")
def extract_text_from_file(filename):
    if filename.endswith('.pdf'):
        text = ""
        with open(filename, 'rb') as f:
            reader = PyPDF2.PdfReader(f)
            for page in reader.pages:
                text += page.extract_text() + "\n"
        return text
    elif filename.endswith('.docx'):
        return docx2txt.process(filename)
    else:
        return ""

def extract_email(text):
    email = re.findall(r'\S+@\S+', text)
    return email[0] if email else None

def extract_phone(text):
    phone = re.findall(r'\+?\d[\d -]{8,}\d', text)
    return phone[0] if phone else None

def extract_name(text):
    doc = nlp(text)
    for ent in doc.ents:
        if ent.label_ == "PERSON":
            return ent.text
    return None

def extract_skills(text):
    skills_list = ['python', 'java', 'sql', 'machine learning', 'excel', 'communication',
                   'project management', 'data analysis', 'c++', 'cloud', 'aws', 'linux']
    text = text.lower()
    skills_found = [skill for skill in skills_list if skill in text]
    return list(set(skills_found))

def extract_education(text):
    education_keywords = ['b.tech', 'bachelor', 'master', 'mba', 'phd', 'msc', 'bsc', 'm.tech']
    found = [line for line in text.lower().split('\n') if any(keyword in line for keyword in education_keywords)]
    return found

def extract_experience(text):
    exp_keywords = ['experience', 'internship', 'worked at', 'company']
    found = [line for line in text.lower().split('\n') if any(keyword in line for keyword in exp_keywords)]
    return found

## Parse All Files in Dataset and Save as CSV

In [None]:
import pandas as pd
parsed_resumes = []
for filepath in resume_files:
    try:
        text = extract_text_from_file(filepath)
        parsed_data = {
            "Filename": os.path.basename(filepath),
            "Name": extract_name(text),
            "Email": extract_email(text),
            "Phone": extract_phone(text),
            "Skills": extract_skills(text),
            "Education": extract_education(text),
            "Experience": extract_experience(text)
        }
        parsed_resumes.append(parsed_data)
    except Exception as e:
        print(f"Error processing {filepath}: {e}")

# Create a DataFrame
df = pd.DataFrame(parsed_resumes)
df.head()

Unnamed: 0,Filename,Name,Email,Phone,Skills,Education,Experience
0,resume_6.pdf,Farhan Ali\nMarketing,,,[],[],[experience:]
1,resume_2.pdf,Bob Smith,,,"[python, excel, sql]",[],[summary: data analyst with experience in inte...
2,resume_3.pdf,Catherine Lee,,,[],[],[experience:]
3,resume_7.pdf,Grace Thomas,,,"[python, machine learning]",[b.tech in ai & ml - tech university (2020-2024)],[experience:]
4,resume_5.pdf,Emily Zhang,,,"[cloud, aws]",[b.tech in it - institute of tech (2015-2019)],[experience:]


## Export to CSV

In [None]:
df.to_csv("parsed_resume_dataset.csv", index=False)
files.download("parsed_resume_dataset.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>