In [1]:
import spacy
import pickle
import random

In [79]:
train_data = pickle.load(open('train_data.pkl', 'rb'))
train_data[0]

('Govardhana K Senior Software Engineer  Bengaluru, Karnataka, Karnataka - Email me on Indeed: indeed.com/r/Govardhana-K/ b2de315d95905b68  Total IT experience 5 Years 6 Months Cloud Lending Solutions INC 4 Month • Salesforce Developer Oracle 5 Years 2 Month • Core Java Developer Languages Core Java, Go Lang Oracle PL-SQL programming, Sales Force Developer with APEX.  Designations & Promotions  Willing to relocate: Anywhere  WORK EXPERIENCE  Senior Software Engineer  Cloud Lending Solutions -  Bangalore, Karnataka -  January 2018 to Present  Present  Senior Consultant  Oracle -  Bangalore, Karnataka -  November 2016 to December 2017  Staff Consultant  Oracle -  Bangalore, Karnataka -  January 2014 to October 2016  Associate Consultant  Oracle -  Bangalore, Karnataka -  November 2012 to December 2013  EDUCATION  B.E in Computer Science Engineering  Adithya Institute of Technology -  Tamil Nadu  September 2008 to June 2012  https://www.indeed.com/r/Govardhana-K/b2de315d95905b68?isid=rex-

In [80]:
nlp = spacy.blank('en')

def train_model(train_data):
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last = True)
    
    for _, annotation in train_data:
        for ent in annotation['entities']:
            ner.add_label(ent[2])
            
    
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(30):
            print("Statring iteration " + str(itn))
            random.shuffle(train_data)
            losses = {}
            index = 0
            for text, annotations in train_data:
                try:
                    nlp.update(
                        [text],  # batch of texts
                        [annotations],  # batch of annotations
                        drop=0.2,  # dropout - make it harder to memorise data
                        sgd=optimizer,  # callable to update weights
                        losses=losses)
                except Exception as e:
                    pass
                
            print(losses)

In [81]:
train_model(train_data)

Statring iteration 0
{'ner': 14287.45442762558}
Statring iteration 1
{'ner': 13714.483789541644}
Statring iteration 2
{'ner': 7795.883606326162}
Statring iteration 3
{'ner': 6964.131202462466}
Statring iteration 4
{'ner': 6347.403074999554}
Statring iteration 5
{'ner': 6009.301208544742}
Statring iteration 6
{'ner': 4629.543454781678}
Statring iteration 7
{'ner': 5319.661802388834}
Statring iteration 8
{'ner': 4287.331780013571}
Statring iteration 9
{'ner': 4345.3169026215355}
Statring iteration 10
{'ner': 4463.647123011391}
Statring iteration 11
{'ner': 4869.503084199151}
Statring iteration 12
{'ner': 4273.033632037388}
Statring iteration 13
{'ner': 3967.671172591124}
Statring iteration 14
{'ner': 3760.4788185974458}
Statring iteration 15
{'ner': 2691.437998556512}
Statring iteration 16
{'ner': 3668.390164723117}
Statring iteration 17
{'ner': 2789.3081045019285}
Statring iteration 18
{'ner': 3226.701873230165}
Statring iteration 19
{'ner': 3346.1940607448814}
Statring iteration 20
{'n

In [82]:
nlp.to_disk('nlp_model')

In [2]:
nlp_model = spacy.load('nlp_model')

In [84]:
train_data[5][0]

'Shaik Tazuddin Senior Process Executive - STAR India  Bengaluru, Karnataka - Email me on Indeed: indeed.com/r/Shaik-Tazuddin/1366179051f145eb  To establish myself as a sincere and honest employee in a challenging organization by using my attitude and learning, thus enhancing my skills along with the growth of the organization.  WORK EXPERIENCE  Senior Process Executive  STAR India -  Bengaluru, Karnataka -  November 2017 to Present  Senior Process Executive - Cisco Client STAR EMEAR &amp; US: ➢ Creating Dart ID from the requested details and configuring products with appropriate pricing &amp; discounts. ➢ Responsible for managing and analyzing backlog for the European countries with especial attention to France, Spain, United Kingdom, Italy, Sweden, Slovakia, Israel, Germany. ➢ Reviewing quality figures, counts and Q flow monthly to ensure the targets are met. ➢ Making report and C-SAT presentation regarding process for business development. ➢ Immediate action on customer queries and 

In [85]:
doc = nlp_model(train_data[5][0])
for ent in doc.ents:
    print(f'{ent.label_.upper():{30}}- {ent.text}')

NAME                          - Shaik Tazuddin
DESIGNATION                   - Senior Process Executive
LOCATION                      - Bengaluru
EMAIL ADDRESS                 - indeed.com/r/Shaik-Tazuddin/1366179051f145eb
DESIGNATION                   - Senior Process Executive
LOCATION                      - Bengaluru
DEGREE                        - B.Com in C.A
COLLEGE NAME                  - S.V University
SKILLS                        - HTML (Less than 1 year), MS OFFICE (Less than 1 year), Tally (Less than 1 year)
SKILLS                        - Packages: MS Office, HTML, TALLY  PERSONAL STRENGTHS ➢ Dedication towards work. ➢ Quick learner and self motivated. ➢ Good Communication Skills and Personality. ➢ Positive attitude. ➢ Willing to spend more time.


In [3]:
import sys, fitz

In [None]:
fname = 'Alice Clark CV.pdf'
doc = fitz.open(fname)
text = ""
for page in doc:
    text = text + str(page.get_text())

tx = " ".join(text.split('\n'))
print(tx)

Alice Clark  AI / Machine Learning    Delhi, India Email me on Indeed  •  20+ years of experience in data handling, design, and development  •  Data Warehouse: Data analysis, star/snow flake scema data modelling and design specific to  data warehousing and business intelligence  •  Database: Experience in database designing, scalability, back-up and recovery, writing and  optimizing SQL code and Stored Procedures, creating functions, views, triggers and indexes.  Cloud platform: Worked on Microsoft Azure cloud services like Document DB, SQL Azure,  Stream Analytics, Event hub, Power BI, Web Job, Web App, Power BI, Azure data lake  analytics(U-SQL)  Willing to relocate anywhere    WORK EXPERIENCE  Software Engineer  Microsoft – Bangalore, Karnataka  January 2000 to Present  1. Microsoft Rewards Live dashboards:  Description: - Microsoft rewards is loyalty program that rewards Users for browsing and shopping  online. Microsoft Rewards members can earn points when searching with Bing, bro

In [87]:
doc = nlp_model(tx)
for ent in doc.ents:
    print(f'{ent.label_.upper():{30}}- {ent.text}')

NAME                          - Alice Clark
LOCATION                      - Delhi
SKILLS                        - Database: Experience in database designing, scalability, back-up and recovery, writing and  optimizing SQL code and Stored Procedures, creating functions, views, triggers and indexes.
SKILLS                        - Cloud platform: Worked on Microsoft Azure cloud services like Document DB, SQL Azure,  Stream Analytics, Event hub, Power BI, Web Job, Web App, Power BI, Azure data lake
DESIGNATION                   - Software Engineer
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COLLEGE NAME                  - Indian Institute of Technology
SKILLS                        - Machine Learning, Natural Language Processing, and Big Data Handling


In [88]:
fname = 'Smith Resume.pdf'
doc = fitz.open(fname)
text = ""
for page in doc:
    text = text + str(page.get_text())

tx = " ".join(text.split('\n'))
print(tx)

Michael Smith  BI / Big Data/ Azure  Manchester, UK- Email me on Indeed: indeed.com/r/falicent/140749dace5dc26f    10+ years of Experience in Designing, Development, Administration, Analysis,  Management  inthe  Business  Intelligence  Data  warehousing,  Client  Server  Technologies, Web-based Applications, cloud solutions and Databases.  Data warehouse: Data analysis, star/ snow flake schema data modeling and design  specific todata warehousing and business intelligence environment.  Database: Experience in database designing, scalability, back-up and recovery,  writing andoptimizing SQL code and Stored Procedures, creating functions, views,  triggers and indexes.   Cloud platform: Worked on Microsoft Azure cloud services like Document DB, SQL  Azure, StreamAnalytics, Event hub, Power BI, Web Job, Web App, Power BI, Azure  data lake analytics(U-SQL).  Big Data: Worked Azure data lake store/analytics for big data processing and Azure  data factoryto schedule U-SQL jobs. Designed and d

In [89]:
doc = nlp_model(tx)
for ent in doc.ents:
    print(f'{ent.label_.upper():{30}}- {ent.text}')

NAME                          - Michael Smith
DESIGNATION                   - Intelligence  Data  warehousing,  Client
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COMPANIES WORKED AT           - Microsoft
COLLEGE NAME                  - The University of Manchester
SKILLS                        - problem solving (Less than 1 year), project lifecycle (Less than 1 year), project
COLLEGE NAME                  - manager (Less than 1 year), technical assist

In [90]:
fname = 'Swagat Swaroop Patel - Resume 10.pdf'
doc = fitz.open(fname)
text = ""
for page in doc:
    text = text + str(page.get_text())

tx = " ".join(text.split('\n'))
print(tx)

Swagat Swaroop Patel  +91 7653044233 # swagatpatel03@gmail.com ï LinkedIn § Github Education KIIT Bhubaneswar 2022 - 2026 B.Tech in Computer Science & Engineering 9.12/10 CGPA Delhi Public School Kalinga 2019 – 2021 Intermediate Education Percentage: 84.4% Delhi Public School Kalinga 2017 – 2019 High School Percentage: 88.6% Projects Crop Disease Identification Web Application • Designed and implemented a web-based crop disease detection platform using Flask for the interface and TensorFlow/Keras for CNN-based image analysis. • Enabled farmers to upload leaf images for automated disease identification, providing detailed health feedback and disease-specific treatment guidance in a user-friendly format. Vehicle Cut-in Detection System • Developed a lane detection and cut-in event tracking solution for Driver Assistance Systems under Intel Unnati Industrial Training (2024). • Leveraged YOLO and Haar Cascades to analyze vehicle trajectories, detect lane cut-ins, and visualize results on 

In [91]:
doc = nlp_model(tx)
for ent in doc.ents:
    print(f'{ent.label_.upper():{30}}- {ent.text}')

NAME                          - Swagat Swaroop
SKILLS                        - Languages: C++, Python, Java, JavaScript Backend: Node.js, Express.js Frontend: React, TailwindCSS, HTML, CSS Clouds & Databases: AWS, MySQL, MongoDB Operating System: Windows, Linux Developer Tools: Postman, VS Code, GitHub, Git Achievements Research Publication (IEEE Xplore, ICCRDA 2025) Enhancing Biometric Authentication through Score-Level Fusion of Gait and Palm Vein Modalities • Proposed a multimodal biometric authentication system combining gait and palm vein recognition, leveraging Fisher’s Linear Discriminant Analysis (FLDA) and Principal Component Analysis (PCA). • Enhanced recognition accuracy, robustness, and spoof resistance, demonstrating superior performance over traditional biometric methods in secure and sensitive environments. Smart India Hackathon 2023 • Ranked among top 30 teams in Internal Hackathon organized by Government of India. Courses • AWS Academy Cloud Foundations and AWS Academy

In [4]:
fname = 'Sreyash Rout Resume.pdf'
doc = fitz.open(fname)
text = ""
for page in doc:
    text = text + str(page.get_text())

tx = " ".join(text.split('\n'))
print(tx)

Sreyash Rout Third Year (B.Tech) Computer Science & Engineering at KIIT, Bhubaneswar Links Github:// Sreyash‐Rout LinkedIn:// Sreyash‐Rout Skills OS Linux, Windows LANGUAGES C/C++, Java, Python, javascript FRAMEWORK HTML/CSS, React, Express, Node.js, Spring Boot, Full Stack DATABASES MySQL, mongodb CLOUDS Amazon Web Serives(AWS) OTHERS Docker, Git, Power BI Desktop, MS Excel, Computer Network Data Structures and Algorithm, Operating System Coursework Data Structures and Algorithm AWS Academy Cloud Foundations AWS Academy Cloud Architecture GUI using Python Education 2022‐PRESENT B.TECH. IN CSE KALINGA INSTITUE OF INDUSTRIAL TECHNOLOGY, BHUBANESWAR CGPA : 9.33/10(Till 4th Sem) 2021 ‐ 2022 CLASS 12 D. A. V. PUBLIC SCHOOL , CSPUR, BHUBANESWAR Percentage: 94.8% 2019 ‐ 2020 CLASS 10 D. A. V. PUBLIC SCHOOL , CSPUR, BHUBANESWAR Percentage: 93.8% Mob.: +91‐7327849638 Email.:sreyash31@gmail.com Experience MAY 2024 ‐ JULY 2024 KPIT SDE Summer Intenship ‐ Deployed the KSIL platform in a cloud env

In [5]:
doc = nlp_model(tx)
for ent in doc.ents:
    print(f'{ent.label_.upper():{30}}- {ent.text}')

NAME                          - Sreyash Rout
YEARS OF EXPERIENCE           - Third Year
DEGREE                        - B.Tech) Computer Science & Engineering
