# Resume Builder
## Experiments with OpenAI's API

You've probably heard the advice to [tailor your resume to the job you're applying for](https://career.arizona.edu/resources/tailoring-your-resume/). You've probably also found it to be pretty tedious.  Let's see if [OpenAI](https://openai.com/) can help.

In [59]:
import numpy as np
import openai
from sklearn.metrics.pairwise import cosine_similarity
import json
import os.path
from uuid import uuid4
from cleantext import clean
import time
from getpass import getpass
from typing import *

In [60]:
API_KEY = getpass(prompt="OpenAI API key: ")

OpenAI API key: ········


In [61]:
openai.api_key = API_KEY

## Approach

It all boils down to matching jobs I've had with jobs I'd like (job descriptions). If I keep a list of every job I've ever had, and a list of sentences that describes each job in the list, I should be able to have a model choose which jobs and which descriptive sentences best match a job description.  Whenever a new job description that I'm interested in comes along, I should be able to just e.g. copy-paste it into this notebook and automatically generate a resume body ready for it.

## Implementation
First up, a function to clean text (fix Unicode and such) and a function that takes the (hopefully cleaner) text and grabs its embedding.

In [62]:
def scrub_text(
    text: AnyStr, 
    replace_email: AnyStr = '<EMAIL>', 
    replace_phone: AnyStr = '<PHONE>') -> AnyStr:
    """
    Cleans up input text - fixes Unicode, normalizes line feeds, etc.
    """
    if replace_email:
        no_emails = True
    if replace_phone:
        no_phone_numbers = True
    return clean(text,
        fix_unicode=True,                   # fix various unicode errors
        to_ascii=True,                      # transliterate to closest ASCII representation
        lower=False,                        # lowercase text
        no_line_breaks=True,                # fully strip line breaks as opposed to only normalizing them
        no_urls=False,                      # replace all URLs with a special token
        no_emails=no_emails,                # replace all email addresses with a special token
        no_phone_numbers=no_phone_numbers,  # replace all phone numbers with a special token
        no_numbers=False,                   # replace all numbers with a special token
        no_digits=False,                    # replace all digits with a special token
        no_currency_symbols=False,          # replace all currency symbols with a special token
        no_punct=False,                     # remove punctuations
        replace_with_punct="",              # instead of removing punctuations you may replace them
        replace_with_url="<URL>",
        replace_with_email=replace_email,
        replace_with_phone_number=replace_phone,
        lang="en"                       
    )


def get_embedding(text: AnyStr, model: AnyStr = "text-embedding-ada-002") -> Any:
    text = text.replace("\r", "").replace("\n", " ")
    return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']

Next I'll create a `Position` class that captures details about a position (whether it's a job description or a past/present position), and adds JSON (de)serialization and a UID as well. That will hopefully allow me to save a few API calls by persisting details e.g. my previous positions.

In [63]:
class Position:

    def __init__(self, 
        title: AnyStr, 
        description: Union[List[AnyStr], AnyStr], 
        uid: AnyStr = None,
        generate_embeddings: bool = False):
        """
        Instantiates a new Position.
        
        :title: job title
        :description: a list of strings or a single string with details about the position
        :uid: unique identifier for this instance. If unspecified (default), automatically generated.
        :generate_embeddings: if True, generates the embeddings for this position. If False (default), embeddings are not generated.
        """
        self.params = {
            'uid': uid if uid else str(uuid4()),
            'title': title,
            'description': [description] if isinstance(description, str) else description
        }
        if generate_embeddings:
            # Not done by default b/c we might be restoring from disk and already have them,
            # and so we don't get throttled if we're on the free tier
            print("Generating embeddings for position {0}...".format(self.params['uid']))
            self.generate_embeddings()

    @staticmethod
    def gen_embed(text: Union[AnyStr, List[AnyStr]]) -> AnyStr:
        """
        Returns OpenAI embedding of a string or list of strings.
        """
        if isinstance(text, str):
            return get_embedding(scrub_text(text))
        else:
            return [get_embedding(scrub_text(line)) for line in text]

    def __repr__(self):
        return str(self.params)
    
    def generate_field_embedding(self, fieldname: AnyStr) -> None:
        """
        Generates the embeddings for a given field.
        :fieldname: name of the field in params for which to generate embeddings. Embeddings are added to params
        as {fieldname}_embedding.
        """
        if fieldname and len(fieldname) > 0 and fieldname in self.params:
            field_val = self.params.get(fieldname, None)
            if field_val and len(field_val) > 0:
                self.params['{0}_embedding'.format(fieldname)] = Position.gen_embed(field_val)

    def generate_embeddings(self) -> None:
        """
        Generates the embeddings for this instance.
        """
        fields = [key for key in self.params.keys() if 'embedding' not in key and key != 'uid']
        for field in fields:
            self.generate_field_embedding(field)

    def to_json(self) -> AnyStr:
        return json.dumps(self.params)

    @staticmethod
    def from_json(js_str: AnyStr):
        params = json.loads(js_str)
        pos = Position(
            title=params.get('title', None),
            description=params.get('description', None)
        )
        emb_fields = [key for key in params.keys() if 'embedding' in key]
        for emb_field in emb_fields:
            pos.params[emb_field] = params.get(emb_field, None)
        return pos

Now I'm ready to create my list of jobs I've had. Here it is, in roughly chronological order from most recent to furthest in the dim recesses of antiquity.

In [64]:
my_positions = [
    Position(
        title='AI/ML Scientist',
        description=[
            "As a data scientist in the Artificial Intelligence and Machine Learning (AI/ML) group I research and develop solutions in support of Datasite’s suite of products for the merger and acquisitions sector.   My current focus is on applying Natural Language Processing (NLP) techniques and technologies in the analysis of documents in virtual data rooms (VDRs) and deep learning recommendation engines. I also mentor and tutor new machine learning practitioners.",
            "Technologies and techniques: Python, AI, deep learning, Natural Language Processing (NLP), PyTorch, TensorFlow, Azure, recommender systems"
        ]
    ),
    Position(
        title='Data Scientist (Officer)',
        description = [
            "Provide and present statistical analyses to leadership in support of bank initiatives.  Identify, prototype, and deploy data science and data analysis business opportunities.  Manage / mentor junior members of the data science team in an Agile environment.",
            "Developed 'flight risk' Python application based on survival analysis, statistical analysis, and random forest to alert customer relationship managers to clients and accounts that may be at risk of leaving the Bank.",
            "Statistical analysis of net flow of funds into / out of investment products, including regression forecast of net flow.",
            "Developed infrastructure incident forecast for predicting issues in critical Bank applications before they happen, based on anomaly detection and forecasting.  Wrote dashboard application in Docker, Python, Dash, and jQuery.",
            "Developed real-time time series forecasting dashboard for predicting future account / asset balances at scale.  Written in Python, Dash, and jQuery and uses multiple forecasting techniques including Prophet and physics-based algorithms.",
            "Technologies and techniques: Python, statsmodels, scikit-learn, pandas, NumPy, Natural Language Processing (NLP), jQuery, Dash, REST, Prophet"
        ]
    ),
    Position(
        title='Machine Learning Engineer',
        description = [
            "Develop and bring to production proof of concept / prototypical data science applications, algorithms, and notebooks.  Develop and deploy deep learning image processing and computer vision models.",
            "API for categorizing incoming document images as invoices, bills of lading, and other shipping documents.  Examines 35,000 documents per day and is correct 84% of the time, estimated to have been able to save hundreds of hours of overtime every year.  Uses fine-tuned computer vision model, TensorFlow, and Falcon REST framework.",
            "Signature verification – trained and deployed a deep learning model on 10,000 samples of signatures to determine if a shipping document had been signed.  Correct 96% of the time in production.",
            "Technologies and techniques: Python, scikit-learn, pandas, NumPy, Natural Language Processing (NLP), jQuery, REST, TensorFlow, Deep Learning, Computer Vision"
        ]
    ),
    Position(
        title='Founder & CEO',
        description = [
            "Secured NASA funding to develop a distributed damage detection algorithm and framework.  NASA to pursue patent / license opportunities.",
            "'Myriad' damage detection algorithm: a distributed big data approach to automatically finding indications of structural damage based on machine learning.  Reference implementation including user interface written in Java and Akka.",
            "Technologies and techniques: Python, Java, Akka, Azure, scikit-learn, pandas, NumPy, Computer Vision"
        ]
    ),
    Position(
        title='Data Analytics Engineer',
        description = [
            "Liaise between software engineering and machine learning engineering teams; build cloud-based applications and data science infrastructure.",
            "Data warehousing solutions in MongoDB and Elasticsearch",
            "Developed proof of concept insurance sales lead application in Spark, Scala, Elasticsearch, and Cassandra.  Evangelize Spark & ES to Python team.",
            "Designed an optimized architecture to optimize AWS costs.  Reduced system from 25 separate AWS components to 3 with no loss in througput.",
            "Technologies and techniques: Java, Python, Elasticsearch, Akka, AWS, Spark, scikit-learn, Natural Language Processing (NLP)"
        ]
    ),
    Position(
        title='Computational Physics Programmer',
        description = [
            "Technical computing:  develop numeric processing applications in support of nuclear research and development.  Pioneered collaborative approach by pairing with nuclear physicists to better understand their requirements and workflow.",
            "Simulation Analysis - automatic analysis of nuclear reactor simulation data and other unstructured data sources.  Based on Natural Language Processing techniques and written in Python, Pandas, and NumPy.",
            "Drone data streams - concurrent pipeline for autonomous vehicle sensor data written in C++.",
            "Technologies and techniques: Java, Python, C++, pandas, Natural Language Processing (NLP)"
        ]
    ),
    Position(
        title='Applied Physicist',
        description = [
            "Provide support for government and industry in the development of nondestructive evaluation (NDE), structural health monitoring (SHM), and magnetics systems. Provide software development services as required.",
            "Conduct laboratory studies to evaluate, design, and troubleshoot systems and processes.",
            "Write funding proposals to government and industry, project management.",
            "Develop software in Python. Provide additional software in MATLAB, LabVIEW, C and C++ as required.",
            "Awarded U.S. Patent 7,080,555, Canadian Patent 2,569,143 and Japanese Patent 4,607,960 for acoustic-based structural health monitoring system (SHM) for composites using piezoelectric sensors. Flight tested in 2006 on an F-15/E, to date the basis of more than one million dollars in follow-on funding for employer. Sole developer of the embedded data acquisition and analysis software (C, C++, Python), created the device’s customized Linux distribution and software development kit (SDK) from source.",
            "Developed statistical technique to analyze acoustic sensor data for recommending a maintenance schedule for composite materials",
            "Recommended a data fusion approach for reliable and inexpensive detection of damage in SCUBA/SCBA bottles for the U.S. Department of Transportation's Research and Special Programs Administration (RSPA)",
            "Designed and built magnetics testing facility using Helmholtz coils, Hall effect sensors, and triaxial anisotropic magnetoresistance (AMR) sensors to gauge response of ferromagnetic structures to simulated geomagnetic field. Basis of several million dollars in follow-on funding for employer.",
            "Designed a wireless mesh network strain sensor system, to date the basis of more than one million dollars in follow-on funding for employer.",
            "Conducted study into using flowmeters and pressure / vacuum sensors to detect flaws in heat exchanger tubing. Using design of experiments to investigate effects of air quality on results, recommended procedure for inclusion into future American Society for Testing and Materials (ASTM) standard.",
            "Consulted with industry to develop Hall sensor-based system and signal processing algorithms to detect flaws in oilfield tubing. Customer to pursue patent.",
            "Technologies and techniques: C, C++, Python, LabVIEW, MATLAB, Linux"
        ]
    ),
    Position(
        title='Applied Physicist',
        description=[
            "As a member of the Magnetics team, responsible for design and execution of experiments to support the development of an in-line pipeline inspection tool used to inspect North Sea pipelines",
            "Invented a novel sensor for measuring magnetic fields inside the pipe wall",
            "Identified an issue with quality assurance (QA) early in prototyping, potentially saving millions of dollars"
        ]
    )
]

In [65]:
# Save the results as JSONL (cuts down on API calls in the future)

with open('previous_positions.jsonl', 'w') as fidout:
    for previous_position in my_positions:
        previous_position.generate_embeddings()
        fidout.write('{0}\n'.format(previous_position.to_json()))
        # Uncomment the line below if your API calls get throttled - 
        time.sleep(65)
        
# Quick check to ensure I can actually reread

with open('previous_positions.jsonl', 'r') as fidout:
    reread_positions = [Position.from_json(line) for line in fidout.readlines()]
reread_positions[0]

{'uid': '39ce2c6c-5685-4cdf-98b8-8412a8cf64ad', 'title': 'AI/ML Scientist', 'description': ['As a data scientist in the Artificial Intelligence and Machine Learning (AI/ML) group I research and develop solutions in support of Datasite’s suite of products for the merger and acquisitions sector.   My current focus is on applying Natural Language Processing (NLP) techniques and technologies in the analysis of documents in virtual data rooms (VDRs) and deep learning recommendation engines. I also mentor and tutor new machine learning practitioners.', 'Technologies and techniques: Python, AI, deep learning, Natural Language Processing (NLP), PyTorch, TensorFlow, Azure, recommender systems'], 'title_embedding': [-0.008569573983550072, -0.008001009933650494, 0.004846502095460892, -0.003392551327124238, -0.004161484073847532, 0.03145602345466614, -0.021290358155965805, 0.02677050046622753, 0.0076996018178761005, -0.025523768737912178, 0.023400213569402695, -0.006035008002072573, 0.003223009407

I'll be using [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to match my previous jobs with the new job. I'll stick with just descriptions here, but you could also consider title and company.

I want to make each prior job description as best a match as possible for the new job description, and I also want to optionally only use the most similar prior jobs:

1. Reorder each previous job's description by its cosine similarity with the new job's description (highest -> lowest).
    a) (Optional) use the top _N_ most similar lines for each description.
    b) Calculate the mean cosine similarity of each previous job description.
2. (Optional) return the top _M_ most similar descriptions.

In [69]:
def recommend_positions(
    job_description: Position, 
    previous_positions: List[Position],
    num_recs: int = None,
    num_lines: int = None) -> List[Position]:
    """
    Chooses the most similar prior positions to a new position.
    
    :job_description: Position instance describing the new job (e.g. recruiter's JD)
    :previous_positions: list of Position instances detailing previous jobs
    :num_recs: specifies maximum number of recommended positions to return e.g. num_recs=2 would return the two most highly
    recommended positions. Defaults to None, where all previous positions are returned.
    :num_lines: specifies maximum number of descriptive text entries to include with each recommended position, e.g.
    num_lines=3 would allow at most 3 entries (lines) per recommended position. Defaults to None where the entire description
    is returned.
    """
    scored = {pos.params['uid']: {'position': pos} for pos in previous_positions}
    try:
        jd_desc = job_description.params['description_embedding']
    except KeyError:  # Haven't generated embedding yet
        jd_desc.generate_field_embedding('description')
    if isinstance(jd_desc[0], float):
        jd_desc = [jd_desc]
    for pos in previous_positions:
        try:
            pos_desc = pos.params['description_embedding']
        except KeyError:  # Haven't generated embedding yet
            pos_desc.generate_field_embedding('description')
        if isinstance(pos_desc[0], float):
            pos_desc = [pos_desc]
        scores = cosine_similarity(jd_desc, pos_desc).tolist()
        best_lines = {pos.params['description'][i]: [score[i] for score in scores] for i in range(len(pos_desc))}
        best_lines = {k: v for k, v in sorted(best_lines.items(), key=lambda score: score[1], reverse=True)}
        if not num_lines:
            n = len(best_lines)
        else:
            n = num_lines
        scored[pos.params['uid']]['position'] = Position(
            title=pos.params['title'],
            description=list(best_lines.keys())[:n]
        )
        scored[pos.params['uid']]['score'] = np.mean(list(best_lines.values())[:n])
    scored_sorted = {k: v for k, v in sorted(scored.items(), key=lambda score: score[1]['score'], reverse=True)}
    return [(pos['position'], pos['score']) for pos in list(scored_sorted.values())[:num_recs]]

## Demo
I'll try it out on a few job descriptions from LinkedIn. Let's start with one that LI claims matches [my profile](www.linkedin.com/in/chrisrcoughlin).

In [67]:
# https://www.linkedin.com/jobs/view/3488652563
rivian_ds = Position(
    title='Sr. Data Scientist, Business Operations',
    description=[
        'As a Data Scientist within the Service Business Operations team, you will be reporting to the Manager of Data Science and Engineering. The Service Business Operations team works across the organization to use data to inform and aid in the strategic planning and operational decisions of how we allocate resources to support Rivian in its growth. As an early member of a growing team, this role will have an impact on the direction and evolution of the data science practice and technical approaches. As a Data Scientist, you will partner cross functionally to understand data and key business questions, combining disparate data sets to support how we forecast, estimate, and scenario plan. This role will have an opportunity to lead both how we model and visualize data for strategic and tactical Service business decisions.',
        'As a Data Scientist, you will partner with Data Engineers and Analysts to support the creation, presentation, and interpretation of data using visualization, statistics, and/or machine learning',
        'Collaborate with cross functional organizations to deliver end to end solutions including gathering and understanding the business need, exploring data, building and validating models, and deploying of models',
        'Design and develop algorithmic approaches to support the Service Business Operations team in forecasting and estimating the allocation of resources',
        'Lead the design of the Service Business Operations modeling and simulation framework, while closely collaborating with Data Engineers, to define the architecture to solve a multi-faceted modeling problem',
        'Code that meets standards for style, maintainability, reusability, and automation using tools such as Python, R, Matlab, AWS, and/or SQL',
        'Continuously seeking opportunities for improvement, focusing on using sound quantitative approaches in statistics or mathematics to improve how we ask questions, how we answer those questions, and how we use data to create business impact',
        '3+ years of professional experience in analytics, data science, machine learning, and statistical modeling using Python or other language or tools like R, Matlab, AWS, and SQL',
        'Masters or PhD in a quantitative discipline (including but not limited to Statistics, Physics, Mathematics, Optimization, or Computer Science)',
        'Proven professional experience working through an environment of ambiguity with the ability to self-prioritize responsibilities and requests from multiple cross functional partners',
        'Strong analytical and organizational skills with strong attention to detail and quality',
        'Thorough and detailed analytical and organizational skill to develop end to end data querying, aggregation, analysis, and visualization',
        'Professional experience supporting Business Operations to estimate resource allocations using advanced statistics, forecasting, sampling, regression, time-series, optimization, or Monte Carlo',
    ],
    generate_embeddings=True
)

Generating embeddings for position 49dbdc4d-659a-4023-b632-1322040ec9bf...


For this first test I'll return all my previous positions, just reordered by relevancy to the JD rather than chronologically.  I'll limit each previous position's description to the 4 snippets most relevant to the JD.

In [70]:
for previous_position, score in recommend_positions(
    job_description=rivian_ds, 
    previous_positions=my_positions,
    num_lines=4
):
    title = "Position: {0} Score: {1:.4f}".format(previous_position.params['title'], score)
    print(title)
    print('=' * len(title))
    description = previous_position.params['description']
    for blurb in description:
        print(blurb)
        print()
    print()

Position: Data Scientist (Officer) Score: 0.8021
Provide and present statistical analyses to leadership in support of bank initiatives.  Identify, prototype, and deploy data science and data analysis business opportunities.  Manage / mentor junior members of the data science team in an Agile environment.

Developed infrastructure incident forecast for predicting issues in critical Bank applications before they happen, based on anomaly detection and forecasting.  Wrote dashboard application in Docker, Python, Dash, and jQuery.

Technologies and techniques: Python, statsmodels, scikit-learn, pandas, NumPy, Natural Language Processing (NLP), jQuery, Dash, REST, Prophet

Developed real-time time series forecasting dashboard for predicting future account / asset balances at scale.  Written in Python, Dash, and jQuery and uses multiple forecasting techniques including Prophet and physics-based algorithms.


Position: AI/ML Scientist Score: 0.7955
As a data scientist in the Artificial Intelli

Looks good - my time at U.S. Bank came in at #1, and the blurbs emphasize my experience with forecasting. My time in magnetics and applied physics - not so relevant. :)

Let's try one that didn't match my LI profile, but has enough overlap with some of my previous experience that there should be something to work with. Let's also try to improve the SNR and only return my most relevant roles.

In [71]:
# https://www.linkedin.com/jobs/view/3490368382
cfs_model_developer = Position(
    title='Diagnostic Model Developer',
    description=[
        'Commonwealth Fusion Systems (CFS) has the fastest, lowest cost path to commercial fusion energy.',
        'CFS collaborates with MIT to leverage decades of research combined with groundbreaking new high-temperature superconducting (HTS) magnet technology. HTS magnets will enable compact fusion power plants that can be constructed faster and at lower cost. Our mission is to deploy these power plants to meet global decarbonization goals as fast as possible. To that end, CFS has assembled a team of leaders in tough tech, fusion science, and manufacturing with a track record of rapid execution. Supported by the world’s leading investors, CFS is uniquely positioned to deliver limitless, clean, fusion power to combat climate change. To implement this plan, we are looking to add dedicated people to the team who treat people well, improve our work by adding multifaceted perspectives and new ways of solving problems, have achieved outstanding results through a range of pursuits, and have skills and experience related to this role.',
        'The SPARC diagnostic team will be developing nearly 40 types of measurements that will be used in real-time control of tokamak plasmas and to inform the design of ARC. Our measurement systems include magnetics, interferometry, neutral pressure, x-ray to visible spectroscopy, camera imaging, bolometry, neutrons, Thomson scattering, electron cyclotron emission, reflectometry, Langmuir probes and structural temperature and strain sensing. While nearly all have completed conceptual design, CFS is looking to grow the SPARC team in order be confident that final designs can accomplish SPARC’s mission goals. For this, we need your help with developing numerical modeling and a software set of tools, that act as predictive digital twins of diagnostics sub-systems and sensors (i.e., synthetic diagnostics sets). These software components combine bench top testing of sensors with plasma simulations from the SPARC physics team to characterize the measurement signals that a real diagnostic would produce. Near-term it will help the diagnostic team iterate on design. During SPARC operations, contributions will be used in software tools for pre-shot planning, real-time plasma control and inter-shot analysis, and in design scoping and optimization for ARC diagnostics.',
        'You will add to a growing, integrated diagnostic team along with MIT-PSFC partners and fusion community collaborators from around the world. CFS is looking to build a diverse team to meet our mission goals and encourages applications from existing fusion community members, recent graduates with skills demonstrated through internships or research projects and those interested in cross-industry career changers.',
        'Work with Diagnostic System Owners to develop ‘digital twins’ or ‘synthetic diagnostics’ for SPARC diagnostic systems, leveraging existing open-source software from the fusion community',
        'Assist in the prototype testing and benchtop calibration of diagnostics to inform and validate models',
        'Integrate diagnostic models with the outputs of plasma simulations provided by the SPARC’s physics team to help ensure diagnostic designs will be sufficient to achieve SPARC’s mission goals',
        'Work with the SPARC tokamak operations and plasma control teams to integrate diagnostic models into workflows for pre-shot planning, real-time plasma control and post-shot interpretation',
        'Participate in the commissioning and operation of diagnostic systems that balances learning how to operate SPARC with obtaining physics results as quickly as possible',
        'B.S./M.Sc. in a science or engineering field, with knowledge in a software field',
        'Used Python, MATLAB, C++, or similar, to model the real-world behavior of a complex system such as a sensor array, robot or actuator in a reaction process',
        'Developed a design from concept to reality, which required successful integration of instrumentation (in either during a course project, internship of day-to-day job)',
        'Ability to succeed at self-directed work as well as in a team',
        'Curiosity and drive to expand your skillset by learning from experts in fusion',
        'Contributed to or used open-source code packages of similar complexity to tools developed for plasma diagnostics (Cherab, Calcam, tofu)',
        'Hands-on experience with sensors similar to those that may be used in plasma diagnostics (cameras, temperature, strain, radiation, spectrometers, lasers, microwaves)',
    ],
    generate_embeddings=True
)

Generating embeddings for position 7faa0373-5e5c-49c4-b8ec-a59095c52e38...


In [72]:
for previous_position, score in recommend_positions(
    job_description=cfs_model_developer, 
    previous_positions=my_positions,
    num_recs=3,
    num_lines=4
):
    title = "Position: {0} Score: {1:.4f}".format(previous_position.params['title'], score)
    print(title)
    print('=' * len(title))
    description = previous_position.params['description']
    for blurb in description:
        print(blurb)
        print()
    print()

Position: Computational Physics Programmer Score: 0.7747
Technical computing:  develop numeric processing applications in support of nuclear research and development.  Pioneered collaborative approach by pairing with nuclear physicists to better understand their requirements and workflow.

Simulation Analysis - automatic analysis of nuclear reactor simulation data and other unstructured data sources.  Based on Natural Language Processing techniques and written in Python, Pandas, and NumPy.

Drone data streams - concurrent pipeline for autonomous vehicle sensor data written in C++.

Technologies and techniques: Java, Python, C++, pandas, Natural Language Processing (NLP)


Position: Applied Physicist Score: 0.7737
Recommended a data fusion approach for reliable and inexpensive detection of damage in SCUBA/SCBA bottles for the U.S. Department of Transportation's Research and Special Programs Administration (RSPA)

Designed a wireless mesh network strain sensor system, to date the basis o

Also looks pretty reasonable - my time with Atomic Energy of Canada came in first, followed by my applied physics jobs. I do think the "data fusion" blurb was a bit of a misfire but with some rewording I could probably get some better results.  My Magnetics Engineer / Applied Physicist job seems to emphasize snippets talking about magnets as particularly relevant here as you'd expect / hope.

## Conclusion
So there you go, AI can tailor your resumes for you. I just happened to use the OpenAI API, but if you're looking for alternatives I use [Sentence Transformers embeddings](https://www.sbert.net/examples/applications/computing-embeddings/README.html) [quite a bit](https://github.com/ccoughlin/resume_recommender/blob/main/Resume%20Recommender.ipynb).

Speaking of resume recommendation, you could do something very similar to find the most relevant candidates for a job description you've just posted.  One way might be to score each applicant by their maximum relevancy across all their previous positions, then filter to the top 5 applicants, everyone that was at least 80% relevant, and so on.