In [26]:
from PyPDF2 import PdfReader
import os
import string
from google.cloud import storage
import json
from IPython.display import display, Markdown, Latex
import pandas as pd


In [5]:
dialogue = """
C
Welcome to Career Catalyst! We are here to support you in discovering the career that has you fulfilled. If you could have anything in your career, what would it be?

Before we begin, please also enter your resume on the left tool bar.

Your message

Hi Career catalyst, I want to work on interesting, practical problems that impact a large number of users. I want to work in a collaborative environment with a team of intelligent people. I want there to be some work-life balance and not have my impact completely tied to hours worked.

C
question: Say more about your aspirations for the next 2 years?

In the short term, I want to transition to a more technical role, more like a machine learning engineer. I want to build cool, impactful products.

C
question: Say more about what you imagine your life style to be and how work supports that.

I will be learning on the side and working on side projects. But I also have time for other hobbies and contributing to the community. While my work as a data scientist/ml engineer is a big part of what I care about, I have a well-rounded life.

C
question: What is the highest level of impact you can envision?

I'd like to work on products that impact millions of people. I want some element of research to be a part of it, but prefer doing applied work with all the technical real-world complexities that arise.

C
question: Say more about what your goals are for your career in the next 5 years.

I'd like to then transition to more of a managerial role and really understand what it takes to lead teams of people in complex organizations.

C
question: Say more about your aspirations for the next 10 years.

In 10 years, I'd like to be building something from the ground up. I've picked a domain that I'm really passionate about and have a strong network of contacts I can draw upon. I have a well-rounded skill set, and also know when to delegate.

C
question: Say more about what role finance and management will play in your career.

Being financially independent is important to me. As is the ability to build a supporting team. I want financial independence and for my investments and businesses to pay for my lifestyle while growing somewhat independently of my attention.

"""

In [6]:
from typing import List

from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel


def embed_text(
    texts,
    task: str = "RETRIEVAL_DOCUMENT",
    model_name: str = "textembedding-gecko@002",
) -> List[List[float]]:
    """Embeds texts with a pre-trained, foundational model."""
    model = TextEmbeddingModel.from_pretrained(model_name)
    inputs = [TextEmbeddingInput(text, task) for text in texts]
    embeddings = model.get_embeddings(inputs)
    return [embedding.values for embedding in embeddings]


In [9]:
PROJECT_ID = ! gcloud config get-value project
PROJECT_ID = PROJECT_ID[0]
LOCATION = "us-central1"
if PROJECT_ID == "(unset)":
    print(f"Please set the project ID manually below")

In [11]:
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)

In [12]:
DEPLOYED_INDEX_ID = "vs_quickstart_deployed_143"
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(
    index_endpoint_name='projects/809892116196/locations/us-central1/indexEndpoints/2671712100425924608'
)

In [13]:
import vertexai
from vertexai.generative_models import GenerativeModel, Part

vertexai.init(project=PROJECT_ID, location=LOCATION)
model = GenerativeModel(model_name="gemini-1.0-pro")


In [14]:
def read_pdf(file_location):
    reader = PdfReader(file_location)
    accum = []
    for page in reader.pages:
        accum.append(page.extract_text())
    return '\n'.join(accum)

In [16]:
aaron_resume = read_pdf('Data Analyst/Aaron Li Resume 2023.pdf')

In [18]:
prompt_pre = """
You are a seasoned career coach. you have coached dozens of clients. You help clients discover what they want out of their career. 

You are giving advice to a data analyst with the resume below. He wants to switch to a more technical data scientist or machine learning engineer
role and is seeking your guidance to improve.
what does his resume look like 3 years. He has also had a chat with you

resume: {resume}
chat contents: {chat_contents}
resume in three years:

"""

hypothetical_resume = model.generate_content(
        # Add an example query
        prompt_pre.format(
            resume=aaron_resume,
            chat_contents=dialogue
                         )
)

In [22]:
display(Markdown(hypothetical_resume.text))

## Aaron Li - Data Scientist / Machine Learning Engineer

**[Your website or portfolio link here]** | (314) 250-4579 | New York, NY

**Summary of Qualifications:**

Seasoned data scientist with over 5 years of experience specializing in the application of AI and machine learning to solve complex business problems. Proven ability to design and implement ETL pipelines, analyze large datasets, and build models that deliver significant business impact. Expert in various programming languages including Python, SQL, and R, as well as machine learning libraries like TensorFlow and PyTorch. Passionate about driving innovation and collaborating with cross-functional teams to achieve common goals.

**Technical Skills:**

* **Programming Languages:** Python, SQL, R, Java
* **Machine Learning Libraries:** TensorFlow, PyTorch, Scikit-learn
* **Data Analysis Tools:** Pandas, NumPy, Spark, Airflow
* **Cloud Platforms:** AWS, Azure
* **Visualization Tools:** Tableau, Power BI, Grafana

**Experience:**

**Machine Learning Engineer** | Acme Corporation | New York, NY | 2024 - Present

* Developed and deployed machine learning models that improved customer churn prediction by 20%, leading to a 15% increase in customer retention.
* Led the design and implementation of an automated system for anomaly detection in financial transactions, preventing over $1 million in fraudulent activity.
* Developed a natural language processing model for sentiment analysis of social media data, generating insights that informed product development and marketing campaigns.
* Built and deployed a recommendation engine for e-commerce platform, resulting in a 10% increase in average order value.

**Data Scientist** | Apex Inc. | San Francisco, CA | 2022 - 2024

* Designed and implemented ETL pipelines for processing large-scale datasets from diverse sources.
* Performed exploratory data analysis and identified key insights that informed business decisions.
* Built and deployed machine learning models for various applications, including fraud detection, customer segmentation, and predictive maintenance.
* Collaborated with cross-functional teams to translate data insights into actionable recommendations and optimize business processes.

**Education:**

Master of Science in Data Science, Columbia University, New York, NY (2022)
Bachelor of Science in Mathematics, University of California, Berkeley (2020)

**Projects:**

* Developed a deep learning model for image classification that achieved 95% accuracy on the ImageNet dataset.
* Built a real-time sentiment analysis tool for Twitter data that analyzes public opinion on trending topics.
* Implemented a collaborative filtering algorithm for movie recommendations that personalized user experience.

**Leadership & Activities:**

* Co-organized a machine learning workshop for aspiring data scientists.
* Mentored junior data science interns on applying machine learning techniques to real-world problems.
* Actively participated in meetups and conferences to stay informed about the latest advancements in the field.

**Key Strengths:**

* Strong analytical and problem-solving skills.
* Deep understanding of machine learning algorithms and their applications.
* Excellent communication and collaboration skills.
* Ability to translate complex data insights into actionable recommendations.
* Passionate about innovation and applying data science to real-world problems.

This resume reflects your aspirations and goals, combining your passion for impactful work, research, and leadership with a focus on machine learning and data science expertise. It emphasizes your technical skills, achievements, and leadership experience, positioning you as a strong candidate for technical data scientist and machine learning engineer roles. Remember to personalize your resume further by incorporating specific details and experiences relevant to each job application. I'm confident that with your dedication and continued learning, you will achieve your dream career path! 


In [24]:
hypothetical_resume_embedding = embed_text(texts=[hypothetical_resume.text])

In [25]:
response = my_index_endpoint.find_neighbors(
    deployed_index_id = DEPLOYED_INDEX_ID,
    queries = [hypothetical_resume_embedding[0]],
    num_neighbors = 10
)

# show the results
for idx, neighbor in enumerate(response[0]):
    print(neighbor)
    print(f"{neighbor.distance:.2f} {neighbor.id}")

MatchNeighbor(id='5', distance=0.03427734971046448, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.03 5
MatchNeighbor(id='41', distance=0.031075958162546158, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.03 41
MatchNeighbor(id='9', distance=0.02756562829017639, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.03 9
MatchNeighbor(id='3', distance=0.0209933053702116, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.02 3
MatchNeighbor(id='30', distance=0.019494637846946716, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.02 30
MatchNeighbor(id='27', distance=0.01804642751812935, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.02 27
MatchNeighbor(id='46', distance=0.01592842862010002, feature_vector=[], crowding_tag='0', restricts=[], numeric_restricts=[])
0.02 46
MatchNeighbor(id='25', distance=0.015420296229422092, feature_vecto

In [27]:
with open('career-catalyst-standard-resume/standardized-resume-text/2024-04-20/data_analyst_resumes.jsonl') as f:
    resumes = pd.DataFrame([json.loads(l) for l in f])

In [30]:
resumes.iloc[5].tolist()

['# EXPERIENCE\n\n - Quality Technician at Intralox, performing quality inspections and tests, conducting root cause analysis, and collaborating with teams to improve processes.\n- Quality Manager at IWS Gas and Supply, leading a $500k lab expansion project, ensuring ISO 17025 compliance, overseeing quality control programs, and implementing quality training.\n- Manufacturing Data Analyst at IWS Gas and Supply, analyzing data to identify trends and opportunities for improvement, developing data collection systems, and conducting internal audits.\n- Skilled in quality assurance, professional software, measurement tools, root cause analysis, statistical process control, documentation, quality management systems, data analysis, process optimization, statistical tools, data visualization, predictive maintenance, cross-functional collaboration, chemical testing, laboratory safety, instrumentation, data interpretation, quality control, and technical documentation.\n- Proficient in Excel, Sig

In [35]:
real_resumes = resumes.iloc[[5, 41, 9, 3, 30, 27, 46, 25, 17, 12]]['content'].tolist()

In [36]:
prompt_augmented_with_data = """
You are a seasoned career coach. you have coached dozens of clients. You help clients discover what they want out of their career. 

You are giving advice to a data analyst with the resume below. He wants to switch to a more technical data scientist or machine learning engineer
role and is seeking your guidance to improve.
what does his resume look like 3 years. He has also had a chat with you.
Use the real resumes to make the output more realistic.

resume: {resume}
other real resumes: {real_resumes}
chat contents: {chat_contents}
resume in three years:

"""

hypothetical_resume = model.generate_content(
        # Add an example query
        prompt_pre.format(
            resume=aaron_resume,
            real_resumes = real_resumes,
            chat_contents=dialogue
                         )
)

In [37]:
display(Markdown(hypothetical_resume.text))

## AARON LI

**(314) 250-4579** | **aaron.li.workday@gmail.com** | **New York, NY**

**SUMMARY OF QUALIFICATIONS**

Aaron brings over five years of experience in data analysis, business intelligence, and machine learning. He is proficient in SQL, Python, and cloud platforms, with a strong focus on building machine learning models and developing ETL pipelines. Aaron demonstrates expertise in data visualization, using tools like Grafana, Tableau, and Power BI for effective communication and decision-making. 

Aaron is recognized for his problem-solving abilities, collaborative spirit, and passion for learning. 


** EDUCATION** 


* **Master of Science in Business Analytics**, Financial Technology Analysis Track (2023)
  * Olin School of Business, Washington University in St. Louis, St. Louis, MO

* **Bachelor of Business Administration** (2022)
  * Major in Finance
  * Minor in Data Science
  * Northeastern University, Boston, MA


**PROFESSIONAL EXPERIENCE**

**Machine Learning Engineer** (2024 - Present)
* **Acme Corporation**, New York, NY

* Developed and deployed machine learning models for various use cases, including churn prediction, fraud detection, and customer segmentation.
* Implemented end-to-end machine learning pipelines using cloud platforms (e.g., AWS, GCP).
* Conducted A/B testing and monitored model performance for continuous improvement. 
* Contributed to the development of an automated Machine Learning platform for faster experimentation and deployment.
* Collaborated effectively with cross-functional teams to translate business needs into technical solutions.


**Data Analyst** (2022 - 2024)
* **Haddee Education, Jersey City, NJ**

* Built ETL pipelines using Python for loading and processing large datasets.
* Developed interactive dashboards in Grafana using SQL and Python for data visualization and exploration.
* Performed analysis of customer purchase order data to provide insights and recommendations to improve business operations.

**Data Analyst (Capstone Project)** (2022)
* **Ascension Healthcare, St. Louis, MO**

* Developed an ETL pipeline to extract, clean, and transform financial data using Python and cloud platforms.
* Performed time series analysis to identify financial trends and seasonality patterns.
* Built machine learning models for predicting healthcare costs.

**Data Science Consulting Intern** (2022)
* **Big Data & Analytics, HCR, Beijing, China**

* Conducted data mining analysis on customer feedback from online forums and surveys.
* Built NLP models for extracting insights from unstructured text data.
* Performed cluster analysis to identify distinct customer segments based on their needs and preferences.

**LDATopic Extraction for Vaccine Discourse** (2021)

* Developed an NLP pipeline for extracting key topics and sentiment from vaccine-related articles, research papers, and social media data.
* Used topic modeling to analyze the evolution of public discourse on vaccines over time.
* Identified emerging trends and insights relevant to vaccine development, communication, and public understanding.

**Business Analyst** (2019)
* **Northeastern Accounting Department, Boston, MA**

* Developed Power BI dashboards to visualize key financial metrics for budget planning and performance analysis.
* Conducted ad-hoc data analysis to support departmental decision-making.

**SKILLS & TECHNOLOGIES**

* Programming Languages: Python, SQL, R
* Data Engineering: Airflow, Spark, AWS, GCP
* Machine Learning: Scikit-learn, TensorFlow, PyTorch
* NLP Libraries: NLTK, Gensim, spaCy
* Cloud Computing: AWS EC2, S3, Google Cloud Platform, Azure
* Data Visualization: Grafana, Tableau, Power BI
* Communication: Excellent communication and presentation skills.
* Teamwork: Proven ability to collaborate effectively in a team-oriented environment.
