### Summarize jobs skills & requirements with LangChain and GPT4

This project is a variant from an existing LangChain tutorial.  It connects LangChain & ChatGPT to create a summary of skills to master in order to be selected for a job offer.

In this example, I use prebuilt dataset of jobs decriptions that I had previously currated in the field of Machine Learning and Deep Learning. Take note that most of the data that I have found is senior positions, also the dataset is very small (14).

I won't share my dataset, but here's the output.


# The output


## Consolidated Summary

The new career path involves a variety of roles such as Deep Learning Researchers, ML Engineers, AI Software Developers, and Senior Software Engineers. These roles require a strong background in computer science, engineering, mathematics, statistics, or a related field with a focus on deep learning, machine learning, or AI. The responsibilities range from designing, building, and testing machine learning models, developing and integrating machine learning models for interactive intent and motion forecasting, implementing revolutionary AI technologies, to mentoring junior developers. The roles require a minimum of 5 years of experience in deep learning or machine learning, with proven experience in successful implementation and optimization of models. The roles also require strong programming skills in Python, experience with deep learning frameworks like TensorFlow, PyTorch, or Keras, and knowledge of big data processing frameworks like Hadoop, Spark, or Flink. Familiarity with the latest developments in Large Language Models (LLMs) and proficiency in other programming languages is also necessary. Exceptional problem-solving, analytical, and critical thinking skills, excellent communication and collaboration skills, and the ability to work effectively in a diverse team environment are also essential.

## Consolidated Top Skills and Technologies

1. **Deep Learning and Machine Learning**: Knowledge and experience in deep learning and machine learning techniques such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Transformers, Reinforcement Learning, and Large Language Models (LLMs).
2. **Programming**: Strong programming skills in Python and proficiency in other programming languages like C/C++, Java, Go, PHP, and JavaScript.
3. **Deep Learning and Machine Learning Frameworks**: Experience with deep learning and machine learning frameworks such as TensorFlow, PyTorch, Keras, HuggingFace, LangChain, etc.
4. **Big Data**: Knowledge of big data processing frameworks such as Hadoop, Spark, or Flink. Experience working with large-scale datasets is essential.
5. **Data Integration and Analysis Platforms**: Knowledge of Palantir Foundry or similar data integration and analysis platforms is a plus.
7. **Problem-Solving**: Exceptional problem-solving, analytical, and critical thinking skills.
8. **Communication**: Excellent communication and collaboration skills, with the ability to work effectively in a diverse team environment.
9. **Multi-tasking**: Ability to adapt to a dynamic work environment and manage multiple projects simultaneously.
10. **Domain Knowledge**: Knowledge of the geoeconomic domain.
11. **Statistical Analysis**: Deep understanding of statistical analysis, probability theory, and experimental design.
12. **NLP/LLMs/Deep Learning**: Experience in building NLP/LLMs or Deep Learning models.
13. **NLP Techniques**: Expertise in sentiment analysis, word embedding, POS tagging, topic modeling, text classification, machine translation, speech recognition, NER, NLG, etc.
14. **Deep Learning Techniques**: Experience with CNNs and RNNs, and understanding of building and training these models.
15. **LMs and LLMs**: Familiarity with GPT, BERT, and Transformer models.
16. **Research Implementation**: Ability to comprehend and implement AI research papers.
17. **Data Handling**: Knowledge and experience in structured and unstructured data Information Extraction, Knowledge Information Retrieval, and Knowledge Representation.
18. **Problem-Solving**: Strong problem-solving and analytical skills.
19. **Big Data Technologies**: Experience with data engineering technologies such as AWS Glue, EMR, Athena, Redshift, Lake Formation, Apache Spark, Apache Hive, Apache Airflow, S3FS, Apache Hudi, and Trino.
20. **Data Pipelines**: The ability to design, build, and maintain scalable and efficient data pipelines.
21. **AngularJS**: Experience in developing and maintaining web applications using AngularJS.
22. **DevOps**: Knowledge of DevOps best practices, including CI/CD pipelines using tools like Terraform, Jenkins, Github actions, Gitflow.
23. **AWS**: Familiarity with AWS and its services like Cognito for user authentication and authorization.
24. **HTML, CSS, and JavaScript**: A strong understanding of these web development technologies.
25. **RESTful APIs and JSON**: Experience working with these.
26. **Microservices Architecture**: Familiarity with this architecture.
27. **Data Management**: Experience in managing data-related concerns like data catalog, data lineage, data quality, data profiling, data discovery, and metadata management.
28. **Engineering and Computer Science Fundamentals**: A degree in CS, Math, or equivalent experience.
29. **Neural Network Architectures**: Expertise in implementing large neural-network architectures such as Transformers.
30. **Cloud Computing**: The ability to adapt algorithms and architectures to modern cloud computing environments (GPUs/TPUs).
31. **MLOps Tools**: Experience with CloudML and MLOps tools like Kubeflow, AWS Sagemaker, Google AI Platform, Azure Machine Learning.
32. **Communication Skills**: Excellent communication skills.
33. **NLP Libraries**: Experience with NLP libraries such as NLTK, spaCy, or Transformers.
34. **Chatbot Development Tools**: Knowledge of chatbot development tools such as Dialogflow, Botpress, or Rasa.
35. **Analytical and Problem-Solving Skills**: Strong analytical and problem-solving skills.
36. **Communication and Interpersonal Skills**: Excellent communication and interpersonal skills.
37. **Attention to Detail**: Strong attention to detail.
38. **Project Management**: Experience in project management and project delivery methodologies.
39. **React, Redux, and/or Typescript**: Experience with these technologies.
40. **GitHub**: Experience with GitHub development workflow.
41. **Unix-based development environment**: Professional experience in this environment.
42. **Matlab**: Experience with Matlab.
43. **Data visualization software development**: This skill.
44. **Databasing experience**: This includes the design of scalable data handling solutions.

## Optional but Beneficial Skills

1. **Large-Scale Data Processing**: Experience with technologies like Hadoop, Spark, and distributed computing systems.
2. **Familiarity with technology stacks such as AWS, Kubernetes, Docker, Kubeflow, Ray, Tensorflow, PyTorch**.

## Suggestions

To prepare for these roles, consider taking advanced courses in deep learning and machine learning, and gain hands-on experience with Python and deep learning frameworks like TensorFlow, PyTorch, or Keras. Familiarize yourself with big data processing frameworks like Hadoop, Spark, or Flink. Enhance your problem-solving and communication skills. Stay updated with the latest developments in Machine Learning, particularly Large Language Models (LLMs). Practice using Machine Learning software tools and libraries such as TensorFlow, PyTorch, HuggingFace, LangChain, etc. Lastly, develop an understanding of the geoeconomic domain.


In [None]:
!pip install openai
!pip install tiktoken

In [2]:
# LLMs
from langchain import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

# load files
from langchain.document_loaders import UnstructuredFileLoader
from langchain.document_loaders import DirectoryLoader

# Environment Variables
import os
from os.path import join, dirname
from dotenv import load_dotenv


In [None]:
# Link environement vars
NOTEBOOK_DIR = os.path.dirname(os.path.abspath('')) + '/projet'
ROOT_DIR = os.path.dirname(os.path.abspath('..'))
ENV_DIR = ROOT_DIR + '../env'
dotenv_path = join(ENV_DIR, '.env.local')

load_dotenv(dotenv_path)


In [4]:
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
DATA_DIR = NOTEBOOK_DIR + '/data/'

In [None]:
# Combine all text files in one single string
loader = DirectoryLoader(DATA_DIR, glob="**/*.txt", show_progress=True)
docs = loader.load()
len(docs)

In [18]:
map_prompt = """You are a helpful AI bot that aids a person find skills to master to build a new carreer path.
Below is information about the role.
Information will include job descriptions and skills requirements.
Your goal is to generate a list of top skills and technologies the person need to be qualified with.
Use specifics from the research when possible

% START OF INFORMATION:
{text}
% END OF INFORMATION:

Please respond with a summary, a list and suggestions based on the topics above

YOUR RESPONSE:"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

In [19]:
combine_prompt = """
You are a helpful AI bot that aids a build a new carreer path comming from an experienced front-end software engineer.
You will be given a summary of roles, suggestions, and a list of potential top skills and technologies the person need to be qualified with.

Please consolidate the informations about all the different roles and create one list of skills and one summary from all the content you've received.

Respond in markdown format.

% SUMMARY AND LIST
{text}
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])


In [20]:
llm = ChatOpenAI(temperature=0, model_name='gpt-4')

chain = load_summarize_chain(llm,
                             chain_type="map_reduce",
                             map_prompt=map_prompt_template,
                             combine_prompt=combine_prompt_template,
                             verbose=True
                            )



In [None]:
output = chain({"input_documents": docs})

In [None]:
print (output['output_text'])

In [23]:
with open("out-result-skills-to-master.md", "w") as text_file:
    text_file.write(output['output_text'])
