**Installing Required Libraries**

In [None]:
!pip install langchain-core langchain_groq langchain_community



**Setting up the AI model**

In [None]:
from langchain_groq import ChatGroq
llm = ChatGroq(
    temperature = 0,
    groq_api_key = " ",
    model_name = "llama-3.3-70b-versatile"
)

response = llm.invoke("What is the Meaning of Anubhab")
print(response.content)

"Anubhab" (अनुभव) is a Sanskrit word that has a rich meaning. It is a noun that can be translated to English as "experience" or "perception." However, the depth of its meaning goes beyond these simple translations.

In essence, "Anubhab" refers to the direct, personal experience or realization of something, often related to spiritual or philosophical concepts. It encompasses the idea of gaining insight or understanding through direct perception, intuition, or personal encounter.

In Hinduism, Buddhism, and other Eastern spiritual traditions, "Anubhab" is often used to describe the experience of enlightenment, self-realization, or the attainment of higher states of consciousness. It implies a deep, subjective understanding that transcends intellectual knowledge or mere intellectual comprehension.

In everyday usage, "Anubhab" can also refer to a person's individual experiences, feelings, or emotions, such as joy, sorrow, or wonder. It acknowledges that each person's experiences and perc

**Extracting Job Descriptions from Websites**

In [None]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.amazon.jobs/en/jobs/2839145/machine-learning-engineer")
page_data = loader.load().pop().page_content
print(page_data)

Amazon.jobs
Skip to main contentHomeTeamsLocationsJob categoriesYour job applicationResourcesDisability accommodationsBenefitsInclusive experiencesInterview tipsLeadership principlesWorking at AmazonFAQ×Sorry, the job you're looking for isn't available.There are other opportunities you might be interested in. Check them out by searching for jobs below.Search for jobsCheck out our openings by  searching with job titles, teams, or locations.SearchGet to know usLearn about working at Amazon, and  read the stories of our pioneers.Learn moreLearn how we hireGet answers from our FAQs on the  hiring and interviewing process.Go to FAQsJOIN US ONFind CareersJob CategoriesTeamsLocationsUS and EU Military recruitingWarehouse and Hourly JobsWorking At AmazonCultureBenefitsAmazon NewsletterDiversity at AmazonOur leadership principlesHelpFAQInterview tipsReview application statusDisability accommodationsEU background checksAmazon is committed to a diverse and inclusive workplace. Amazon is an equal 

**Creating a Promt to Extract Job Data**

In [None]:
from langchain_core.prompts import PromptTemplate
prompt_extract = PromptTemplate.from_template("""
        ### SCRAPED TEXT FROM WEBSITE:
        {page_data}
        ### INSTRUCTION:
        The scraped text is from the career's page of a website.
        Your job is to extract the job postings and return them in JSON format containing the
        following keys: `role`, `experience`, `skills` and `description`.
        Only return the valid JSON.
        ### VALID JSON (NO PREAMBLE):
        """)

**Running the AI model on the Job Description**

In [None]:
chain_extract = prompt_extract | llm
res = chain_extract.invoke(input = {'page_data': page_data})
print(res.content)

```json
[]
```


Coverting the AI Output into **JSON**

In [None]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()
json_res = json_parser.parse(res.content)
print(json_res)

[]


**Loading a Technology List**

In [None]:
import pandas as pd
df = pd.read_csv("/content/technology_list_diverse.csv")
df

Unnamed: 0,ID,Technology
0,0,"Python, Django, MySQL"
1,1,"Machine Learning, Python, TensorFlow"
2,2,"Large Languae Models, Python, PyTorch"
3,3,"Deep Learning, Keras, Python"
4,4,"Flutter, Firebase, GraphQL"
5,5,"React Native, Node.js, MongoDB"
6,6,"IOS, Swift, Core Data"
7,7,"Android, Java, Room Persistence"
8,8,"Kotlin, Android, Firebase"
9,9,"Web Development, React, Node.js"


**Installing ChromaDB (Vector Database)**

In [None]:
!pip install chromadb



**Setting up ChromaDB**

In [None]:
import chromadb

client = chromadb.Client()
# Try to get the collection; if it doesn't exist, create it.
try:
    collections = client.get_collection(name="interview")
except chromadb.errors.CollectionNotFound:
    collections = client.create_collection(name="interview")

Storing Job Data in **ChromaDB**

In [None]:
collections.add(
    documents = [
        "This document is about Kedarnath",
        "This document is about Maharastra"
    ],

    ids = ['id1', 'id2'],
    metadatas = [
        {"url": "https://shrikedarnathcharitabletrust.uk.gov.in/index.html"},
        {"url": "https://www.dagdushethganpati.com/"},
    ]
)

all_docs = collections.get()
print(all_docs)

all_docs

documents = collections.get(ids = ['id1'])
documents

results = collections.query(
    query_texts = ['Query is about Ganpati'],
    n_results = 2
)

results



{'ids': ['id1', 'id2'], 'embeddings': None, 'documents': ['This document is about Kedarnath', 'This document is about Maharastra'], 'uris': None, 'data': None, 'metadatas': [{'url': 'https://shrikedarnathcharitabletrust.uk.gov.in/index.html'}, {'url': 'https://www.dagdushethganpati.com/'}], 'included': [<IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}


{'ids': [['id2', 'id1']],
 'embeddings': None,
 'documents': [['This document is about Maharastra',
   'This document is about Kedarnath']],
 'uris': None,
 'data': None,
 'metadatas': [[{'url': 'https://www.dagdushethganpati.com/'},
   {'url': 'https://shrikedarnathcharitabletrust.uk.gov.in/index.html'}]],
 'distances': [[1.1875228881835938, 1.2304192781448364]],
 'included': [<IncludeEnum.distances: 'distances'>,
  <IncludeEnum.documents: 'documents'>,
  <IncludeEnum.metadatas: 'metadatas'>]}

In [None]:
import uuid
import chromadb

In [None]:
client = chromadb.PersistentClient()
collections = client.get_or_create_collection(name = "technology_list_diverse_data")

Matching Skills from Job Descriptions

In [None]:
df

if not collections.count():
  for i, row in  df.iterrows():
    collections.add(documents = row['Technology'], ids =[str(uuid.uuid4())])

tech = collections.query(query_texts=['Experience in Python', 'MERN stack'], n_results = 2, ).get('documents', [])

tech

json_res

# if type(json_res) == dict:
#   job = json_res.get("skills", [])
# else:
#   job = json_res[0].get("skills", [])

# job

if type(json_res) == dict:
  job = json_res.get("skills", [])
elif type(json_res) == list and len(json_res) > 0: # Check if json_res is a list and not empty
  job = json_res[0].get("skills", [])
else:
  job = [] # Assign an empty list if json_res is an empty list or not a list



prompt_skills_and_question = PromptTemplate.from_template("""
        ### JOB DESCRIPTION:
        {job_description}

        ### INSTRUCTION:
        You are Mishu Dhar Chando, the CEO of Knowledge Doctor, a YouTube channel specializing in educating individuals on machine learning, deep learning, and natural language processing.
        Your expertise lies in bridging the gap between theoretical knowledge and practical applications through engaging content and innovative problem-solving techniques.
        Your job is to:
        1. Analyze the given job description to identify the required technical skills and match them with the provided skill set to calculate a percentage match.
        2. Generate a list of relevant interview questions based on the job description (20 Medium to Adcanced Level Interview Questions).
        3. Return the information in JSON format with the following keys:
            - `skills_match`: A dictionary where each key is a skill, and the value is the matching percentage.
            - `interview_questions`: A list of tailored questions related to the job description.

        Only return the valid JSON.
        ### VALID JSON (NO PREAMBLE):

        """)

In [None]:
chain_skills_and_question = prompt_skills_and_question | llm
res = chain_skills_and_question.invoke({"job_description": str(job)})
print(res.content)


```json
{
    "skills_match": {
        "Machine Learning": 90,
        "Deep Learning": 85,
        "Natural Language Processing": 95,
        "Problem-Solving": 80,
        "Content Creation": 70
    },
    "interview_questions": [
        "What is your experience with machine learning frameworks such as TensorFlow or PyTorch?",
        "How do you approach a deep learning project, from data preprocessing to model deployment?",
        "Can you explain the concept of attention mechanisms in natural language processing?",
        "How do you handle overfitting in machine learning models?",
        "What is your experience with data visualization tools such as Matplotlib or Seaborn?",
        "How do you evaluate the performance of a machine learning model?",
        "Can you describe a project where you applied transfer learning to achieve state-of-the-art results?",
        "How do you approach feature engineering for a machine learning project?",
        "What is your experience wit

**Installing Gradio (For Web Interface)**

In [None]:
!pip install gradio




Installing the libraries for WebApp

In [None]:
import gradio as gr
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
import pandas as pd
import uuid
import chromadb
from langchain_groq import ChatGroq

**Creating a Web App with Gradio**

In [None]:
llm = ChatGroq(
    temperature = 0,
    groq_api_key = " ",
    model_name = "llama-3.3-70b-versatile"
)

**Handling User Input in the Web App**

In [None]:
def preproces_job_posting(url, technology_list_diverse_csv):
  loader = WebBaseLoader(url)
  page_data = loader.load().pop().page_content
  prompt_extract = PromptTemplate.from_template("""
        ### SCRAPED TEXT FROM WEBSITE:
        {page_data}
        ### INSTRUCTION:
        The scraped text is from the career's page of a website.
        Your job is to extract the job postings and return them in JSON format containing the
        following keys: `role`, `experience`, `skills` and `description`.
        Only return the valid JSON.
        ### VALID JSON (NO PREAMBLE):
        """)

  chain_extract = prompt_extract | llm
  res_1 = chain_extract.invoke(input = {'page_data': page_data})
  json_parser = JsonOutputParser()
  json_res = json_parser.parse(res_1.content)

  df = pd.read_csv("/content/technology_list_diverse.csv")

  client = chromadb.PersistentClient('vectorstore')
  collections = client.get_or_create_collection(name="technology_list_diverse_app")
  if not collections.count():
    for i, row in  df.iterrows():
      collections.add(documents = row['Technology'], ids =[str(uuid.uuid4())])
  job  = json_res.get('skills', []) if type(json_res) == dict else json_res[0].get('skills', [])

  prompt_skills_and_question = PromptTemplate.from_template("""
        ### JOB DESCRIPTION:
        {job_description}

        ### INSTRUCTION:
        You are Mishu Dhar Chando, the CEO of Knowledge Doctor, a YouTube channel specializing in educating individuals on machine learning, deep learning, and natural language processing.
        Your expertise lies in bridging the gap between theoretical knowledge and practical applications through engaging content and innovative problem-solving techniques.
        Your job is to:
        1. Analyze the given job description to identify the required technical skills and match them with the provided skill set to calculate a percentage match.
        2. Generate a list of relevant interview questions based on the job description (20 Medium to Adcanced Level Interview Questions).
        3. Return the information in JSON format with the following keys:
            - `skills_match`: A dictionary where each key is a skill, and the value is the matching percentage.
            - `interview_questions`: A list of tailored questions related to the job description.

        Only return the valid JSON""")

  chain_skills_and_question = prompt_skills_and_question | llm
  res1 = chain_skills_and_question.invoke({"job_description": str(job)})
  final_result = json_parser.parse(res1.content)
  return final_result

**Running the Web App**

In [None]:
def gradio_interface(url, technology_list_diverse_csv):
  try:
    result = preproces_job_posting(url, technology_list_diverse_csv)
    return result
  except Exception as e:
    return {"error": str(e)}

with gr.Blocks(theme='Respair/Shiki@1.2.1') as app:
  gr.Markdown("# Job Scraping & Analyzer with Interview Preparation Questions Using Gen-AI")

  with gr.Row():
    url_input = gr.Textbox(label = "Website URL", placeholder="Enter the url of the job posting")
    portfolio_input = gr.File(label = "Upload Portfolio CSV")

  analyze_button = gr.Button("Analyze Job Posting")
  output_box = gr.JSON(label = "Result")

  analyze_button.click(gradio_interface, inputs = [url_input, portfolio_input], outputs = output_box)

app.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1d7e16dcf39bd42820.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


