# Programmatic SEO using CrackedDevs API and LLMs
<a target="_blank" href="https://colab.research.google.com/github/batuhanaky/crackkeddevs-programmatic-seo/crackeddevs_seo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Have you ever searched "Software Engineer Salaries" on Google?

Major job boards like Glassdoor and Indeed use programmatic SEO for generating aggregated reports in bulk, and they rank very well on Google.

I bet you clicked one of their programmatic pages at least once in your life, and know what I'm talking about. Those pages are useful for the users, and they generate free traffic for the job boards.


In our journey, we will:
- Get data from the CrackedDevs API.
- Process the data to make it suitable for our project.
- Cluster the data together using LLMs.
- Generate human-friendly unique templates using LLMs.
- Dynamically populate the templates and generate Programmatic SEO pages for CrackedDevs.


Since we don't have direct access to the CD database, we will use their API. Using the API for such task is no the optimal solution, however, that's what we have right now.

You may think of this hackathon build as a tutorial, as well.


To keep it simple and accessible, we will only use this notebook for the entire project. I could write a fully equipped Python library and a cli tool, however, I want this project to teach people some stuff. That seems more important than winning the Hackathon.

## Setting up the environment

In [1]:
!pip install openai==1.7.2 retry

Collecting openai==1.7.2
  Downloading openai-1.7.2-py3-none-any.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.1/212.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting retry
  Downloading retry-0.9.2-py2.py3-none-any.whl (8.0 kB)
Collecting httpx<1,>=0.23.0 (from openai==1.7.2)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
Collecting typing-extensions<5,>=4.7 (from openai==1.7.2)
  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Collecting py<2.0.0,>=1.4.26 (from retry)
  Downloading py-1.11.0-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.7/98.7 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai==1.7.2)
  Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import requests
import pandas as pd
import openai
from typing import Optional, Dict, Any, List
from concurrent.futures import ThreadPoolExecutor
from retry import retry
from pprint import pprint
from IPython.display import Markdown
import os

## Your API Keys
**Important**: You need to provide your API keys in order to use this notebook. Since the notebook runs on your own account, I won't be able to see your keys.

In [3]:
CRACKEDDEVS_API_KEY = "your-api-key"
OPENAI_API_KEY="your-api-key"
openai.api_key = OPENAI_API_KEY

## Functions
I bundled all functions together to tidy up the notebook. I tried not to wrap short code pieces within functions, as they help with the narration.

In [4]:
def get_jobs(api_key: str, limit: int = 10, page: int = 1,
             min_salary: Optional[int] = None, max_salary: Optional[int] = None,
             job_types: Optional[str] = None, degree_required: Optional[bool] = None,
             technologies: Optional[str] = None, location_iso: Optional[str] = None) -> Dict[str, Any]:
    """
    Fetches job listings from the api.crackeddevs.com API.

    Parameters:
    api_key (str): API key for authentication.
    limit (int): Number of results per page.
    page (int): Page number of results.
    min_salary (Optional[int]): Minimum salary filter.
    max_salary (Optional[int]): Maximum salary filter.
    job_types (Optional[str]): Comma-separated job types (e.g., 'full_time,part_time').
    degree_required (Optional[bool]): Filter for degree requirement.
    technologies (Optional[str]): Comma-separated list of technologies (COMING SOON).
    location_iso (Optional[str]): Comma-separated list of location ISO codes (COMING SOON).

    Returns:
    Dict[str, Any]: JSON response containing job listings.
    """

    url = 'https://api.crackeddevs.com/api/get-jobs'
    headers = {'api-key': api_key}
    params = {
        'limit': limit,
        'page': page,
        'min_salary': min_salary,
        'max_salary': max_salary,
        'job_types': job_types,
        'degree_required': degree_required,
        'technologies': technologies,
        'location_iso': location_iso
    }

    # Remove None values from params
    params = {k: v for k, v in params.items() if v is not None}

    response = requests.get(url, headers=headers, params=params)
    response_json = response.json()
    if response.status_code == 200:
          return response_json
    else:
        error = response_json.get("error", "")
        if not error:
          error = response_json.get("message", "")
        return {'error': 'Failed to fetch data', 'status_code': response.status_code, "error": error}


In [5]:

def get_all_jobs(api_key: str, min_salary: Optional[int] = None, max_salary: Optional[int] = None,
                 job_types: Optional[str] = None, degree_required: Optional[bool] = None,
                 technologies: Optional[str] = None, location_iso: Optional[str] = None) -> List[Dict[str, Any]]:
    """
    Retrieves all job listings by iterating through the results.

    Parameters:
    api_key (str): API key for authentication.
    min_salary (Optional[int]): Minimum salary filter.
    max_salary (Optional[int]): Maximum salary filter.
    job_types (Optional[str]): Comma-separated job types (e.g., 'full_time,part_time').
    degree_required (Optional[bool]): Filter for degree requirement.
    technologies (Optional[str]): Comma-separated list of technologies (COMING SOON).
    location_iso (Optional[str]): Comma-separated list of location ISO codes (COMING SOON).

    Returns:
    List[Dict[str, Any]]: A list of all job listings retrieved.
    """
    all_jobs = []
    limit = 30

    for page in range(1, 100):
        response = get_jobs(api_key, limit, page, min_salary, max_salary, job_types,
                            degree_required, technologies, location_iso)

        # Check for error in response
        if 'error' in response:
            print(f"Error fetching page {page}: {response['error']}")
            break

        jobs = response  # API may change in the future, just keeping it clean
        if not jobs:
            break  # No more jobs to fetch

        all_jobs.extend(jobs)

    return all_jobs


In [6]:
## My main function for GPT requests

@retry(tries = 3, delay = 15)
def gptReq(user: str, system: str = "", temperature: float = 0.7, model: str = "gpt-3.5-turbo") -> str:
    """
    Sends a request to the OpenAI GPT model and returns the response.

    Parameters:
    user (str): The user's input message to the model.
    system (str, optional): Additional system message to provide context. Defaults to an empty string.
    temperature (float, optional): The temperature to use for the response generation. Defaults to 1.
    model (str, optional): The identifier of the GPT model to use. Defaults to "gpt-3.5".

    Returns:
    str: The content of the model's response.

    Note:
    The function prints the number of tokens used by the selected model.
    """

    # Mapping user-specified model to the actual model identifier
    selectedModel = {
        "gpt-3.5": "gpt-3.5-turbo",
        "gpt-3.5-16k": "gpt-3.5-turbo-16k",
        "gpt-4": "gpt-4"
    }.get(model, "gpt-3.5-turbo")

    # Creating the completion request
    completion = openai.chat.completions.create(
        model=selectedModel,
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user}
        ],
        temperature=temperature
    )

    # Extracting content and token usage
    content = completion.choices[0].message.content
    tokens = completion.usage.total_tokens

    # For printing the token usage. You may activate if you wish to monitor your usage.
    #print(f"{selectedModel} Tokens:", tokens)

    return content

In [7]:
## For concurrent processing
def apply_concurrently(function, series, max_workers=10):
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Map the function to the series items
        futures = [executor.submit(function, item) for item in series]
        results = [future.result() for future in futures]
    return results

In [8]:
def generalize_job_title(job_title: str) -> str:
    """
    Converts a specific job title to a generalized job name using OpenAI's ChatGPT model.

    Parameters:
    job_title (str): The specific job title to be converted.

    Returns:
    str: The generalized job name.
    """

    prompt = f"""Convert this job title to a generalized job name. Only output the job name.

Example input: Web3 Technical Content Engineer (Remote)
Example output: Technical Content Engineer

Input: {job_title}
Output:"""

    response = gptReq(prompt, system = "The context is software and technology.")

    return response

In [9]:
# Helper function to calculate average salary
def average_salary(row):
    salaries = [s for s in [row['min_salary_usd'], row['max_salary_usd']] if s > 0]
    return int(sum(salaries) / len(salaries)) if salaries else None


In [10]:
def generate_job_content(generalized_job_title: str) -> str:
    """
    Creates human-friendly content from a generalized job title.

    Parameters:
    generalized_job_title (str): Job title to be used

    Returns:
    str: The generalized job name.
    """

    prompt = f"""Task: Write a long informative statistics report about the given software/tech job title.

Rules:
- Write in a straightforward, simple tone.
- Don't use conclusions or any URLs.
- Use markdown for formatting (headings, bold, list)
- Use the given placeholders.
- Don't change the placeholder names.
- Placeholder format: [[placeholder_name]]
-----
Table of contents:
# {generalized_job_title} Average Salaries and Statistics
## Overview
- Job openings:
- Average Salary:
- Degree Requirement:
## Average Salary
## Responsibilities
## How to become a {generalized_job_title}?

Information:

Job title: {generalized_job_title}

Placeholders:
Count of jobs listed: [[count]]
Average Salary ($): [[average_salary]]
Is a degree required? (Percentage) : [[degree_required_percentage]]"""

    response = gptReq(prompt, system = "You write content for CrackedDevs, a job board for developers. The context is software and technology.", temperature = 1)

    return response

In [11]:
def process_title(title):
    return {
        "generalized_title": title,
        "content_template": generate_job_content(title)
    }

## Getting and Processing the CrackedDevs data
We first have to get the CrackedDevs job listing data, using their API. In an ideal world, Programmatic SEO and Data Analysis tasks would be performed on the data accumulated from the database itself.

But this is a hackathon and the main point is to **use what we have**.

We have the API. Let's use it.

In [12]:
"""
This function will auto-paginate through the API and stop when there are no pages left.
In the end, we will have all the data we need.
"""
all_jobs = get_all_jobs(CRACKEDDEVS_API_KEY)

Let's take a look at the data format

In [13]:
## We are the cool kids. We print pretty.
pprint(all_jobs[:2])

[{'applications': 2,
  'company': 'CrowdHack',
  'created_at': '2024-01-19T16:00:15.247126+00:00',
  'degree_required': False,
  'description': 'We are in search of a passionate and organized individual to '
                 'take on the role of Hackathon Organizer/Operator. In this '
                 'position, you will play a pivotal role in managing the '
                 'hackathons associated with our world-renowned platform '
                 '<https://crowdhack.io/> \n'
                 '\n'
                 'As a key member of our team, you will be responsible for '
                 'architecting each hackathon, collaborating with sponsors, '
                 'assembling a panel of esteemed judges, and ensuring the '
                 'successful execution of the hackathons projects. \n'
                 '\n'
                 '* Full ownership of planning and executing hackathons '
                 'associated with <https://crowdhack.io/>\n'
                 '* Collaborate with 

The API returns some valuable information that we may process.

In this scenario, I will focus on the data that would benefit our Programmatic SEO journey. I will use:
- Job names
- Min/Max Salaries
- Degree Requirements, as CrackedDevs has a "no-degree jobs" in the footer. That must be important for their marketing plan.

Looks like they use Markdown for formatting their text. we will stay loyal to their rules.

With enough said, **let's inspect our data in Pandas**.

In [14]:
df = pd.DataFrame(all_jobs)
df

Unnamed: 0,id,title,company,min_salary_usd,max_salary_usd,location_iso,job_type,degree_required,description,url,created_at,applications,views,technologies,image_url
0,4042,Hackathon Organizer/Operator,CrowdHack,10000.0,0.0,,,False,We are in search of a passionate and organized...,https://www.crackeddevs.com/job/4042?ref=api,2024-01-19T16:00:15.247126+00:00,2,7,,https://imgix.cryptojobslist.com/5a671b80-8a8f...
1,4003,Zero Knowledge Cryptography Engineer,Terminal 3,65000.0,115000.0,,,False,\nWe are looking for a Zero-Knowledge Cryptogr...,https://www.crackeddevs.com/job/4003?ref=api,2024-01-19T08:00:12.887483+00:00,1,8,,
2,3995,Founding Full Stack Engineer,Kalder,60000.0,110000.0,,,False,\nAbout Kalder\n\n\nIt has never been more exp...,https://www.crackeddevs.com/job/3995?ref=api,2024-01-19T08:00:12.783277+00:00,1,10,,
3,3992,Software Engineer,Web3Auth,60000.0,110000.0,,,False,Who we areWeb3Auth is a VC-backed company that...,https://www.crackeddevs.com/job/3992?ref=api,2024-01-19T08:00:12.717685+00:00,0,3,,
4,3970,Full Stack Developer (Paid Intern),CodingSprint,2000.0,0.0,,,False,We are seeking a highly motivated individual t...,https://www.crackeddevs.com/job/3970?ref=api,2024-01-19T00:00:16.685524+00:00,2,5,,https://imgix.cryptojobslist.com/2f026d5c-af55...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270,639,Software Engineer Machine Learning,Celonis,,110000.0,,,True,The Team\n\nCelonis AI Group (CeloAI) is a new...,https://www.crackeddevs.com/job/639?ref=api,2023-11-23T01:19:55.992619+00:00,1,8,,https://remoteok.com/assets/img/jobs/60af479b7...
271,638,Sales Engineer,Pendo,,110000.0,,,True,\nTeam Description&nbsp;\n\nPendo’s platform h...,https://www.crackeddevs.com/job/638?ref=api,2023-11-23T01:19:55.903071+00:00,0,2,,https://remoteok.com/assets/img/jobs/1bf43d360...
272,637,Senior Full Stack Engineer,Honor,,105000.0,,,True,"\nAs a senior technical contributor, you will ...",https://www.crackeddevs.com/job/637?ref=api,2023-11-23T01:19:55.61462+00:00,2,13,,https://remoteok.com/assets/img/jobs/8134bd2b6...
273,636,Software Engineer Frontend IN,Findem,,105000.0,,,True,\nWhat is Findem:\n\n\nFindem is HR 2.0. We’re...,https://www.crackeddevs.com/job/636?ref=api,2023-11-23T01:19:55.606833+00:00,9,33,,https://remoteok.com/assets/img/jobs/a77a06bb9...


The data looks neat, but did you notice that some fields are missing? What's worse for our scenario, job titles for the same jobs are **different**.

Of course, they wouldn't keep their data as I wished. We will clean up the data and turn it into something useful for our journey.

Let's begin with eliminating completely useless data. We desperately need the salary information, and the posts without salary data must be **eradicated**.

In [15]:
df = df[~((df['min_salary_usd'].isna() | (df['min_salary_usd'] == 0)) &
          (df['max_salary_usd'].isna() | (df['max_salary_usd'] == 0)))]

In [16]:
## Our DataFrame after the records without any salary data are popped out.
df

Unnamed: 0,id,title,company,min_salary_usd,max_salary_usd,location_iso,job_type,degree_required,description,url,created_at,applications,views,technologies,image_url
0,4042,Hackathon Organizer/Operator,CrowdHack,10000.0,0.0,,,False,We are in search of a passionate and organized...,https://www.crackeddevs.com/job/4042?ref=api,2024-01-19T16:00:15.247126+00:00,2,7,,https://imgix.cryptojobslist.com/5a671b80-8a8f...
1,4003,Zero Knowledge Cryptography Engineer,Terminal 3,65000.0,115000.0,,,False,\nWe are looking for a Zero-Knowledge Cryptogr...,https://www.crackeddevs.com/job/4003?ref=api,2024-01-19T08:00:12.887483+00:00,1,8,,
2,3995,Founding Full Stack Engineer,Kalder,60000.0,110000.0,,,False,\nAbout Kalder\n\n\nIt has never been more exp...,https://www.crackeddevs.com/job/3995?ref=api,2024-01-19T08:00:12.783277+00:00,1,10,,
3,3992,Software Engineer,Web3Auth,60000.0,110000.0,,,False,Who we areWeb3Auth is a VC-backed company that...,https://www.crackeddevs.com/job/3992?ref=api,2024-01-19T08:00:12.717685+00:00,0,3,,
4,3970,Full Stack Developer (Paid Intern),CodingSprint,2000.0,0.0,,,False,We are seeking a highly motivated individual t...,https://www.crackeddevs.com/job/3970?ref=api,2024-01-19T00:00:16.685524+00:00,2,5,,https://imgix.cryptojobslist.com/2f026d5c-af55...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270,639,Software Engineer Machine Learning,Celonis,,110000.0,,,True,The Team\n\nCelonis AI Group (CeloAI) is a new...,https://www.crackeddevs.com/job/639?ref=api,2023-11-23T01:19:55.992619+00:00,1,8,,https://remoteok.com/assets/img/jobs/60af479b7...
271,638,Sales Engineer,Pendo,,110000.0,,,True,\nTeam Description&nbsp;\n\nPendo’s platform h...,https://www.crackeddevs.com/job/638?ref=api,2023-11-23T01:19:55.903071+00:00,0,2,,https://remoteok.com/assets/img/jobs/1bf43d360...
272,637,Senior Full Stack Engineer,Honor,,105000.0,,,True,"\nAs a senior technical contributor, you will ...",https://www.crackeddevs.com/job/637?ref=api,2023-11-23T01:19:55.61462+00:00,2,13,,https://remoteok.com/assets/img/jobs/8134bd2b6...
273,636,Software Engineer Frontend IN,Findem,,105000.0,,,True,\nWhat is Findem:\n\n\nFindem is HR 2.0. We’re...,https://www.crackeddevs.com/job/636?ref=api,2023-11-23T01:19:55.606833+00:00,9,33,,https://remoteok.com/assets/img/jobs/a77a06bb9...


Much better now.

**But, we have one more problem**: Job titles are still different.

We will group the data by their job titles, and we want to somehow unite similar job listings under the same umbrella.

For this task, I will use an LLM.

We have two options:
1. Encoding all the titles with an embedding model and clustering them with their semantic similarity.
2. Straight-out asking an LLM to generalize the titles into broader categories.

**I will go with the second one.** Why? Because we have a limited data and the semantic clustering method would not be as accurate as the "asking the LLM" method.

If we had enough data, I would simply fine-tune an embedding model for better precision. But such an operation would be overkill in our scenario.

My LLM of choice would be Mistral, simply because it offers better models for the buck. But people will be running this notebook and Mistral only accepts via invitation.

Since literally everyone has access to OpenAI, we will use GPT.

**Ideally**, the data would be processed and kept in the database itself. But we don't have such luxury.

Don't worry about the OpenAI bills, this notebook uses less than $0.25 to run end-to-end. And that's despite we use **bad practices**.

In [17]:
# Apply the GPT function concurrently to the 'title' column
df['generalized_title'] = apply_concurrently(generalize_job_title, df['title'])

# Display the DataFrame
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['generalized_title'] = apply_concurrently(generalize_job_title, df['title'])


Unnamed: 0,id,title,company,min_salary_usd,max_salary_usd,location_iso,job_type,degree_required,description,url,created_at,applications,views,technologies,image_url,generalized_title
0,4042,Hackathon Organizer/Operator,CrowdHack,10000.0,0.0,,,False,We are in search of a passionate and organized...,https://www.crackeddevs.com/job/4042?ref=api,2024-01-19T16:00:15.247126+00:00,2,7,,https://imgix.cryptojobslist.com/5a671b80-8a8f...,Organizer/Operator
1,4003,Zero Knowledge Cryptography Engineer,Terminal 3,65000.0,115000.0,,,False,\nWe are looking for a Zero-Knowledge Cryptogr...,https://www.crackeddevs.com/job/4003?ref=api,2024-01-19T08:00:12.887483+00:00,1,8,,,Cryptography Engineer
2,3995,Founding Full Stack Engineer,Kalder,60000.0,110000.0,,,False,\nAbout Kalder\n\n\nIt has never been more exp...,https://www.crackeddevs.com/job/3995?ref=api,2024-01-19T08:00:12.783277+00:00,1,10,,,Full Stack Engineer
3,3992,Software Engineer,Web3Auth,60000.0,110000.0,,,False,Who we areWeb3Auth is a VC-backed company that...,https://www.crackeddevs.com/job/3992?ref=api,2024-01-19T08:00:12.717685+00:00,0,3,,,Software Engineer
4,3970,Full Stack Developer (Paid Intern),CodingSprint,2000.0,0.0,,,False,We are seeking a highly motivated individual t...,https://www.crackeddevs.com/job/3970?ref=api,2024-01-19T00:00:16.685524+00:00,2,5,,https://imgix.cryptojobslist.com/2f026d5c-af55...,Full Stack Developer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270,639,Software Engineer Machine Learning,Celonis,,110000.0,,,True,The Team\n\nCelonis AI Group (CeloAI) is a new...,https://www.crackeddevs.com/job/639?ref=api,2023-11-23T01:19:55.992619+00:00,1,8,,https://remoteok.com/assets/img/jobs/60af479b7...,Software Engineer
271,638,Sales Engineer,Pendo,,110000.0,,,True,\nTeam Description&nbsp;\n\nPendo’s platform h...,https://www.crackeddevs.com/job/638?ref=api,2023-11-23T01:19:55.903071+00:00,0,2,,https://remoteok.com/assets/img/jobs/1bf43d360...,Sales Engineer
272,637,Senior Full Stack Engineer,Honor,,105000.0,,,True,"\nAs a senior technical contributor, you will ...",https://www.crackeddevs.com/job/637?ref=api,2023-11-23T01:19:55.61462+00:00,2,13,,https://remoteok.com/assets/img/jobs/8134bd2b6...,Full Stack Engineer
273,636,Software Engineer Frontend IN,Findem,,105000.0,,,True,\nWhat is Findem:\n\n\nFindem is HR 2.0. We’re...,https://www.crackeddevs.com/job/636?ref=api,2023-11-23T01:19:55.606833+00:00,9,33,,https://remoteok.com/assets/img/jobs/a77a06bb9...,Frontend Software Engineer


In [18]:
# Add a column for average salary
df['average_salary'] = df.apply(average_salary, axis=1)

# Group by 'title' and aggregate data
grouped_df = df.groupby('generalized_title').agg(
    count=('generalized_title', 'size'),
    average_salary=('average_salary', 'mean'),
    degree_required_count=('degree_required', lambda x: x.sum()),
    degree_not_required_count=('degree_required', lambda x: (x == False).sum()),
    technologies=('technologies', lambda x: x.dropna().explode().value_counts().to_dict())
).reset_index()

# Calculate degree required/not required ratio
grouped_df['degree_ratio'] =  grouped_df['degree_required_count'] / grouped_df["count"] * 100

# Replace infinities with NaN if any division by zero occurred
grouped_df['degree_ratio'].replace([float('inf'), -float('inf')], None, inplace=True)

# Display the grouped DataFrame
grouped_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['average_salary'] = df.apply(average_salary, axis=1)


Unnamed: 0,generalized_title,count,average_salary,degree_required_count,degree_not_required_count,technologies,degree_ratio
0,Analytics Engineer,1,100000.0,1,0,{},100.000000
1,Android Engineer,1,102500.0,1,0,{},100.000000
2,Back End Software Engineer,1,105000.0,1,0,{},100.000000
3,Backend Developer,3,88800.0,1,2,"{'docker': 1, 'typescript': 1, 'node': 1, 'gra...",33.333333
4,Backend Engineer,4,108750.0,3,1,{},75.000000
...,...,...,...,...,...,...,...
59,Tech Lead,1,110000.0,1,0,{},100.000000
60,Technical Success Engineer,1,100000.0,1,0,{},100.000000
61,Technical Support Engineer,1,100000.0,1,0,{},100.000000
62,UX Designer,1,110000.0,0,1,{},0.000000


**That's way better than what I expected.**

Note that we didn't go with the best practice. But we still have valuable data in our hands.

These are the aggregated statistics from CrackedDevs platform. Feel free to play with the table, I enjoyed doing so.

## Generating the content

In the minimized section named "Functions", I had written all the functionalities for this notebook. If you wish to take a look at the prompts, you may just scroll there.

Now, we will generate the page content.

But let's take a look at our approach:
- We will use AI-generated templates for each job title.
- AI templates will utilize placeholders for dynamic information. Why? Because updating all the content daily is a very bad practice, and not cost-effective. We want to update the numbers only, like current job listings count and average salary.
- We could duplicate the same template to fit every job title. But that would trigger the Google duplicate content flag and we don't want that. Our aim is to get unique content for every job in the platform, and be able to update them dynamically.

Let's test our prompt by generating a dummy page content.

In [19]:
display(Markdown(generate_job_content("Front-End Developer")))

# Front-End Developer Average Salaries and Statistics

## Overview
Front-End Developer is a job title that involves creating and implementing the visual elements of a website or application. They are responsible for the user interface and user experience, ensuring that the design is visually appealing and functional. In this report, we will explore the job openings, average salary, and degree requirement for Front-End Developers.

- Job openings: [[count]]
- Average Salary: $[[average_salary]]
- Degree Requirement: [[degree_required_percentage]]% of Front-End Developer positions require a degree.

## Average Salary
The average salary for Front-End Developers is $[[average_salary]]. However, it's important to note that salaries can vary depending on factors such as experience, skill level, location, and company size. Entry-level positions may have lower salaries, while senior or lead positions can offer higher earning potential.

## Responsibilities
Front-End Developers play a crucial role in web and application development. Their responsibilities include:

- Collaborating with designers to translate design wireframes and mockups into code.
- Writing clean, efficient, and maintainable code using front-end technologies like HTML, CSS, and JavaScript.
- Ensuring cross-browser and cross-device compatibility, making websites and applications accessible to a wide range of users.
- Optimizing website performance and speed through quality code and efficient asset delivery.
- Testing and debugging code to identify and fix any issues in the user interface or functionality.
- Staying updated with the latest front-end development trends, techniques, and frameworks to continuously improve skills.

## How to become a Front-End Developer?
Becoming a Front-End Developer typically requires a combination of education, practical experience, and continuous learning. While a degree is not always required, it can provide a solid foundation in computer science or web development. Here are the general steps to pursue a career as a Front-End Developer:

1. Obtain a relevant degree: Consider pursuing a degree in computer science, web development, design, or a related field. While not always mandatory, a degree can help demonstrate your knowledge and commitment to the field.

2. Gain practical experience: Build a portfolio of projects to showcase your skills and abilities. This can include personal projects, freelance work, internships, or contributions to open-source projects. Practical experience is highly valued in the tech industry.

3. Learn relevant technologies: Familiarize yourself with front-end technologies such as HTML, CSS, JavaScript, and popular libraries and frameworks like React or Angular. Online tutorials, courses, and resources can be valuable for gaining proficiency in these languages and tools.

4. Stay up to date: The field of front-end development evolves rapidly. Stay informed about the latest trends, techniques, and technologies through online communities, forums, blogs, and professional networks. Continuous learning is crucial to maintain your skills.

5. Network and seek opportunities: Attend industry events, join relevant online communities, and connect with professionals in the field. Networking can help you find job opportunities, gain insights, and grow your professional network.

In conclusion, Front-End Developers are in demand and offer competitive average salaries. While a degree is not always required, practical experience and continuous learning are essential for success in this field. By following the steps outlined above, you can start your journey towards becoming a Front-End Developer.

**Not perfect**, but not bad either. We used `gpt-3.5` for this task. In a real-world application, I would prefer `gpt-4` or `mistral-medium`.

Now, let's generate all the templates for the job titles.

I will use multithreading for this operation. Why? Because `asyncio` has some problems with the Jupyter Notebook environment. Better stay safe.

Note that `concurrent.futures` works slower than the threading library. But for the sake of keeping the code clean, we will use `ThreadPoolExecutor` from `concurrent.futures`.

### Concurrently Generating All Templates

In [20]:
with ThreadPoolExecutor() as executor:
    templates_for_jobs = list(executor.map(process_title, grouped_df["generalized_title"]))

In [21]:
display(Markdown(templates_for_jobs[0]["content_template"]))

# Analytics Engineer Average Salaries and Statistics

## Overview
- Job openings: [[count]]
- Average Salary: $[[average_salary]]
- Degree Requirement: [[degree_required_percentage]]% of employers require a degree

## Average Salary
The average salary for an Analytics Engineer is $[[average_salary]] per year. This figure may vary depending on factors such as location, experience, and company size. Salaries can range from entry-level positions to senior-level roles, with higher salaries typically being offered to individuals with more experience in the field.

## Responsibilities
As an Analytics Engineer, your main responsibilities revolve around data analysis and developing systems to improve data management and insights within an organization. Some common tasks include:

- Collaborating with stakeholders to understand their data requirements
- Designing and implementing data pipelines and ETL processes to extract, transform, and load data
- Building and maintaining data warehouses, databases, and data models
- Writing queries and scripts to manipulate and analyze data
- Developing tools and dashboards for data visualization and reporting
- Working with cross-functional teams to ensure data accuracy and consistency
- Investigating data quality issues and implementing solutions
- Staying up-to-date with industry trends and best practices in analytics and data engineering

## How to become an Analytics Engineer?
To become an Analytics Engineer, a combination of education and technical skills is usually required. While a degree is not always mandatory, [[degree_required_percentage]]% of employers do require a degree in a related field such as computer science, statistics, or engineering.

Here are some steps you can take to start your career as an Analytics Engineer:

1. Obtain a degree: Pursue a bachelor's or master's degree in computer science, statistics, mathematics, or a related field. This will provide a solid foundation in data analysis and engineering principles.

2. Gain technical skills: Develop proficiency in programming languages such as Python, R, SQL, and data manipulation tools like Apache Spark or Hadoop. Familiarize yourself with data visualization tools, database management systems, and statistical analysis techniques.

3. Build practical experience: Seek internships, co-op programs, or entry-level positions that allow you to work with data and analytics. These experiences will help you apply theoretical knowledge to real-world scenarios and strengthen your problem-solving and data manipulation skills.

4. Develop a portfolio: Showcasing your projects and practical work is essential in the tech industry. Create a portfolio that highlights your proficiency in data analysis, engineering, and visualization. Include examples of your coding, data modeling, and dashboarding skills.

5. Continuously learn and stay updated: The field of analytics and data engineering is evolving rapidly. Keep up with new technologies, tools, and best practices through online courses, webinars, conferences, and industry publications. Stay curious and actively seek opportunities to expand your knowledge and expertise.

Remember, success in this role requires a combination of technical skills, analytical thinking, and effective communication. Continuously honing your skills and staying abreast of industry advancements will contribute to a successful career as an Analytics Engineer.

### Transforming the Templates into Programmatic SEO Content
We have done all the heavy-lifting so far. Now, we will just replace the placeholders with real values.

With a simple code snippet, I will get our data from the DataFrame and replace the placeholder strings with the aggregated data.

In [30]:
programmatic_seo_content = []

for template in templates_for_jobs:
  # Filter the DataFrame for the given title
  title = template["generalized_title"]
  filtered_df = grouped_df[grouped_df['generalized_title'] == title]

  # Check if the title exists in the DataFrame
  if not filtered_df.empty:
      row = filtered_df

      # Extract values and replace placeholders
      values = {
          'count': int(row['count']),
          'average_salary': int(row['average_salary']),
          'degree_required_percentage': int(row['degree_ratio'])
      }

      final_text = template["content_template"]
      for key, value in values.items():
          placeholder = f"[[{key}]]"
          final_text = final_text.replace(placeholder, str(value))

      programmatic_seo_content.append({
          "generalized_title": template["generalized_title"],
          "seo_content": final_text,
          "data": values
      })
  else:
      replaced_text = f"Title '{title}' not found in the DataFrame."

display(Markdown(programmatic_seo_content[0]["seo_content"]))

# Analytics Engineer Average Salaries and Statistics

## Overview
- Job openings: 1
- Average Salary: $100000
- Degree Requirement: 100% of employers require a degree

## Average Salary
The average salary for an Analytics Engineer is $100000 per year. This figure may vary depending on factors such as location, experience, and company size. Salaries can range from entry-level positions to senior-level roles, with higher salaries typically being offered to individuals with more experience in the field.

## Responsibilities
As an Analytics Engineer, your main responsibilities revolve around data analysis and developing systems to improve data management and insights within an organization. Some common tasks include:

- Collaborating with stakeholders to understand their data requirements
- Designing and implementing data pipelines and ETL processes to extract, transform, and load data
- Building and maintaining data warehouses, databases, and data models
- Writing queries and scripts to manipulate and analyze data
- Developing tools and dashboards for data visualization and reporting
- Working with cross-functional teams to ensure data accuracy and consistency
- Investigating data quality issues and implementing solutions
- Staying up-to-date with industry trends and best practices in analytics and data engineering

## How to become an Analytics Engineer?
To become an Analytics Engineer, a combination of education and technical skills is usually required. While a degree is not always mandatory, 100% of employers do require a degree in a related field such as computer science, statistics, or engineering.

Here are some steps you can take to start your career as an Analytics Engineer:

1. Obtain a degree: Pursue a bachelor's or master's degree in computer science, statistics, mathematics, or a related field. This will provide a solid foundation in data analysis and engineering principles.

2. Gain technical skills: Develop proficiency in programming languages such as Python, R, SQL, and data manipulation tools like Apache Spark or Hadoop. Familiarize yourself with data visualization tools, database management systems, and statistical analysis techniques.

3. Build practical experience: Seek internships, co-op programs, or entry-level positions that allow you to work with data and analytics. These experiences will help you apply theoretical knowledge to real-world scenarios and strengthen your problem-solving and data manipulation skills.

4. Develop a portfolio: Showcasing your projects and practical work is essential in the tech industry. Create a portfolio that highlights your proficiency in data analysis, engineering, and visualization. Include examples of your coding, data modeling, and dashboarding skills.

5. Continuously learn and stay updated: The field of analytics and data engineering is evolving rapidly. Keep up with new technologies, tools, and best practices through online courses, webinars, conferences, and industry publications. Stay curious and actively seek opportunities to expand your knowledge and expertise.

Remember, success in this role requires a combination of technical skills, analytical thinking, and effective communication. Continuously honing your skills and staying abreast of industry advancements will contribute to a successful career as an Analytics Engineer.

**That's it!**

It works like a wonder. With such limited resources, we could replicate the basic programmatic SEO functionality of the major job listing boards.

Now, you may browse through the generated pages using the form below and enjoy your masterpiece.

## Browsing through the generated pages

In [23]:
display(Markdown(f"## Total Pages: {len(programmatic_seo_content)}"))

## Total Pages: 64

In [24]:
# @title Content Browser { run: "auto", vertical-output: true }
# @markdown Enter the desired page number. Total page count is stated above.
page = 33 # @param {type:"integer"}

try:
  display(Markdown(programmatic_seo_content[page]["seo_content"]))
except:
  display(Markdown(f"**There are only {len(programmatic_seo_content)} pages.**"))


# Lead WordPress Developer Average Salaries and Statistics
## Overview
- Job Openings: 1
- Average Salary: $110000
- Degree Requirement: 100%

## Average Salary
The average salary for a Lead WordPress Developer is $110000. This figure is based on the analysis of various job listings in the software and technology industry.

## Responsibilities
The role of a Lead WordPress Developer involves overseeing the development and maintenance of WordPress-based websites. This includes managing a team of developers, coordinating project timelines, conducting code reviews, and ensuring the quality and efficiency of the codebase.

Lead WordPress Developers are responsible for:

1. Designing and implementing customized WordPress themes and plugins.
2. Collaborating with cross-functional teams, including designers and content creators, to ensure seamless integration of design and functionality.
3. Conducting website maintenance and troubleshooting issues.
4. Optimizing websites for performance and search engine optimization (SEO).
5. Implementing responsive and accessible designs to enhance user experience across various devices.

In addition to technical skills, Lead WordPress Developers must have excellent communication and leadership abilities. They need to effectively communicate with clients and stakeholders, guide and mentor team members, and provide technical guidance throughout the development process.

## How to Become a Lead WordPress Developer?
Becoming a Lead WordPress Developer typically requires a solid foundation in web development and proficiency in WordPress-related technologies. While a degree may not be a strict requirement, it can certainly enhance your prospects and demonstrate your commitment to learning.

Here are the steps you can take to become a Lead WordPress Developer:

1. Learn HTML, CSS, JavaScript, and PHP: These are the fundamental programming languages used in web development. Familiarize yourself with WordPress-specific functions and APIs.

2. Gain Experience with WordPress: Start by building your own WordPress websites or contribute to existing projects. This hands-on experience will help you become well-versed in the platform.

3. Expand Your Skillset: Learn about front-end frameworks (such as Bootstrap or Foundation) and back-end technologies like PHP frameworks (such as Laravel or Symfony). This will make you a more well-rounded developer.

4. Stay Up-to-date: WordPress evolves continuously. Stay updated with the latest versions, best practices, and emerging trends by participating in forums, attending conferences, and reading relevant blogs.

5. Networking and Professional Development: Engage with WordPress developer communities, attend meetups, and join online forums. Networking with industry professionals can provide invaluable insights and opportunities.

6. Showcase Your Work: Build a portfolio website showcasing your WordPress projects and contributions. A strong portfolio can be the key to landing a Lead WordPress Developer role.

Remember, becoming a Lead WordPress Developer is a journey that requires continuous learning and practice. With dedication and passion for web development, you can excel in this role and contribute to the ever-expanding WordPress ecosystem.

## Exporting the pages as .md
Because... Why not?

In [25]:
output_dir = "generated_pages"
os.makedirs(output_dir, exist_ok=True)

# Iterate over the list and create files
for item in programmatic_seo_content:
    filename = f"{item['generalized_title'].replace('/', '_').replace(' ', '-')}.md"
    filepath = os.path.join(output_dir, filename)

    with open(filepath, 'w', encoding='utf-8') as file:
        file.write(item['seo_content'])

display(Markdown("**All pages are saved.**"))

**All pages are saved.**