**Note:** We use `requests` library. It is a Python library that is used to send HTTP requests to web servers and receive responses. It simplifies the process of making HTTP requests and handling responses in Python.

## The muse.com : API Endpoints

Here we will extract data for all 70 pages and consider the following endpoints:

- **Category:**
  - Data & Analytics
  - Data science
  - Computer & IT
  - IT

- **Level:**
  - Entry Level
  - Mid Level
  - Senior Level
  - Management

- **Location:** We consider major cities for this purpose
  - Frankfurt
  - Munich
  - Berlin
  - Hanover
  - Hamburg
  - Düsseldorf
  - Essen
  - Stuttgart
  - Karlsruhe
  - Munster
  - Leipzig

- **Request URL:** [url](https://www.themuse.com/api/public/jobs?category=Computer%20and%20IT&category=Data%20and%20Analytics&category=Data%20Science&category=IT&level=Entry%20Level&level=Mid%20Level&level=Senior%20Level&level=management&location=Berlin%2C%20DE&location=Berlin%2C%20Germany&location=Cologne%2C%20Germany&location=Dusseldorf%2C%20Germany&location=Essen%2C%20Germany&location=Frankfurt%2C%20Germany&location=Frankfurt%20(Oder)%2C%20Germany&location=Hamburg%2C%20Germany&location=Hannover%2C%20Germany&location=Heidelberg%2C%20Germany&location=Karlsruhe%2C%20Germany&location=K%C3%B6ln%2C%20Germany&location=Leipzig%2C%20Germany&location=Munich%2C%20Germany&location=Stuttgart%2C%20Germany&page=65&descending=True)

- **Response code:** `200`

- **Response headers:**
    <pre>
    access-control-allow-origin: *
    cf-cache-status: DYNAMIC
    cf-ray: 853ee4a71e7b9b63-FRA
    content-encoding: br
    content-type: application/json; charset=UTF-8
    date: Sun, 11 Feb 2024 19:10:39 GMT
    etag: W/"5603f8db48c28c2413dad8e547396ffda7514657"
    server: cloudflare
    strict-transport-security: max-age=2592000; includeSubDomains
    vary: Accept-Encoding
    x-ratelimit-limit: 500
    x-ratelimit-remaining: 490
    x-ratelimit-reset: 3261
    </pre>

- **Response body:** 
    <pre>
    {
        "page": 65,
        "page_count": 70,
        "items_per_page": 20,
        "took": 121,
        "timed_out": false,
        "total": 1383,
        "results": []
    }
    </pre>

    This the result object of the JSON  response to the `/muse/search/` endpoint query:

    ![alt text](jsoncrack.com.png)

    Clearly, we need to find following object for our purpose:

    - contents
    - name
    - id
    - locations
    - categories
    - levels/short_name
    - refs
    - company = id, name

In [40]:
import requests
import json
import requests
from bs4 import BeautifulSoup
import pandas as pd
from dotenv import load_dotenv
import os

url = 'https://www.themuse.com/api/public/jobs?category=Computer%20and%20IT&category=Data%20and%20Analytics&category=Data%20Science&category=IT&level=Entry%20Level&level=Mid%20Level&level=Senior%20Level&level=management&location=Berlin%2C%20DE&location=Berlin%2C%20Germany&location=Cologne%2C%20Germany&location=Dusseldorf%2C%20Germany&location=Essen%2C%20Germany&location=Frankfurt%2C%20Germany&location=Frankfurt%20(Oder)%2C%20Germany&location=Hamburg%2C%20Germany&location=Hannover%2C%20Germany&location=Heidelberg%2C%20Germany&location=Karlsruhe%2C%20Germany&location=K%C3%B6ln%2C%20Germany&location=Leipzig%2C%20Germany&location=Munich%2C%20Germany&location=Stuttgart%2C%20Germany&page=65&descending=True' 

# Load environment variables from .env file
load_dotenv()
# Access the API key from the environment
api_key = os.getenv("API_KEY1")

# Define the headers with the API key
headers = {'Authorization': api_key}

**First extracting data for first page**. 

Later we will be extracting data for all 70 pages.

In [None]:
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()
    
    # Extract relevant information from the JSON data
    jobs = data.get('results', [])
    
    # Create lists to store extracted information
    job_data = []
    
    # Iterate over each job and extract relevant details
    for job in jobs:
        # Parse the HTML content
        html_content = BeautifulSoup(job.get('contents', ''), 'html.parser')
        
        # Extract the job title, company name, location, and other details
        job_info = {
            'Job Title': job.get('name'),
            'Company Name': job.get('company').get('name'),
            'Location': ', '.join([loc.get('name') for loc in job.get('locations', [])]),
            'Publication Date': job.get('publication_date'),
            'Categories': ', '.join([cat.get('name') for cat in job.get('categories', [])]),
            'Experience Level': ', '.join([level.get('name') for level in job.get('levels', [])]),
            'Job Link': job.get('refs').get('landing_page')
        }
        
        # Extract the full/part-time information if available
        full_part_time_tag = html_content.find('b', string='Full / Part time:')
        if full_part_time_tag:
            job_info['Full/Part Time'] = full_part_time_tag.next_sibling.strip()
        else:
            job_info['Full/Part Time'] = None
        
        job_data.append(job_info)
    
    # Create a Pandas DataFrame from the list of dictionaries
    df = pd.DataFrame(job_data)
    
    # Save the DataFrame to a CSV file
    df.to_csv('./data/muse/job_listings_muse_first_page1.csv', index=False)
    
    # Display the DataFrame
    print("Data saved successfully as job_listings.csv")
    print(df)
else:
    print("Failed to retrieve data from the URL:", url)


In [29]:
df.head(20)

Unnamed: 0,Job Title,Company Name,Location,Publication Date,Categories,Experience Level,Job Link,Full/Part Time
0,Mid-Market Senior Account Executive - Remote,Samsara Inc.,Flexible / Remote,2024-02-01T23:07:18Z,Sales,Senior Level,https://www.themuse.com/jobs/samsarainc/midmar...,
1,Senior Integrated Marketing Manager,Samsara Inc.,Flexible / Remote,2024-02-07T23:08:09Z,Advertising and Marketing,Senior Level,https://www.themuse.com/jobs/samsarainc/senior...,
2,Pflichtpraktikant*in (w/m/d) Konfiguration von...,Siemens,"Munich, Germany",2022-12-16T02:23:27Z,Unknown,Mid Level,https://www.themuse.com/jobs/siemens/pflichtpr...,
3,Rechtsreferendar (w/m/d) Siemens Mobility Berl...,Siemens,"Berlin, Germany",2023-02-10T23:37:07Z,Computer and IT,Mid Level,https://www.themuse.com/jobs/siemens/rechtsref...,
4,Pflichtpraktikant*in (w/m/d) ab Juli/August 20...,Siemens,"Karlsruhe, Germany",2023-04-17T23:38:16Z,Arts,Mid Level,https://www.themuse.com/jobs/siemens/pflichtpr...,Full-time
5,Account Manager (w/m/d) Automatisierungs- und ...,Siemens,"Bielefeld, Germany",2023-04-05T11:52:16Z,Account Management,Mid Level,https://www.themuse.com/jobs/siemens/account-m...,Full-time
6,IoT Senior Portfolio Manager (w/m/d) Digital B...,Siemens,"Essen, Germany",2023-04-14T23:43:20Z,Software Engineering,Senior Level,https://www.themuse.com/jobs/siemens/iot-senio...,Full-time
7,Werkstudent*in (w/m/d) Technischer Vertrieb,Siemens,"Bielefeld, Germany",2023-01-25T00:49:58Z,Unknown,Mid Level,https://www.themuse.com/jobs/siemens/werkstude...,
8,Instandhalter (w/m/d) Fertigungsanlagen,Siemens,"Munich, Germany",2023-04-06T23:49:36Z,Software Engineering,Mid Level,https://www.themuse.com/jobs/siemens/instandha...,
9,Senior Manager Copywriter,Circle,Flexible / Remote,2023-12-21T23:02:10Z,Writing and Editing,Senior Level,https://www.themuse.com/jobs/circle/senior-man...,


In [30]:
df.head()

Unnamed: 0,Job Title,Company Name,Location,Publication Date,Categories,Experience Level,Job Link,Full/Part Time
0,Mid-Market Senior Account Executive - Remote,Samsara Inc.,Flexible / Remote,2024-02-01T23:07:18Z,Sales,Senior Level,https://www.themuse.com/jobs/samsarainc/midmar...,
1,Senior Integrated Marketing Manager,Samsara Inc.,Flexible / Remote,2024-02-07T23:08:09Z,Advertising and Marketing,Senior Level,https://www.themuse.com/jobs/samsarainc/senior...,
2,Pflichtpraktikant*in (w/m/d) Konfiguration von...,Siemens,"Munich, Germany",2022-12-16T02:23:27Z,Unknown,Mid Level,https://www.themuse.com/jobs/siemens/pflichtpr...,
3,Rechtsreferendar (w/m/d) Siemens Mobility Berl...,Siemens,"Berlin, Germany",2023-02-10T23:37:07Z,Computer and IT,Mid Level,https://www.themuse.com/jobs/siemens/rechtsref...,
4,Pflichtpraktikant*in (w/m/d) ab Juli/August 20...,Siemens,"Karlsruhe, Germany",2023-04-17T23:38:16Z,Arts,Mid Level,https://www.themuse.com/jobs/siemens/pflichtpr...,Full-time


## All jobs in Germany

- [Request URL:](https://www.themuse.com/api/public/jobs?level=Entry%20Level&level=Mid%20Level&level=Senior%20Level&level=management&location=Berlin%2C%20DE&location=Berlin%2C%20Germany&location=Cologne%2C%20Germany&location=Dusseldorf%2C%20Germany&location=Essen%2C%20Germany&location=Frankfurt%2C%20Germany&location=Frankfurt%20(Oder)%2C%20Germany&location=Hamburg%2C%20Germany&location=Hannover%2C%20Germany&location=Heidelberg%2C%20Germany&location=Karlsruhe%2C%20Germany&location=K%C3%B6ln%2C%20Germany&location=Leipzig%2C%20Germany&location=Munich%2C%20Germany&location=Stuttgart%2C%20Germany&page=65&descending=True)
- **Level:**
  - Entry Level
  - Mid Level
  - Senior Level
  - Management

- **Location:** We consider major cities for this purpose
  - Frankfurt
  - Munich
  - Berlin
  - Hanover
  - Hamburg
  - Düsseldorf
  - Essen
  - Stuttgart
  - Karlsruhe
  - Munster
  - Leipzig
  
- **Response code:** `200`

- **Response headers:**
    <pre>
    access-control-allow-origin: *
    cf-cache-status: DYNAMIC
    cf-ray: 853f5035db923733-FRA
    content-encoding: br
    content-type: application/json; charset=UTF-8
    date: Sun, 11 Feb 2024 20:24:05 GMT
    etag: W/"45bf8ad7d014bb3bfc200c6f14e165687456547a"
    server: cloudflare
    strict-transport-security: max-age=2592000; includeSubDomains
    vary: Accept-Encoding
    x-ratelimit-limit: 500
    x-ratelimit-remaining: 499
    x-ratelimit-reset: 3600
    </pre>


    ![alt text](jsoncrack_all_jobs.com.png)



In [31]:
import requests
import json
import requests
from bs4 import BeautifulSoup
import pandas as pd
from dotenv import load_dotenv
import os

url = 'https://www.themuse.com/api/public/jobs?level=Entry%20Level&level=Mid%20Level&level=Senior%20Level&level=management&location=Berlin%2C%20DE&location=Berlin%2C%20Germany&location=Cologne%2C%20Germany&location=Dusseldorf%2C%20Germany&location=Essen%2C%20Germany&location=Frankfurt%2C%20Germany&location=Frankfurt%20(Oder)%2C%20Germany&location=Hamburg%2C%20Germany&location=Hannover%2C%20Germany&location=Heidelberg%2C%20Germany&location=Karlsruhe%2C%20Germany&location=K%C3%B6ln%2C%20Germany&location=Leipzig%2C%20Germany&location=Munich%2C%20Germany&location=Stuttgart%2C%20Germany&page=65&descending=True' 

# Load environment variables from .env file
load_dotenv()
# Access the API key from the environment
api_key = os.getenv("API_KEY1")

# Define the headers with the API key
headers = {'Authorization': api_key}

# Assuming df is your DataFrame containing the job data
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()
    
    # Extract relevant information from the JSON data
    jobs = data.get('results', [])
    
    # Create lists to store extracted information
    job_data = []
    
    # Iterate over each job and extract relevant details
    for job in jobs:
        # Parse the HTML content
        html_content = BeautifulSoup(job.get('contents', ''), 'html.parser')
        
        # Extract the job title, company name, location, and other details
        job_info = {
            'Job Title': job.get('name'),
            'Company Name': job.get('company').get('name'),
            'Location': ', '.join([loc.get('name') for loc in job.get('locations', [])]),
            'Publication Date': job.get('publication_date'),
            'Categories': ', '.join([cat.get('name') for cat in job.get('categories', [])]),
            'Experience Level': ', '.join([level.get('name') for level in job.get('levels', [])]),
            'Job Link': job.get('refs').get('landing_page')
        }
        
        # Extract the full/part-time information if available
        full_part_time_tag = html_content.find('b', string='Full / Part time:')
        if full_part_time_tag:
            job_info['Full/Part Time'] = full_part_time_tag.next_sibling.strip()
        else:
            job_info['Full/Part Time'] = None
        
        job_data.append(job_info)
    
    # Create a Pandas DataFrame from the list of dictionaries
    df = pd.DataFrame(job_data)
    
    # Save the DataFrame to a CSV file
    df.to_csv('../../data/scraped_data/muse/muse_scrapped_data.csv', index=False)
    
    # Display the DataFrame
    print("Data saved successfully as muse_scrapped_data.csv")
    print(df)
else:
    print("Failed to retrieve data from the URL:", url)


Data saved successfully as job_listings.csv
                                            Job Title  Company Name  \
0        Mid-Market Senior Account Executive - Remote  Samsara Inc.   
1                 Senior Integrated Marketing Manager  Samsara Inc.   
2   Pflichtpraktikant*in (w/m/d) Konfiguration von...       Siemens   
3   Rechtsreferendar (w/m/d) Siemens Mobility Berl...       Siemens   
4   Pflichtpraktikant*in (w/m/d) ab Juli/August 20...       Siemens   
5   Account Manager (w/m/d) Automatisierungs- und ...       Siemens   
6   IoT Senior Portfolio Manager (w/m/d) Digital B...       Siemens   
7         Werkstudent*in (w/m/d) Technischer Vertrieb       Siemens   
8             Instandhalter (w/m/d) Fertigungsanlagen       Siemens   
9                           Senior Manager Copywriter        Circle   
10                        Industriemechaniker (w/m/d)       Siemens   
11                             MRI Clinical Scientist       Philips   
12  Solution Design / Technische 

In [32]:
df.head()

Unnamed: 0,Job Title,Company Name,Location,Publication Date,Categories,Experience Level,Job Link,Full/Part Time
0,Mid-Market Senior Account Executive - Remote,Samsara Inc.,Flexible / Remote,2024-02-01T23:07:18Z,Sales,Senior Level,https://www.themuse.com/jobs/samsarainc/midmar...,
1,Senior Integrated Marketing Manager,Samsara Inc.,Flexible / Remote,2024-02-07T23:08:09Z,Advertising and Marketing,Senior Level,https://www.themuse.com/jobs/samsarainc/senior...,
2,Pflichtpraktikant*in (w/m/d) Konfiguration von...,Siemens,"Munich, Germany",2022-12-16T02:23:27Z,Unknown,Mid Level,https://www.themuse.com/jobs/siemens/pflichtpr...,
3,Rechtsreferendar (w/m/d) Siemens Mobility Berl...,Siemens,"Berlin, Germany",2023-02-10T23:37:07Z,Computer and IT,Mid Level,https://www.themuse.com/jobs/siemens/rechtsref...,
4,Pflichtpraktikant*in (w/m/d) ab Juli/August 20...,Siemens,"Karlsruhe, Germany",2023-04-17T23:38:16Z,Arts,Mid Level,https://www.themuse.com/jobs/siemens/pflichtpr...,Full-time


# Considering all 70 Pages

Here we will extract data for all 70 pages and consider the 
**Category:**
- Data & Analytics
- Data science
- Computer & IT
- IT

**Level:**
- Entry Level
- Mid Level
- Senior Level
- Management

**Location:** We consider major cities for this purpose
- Frankfurt
- Munich
- Berlin
- Hanover
- Hamburg
- Düsseldorf
- Essen
- Stuttgart
- Karlsruhe
- Munster
- Leipzig

In [37]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Access the API key from the environment
api_key = os.getenv("API_KEY1")

# Define the URL template with placeholders for dynamic values
url_template = 'https://www.themuse.com/api/public/jobs?category=Computer%20and%20IT&category=Data%20and%20Analytics&category=Data%20Science&category=IT&level=Entry%20Level&level=Mid%20Level&level=Senior%20Level&level=management&location=Berlin%2C%20DE&location=Berlin%2C%20Germany&location=Cologne%2C%20Germany&location=Dusseldorf%2C%20Germany&location=Essen%2C%20Germany&location=Frankfurt%2C%20Germany&location=Frankfurt%20(Oder)%2C%20Germany&location=Hamburg%2C%20Germany&location=Hannover%2C%20Germany&location=Heidelberg%2C%20Germany&location=Karlsruhe%2C%20Germany&location=K%C3%B6ln%2C%20Germany&location=Leipzig%2C%20Germany&location=Munich%2C%20Germany&location=Stuttgart%2C%20Germany&page={}&descending=True'

# Define the headers with the API key
headers = {'Authorization': api_key}

# Create an empty list to store all job data
all_job_data = []

# Iterate over each page (from 1 to 70) of job listings
for page_num in range(1, 71):
    # Build the URL for the current page
    url = url_template.format(page_num)
    
    # Send a GET request to the URL with the headers
    response = requests.get(url, headers=headers)
    
    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Parse the JSON response
        data = response.json()
        
        # Extract relevant information from the JSON data
        jobs = data.get('results', [])
        
        # Iterate over each job on the current page and extract details
        for job in jobs:
            # Parse the HTML content
            html_content = BeautifulSoup(job.get('contents', ''), 'html.parser')
            
            # Extract the job title, company name, location, and other details
            job_info = {
                'Job Title': job.get('name'),
                'Company Name': job.get('company').get('name'),
                'Location': ', '.join([loc.get('name') for loc in job.get('locations', [])]),
                'Publication Date': job.get('publication_date'),
                'Categories': ', '.join([cat.get('name') for cat in job.get('categories', [])]),
                'Experience Level': ', '.join([level.get('name') for level in job.get('levels', [])]),
                'Job Link': job.get('refs').get('landing_page')
            }
            
            # Extract the full/part-time information if available
            full_part_time_tag = html_content.find('b', string='Full / Part time:')
            if full_part_time_tag:
                job_info['Full/Part Time'] = full_part_time_tag.next_sibling.strip()
            else:
                job_info['Full/Part Time'] = None
            
            # Append the job information to the list
            all_job_data.append(job_info)
    
    else:
        print("Failed to retrieve data from the URL:", url)

# Create a DataFrame from the list of all job data
df = pd.DataFrame(all_job_data)

# Save the DataFrame to a CSV file
df.to_csv('./data/muse/job_listings_muse_first_page1.csv', index=False)


In [36]:
df.shape

(1363, 8)