## Install and import packages

In [1]:
%pip install pandas numpy sqlalchemy psycopg2 ipywidgets

Collecting sqlalchemy
  Obtaining dependency information for sqlalchemy from https://files.pythonhosted.org/packages/60/7f/ea1086136bc648cd4713a1e01869f7fc31979d67b3a8f973f5d9ab8de7e1/sqlalchemy-2.0.40-cp310-cp310-win_amd64.whl.metadata
  Downloading sqlalchemy-2.0.40-cp310-cp310-win_amd64.whl.metadata (9.9 kB)
Collecting greenlet>=1 (from sqlalchemy)
  Obtaining dependency information for greenlet>=1 from https://files.pythonhosted.org/packages/96/28/d62835fb33fb5652f2e98d34c44ad1a0feacc8b1d3f1aecab035f51f267d/greenlet-3.1.1-cp310-cp310-win_amd64.whl.metadata
  Downloading greenlet-3.1.1-cp310-cp310-win_amd64.whl.metadata (3.9 kB)
Downloading sqlalchemy-2.0.40-cp310-cp310-win_amd64.whl (2.1 MB)
   ---------------------------------------- 0.0/2.1 MB ? eta -:--:--
   ---- ----------------------------------- 0.2/2.1 MB 7.3 MB/s eta 0:00:01
   ---------------------------------------- 2.1/2.1 MB 26.9 MB/s eta 0:00:00
Downloading greenlet-3.1.1-cp310-cp310-win_amd64.whl (298 kB)
   --------


[notice] A new release of pip is available: 23.2.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import time
from ipywidgets import widgets, VBox, Label
import psycopg2
from IPython.display import display
import re
import pandas as pd
from sqlalchemy import create_engine

## Connect to DB

In [60]:
DB_NAME = "Linkedin"
DB_USER = "postgres"
DB_PASS = "Sapienza"
DB_HOST = "localhost"
DB_PORT = "5432"

def get_db_connection():
    return psycopg2.connect(
        dbname=DB_NAME, user=DB_USER, password=DB_PASS,
        host=DB_HOST, port=DB_PORT)

Distinct is not very performant then if we do not have duplicate rows into this column we can also avoid to use the DISTINCT statement

## SQL

### Insert your personal information

We prefered to use the like concat instead that equals or not equals in order to extract meaningful information also if we do not correctly insert the exact value in the where condition

In [11]:
conn = get_db_connection()
cur = conn.cursor()

cur.execute('''SELECT DISTINCT "FORMATTED_EXPERIENCE_LEVEL" FROM public."POSTING" WHERE "FORMATTED_EXPERIENCE_LEVEL" IS NOT NULL ORDER BY "FORMATTED_EXPERIENCE_LEVEL" ASC''')
experience_levels = [row[0] for row in cur.fetchall()]
cur.close()

cur = conn.cursor()
cur.execute('''SELECT DISTINCT "ID", "SKILL_NAME" FROM public."SKILL" WHERE "SKILL_NAME" IS NOT NULL ORDER BY "SKILL_NAME" ASC''')
skill_data = cur.fetchall()
skill_dict = {row[1]: row[0] for row in skill_data}
skill_names = list(skill_dict.keys())
cur.close()

email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

name_input = widgets.Text(placeholder="Insert your name")
surname_input = widgets.Text(placeholder="Insert your surname")
email_input = widgets.Text(placeholder="Insert your email")
experience_input = widgets.Dropdown(options=experience_levels, description="Experience:")
skill_input = widgets.SelectMultiple(
    options=skill_names,
    description="Skills:",
    rows=10
)

button = widgets.Button(description="Insert Data", button_style="success")
output = widgets.Output()

def insert_data(b):
    with output:
        output.clear_output()
        name = name_input.value
        surname = surname_input.value
        email = email_input.value
        experience = experience_input.value
        selected_skills = skill_input.value

        if not re.match(email_pattern, email):
            print("Warning: Invalid email format. Please enter a valid email address.")
            #email_input.value = ""
            return
        
        try:
            cur = conn.cursor()
            query_user = '''INSERT INTO public."USERS" ("NAME", "SURNAME", "MAIL", "EXPERIENCE") 
                            VALUES (%s, %s, %s, %s) RETURNING "ID"'''
            cur.execute(query_user, (name, surname, email, experience))
            user_id = cur.fetchone()[0]
            conn.commit()
            
            if selected_skills:
                query_skill = '''INSERT INTO public."USER_SKILL" ("USER_ID", "SKILL_ID") VALUES (%s, %s)'''
                for skill_name in selected_skills:
                    skill_id = skill_dict[skill_name]
                    cur.execute(query_skill, (user_id, skill_id))
                
                conn.commit()
            
            cur.close()
            print("Data inserted successfully!")
        except Exception as e:
            print(f"Error while connecting to database: {e}")

button.on_click(insert_data)

display(widgets.VBox([
    widgets.Label("Name:"), name_input,
    widgets.Label("Surname:"), surname_input,
    widgets.Label("Email:"), email_input,
    experience_input,
    skill_input,
    button, output
]))

VBox(children=(Label(value='Name:'), Text(value='', placeholder='Insert your name'), Label(value='Surname:'), …

### Some interesting insight/question that can be extracted from data:


Most In-Demand Job Titles

```SQL 
SELECT "job_title", COUNT(*) AS num_postings
FROM public."postings"
GROUP BY "job_title"
ORDER BY num_postings DESC
LIMIT 10;
```

Most Hiring Companies

```SQL 
SELECT "company_name", COUNT(*) AS num_postings
FROM public."postings"
GROUP BY "company_name"
ORDER BY num_postings DESC
LIMIT 10;
```

Job Locations with Highest Demand

```SQL 
SELECT "job_location", COUNT(*) AS num_postings
FROM public."postings"
GROUP BY "job_location"
ORDER BY num_postings DESC
LIMIT 10;
```

Average Salary for Each Job Title (if salary info is available)

```SQL 
SELECT "job_title", AVG("salary") AS avg_salary
FROM public."postings"
WHERE "salary" IS NOT NULL
GROUP BY "job_title"
ORDER BY avg_salary DESC
LIMIT 10;
```

Remote vs On-Site Jobs

```SQL
SELECT "work_type", COUNT(*) AS num_postings
FROM public."postings"
WHERE "work_type" IN ('Remote', 'On-site')
GROUP BY "work_type";
```

Most Common Skills Required

```SQL
SELECT "skills", COUNT(*) AS num_occurrences
FROM public."postings"
WHERE "skills" IS NOT NULL
GROUP BY "skills"
ORDER BY num_occurrences DESC
LIMIT 10;
```

### 1. Select all the posting job for company name given by input and with requested location and for a specific job location

In [11]:
company_input = widgets.Text(placeholder="Insert the company name")
location_input = widgets.Text(placeholder="Insert the preferred location")
job_title_input = widgets.Text(placeholder="Insert the job title")
search_button = widgets.Button(description="Look for job posting", button_style="primary")
output = widgets.Output()

def search_jobs(b):
    with output:
        output.clear_output()
        company = company_input.value.strip() or None
        location = location_input.value.strip() or None
        job_title = job_title_input.value.strip() or None

        conn = None

        try:
            conn = get_db_connection()
            cur = conn.cursor()

            query = '''
            SELECT p."JOB_ID", c."NAME", p."LOCATION", p."TITLE" FROM public."POSTING" p
            JOIN public."COMPANIES" c ON c."ID" = p."COMPANY_ID"
            WHERE (%s IS NULL OR UPPER(c."NAME") ILIKE UPPER(%s)) 
            AND (%s IS NULL OR UPPER(p."LOCATION") ILIKE UPPER(%s)) 
            AND (%s IS NULL OR UPPER(p."TITLE") ILIKE UPPER(%s))
            ORDER BY "JOB_ID" ASC
            LIMIT 10;
            '''

            params = (
                company, f"%{company}%" if company else None,
                location, f"%{location}%" if location else None,
                job_title, f"%{job_title}%" if job_title else None
            )

            start_time = time.time()
            cur.execute(query, params)
            rows = cur.fetchall()
            end_time = time.time()
            execution_time = end_time - start_time
            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")

            if rows:
                df = pd.DataFrame(rows, columns=["JOB_ID", "COMPANY_NAME", "LOCATION", "TITLE"])
                display(df)
            else:
                print("No job postings found with the selected parameters.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()


search_button.on_click(search_jobs)

display(widgets.VBox([
    widgets.Label("Company Name:"), company_input,
    widgets.Label("Location:"), location_input,
    widgets.Label("Job Title:"), job_title_input,
    search_button, output
]))

VBox(children=(Label(value='Company Name:'), Text(value='', placeholder='Insert the company name'), Label(valu…

### 2. Select all the job postings for the TechGiant (more than 500 employees and in the IT industry) DA IMPLEMENTARE SICURAMENTE

#### Without linking with the foreign key between COMPANY_ID in the POSTING table and ID in the COMPANY table

In [5]:
industry_input = widgets.Text(placeholder="Insert the Industry domain")
company_input = widgets.Text(placeholder="Insert the company name")
search_button = widgets.Button(description="Look for job posting", button_style="primary")
output = widgets.Output()

def search_jobs(b):
    with output:
        output.clear_output()
        company = company_input.value.strip() or None
        industry = industry_input.value.strip() or None

        conn = None

        try:
            conn = get_db_connection()
            cur = conn.cursor()

            query = '''
            SELECT DISTINCT P."COMPANY_NAME", P."VIEWS", P."PAY_PERIOD", P."APPLIES", P."EXPIRY", CI."INDUSTRY"
            FROM public."POSTING" P 
            JOIN public."COMPANY_INDUSTRY" CI ON P."COMPANY_ID" = CI."COMPANY_ID"
            JOIN public."EMPLOYEE_COUNTS" EC ON P."COMPANY_ID" = EC."COMPANY_ID" 
            WHERE EC."EMPLOYEE_COUNT" > 2000 
            AND "VIEWS" IS NOT NULL
            AND (%s IS NULL OR UPPER(CI."INDUSTRY") ILIKE UPPER(%s))
            AND (%s IS NULL OR UPPER(P."COMPANY_NAME") ILIKE UPPER(%s)) 
            ORDER BY "VIEWS" DESC
            LIMIT 20;
            '''
            
            params = (
                industry, f"%{industry}%" if industry else None,
                company, f"%{company}%" if company else None
            )

            start_time = time.time()
            cur.execute(query, params)
            rows = cur.fetchall()
            end_time = time.time()
            execution_time = end_time - start_time
            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")

            if rows:
                df = pd.DataFrame(rows,
                                  columns=["COMPANY_NAME", "VIEWS", "PAY_PERIOD", "APPLIES", "EXPIRY", "INDUSTRY"])
                display(df)
            else:
                print("No job postings found for TechGiant companies.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

search_button.on_click(search_jobs)

display(widgets.VBox([
    widgets.Label("Industry:"), industry_input,
    widgets.Label("Company Name:"), company_input,
    search_button, output
]))

VBox(children=(Label(value='Industry:'), Text(value='', placeholder='Insert the Industry domain'), Label(value…

In [24]:
industry_input = widgets.Text(placeholder="Insert the Industry domain")
company_input = widgets.Text(placeholder="Insert the company name")
search_button = widgets.Button(description="Look for job posting", button_style="primary")
output = widgets.Output()

def search_jobs(b):
    with output:
        output.clear_output()
        company = company_input.value.strip() or None
        industry = industry_input.value.strip() or None

        conn = None

        try:
            conn = get_db_connection()
            cur = conn.cursor()

            query = '''
            SELECT DISTINCT P."JOB_ID", P."COMPANY_NAME", P."TITLE", P."VIEWS", P."PAY_PERIOD", P."APPLIES", P."EXPIRY", CI."INDUSTRY"
            FROM public."POSTING" P 
            JOIN public."COMPANY_INDUSTRY" CI ON P."COMPANY_ID" = CI."COMPANY_ID"
            JOIN public."EMPLOYEE_COUNTS" EC ON P."COMPANY_ID" = EC."COMPANY_ID" 
            WHERE EC."EMPLOYEE_COUNT" > 2000 
            AND "VIEWS" IS NOT NULL
            AND (%s IS NULL OR UPPER(CI."INDUSTRY") ILIKE UPPER(%s))
            AND (%s IS NULL OR UPPER(P."COMPANY_NAME") ILIKE UPPER(%s)) 
            ORDER BY "VIEWS" DESC
            LIMIT 20;
            '''
            
            params = (
                industry, f"%{industry}%" if industry else None,
                company, f"%{company}%" if company else None
            )

            start_time = time.time()
            cur.execute(query, params)
            rows = cur.fetchall()
            end_time = time.time()
            execution_time = end_time - start_time
            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")

            if rows:
                df = pd.DataFrame(rows,
                                  columns=["JOB_ID", "COMPANY_NAME", "TITLE", "VIEWS", "PAY_PERIOD", "APPLIES", "EXPIRY", "INDUSTRY"])
                display(df)
            else:
                print("No job postings found for TechGiant companies.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

search_button.on_click(search_jobs)

display(widgets.VBox([
    widgets.Label("Industry:"), industry_input,
    widgets.Label("Company Name:"), company_input,
    search_button, output
]))

VBox(children=(Label(value='Industry:'), Text(value='', placeholder='Insert the Industry domain'), Label(value…

### 3. Select all the companies and job position requested which salary is between two values given by input. Add info such as, all the part-time with USD payment and Monthly pay period, I want also to select the industry and the job should match at least 1 of my skill, order by number of skill matched

Considering today is today but one year early select all the active job postings that allows for remote working, that pay in USD, from companies with name given by input and that respect the characteristics of the logged user DA INTEGRARE CON LA 4

AGGIUSTARE SLIDER IL MASSIMO NON FUNZIONA CORRETTAMENTE PER IL FILTRAGGIO

In [20]:
def fetch_users():
    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute('SELECT "ID", "NAME", "SURNAME", "MAIL" FROM public."USERS" ORDER BY "NAME";')
    users = cur.fetchall()
    cur.close()
    conn.close()
    user_dict = {f"{name} {surname} ({email})": user_id for user_id, name, surname, email in users}
    return user_dict

user_dict = fetch_users()
user_dropdown = widgets.Dropdown(
    options=[("Select a user", None)] + list(user_dict.items()),
    description="User:",
    style={"description_width": "auto"}
)

currency_input = widgets.Text(placeholder="Currency (e.g., USD)", description="Currency:")
job_type_input = widgets.Text(placeholder="Job Type (e.g., Part-time)", description="Job Type:")
pay_period_input = widgets.Text(placeholder="Pay Period (e.g., Monthly)", description="Pay Period:")
industry_input = widgets.Text(placeholder="Industry (e.g., Tech)", description="Industry:")
company_input = widgets.Text(placeholder="Company Name", description="Company:")

salary_slider = widgets.IntRangeSlider(
    value=[0, 100000],
    min=0,
    max=100000,
    step=100,
    description="Salary Range:",
    continuous_update=True
)

search_button = widgets.Button(description="Look for job postings", button_style="primary")
output = widgets.Output()

def search_jobs(b):
    with output:
        output.clear_output()
        currency = currency_input.value.strip() or None
        job_type = job_type_input.value.strip() or None
        pay_period = pay_period_input.value.strip() or None
        industry = industry_input.value.strip() or None
        company = company_input.value.strip() or None
        min_salary, max_salary = salary_slider.value
        min_salary = int(min_salary)
        max_salary = int(max_salary) if max_salary < 100000 else None

        selected_user = user_dropdown.value
        if not selected_user:
            print("Please select a user before searching for jobs.")
            return

        conn = None

        try:
            conn = get_db_connection()
            cur = conn.cursor()

            query = '''            
            SELECT p."COMPANY_NAME", p."TITLE", p."PAY_PERIOD", p."CURRENCY", p."WORK_TYPE", 
            s."MIN_SALARY", s."MAX_SALARY", i."NAME" AS industry, skill_match_count
            FROM public."POSTING" p
            JOIN public."SALARIES" s ON p."JOB_ID" = s."JOB_ID"
            JOIN public."JOB_INDUSTRIES" ji ON p."JOB_ID" = ji."JOB_ID"
            JOIN public."INDUSTRIES" i ON ji."INDUSTRY_ID" = i."ID"
            LEFT JOIN (
                SELECT js."JOB_ID", COUNT(*) AS skill_match_count
                FROM public."JOB_SKILLS" js
                JOIN public."SKILL" sk ON js."SKILL_ID" = sk."ID"
                JOIN public."USER_SKILL" us ON sk."ID" = us."SKILL_ID"
                WHERE us."USER_ID" = %s
                GROUP BY js."JOB_ID"
            ) jsm ON p."JOB_ID" = jsm."JOB_ID"
            WHERE 
                p."EXPIRY" >= (EXTRACT(EPOCH FROM NOW()) - (365 * 24 * 60 * 60)) * 1000
                AND skill_match_count IS NOT NULL
                AND (%s IS NULL OR UPPER(p."CURRENCY") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(p."WORK_TYPE") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(p."PAY_PERIOD") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(p."COMPANY_NAME") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(i."NAME") ILIKE UPPER(%s))
                AND (s."MIN_SALARY" >= %s)
                AND (%s IS NULL OR s."MAX_SALARY" <= %s)
            ORDER BY skill_match_count DESC
            LIMIT 20;
            '''

            params = (
                selected_user,
                currency, f"%{currency}%" if currency else None,
                job_type, f"%{job_type}%" if job_type else None,
                pay_period, f"%{pay_period}%" if pay_period else None,
                company, f"%{company}%" if company else None,
                industry, f"%{industry}%" if industry else None,
                min_salary,
                max_salary, max_salary
            )

            start_time = time.time()
            cur.execute(query, params)
            rows = cur.fetchall()
            end_time = time.time()
            execution_time = end_time - start_time
            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")
            
            if rows:
                df = pd.DataFrame(rows, columns=["COMPANY_NAME", "JOB_TITLE", "PAY_PERIOD", "CURRENCY", "WORK_TYPE",
                                                 "MIN_SALARY", "MAX_SALARY", "INDUSTRY", "SKILL_MATCHED"])
                display(df)
            else:
                print("No job postings found with the given filters.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

search_button.on_click(search_jobs)

display(VBox([
    Label("Select User:"),
    user_dropdown,
    Label("Enter Job Filters:"),
    currency_input,
    job_type_input,
    pay_period_input,
    industry_input,
    company_input,
    salary_slider,
    search_button,
    output
]))

VBox(children=(Label(value='Select User:'), Dropdown(description='User:', options=(('Select a user', None), ('…

In [22]:
def fetch_users():
    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute('SELECT "ID", "NAME", "SURNAME", "MAIL" FROM public."USERS" ORDER BY "NAME";')
    users = cur.fetchall()
    cur.close()
    conn.close()
    return {f"{name} {surname} ({email})": user_id for user_id, name, surname, email in users}

user_dict = fetch_users()
user_dropdown = widgets.Dropdown(
    options=[("Select a user", None)] + list(user_dict.items()),
    description="User:",
    style={"description_width": "auto"}
)

currency_input = widgets.Text(placeholder="Currency (e.g., USD)", description="Currency:")
job_type_input = widgets.Text(placeholder="Job Type (e.g., Part-time)", description="Job Type:")
pay_period_input = widgets.Text(placeholder="Pay Period (e.g., Monthly)", description="Pay Period:")
industry_input = widgets.Text(placeholder="Industry (e.g., Tech)", description="Industry:")
company_input = widgets.Text(placeholder="Company Name", description="Company:")

min_salary_input = widgets.IntText(value=0, description="Min Salary:")
max_salary_input = widgets.IntText(value=100000, description="Max Salary:")

salary_slider = widgets.IntSlider(
    value=50000, min=0, max=100000, step=100, description="Quick Select:", continuous_update=True
)

def update_salary_inputs(change):
    min_salary_input.value, max_salary_input.value = salary_slider.value - 10000, salary_slider.value + 10000

salary_slider.observe(update_salary_inputs, names="value")

search_button = widgets.Button(description="Look for job postings", button_style="primary")
output = widgets.Output()

def search_jobs(b):
    with output:
        output.clear_output()
        currency = currency_input.value.strip() or None
        job_type = job_type_input.value.strip() or None
        pay_period = pay_period_input.value.strip() or None
        industry = industry_input.value.strip() or None
        company = company_input.value.strip() or None
        min_salary = int(min_salary_input.value)
        max_salary = int(max_salary_input.value)

        selected_user = user_dropdown.value
        if not selected_user:
            print("Please select a user before searching for jobs.")
            return

        try:
            conn = get_db_connection()
            cur = conn.cursor()

            query = '''            
            SELECT p."COMPANY_NAME", p."TITLE", p."PAY_PERIOD", p."CURRENCY", p."WORK_TYPE", 
            s."MIN_SALARY", s."MAX_SALARY", i."NAME" AS industry, skill_match_count
            FROM public."POSTING" p
            JOIN public."SALARIES" s ON p."JOB_ID" = s."JOB_ID"
            JOIN public."JOB_INDUSTRIES" ji ON p."JOB_ID" = ji."JOB_ID"
            JOIN public."INDUSTRIES" i ON ji."INDUSTRY_ID" = i."ID"
            LEFT JOIN (
                SELECT js."JOB_ID", COUNT(*) AS skill_match_count
                FROM public."JOB_SKILLS" js
                JOIN public."SKILL" sk ON js."SKILL_ID" = sk."ID"
                JOIN public."USER_SKILL" us ON sk."ID" = us."SKILL_ID"
                WHERE us."USER_ID" = %s
                GROUP BY js."JOB_ID"
            ) jsm ON p."JOB_ID" = jsm."JOB_ID"
            WHERE 
                p."EXPIRY" >= (EXTRACT(EPOCH FROM NOW()) - (365 * 24 * 60 * 60)) * 1000
                AND skill_match_count IS NOT NULL
                AND (%s IS NULL OR UPPER(p."CURRENCY") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(p."WORK_TYPE") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(p."PAY_PERIOD") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(p."COMPANY_NAME") ILIKE UPPER(%s))
                AND (%s IS NULL OR UPPER(i."NAME") ILIKE UPPER(%s))
                AND (s."MIN_SALARY" >= %s)
                AND (s."MAX_SALARY" <= %s)
            ORDER BY skill_match_count DESC
            LIMIT 20;
            '''

            params = (
                selected_user,
                currency, f"%{currency}%" if currency else None,
                job_type, f"%{job_type}%" if job_type else None,
                pay_period, f"%{pay_period}%" if pay_period else None,
                company, f"%{company}%" if company else None,
                industry, f"%{industry}%" if industry else None,
                min_salary,
                max_salary
            )

            start_time = time.time()
            cur.execute(query, params)
            rows = cur.fetchall()
            end_time = time.time()
            execution_time = end_time - start_time
            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")
            
            if rows:
                df = pd.DataFrame(rows, columns=["COMPANY_NAME", "JOB_TITLE", "PAY_PERIOD", "CURRENCY", "WORK_TYPE",
                                                 "MIN_SALARY", "MAX_SALARY", "INDUSTRY", "SKILL_MATCHED"])
                display(df)
            else:
                print("No job postings found with the given filters.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

search_button.on_click(search_jobs)

display(widgets.VBox([
    widgets.Label("Select User:"), user_dropdown,
    widgets.Label("Enter Job Filters:"), currency_input,
    job_type_input, pay_period_input, industry_input, company_input,
    min_salary_input, max_salary_input, salary_slider,
    search_button, output
]))

VBox(children=(Label(value='Select User:'), Dropdown(description='User:', options=(('Select a user', None), ('…

### 4. NEGATED SUB QUERY NELLA WHERE CONDITION

POTREMMO INSERIRE QUI PER ESEMPIO DOVE NON E RICHIESTO ESSERE AL DI SOPRA DELL ENTRY LEVEL, IMPORTANTE ANCHE PER L OTTIMIZZAZIONE FARE PRIMA NELLA SUBQUERY UNA SELECT ALL E POI FARE SOLO UNA SELECT 1 PER ANNULLARE I TEMPI DI PROCESSIN DELL ULTIMA.

In [None]:
SELECT p."TITLE" AS job_title,
       c."NAME" AS company_name,
       p."LOCATION" AS job_location,
       p."MAX_SALARY" AS max_salary,
       p."MIN_SALARY" AS min_salary,
       p."DESCRIPTION" AS job_description
FROM public."POSTING" p
JOIN public."COMPANIES" c ON p."COMPANY_ID" = c."ID"
WHERE NOT EXISTS (
    SELECT 1
    FROM public."JOB_SKILLS" js
    JOIN public."SKILL" s ON js."SKILL_ID" = s."ID"
    WHERE js."JOB_ID" = p."JOB_ID" AND s."SKILL_NAME" = 'Python'
);


### 5. Select all the how many job postings each company has and its maximum recorded employee count. add also the ratio of growing thanks to the time column computed as the proportion of employee since the first time recorded employees count and the last, replicate also for the follower 

VERRA OTTIMIZATA CON VIEW SULLE DUE JOIN TRAMITE QUERY ANNIDATE

In [6]:
company_name_input = widgets.Text(placeholder="Enter Company Name", description="Company:")
search_button = widgets.Button(description="Analyze Company Growth", button_style="primary")
output = widgets.Output()

def analyze_company_growth(b):
    with output:
        output.clear_output()
        company_name = company_name_input.value.strip()
        
        conn = None
        try:
            conn = get_db_connection()
            cur = conn.cursor()

            query = '''
                    SELECT 
                    c."NAME" AS company_name,
                    jc.job_postings,
                    eg.last_employee_count,
                    eg.first_employee_count,
                    eg.last_follower_count,
                    eg.first_follower_count,
                    CASE 
                        WHEN eg.first_employee_count = 0 THEN NULL 
                        ELSE (eg.last_employee_count - eg.first_employee_count) * 1.0 / eg.first_employee_count 
                    END AS employee_growth_ratio,
                    CASE 
                        WHEN eg.first_follower_count = 0 THEN NULL 
                        ELSE (eg.last_follower_count - eg.first_follower_count) * 1.0 / eg.first_follower_count 
                    END AS follower_growth_ratio
                FROM public."COMPANIES" c
                JOIN (
                    SELECT 
                        "COMPANY_ID",
                        MIN("EMPLOYEE_COUNT") AS first_employee_count,
                        MAX("EMPLOYEE_COUNT") AS last_employee_count,
                        MIN("FOLLOWER_COUNT") AS first_follower_count,
                        MAX("FOLLOWER_COUNT") AS last_follower_count
                    FROM public."EMPLOYEE_COUNTS"
                    GROUP BY "COMPANY_ID"
                ) AS eg ON c."ID" = eg."COMPANY_ID"
                JOIN (
                    SELECT "COMPANY_ID", COUNT(*) AS job_postings
                    FROM public."POSTING"
                    GROUP BY "COMPANY_ID"
                ) AS jc ON c."ID" = jc."COMPANY_ID"
                WHERE (%s IS NULL OR UPPER(c."NAME") ILIKE UPPER(%s))
                ORDER BY employee_growth_ratio DESC;
            '''

            params = (company_name if company_name else None, f"%{company_name}%" if company_name else None)

            start_time = time.time()
            cur.execute(query, params)
            rows = cur.fetchall()
            end_time = time.time()
            
            execution_time = end_time - start_time

            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")
            
            if rows:
                df = pd.DataFrame(rows, columns=["Company Name", "Job Postings", "Max Employees", 
                                                 "Min Employees", "Max Followers", "Min Followers", 
                                                 "Employee Growth Ratio", "Follower Growth Ratio"])
                display(df)
            else:
                print("No company data found.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

search_button.on_click(analyze_company_growth)

display(widgets.VBox([
    widgets.Label("Enter a company name to filter results (leave blank for all):"),
    company_name_input,
    search_button,
    output
]))

VBox(children=(Label(value='Enter a company name to filter results (leave blank for all):'), Text(value='', de…

### 6. Select all the VOGLIO SAPERE SE NEL MIO STATO CI SONO TANTE AZIENDE DELLA MIA STESSA TIPOLOGIA(INDUSTRY) al mio stesso livello: numero di job posting, numero di employee e numero di follower. E CHE HANNO LE MIE STESSE SPECIALITA, nelle select condition mettere la media del salario massimo e minimo per capire io come comportarmi con i salari

In [28]:
import psycopg2
import pandas as pd
import matplotlib.pyplot as plt
from psycopg2 import sql
from psycopg2.extras import RealDictCursor

def analyze_similar_companies(conn, user_company_id, user_state):
    cursor = conn.cursor(cursor_factory=RealDictCursor)

    cursor.execute("""
        SELECT array_agg(DISTINCT CI."INDUSTRY") AS industries, 
               (SELECT array_agg(DISTINCT CS."SPECIALITY") 
                FROM public."COMPANY_SPECIALITIES" CS 
                WHERE CS."COMPANY_ID" = %s) AS specialties
        FROM public."COMPANY_INDUSTRY" CI
        WHERE CI."COMPANY_ID" = %s
    """, (user_company_id, user_company_id))
    
    company_info = cursor.fetchone()
    user_industries = company_info['industries'] if company_info['industries'] else []
    user_specialties = company_info['specialties'] if company_info['specialties'] else []

    query = """
    WITH similar_companies AS (
        SELECT 
            C."ID" AS company_id,
            C."NAME" AS company_name,
            C."STATE",
            (
                SELECT COUNT(DISTINCT P."JOB_ID") 
                FROM public."POSTING" P 
                WHERE P."COMPANY_ID" = C."ID"
            ) AS job_posting_count,
            (
                SELECT EC."EMPLOYEE_COUNT" 
                FROM public."EMPLOYEE_COUNTS" EC 
                WHERE EC."COMPANY_ID" = C."ID" 
                ORDER BY EC."TIME_RECORDED" DESC 
                LIMIT 1
            ) AS employee_count,
            (
                SELECT EC."FOLLOWER_COUNT" 
                FROM public."EMPLOYEE_COUNTS" EC 
                WHERE EC."COMPANY_ID" = C."ID" 
                ORDER BY EC."TIME_RECORDED" DESC 
                LIMIT 1
            ) AS follower_count,
            (
                SELECT array_agg(DISTINCT CI."INDUSTRY") 
                FROM public."COMPANY_INDUSTRY" CI 
                WHERE CI."COMPANY_ID" = C."ID"
            ) AS industries,
            (
                SELECT array_agg(DISTINCT CS."SPECIALITY") 
                FROM public."COMPANY_SPECIALITIES" CS 
                WHERE CS."COMPANY_ID" = C."ID"
            ) AS specialties,
            (
                SELECT COUNT(*) 
                FROM public."COMPANY_INDUSTRY" CI 
                WHERE CI."COMPANY_ID" = C."ID" AND CI."INDUSTRY" = ANY(%s)
            ) AS matching_industries,
            (
                SELECT COUNT(*) 
                FROM public."COMPANY_SPECIALITIES" CS 
                WHERE CS."COMPANY_ID" = C."ID" AND CS."SPECIALITY" = ANY(%s)
            ) AS matching_specialties
        FROM 
            public."COMPANIES" C
        WHERE 
            C."STATE" = %s
            AND C."ID" != %s
            AND EXISTS (
                SELECT 1 
                FROM public."COMPANY_INDUSTRY" CI 
                WHERE CI."COMPANY_ID" = C."ID" AND CI."INDUSTRY" = ANY(%s)
            )
    ),
    salary_stats AS (
        SELECT 
            P."COMPANY_ID",
            AVG(S."MIN_SALARY") AS avg_min_salary,
            AVG(S."MAX_SALARY") AS avg_max_salary
        FROM 
            public."POSTING" P
        JOIN 
            public."SALARIES" S ON P."JOB_ID" = S."JOB_ID"
        WHERE 
            P."COMPANY_ID" IN (SELECT company_id FROM similar_companies)
        GROUP BY 
            P."COMPANY_ID"
    )
    
    SELECT 
        SC.company_id,
        SC.company_name,
        SC.job_posting_count,
        SC.employee_count,
        SC.follower_count,
        SC.industries,
        SC.specialties,
        SC.matching_industries,
        SC.matching_specialties,
        COALESCE(SS.avg_min_salary, 0) AS avg_min_salary,
        COALESCE(SS.avg_max_salary, 0) AS avg_max_salary,
        (SC.matching_industries * 2 + SC.matching_specialties) AS similarity_score
    FROM 
        similar_companies SC
    LEFT JOIN 
        salary_stats SS ON SC.company_id = SS.company_id
    ORDER BY 
        similarity_score DESC,
        job_posting_count DESC
    LIMIT 20;
    """
    
    cursor.execute(query, (user_industries, user_specialties, user_state, user_company_id, user_industries))
    results = cursor.fetchall()

    cursor.execute("""
        SELECT 
            C."ID" AS company_id,
            C."NAME" AS company_name,
            C."STATE",
            (SELECT COUNT(DISTINCT P."JOB_ID") FROM public."POSTING" P WHERE P."COMPANY_ID" = C."ID") AS job_posting_count,
            (SELECT EC."EMPLOYEE_COUNT" FROM public."EMPLOYEE_COUNTS" EC WHERE EC."COMPANY_ID" = C."ID" ORDER BY EC."TIME_RECORDED" DESC LIMIT 1) AS employee_count,
            (SELECT EC."FOLLOWER_COUNT" FROM public."EMPLOYEE_COUNTS" EC WHERE EC."COMPANY_ID" = C."ID" ORDER BY EC."TIME_RECORDED" DESC LIMIT 1) AS follower_count,
            (SELECT AVG(S."MIN_SALARY") FROM public."POSTING" P JOIN public."SALARIES" S ON P."JOB_ID" = S."JOB_ID" WHERE P."COMPANY_ID" = C."ID") AS avg_min_salary,
            (SELECT AVG(S."MAX_SALARY") FROM public."POSTING" P JOIN public."SALARIES" S ON P."JOB_ID" = S."JOB_ID" WHERE P."COMPANY_ID" = C."ID") AS avg_max_salary
        FROM 
            public."COMPANIES" C
        WHERE 
            C."ID" = %s
    """, (user_company_id,))
    
    your_company = cursor.fetchone()
    cursor.close()
    
    # Converti in DataFrame per analisi
    df_similar = pd.DataFrame(results)
    df_your = pd.DataFrame([your_company]) if your_company else pd.DataFrame()
    
    return df_similar, df_your

def visualize_results(df_similar, df_your):
    """
    Visualizza i risultati dell'analisi con grafici
    """
    if df_similar.empty:
        print("Nessuna azienda simile trovata nel tuo stato")
        return
    
    # Analisi statistica dei salari
    salary_stats = {
        'min_salary': {
            'media': df_similar['avg_min_salary'].mean(),
            'mediana': df_similar['avg_min_salary'].median(),
            'max': df_similar['avg_min_salary'].max(),
            'min': df_similar['avg_min_salary'].min(),
            'tua_azienda': df_your['avg_min_salary'].iloc[0] if not df_your.empty else None
        },
        'max_salary': {
            'media': df_similar['avg_max_salary'].mean(),
            'mediana': df_similar['avg_max_salary'].median(),
            'max': df_similar['avg_max_salary'].max(),
            'min': df_similar['avg_max_salary'].min(),
            'tua_azienda': df_your['avg_max_salary'].iloc[0] if not df_your.empty else None
        }
    }
    
    print("\n===== ANALISI AZIENDE SIMILI NEL TUO STATO =====")
    print(f"Numero di aziende simili trovate: {len(df_similar)}")
    
    print("\n----- STATISTICHE SALARIALI -----")
    print(f"Salario Minimo - Media: {salary_stats['min_salary']['media']:.2f}, "
          f"Mediana: {salary_stats['min_salary']['mediana']:.2f}, "
          f"Range: {salary_stats['min_salary']['min']:.2f} - {salary_stats['min_salary']['max']:.2f}")
    if salary_stats['min_salary']['tua_azienda']:
        print(f"Il tuo salario minimo medio: {salary_stats['min_salary']['tua_azienda']:.2f}")
        
    print(f"Salario Massimo - Media: {salary_stats['max_salary']['media']:.2f}, "
          f"Mediana: {salary_stats['max_salary']['mediana']:.2f}, "
          f"Range: {salary_stats['max_salary']['min']:.2f} - {salary_stats['max_salary']['max']:.2f}")
    if salary_stats['max_salary']['tua_azienda']:
        print(f"Il tuo salario massimo medio: {salary_stats['max_salary']['tua_azienda']:.2f}")
    
    # Crea grafici per un'analisi visiva
    plt.figure(figsize=(16, 10))
    
    # Grafico 1: Distribuzione delle dimensioni aziendali
    plt.subplot(2, 2, 1)
    plt.hist(df_similar['employee_count'].dropna(), bins=10, alpha=0.7)
    if not df_your.empty:
        plt.axvline(df_your['employee_count'].iloc[0], color='r', linestyle='dashed', linewidth=2)
    plt.title('Distribuzione del numero di dipendenti')
    plt.xlabel('Numero dipendenti')
    plt.ylabel('Frequenza')
    
    # Grafico 2: Job posting vs employee count
    plt.subplot(2, 2, 2)
    plt.scatter(df_similar['employee_count'], df_similar['job_posting_count'], alpha=0.7)
    if not df_your.empty:
        plt.scatter(df_your['employee_count'], df_your['job_posting_count'], color='r', s=100, marker='*')
    plt.title('Rapporto tra dipendenti e annunci di lavoro')
    plt.xlabel('Numero dipendenti')
    plt.ylabel('Numero job posting')
    
    # Grafico 3: Salari medi
    plt.subplot(2, 2, 3)
    top10 = df_similar.sort_values('similarity_score', ascending=False).head(10)
    companies = top10['company_name'].values
    min_salaries = top10['avg_min_salary'].values
    max_salaries = top10['avg_max_salary'].values
    
    x = range(len(companies))
    width = 0.35
    plt.bar(x, min_salaries, width, label='Min Salary')
    plt.bar([i + width for i in x], max_salaries, width, label='Max Salary')
    plt.xticks([i + width/2 for i in x], companies, rotation=45, ha='right')
    plt.title('Confronto salari per le 10 aziende più simili')
    plt.legend()
    
    # Grafico 4: Rapporto follower/dipendenti
    plt.subplot(2, 2, 4)
    df_valid = df_similar[df_similar['employee_count'] > 0].copy()
    df_valid['follower_ratio'] = df_valid['follower_count'] / df_valid['employee_count']
    plt.hist(df_valid['follower_ratio'].dropna(), bins=10, alpha=0.7)
    if not df_your.empty and df_your['employee_count'].iloc[0] > 0:
        ratio = df_your['follower_count'].iloc[0] / df_your['employee_count'].iloc[0]
        plt.axvline(ratio, color='r', linestyle='dashed', linewidth=2)
    plt.title('Rapporto follower/dipendenti')
    plt.xlabel('Follower per dipendente')
    plt.ylabel('Frequenza')
    
    plt.tight_layout()
    plt.show()
    
    return salary_stats

def main():
    
    try:
        user_company_id = int(input("Inserisci l'ID della tua azienda: "))

        conn = get_db_connection()
        
        # Ottieni lo stato dell'azienda
        with conn.cursor() as cursor:
            cursor.execute('''SELECT "STATE" FROM public."COMPANIES" WHERE "ID" = %s
            ''', (user_company_id,))
            result = cursor.fetchone()
            if not result:
                print("Azienda non trovata!")
                return
            user_state = result[0]
        
        # Esegui l'analisi
        df_similar, df_your = analyze_similar_companies(conn, user_company_id, user_state)
        
        # Visualizza i risultati
        salary_stats = visualize_results(df_similar, df_your)
        
        # Raccomandazioni sui salari
        if not df_your.empty and salary_stats:
            your_min = salary_stats['min_salary']['tua_azienda']
            your_max = salary_stats['max_salary']['tua_azienda']
            avg_min = salary_stats['min_salary']['media']
            avg_max = salary_stats['max_salary']['media']
            
            print("\n----- RACCOMANDAZIONI SUI SALARI -----")
            if your_min < avg_min * 0.9:
                print(f"Il tuo salario minimo medio è {(1 - your_min/avg_min)*100:.1f}% più basso della media. "
                      f"Considera di aumentarlo per essere più competitivo.")
            elif your_min > avg_min * 1.1:
                print(f"Il tuo salario minimo medio è {(your_min/avg_min - 1)*100:.1f}% più alto della media. "
                      f"Questo potrebbe essere un punto di forza per attirare talenti.")
                
            if your_max < avg_max * 0.9:
                print(f"Il tuo salario massimo medio è {(1 - your_max/avg_max)*100:.1f}% più basso della media. "
                      f"Considera di aumentarlo per attirare candidati di alto livello.")
            elif your_max > avg_max * 1.1:
                print(f"Il tuo salario massimo medio è {(your_max/avg_max - 1)*100:.1f}% più alto della media. "
                      f"Questo potrebbe essere un vantaggio competitivo, ma valuta se è sostenibile.")
        
        print("\nTop 10 aziende simili:")
        top10 = df_similar.sort_values('similarity_score', ascending=False).head(10)
        for idx, row in top10.iterrows():
            print(f"{row['company_name']} - Job posting: {row['job_posting_count']}, "
                  f"Dipendenti: {row['employee_count']}, Follower: {row['follower_count']}")
    
    except Exception as e:
        print(f"Errore: {e}")
    finally:
        if 'conn' in locals() and conn:
            conn.close()

In [29]:
main()

Errore: ERRORE:  la colonna ss.company_id non esiste
LINE 88:         salary_stats SS ON SC.company_id = SS.company_id
                                                    ^
HINT:  Forse intendevi referenziare la colonna "sc.company_id".



In [31]:
def find_similar_companies(company_id):
    query = """
    WITH company_info AS (
        SELECT 
            c."ID" AS company_id,
            c."NAME" AS company_name,
            c."INDUSTRY_ID",
            c."LOCATION",
            c."NUM_EMPLOYEES",
            c."NUM_FOLLOWERS",
            COUNT(p."JOB_ID") AS num_jobs
        FROM public."COMPANIES" c
        LEFT JOIN public."POSTING" p ON c."ID" = p."COMPANY_ID"
        WHERE c."ID" = %s
        GROUP BY c."ID"
    ),
    similar_companies AS (
        SELECT 
            c."ID",
            c."NAME",
            c."LOCATION",
            c."NUM_EMPLOYEES",
            c."NUM_FOLLOWERS",
            COUNT(p."JOB_ID") AS num_jobs
        FROM public."COMPANIES" c
        LEFT JOIN public."POSTING" p ON c."ID" = p."COMPANY_ID"
        JOIN company_info ci ON 
            c."INDUSTRY_ID" = ci."INDUSTRY_ID"
            AND c."LOCATION" = ci."LOCATION"
            AND ABS(c."NUM_EMPLOYEES" - ci."NUM_EMPLOYEES") <= 50
            AND ABS(c."NUM_FOLLOWERS" - ci."NUM_FOLLOWERS") <= 500
            AND ABS(num_jobs - ci.num_jobs) <= 5
        WHERE c."ID" != ci.company_id
        GROUP BY c."ID"
    ),
    salary_stats AS (
        SELECT 
            sc."COMPANY_ID",
            AVG(s."MIN_SALARY") AS avg_min_salary,
            AVG(s."MAX_SALARY") AS avg_max_salary
        FROM public."SALARIES" s
        JOIN public."POSTING" p ON s."JOB_ID" = p."JOB_ID"
        JOIN similar_companies sc ON p."COMPANY_ID" = sc."ID"
        GROUP BY sc."COMPANY_ID"
    ),
    company_skills AS (
        SELECT 
            us."COMPANY_ID", 
            COUNT(DISTINCT us."SKILL_ID") AS num_skills
        FROM public."COMPANY_SKILLS" us
        WHERE us."COMPANY_ID" = %s
        GROUP BY us."COMPANY_ID"
    ),
    similar_skilled_companies AS (
        SELECT 
            c."ID",
            c."NAME",
            c."LOCATION",
            c."NUM_EMPLOYEES",
            c."NUM_FOLLOWERS",
            COUNT(p."JOB_ID") AS num_jobs,
            COALESCE(s.avg_min_salary, 0) AS avg_min_salary,
            COALESCE(s.avg_max_salary, 0) AS avg_max_salary
        FROM similar_companies c
        JOIN company_skills cs ON cs."COMPANY_ID" = %s
        LEFT JOIN salary_stats s ON c."ID" = s."COMPANY_ID"
        WHERE (SELECT COUNT(*) FROM public."COMPANY_SKILLS" cs2 WHERE cs2."COMPANY_ID" = c."ID") = cs.num_skills
        GROUP BY c."ID", s.avg_min_salary, s.avg_max_salary
    )
    SELECT * FROM similar_skilled_companies
    ORDER BY num_jobs DESC, num_employees DESC, num_followers DESC
    LIMIT 20;
    """

    conn = None
    try:
        conn = get_db_connection()
        cur = conn.cursor()

        cur.execute(query, (company_id, company_id, company_id))
        rows = cur.fetchall()
        
        cur.close()
        conn.close()

        # Convertire in DataFrame per visualizzazione
        df = pd.DataFrame(rows, columns=[
            "Company ID", "Company Name", "Location", "Num Employees",
            "Num Followers", "Num Jobs", "Avg Min Salary", "Avg Max Salary"
        ])
        
        if df.empty:
            print("No similar companies found.")
        else:
            display(df)

    except psycopg2.Error as e:
        if conn:
            conn.rollback()
        print(f"Database error: {e}")

    except Exception as e:
        print(f"Unexpected error: {e}")

    finally:
        if conn:
            conn.close()

# Esempio di utilizzo
company_id = 123  # ID dell'azienda da analizzare
find_similar_companies(company_id)

Database error: ERRORE:  la colonna c.INDUSTRY_ID non esiste
LINE 6:             c."INDUSTRY_ID",
                    ^



### 7. Select all the Finds the salary ranges for different industries based on job postings.

 I also want to know this info for industry and job title
 
Finds job listings by country and their average salary, but only for countries with more than 10 job postings.

INTERESSANTE: QUALI SONO LE INDUSTRIE PIU PAGATE?

In [None]:
SELECT i."NAME" AS industry, 
       COUNT(j."JOB_ID") AS job_count, 
       AVG(s."MIN_SALARY") AS avg_min_salary,
       AVG(s."MED_SALARY") AS avg_med_salary,
       AVG(s."MAX_SALARY") AS avg_max_salary
FROM public."JOB_INDUSTRIES" ji
JOIN public."INDUSTRIES" i ON ji."INDUSTRY_ID" = i."ID"
JOIN public."SALARIES" s ON ji."JOB_ID" = s."JOB_ID"
JOIN public."POSTING" j ON ji."JOB_ID" = j."JOB_ID"
GROUP BY i."NAME"
ORDER BY job_count DESC;

In [82]:
search_button = widgets.Button(description="Analyze Remote Jobs", button_style="primary")
output = widgets.Output()

def seventh_query(b):
    with output:
        output.clear_output()

        conn = None
        try:
            conn = get_db_connection()
            cur = conn.cursor()

            start_time = time.time()

            query = """
                SELECT c."COUNTRY", COUNT(p."JOB_ID") AS total_jobs, 
                ROUND(AVG(s."MIN_SALARY"),3) AS avg_min_salary,
                ROUND(AVG(s."MED_SALARY"),3) AS avg_med_salary,
                ROUND(AVG(s."MAX_SALARY"),3) AS avg_max_salary
                FROM public."POSTING" p
                JOIN public."COMPANIES" c ON p."COMPANY_ID" = c."ID"
                LEFT JOIN public."SALARIES" s ON p."JOB_ID" = s."JOB_ID"
                GROUP BY c."COUNTRY"
                HAVING COUNT(p."JOB_ID") > 10
                ORDER BY total_jobs DESC;
            """
            cur.execute(query)
            rows = cur.fetchall()

            end_time = time.time()
            execution_time = end_time - start_time

            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")

            if rows:
                df = pd.DataFrame(rows, columns=["Country", "Number of Job Offers", "Avg Min Salary", "Avg Med Salary", "Avg Max Salary"])
                display(df)
            else:
                print("No data found for the query.")  # Handle the case when no data is found

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()
search_button.on_click(seventh_query)

display(widgets.VBox([
    widgets.Label("Click below to analyze the remote job postings:"),
    search_button,
    output
]))

VBox(children=(Label(value='Click below to analyze the remote job postings:'), Button(button_style='primary', …

### 8. Select the top 5 job title which in percentage are more requested remotely
 calcolare la top 5 dei lavori con in percentuale più proposte di lavoro da remoto

In [73]:
search_button = widgets.Button(description="Analyze Remote Jobs", button_style="primary")
output = widgets.Output()

def eighth_query(b):
    with output:
        output.clear_output()

        conn = None
        try:
            conn = get_db_connection()
            cur = conn.cursor()

            start_time = time.time()

            query = """
                SELECT 
                p."TITLE", 
                COUNT(*) AS total_offers, 
                SUM(CASE WHEN p."REMOTE_ALLOWED" = '1.0' THEN 1 ELSE 0 END) AS remotes_allowed, 
                ROUND(SUM(CASE WHEN p."REMOTE_ALLOWED" = '1.0' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 3) AS percentage_remote
                FROM public."POSTING" p
                GROUP BY "TITLE"
                HAVING COUNT(*) > 0
                ORDER BY remotes_allowed DESC
                LIMIT 5;
            """
            cur.execute(query)
            rows = cur.fetchall()

            end_time = time.time()
            execution_time = end_time - start_time

            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")

            if rows:
                df = pd.DataFrame(rows, columns=["Job Title", "Total Job Offers", "Remote Jobs Allowed", "Percentage Remote"])
                display(df)
            else:
                print("No data found for the query.")  # Handle the case when no data is found

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()
search_button.on_click(eighth_query)

display(widgets.VBox([
    widgets.Label("Click below to analyze the remote job postings:"),
    search_button,
    output
]))

VBox(children=(Label(value='Click below to analyze the remote job postings:'), Button(button_style='primary', …

### 9. Select the top titles with less time requested to close the posting 
- visualizzare i top 3 lavori con minor tempo medio per chiudere la job posting(possiamo usare anche le application per quel lavoro per calcolare quanto sia proficua quella proposta di lavoro) la piu richiesta tipologia di lavoro
aggiungere anche quali sono le piu diffici da prendere anche considerando il rapportto applies views

In [75]:
search_button = widgets.Button(description="Analyze Job Postings", button_style="primary")
output = widgets.Output()

def ninth_query(b):
    with output:
        output.clear_output()

        conn = None
        try:
            conn = get_db_connection()
            cur = conn.cursor()

            start_time = time.time()

            query = """
                SELECT p."TITLE", 
                       ROUND(AVG(p."CLOSED_TIME" - p."LISTED_TIME"),3) AS avg_closing_time,
                       SUM(CASE 
                               WHEN p."APPLIES" IS NOT NULL AND p."APPLIES" > 0 
                               THEN p."APPLIES" 
                               ELSE 0 
                           END) AS total_applies,
                       SUM(CASE 
                               WHEN p."VIEWS" IS NOT NULL AND p."VIEWS" > 0 
                               THEN p."VIEWS" 
                               ELSE 0 
                           END) AS total_views,
                       COUNT(p."TITLE") AS num_postings
                FROM public."POSTING" p
                GROUP BY p."TITLE"
                HAVING COUNT(p."TITLE") >= 5 AND AVG(p."CLOSED_TIME" - p."LISTED_TIME") > 86400
                ORDER BY avg_closing_time ASC
                LIMIT 3;
            """
            cur.execute(query)
            rows = cur.fetchall()

            end_time = time.time()
            execution_time = end_time - start_time

            cur.close()
            conn.close()

            print(f"Query executed in {execution_time:.4f} seconds")

            if rows:
                df = pd.DataFrame(rows, columns=["Job Title", "Avg Closing Time (ms)", "Total Applies", "Total Views", "Num Postings"])
                display(df)
            else:
                print("No data found for the query.")

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

search_button.on_click(ninth_query)

display(widgets.VBox([
    widgets.Label("Click below to analyze the job postings with the lowest closing time:"),
    search_button,
    output
]))

VBox(children=(Label(value='Click below to analyze the job postings with the lowest closing time:'), Button(bu…

### 10. Finds the top 10 most in-demand skills by industry, including salary insights.
 
aggiungere poi: non so se a livello di sql si possa fare, ma pensavo di selezionare le skill richieste per un lavoro nella top 3 dei salari retribuiti e con un esperience level = entry level

In [None]:
SELECT sk."SKILL_NAME", ind."NAME" AS industry, 
       COUNT(js."JOB_ID") AS job_count, 
       ROUND(AVG(s."MED_SALARY"), 2) AS avg_salary
FROM public."JOB_SKILLS" js
JOIN public."SKILL" sk ON js."SKILL_ABR" = sk."SKILL_ABR"
JOIN public."JOB_INDUSTRIES" ji ON js."JOB_ID" = ji."JOB_ID"
JOIN public."INDUSTRIES" ind ON ji."INDUSTRY_ID" = ind."ID"
LEFT JOIN public."SALARIES" s ON js."JOB_ID" = s."JOB_ID"
GROUP BY sk."SKILL_NAME", ind."NAME"
HAVING COUNT(js."JOB_ID") > 5
ORDER BY job_count DESC, avg_salary DESC
LIMIT 10;

In [84]:
search_button = widgets.Button(description="Analyze Job Skills and Industries", button_style="primary")
output = widgets.Output()

def tenth_query(b):
    with output:
        output.clear_output()  # Clear previous output if any

        conn = None
        try:
            # Connecting to the database
            conn = get_db_connection()
            cur = conn.cursor()

            # Start execution time tracking
            start_time = time.time()

            query = """
            SELECT sk."SKILL_NAME", ind."NAME" AS industry, 
                   COUNT(js."JOB_ID") AS job_count, 
                   ROUND(AVG(s."MAX_SALARY"), 2) AS avg_salary
            FROM public."JOB_SKILLS" js
            JOIN public."SKILL" sk ON js."SKILL_ID" = sk."ID"
            JOIN public."JOB_INDUSTRIES" ji ON js."JOB_ID" = ji."JOB_ID"
            JOIN public."INDUSTRIES" ind ON ji."INDUSTRY_ID" = ind."ID"
            LEFT JOIN public."SALARIES" s ON js."JOB_ID" = s."JOB_ID"
            GROUP BY sk."SKILL_NAME", ind."NAME"
            ORDER BY job_count DESC, avg_salary DESC
            LIMIT 10;
        """
            cur.execute(query)
            rows = cur.fetchall()

            # End execution time tracking
            end_time = time.time()
            execution_time = end_time - start_time

            # Closing connection
            cur.close()
            conn.close()

            # Print query execution time
            print(f"Query executed in {execution_time:.4f} seconds")

            # Check if any rows were returned
            if rows:
                # Create DataFrame from the results
                df = pd.DataFrame(rows, columns=["Skill Name", "Industry", "Job Count", "Avg Salary"])
                display(df)  # Display the DataFrame
            else:
                print("No data found for the query.")  # Handle the case when no data is found

        except psycopg2.Error as e:
            if conn:
                conn.rollback()
            print(f"Database error: {e}")

        except Exception as e:
            print(f"Unexpected error: {e}")

        finally:
            if conn:
                conn.close()

# Bind the function to the button click event
search_button.on_click(tenth_query)

# Display the widgets
display(widgets.VBox([
    widgets.Label("Click below to analyze the job skills and industries data:"),
    search_button,
    output
]))

VBox(children=(Label(value='Click below to analyze the job skills and industries data:'), Button(button_style=…

NOTE.
aggiungere experience_level alla tabella USER al posto di experience
Tramite una select prendere tutti i valori della colonna FORMATTED_EXPERIENCE_LEVEL dalla tabella posting e fare in modo che durante l inserimento da PYTHON puo essere scelto un unico valore tra questi della lista

aggiungere skill_id su tabella USER

prendere tutte le skill possibili (nome) dalla tabella SKILL e dare la possibilita all utente di inserire le proprie skill ognuna di queste sara una riga nella tabella di appoggio che verra creata USER_SKILL
Quando si cerca un job posting verranno visualizzate quelle che matchano almeno un numero da scegliere di skill richieste dal job (JOB_SKILLS) e quelle dell utente loggato o registrato






Views can make the query faster allowing for temporarily queries that store huge info instead to compute it into the query itself, but these are not stored in the memory like usual table and this can cause slower access since the views are not optimized to being accessed like normal table.

TRADE OFF: Faster computation of JOIN for example that can be directly stored into a virtual table, Slower access since it is not optimized for it.

INSERIRE DA QUALCHE PARTE ANCHE IL LIVELLO DI ESPERIENZA CHE NON VIENE MAI UTILIZZATO

NOTE PER OTTIMIZZAZIONE:

Depending on your query patterns, you might want to add additional indexes. For example, if you frequently query JOB_SKILLS by SKILL_ID, consider adding an index on SKILL_ID in JOB_SKILLS.

Likewise, if certain columns are frequently used in filters (e.g., STATE in COMPANIES, EXPERIENCE in USERS), indexes might improve performance.

## SQL evaluation and optimization

### sql1