# 💼 Freelancer Job Recommendation System

This notebook builds a smart recommendation system for freelancers, designed to suggest the most relevant job postings based on the user's selected skills.

## ✨ Key Objectives:
- Use Word2Vec embeddings to represent job descriptions in vector space.
- Enable users to input their skills through a Streamlit interface.
- Recommend jobs that are most similar to the user's skills using cosine similarity.
- Visualize recommended job matches and similarity scores interactively.

## 💡 Technologies Used:
- **Gensim** for Word2Vec model
- **Streamlit** for interactive web app interface
- **Plotly** for dynamic visualizations
- **Scikit-learn** for similarity computations

## 🧠 How It Works:
1. Each job description is vectorized using a pre-trained Word2Vec model.
2. The user selects skills from a sidebar.
3. A mean vector of the selected skills is calculated.
4. Cosine similarity is computed between the user vector and all job vectors.
5. The top N similar jobs are displayed along with their similarity scores and descriptions.

## 📊 Benefits:
- Helps freelancers quickly find jobs tailored to their expertise.
- Saves time by narrowing job options based on actual skill matching.
- Offers a professional, easy-to-use web interface.

> ✅ The system is fully deployed via Streamlit and ready to serve real users!


# Import necessary libraries

In [31]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from gensim.models import Word2Vec
from sklearn.metrics.pairwise import cosine_similarity

## Read DataSet

In [32]:
df = pd.read_csv('final_data_to_recommendation_system.csv')
df

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,title,link,description,is_hourly,hourly_low,hourly_high,budget,country,year,month,day,day_name,month_name,hour,minute,jobs,skills
0,0,0,Experienced Media Buyer For Solar Pannel and R...,https://www.upwork.com/jobs/Experienced-Media-...,We’re looking for a talented and hardworking a...,False,,,500.0,United States,2024,2,17,Saturday,February,9,9,Digital Marketing & Content,"['social media marketing', 'content writing', ..."
1,1,1,Full Stack Developer,https://www.upwork.com/jobs/Full-Stack-Develop...,Job Title: Full Stack DeveloperWe are seeking ...,False,,,1100.0,United States,2024,2,17,Saturday,February,9,9,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j..."
2,2,2,SMMA Bubble App,https://www.upwork.com/jobs/SMMA-Bubble-App_%7...,I need someone to redesign my bubble.io site t...,True,10.0,30.0,,United States,2024,2,17,Saturday,February,9,8,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j..."
3,3,3,Talent Hunter Specialized in Marketing,https://www.upwork.com/jobs/Talent-Hunter-Spec...,Join Our Growing Team!We are an innovative com...,True,,,,United States,2024,2,17,Saturday,February,9,8,Digital Marketing & Content,"['social media marketing', 'content writing', ..."
4,4,4,Data Engineer,https://www.upwork.com/jobs/Data-Engineer_%7E0...,We are looking for a resource who can work par...,False,,,650.0,India,2024,2,17,Saturday,February,9,7,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53053,53053,53053,Partial Migration From WordPress to Shopify,https://www.upwork.com/jobs/Partial-Migration-...,We're moving from Wordpress to Shopify. The Sh...,False,,,150.0,Australia,2024,2,14,Wednesday,February,6,40,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j..."
53054,53054,53054,Logo work &amp; Event Booth Rendering,https://www.upwork.com/jobs/Logo-work-amp-Even...,I need some art works rendered in to booth des...,False,,,30.0,United States,2024,2,14,Wednesday,February,6,40,Graphic & Video Design,"['graphic design', 'adobe photoshop', 'adobe i..."
53055,53055,53055,Wedding Dress Collection Photographer,https://www.upwork.com/jobs/Wedding-Dress-Coll...,We are looking for a skilled photographer to c...,True,23.0,51.0,,Australia,2024,2,14,Wednesday,February,6,40,Digital Marketing & Content,"['social media marketing', 'content writing', ..."
53056,53056,53056,Design a startup profile,https://www.upwork.com/jobs/Design-startup-pro...,I building a startup company and I want to des...,False,,,70.0,Saudi Arabia,2024,2,14,Wednesday,February,6,40,Graphic & Video Design,"['graphic design', 'adobe photoshop', 'adobe i..."


### Drop unnecessary columns

In [33]:
df.drop(['Unnamed: 0.1' , 'Unnamed: 0' ,'hourly_low', 'hourly_high', 'budget'] , axis=1 , inplace=True)

## Define a preprocessing function for text data

In [34]:
def preprocess(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\d+', '', text)
    stop_words = set(stopwords.words('english'))
    tokens = text.split()
    tokens = [word for word in tokens if word not in stop_words]
    return tokens

In [35]:
df['tokens'] = df['description'].apply(preprocess)

## Train a Word2Vec model on the preprocessed tokenized text

In [36]:
model_w2v = Word2Vec(sentences=df['tokens'], vector_size=100, window=5, min_count=2, workers=4)

In [37]:
model_w2v

<gensim.models.word2vec.Word2Vec at 0x23a9c9be3b0>

## Define a function to get the average Word2Vec vector for a list of tokens

In [38]:
def get_vector(tokens, model):
    valid_tokens = [word for word in tokens if word in model.wv]
    if not valid_tokens:
        return np.zeros(model.vector_size)
    return np.mean(model.wv[valid_tokens], axis=0)

In [39]:
X_w2v = np.array([get_vector(tokens, model_w2v) for tokens in df['tokens']])

In [40]:
X_w2v.shape

(53058, 100)

In [41]:
df['w2v_vector'] = [vec.tolist() for vec in X_w2v]

In [43]:
df.to_csv("streamlit.csv", index=False)

In [44]:
user_skills = ['html', 'css', 'javascript', 'react']

## Match and rank job descriptions based on a user's skills:
###### 1. `match_skills`: counts the number of overlapping skills between each job and the user
###### 2. Add `skill_match_score` column to store the count of matching skills
###### 3. `get_freelancer_vector`: computes the average Word2Vec vector for the user's skillset
###### 4. `user_vector`: the vector representation of the user based on their skills
###### 5. Calculate cosine similarity between the user's vector and each job description vector (`X_w2v`)
######    and store the results in a new column `similarity`


In [45]:
def match_skills(row_skills, user_skills):
    return len(set(row_skills) & set(user_skills))

In [46]:
df['skill_match_score'] = df['skills'].apply(lambda x: match_skills(x, user_skills))

In [47]:
def get_freelancer_vector(skills, model):
    valid_skills = [skill for skill in skills if skill in model.wv]
    if not valid_skills:
        return np.zeros(model.vector_size)
    return np.mean(model.wv[valid_skills], axis=0)

In [48]:
user_vector = get_freelancer_vector(user_skills, model_w2v)
df['similarity'] = cosine_similarity([user_vector], X_w2v)[0]

In [49]:
df

Unnamed: 0,title,link,description,is_hourly,country,year,month,day,day_name,month_name,hour,minute,jobs,skills,tokens,w2v_vector,skill_match_score,similarity
0,Experienced Media Buyer For Solar Pannel and R...,https://www.upwork.com/jobs/Experienced-Media-...,We’re looking for a talented and hardworking a...,False,United States,2024,2,17,Saturday,February,9,9,Digital Marketing & Content,"['social media marketing', 'content writing', ...","[looking, talented, hardworking, ads, manager,...","[0.6010310053825378, -0.6051793098449707, 0.40...",0,0.060960
1,Full Stack Developer,https://www.upwork.com/jobs/Full-Stack-Develop...,Job Title: Full Stack DeveloperWe are seeking ...,False,United States,2024,2,17,Saturday,February,9,9,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[job, title, full, stack, developerwe, seeking...","[0.17747215926647186, -0.7467168569564819, 0.4...",0,0.651165
2,SMMA Bubble App,https://www.upwork.com/jobs/SMMA-Bubble-App_%7...,I need someone to redesign my bubble.io site t...,True,United States,2024,2,17,Saturday,February,9,8,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[need, someone, redesign, bubbleio, site, opti...","[0.3020835816860199, -0.3401239514350891, -0.5...",0,0.594909
3,Talent Hunter Specialized in Marketing,https://www.upwork.com/jobs/Talent-Hunter-Spec...,Join Our Growing Team!We are an innovative com...,True,United States,2024,2,17,Saturday,February,9,8,Digital Marketing & Content,"['social media marketing', 'content writing', ...","[join, growing, teamwe, innovative, company, e...","[0.38471469283103943, -1.2988585233688354, 0.4...",0,0.018946
4,Data Engineer,https://www.upwork.com/jobs/Data-Engineer_%7E0...,We are looking for a resource who can work par...,False,India,2024,2,17,Saturday,February,9,7,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[looking, resource, work, parttime, one, proje...","[0.8640854954719543, -0.5999747514724731, 0.41...",0,0.290273
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53053,Partial Migration From WordPress to Shopify,https://www.upwork.com/jobs/Partial-Migration-...,We're moving from Wordpress to Shopify. The Sh...,False,Australia,2024,2,14,Wednesday,February,6,40,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[moving, wordpress, shopify, shopify, built, p...","[0.5835453867912292, -0.17499354481697083, -0....",0,0.379148
53054,Logo work &amp; Event Booth Rendering,https://www.upwork.com/jobs/Logo-work-amp-Even...,I need some art works rendered in to booth des...,False,United States,2024,2,14,Wednesday,February,6,40,Graphic & Video Design,"['graphic design', 'adobe photoshop', 'adobe i...","[need, art, works, rendered, booth, designs, u...","[0.44307059049606323, -0.39854303002357483, 0....",0,0.325033
53055,Wedding Dress Collection Photographer,https://www.upwork.com/jobs/Wedding-Dress-Coll...,We are looking for a skilled photographer to c...,True,Australia,2024,2,14,Wednesday,February,6,40,Digital Marketing & Content,"['social media marketing', 'content writing', ...","[looking, skilled, photographer, capture, esse...","[0.2167052924633026, -0.6799629330635071, 0.17...",0,0.038613
53056,Design a startup profile,https://www.upwork.com/jobs/Design-startup-pro...,I building a startup company and I want to des...,False,Saudi Arabia,2024,2,14,Wednesday,February,6,40,Graphic & Video Design,"['graphic design', 'adobe photoshop', 'adobe i...","[building, startup, company, want, design, pro...","[0.32275390625, -0.48095545172691345, 0.290559...",0,0.235234


In [50]:
df['final_score'] = df['skill_match_score'] + df['similarity']
top_jobs = df.sort_values(by='final_score', ascending=False).head(10)

In [51]:
top_jobs

Unnamed: 0,title,link,description,is_hourly,country,year,month,day,day_name,month_name,hour,minute,jobs,skills,tokens,w2v_vector,skill_match_score,similarity,final_score
47087,React Developer,https://www.upwork.com/jobs/React-Developer_%7...,I am seeking a highly skilled React developer ...,False,Ukraine,2024,2,20,Tuesday,February,0,41,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[seeking, highly, skilled, react, developer, a...","[-0.5157040357589722, -0.4520156979560852, 0.0...",0,0.932548,0.932548
37058,Simple Blog Site React Typescript Gatsby,https://www.upwork.com/jobs/Simple-Blog-Site-R...,I need a simple blog site in react react nativ...,False,Pakistan,2024,2,13,Tuesday,February,8,12,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[need, simple, blog, site, react, react, nativ...","[-0.5441218018531799, -0.6518808603286743, 0.0...",0,0.924172,0.924172
14013,Frontend Developer Needed for Web Development ...,https://www.upwork.com/jobs/Frontend-Developer...,Seeking a frontend developer proficient in HTM...,False,United Arab Emirates,2024,2,16,Friday,February,19,19,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[seeking, frontend, developer, proficient, htm...","[-0.23090867698192596, -0.7956643104553223, 0....",0,0.917073,0.917073
45019,React JS developer with Dialog flow developmen...,https://www.upwork.com/jobs/React-developer-wi...,I am looking for a react developer who can bui...,False,United States,2024,2,16,Friday,February,14,39,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[looking, react, developer, build, front, end,...","[-0.6432909965515137, -0.6174487471580505, -0....",0,0.912928,0.912928
21175,Convert FIgma to HTML Looking expert,https://www.upwork.com/jobs/Convert-FIgma-HTML...,I am looking exert who can Convert FIgma to HT...,False,India,2024,2,17,Saturday,February,12,10,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[looking, exert, convert, figma, html, animati...","[-0.44085267186164856, -1.0898449420928955, 0....",0,0.90983,0.90983
30937,Build responsive js angular react based e com...,https://www.upwork.com/jobs/Build-responsive-a...,- Complete e commerce website based on js reac...,True,India,2024,2,14,Wednesday,February,14,33,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[complete, e, commerce, website, based, js, re...","[-0.6054804921150208, -0.4002763330936432, -0....",0,0.909025,0.909025
43224,Software Engineer Needed for Automation System...,https://www.upwork.com/jobs/Software-Engineer-...,Talent we are looking for:- Over 5 years of ex...,False,Australia,2024,2,19,Monday,February,5,11,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[talent, looking, years, experience, react, ts...","[-0.5261688232421875, -0.8312921524047852, 0.5...",0,0.907382,0.907382
48611,Html layout,https://www.upwork.com/jobs/Html-layout_%7E019...,"I am developing HTML, CSS code. I am a newbie....",False,Ukraine,2024,2,13,Tuesday,February,19,33,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[developing, html, css, code, newbiecan, write...","[-0.29437968134880066, -1.2572910785675049, 0....",0,0.906506,0.906506
40133,HTML / CSS developer needed,https://www.upwork.com/jobs/HTML-CSS-developer...,need a front developer just html css javascrip...,False,Pakistan,2024,2,20,Tuesday,February,8,53,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[need, front, developer, html, css, javascript...","[-0.29791998863220215, -0.5702160000801086, 0....",0,0.905858,0.905858
15596,To create responsive HTML and CSS in Django Pr...,https://www.upwork.com/jobs/create-responsive-...,I have an existing Django website.Need to add ...,False,India,2024,2,16,Friday,February,17,34,Web & App Development,"['html', 'css', 'javascript', 'react', 'node.j...","[existing, django, websiteneed, add, stylingcs...","[-0.3024587333202362, -0.6049514412879944, -0....",0,0.903884,0.903884


### Save the trained Word2Vec model to a file named "model_w2v.model" for later use or deployment

In [52]:
model_w2v.save("model_w2v.model")

### Design a ``Recommendation System App`` by  Streamlit  

In [30]:
%%writefile recommendation_system_app.py
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
from gensim.models import Word2Vec
from sklearn.metrics.pairwise import cosine_similarity

# Page setup
st.set_page_config(
    page_title="Job Cluster Analysis & Recommendation",
    layout="wide",
    page_icon="📊"
)

# Title
st.title("📊 Job Cluster Analysis & Recommendation System")

# Load data - replace with your actual data loading
@st.cache_data
def load_data():
    try:
        # Load your main dataframe
        df_final = pd.read_csv("StremlitClustring.csv")
        
        # Ensure we have the expected columns
        required_columns = ['jobs', 'country', 'is_hourly']
        for col in required_columns:
            if col not in df_final.columns:
                raise ValueError(f"Required column '{col}' not found in dataset")

        # Create all required DataFrames based on your questions
        df_country = df_final.groupby(['country', 'jobs']).size().reset_index(name='count')
        jop_type = df_final.groupby(['jobs', 'is_hourly']).size().reset_index(name='count')
        top10_countries = df_final.groupby('country').size().nlargest(10).reset_index(name='count')
        
        # Time-based data (if date column exists)
        job_month = pd.DataFrame()
        job_day = pd.DataFrame()
        job_hour = pd.DataFrame()
        
        if 'date' in df_final.columns:
            try:
                df_final['date'] = pd.to_datetime(df_final['date'])
                df_final['month_name'] = df_final['date'].dt.month_name()
                df_final['day_name'] = df_final['date'].dt.day_name()
                job_month = df_final.groupby(['month_name', 'jobs']).size().reset_index(name='count')
                job_day = df_final.groupby(['day_name', 'jobs']).size().reset_index(name='count')
            except Exception as e:
                st.warning(f"Couldn't process date column: {str(e)}")
        
        if 'time' in df_final.columns:
            try:
                df_final['hour'] = pd.to_datetime(df_final['time']).dt.hour
                job_hour = df_final.groupby(['hour', 'jobs']).size().reset_index(name='count')
            except Exception as e:
                st.warning(f"Couldn't process time column: {str(e)}")
        
        # Create consistent sample data if needed
        unique_jobs = df_final['jobs'].unique()
        num_jobs = len(unique_jobs)
        
        if job_month.empty:
            months = ['Jan', 'Feb', 'Mar']
            job_month = pd.DataFrame({
                'month_name': np.repeat(months, num_jobs),
                'jobs': np.tile(unique_jobs, len(months)),
                'count': np.random.randint(50, 200, len(months)*num_jobs)
            })
        
        if job_day.empty:
            days = ['Mon', 'Tue', 'Wed']
            job_day = pd.DataFrame({
                'day_name': np.repeat(days, num_jobs),
                'jobs': np.tile(unique_jobs, len(days)),
                'count': np.random.randint(20, 100, len(days)*num_jobs)
            })
        
        if job_hour.empty:
            hours = [9, 10, 11]
            job_hour = pd.DataFrame({
                'hour': np.repeat(hours, num_jobs),
                'jobs': np.tile(unique_jobs, len(hours)),
                'count': np.random.randint(10, 50, len(hours)*num_jobs)
            })
        
        # Salary data (sample - replace with your actual data)
        df_melted = pd.DataFrame({
            'jobs': np.repeat(unique_jobs, 2),
            'Hourly Rate': np.tile([45, 30], num_jobs),
            'Rate Type': np.tile(['High', 'Low'], num_jobs)
        })
        
        df_hourly = pd.DataFrame({
            'jobs': unique_jobs,
            'hourly_low': np.random.randint(20, 35, num_jobs),
            'hourly_high': np.random.randint(35, 55, num_jobs)
        })
        df_hourly['hourly_avg'] = (df_hourly['hourly_low'] + df_hourly['hourly_high']) / 2
        top_avg = df_hourly.sort_values(by='hourly_avg', ascending=False).head(20)
        
        df_top10 = df_final[df_final['country'].isin(top10_countries['country'])]
        
        # Create top skills data (replace with actual skills analysis from your data)
        top_skills_data = pd.DataFrame({
            'Skill': ['Graphic Design', 'Web Development', 'JavaScript', 
                     'Python', 'Social Media Marketing', 'Content Writing',
                     'SEO', 'UI/UX Design', 'Mobile App Development', 'Data Analysis'],
            'Frequency': [850, 720, 680, 650, 600, 580, 550, 520, 500, 480]
        })
        
        return {
            'df_final': df_final,
            'df_country': df_country,
            'jop_type': jop_type,
            'top10_countries': top10_countries,
            'job_month': job_month,
            'job_day': job_day,
            'job_hour': job_hour,
            'df_melted': df_melted,
            'df_hourly': df_hourly,
            'top_avg': top_avg,
            'df_top10': df_top10,
            'top_skills_data': top_skills_data
        }
    except Exception as e:
        st.error(f"Error processing data: {str(e)}")
        return None

@st.cache_data
def load_recommendation_data():
    try:
        df = pd.read_csv("streamlit.csv")
        if 'w2v_vector' not in df.columns:
            raise ValueError("w2v_vector column not found in recommendation data")
        return df
    except Exception as e:
        st.error(f"Error loading recommendation data: {str(e)}")
        return None

# Load all data
data = load_data()
if data is None:
    st.stop()

df_recommend = load_recommendation_data()
if df_recommend is None:
    st.stop()

# Recommendation system
@st.cache_resource
def load_w2v_model():
    try:
        return Word2Vec.load("model_w2v.model")
    except Exception as e:
        st.error(f"Error loading Word2Vec model: {str(e)}")
        return None

model_w2v = load_w2v_model()
if model_w2v is None:
    st.stop()

try:
    X_w2v = np.vstack(df_recommend['w2v_vector'].apply(eval).values)
except Exception as e:
    st.error(f"Error processing word vectors: {str(e)}")
    st.stop()

all_possible_skills = sorted([
    'graphic design', 'adobe photoshop', 'adobe illustrator',
    'logo design', 'video editing', 'adobe premiere', 'after effects',
    'motion graphics', 'branding', 'illustration', 'visual design',
    'ui design', 'html', 'css', 'javascript', 'react', 'node.js',
    'wordpress', 'php', 'python', 'flutter', 'mobile app development',
    'web development', 'ui/ux design', 'api integration', 'django',
    'social media marketing', 'content writing', 'seo', 'google ads',
    'facebook ads', 'email marketing', 'copywriting', 'instagram marketing',
    'analytics', 'tiktok ads', 'content creation', 'marketing strategy'
])

def get_freelancer_vector(skills, model):
    valid_skills = [skill for skill in skills if skill in model.wv]
    if not valid_skills:
        return np.zeros(model.vector_size)
    return np.mean(model.wv[valid_skills], axis=0)

def recommend_jobs(user_skills, top_n=5):
    user_vector = get_freelancer_vector(user_skills, model_w2v)
    df_recommend['similarity'] = cosine_similarity([user_vector], X_w2v)[0]
    return df_recommend.sort_values(by='similarity', ascending=False).head(top_n)

# App navigation
st.sidebar.title("Navigation")
app_mode = st.sidebar.radio("Select Mode:", 
                           ["Cluster Analysis", "Job Recommendations"])

if app_mode == "Cluster Analysis":
    # Job filter in sidebar
    st.sidebar.header("Filters")
    selected_jobs = st.sidebar.multiselect(
        "Select job clusters to display:",
        options=data['df_final']['jobs'].unique(),
        default=data['df_final']['jobs'].unique()
    )

    # Apply filters to relevant DataFrames
    df_filtered = data['df_final'][data['df_final']['jobs'].isin(selected_jobs)]
    df_country_filtered = data['df_country'][data['df_country']['jobs'].isin(selected_jobs)]
    jop_type_filtered = data['jop_type'][data['jop_type']['jobs'].isin(selected_jobs)]
    job_month_filtered = data['job_month'][data['job_month']['jobs'].isin(selected_jobs)]
    job_day_filtered = data['job_day'][data['job_day']['jobs'].isin(selected_jobs)]
    job_hour_filtered = data['job_hour'][data['job_hour']['jobs'].isin(selected_jobs)]
    df_melted_filtered = data['df_melted'][data['df_melted']['jobs'].isin(selected_jobs)]
    top_avg_filtered = data['top_avg'][data['top_avg']['jobs'].isin(selected_jobs)]
    df_top10_filtered = data['df_top10'][data['df_top10']['jobs'].isin(selected_jobs)]

    # Main dashboard
    tab1, tab2, tab3 = st.tabs(["Overview", "Geographical", "Temporal & Salary"])

    with tab1:
        st.header("Job Cluster Overview")
        
        col1, col2 = st.columns(2)
        with col1:
            st.subheader("Most Common Job Clusters")
            fig = px.histogram(data_frame=df_filtered, y='jobs', text_auto=True,
                              title='<b>Distribution of Job Clusters</b>',
                              color='jobs',
                              height=500)
            fig.update_layout(title_x=0.5, showlegend=False)
            st.plotly_chart(fig, use_container_width=True)
        
        with col2:
            st.subheader("Cluster Proportion")
            fig = px.pie(data_frame=df_filtered, names='jobs',
                         title='<b>Percentage of Each Cluster</b>',
                         hole=0.3,
                         color_discrete_sequence=px.colors.qualitative.Pastel)
            fig.update_layout(title_x=0.5, height=500)
            st.plotly_chart(fig, use_container_width=True)
        
        # New Top 10 Skills section
        st.subheader("Top 10 In-Demand Skills")
        
        col1, col2 = st.columns([2, 1])
        with col1:
            fig = px.bar(data['top_skills_data'], x='Frequency', y='Skill', 
                        orientation='h', text='Frequency',
                        title='<b>Most Requested Skills</b>',
                        color='Skill',
                        color_discrete_sequence=px.colors.qualitative.Pastel)
            fig.update_layout(title_x=0.5, showlegend=False)
            st.plotly_chart(fig, use_container_width=True)
        
        with col2:
            fig = px.pie(data['top_skills_data'], names='Skill', values='Frequency',
                        title='<b>Skills Distribution</b>',
                        hole=0.4)
            fig.update_layout(title_x=0.5, height=400)
            st.plotly_chart(fig, use_container_width=True)
        
        st.subheader("Hourly vs Non-Hourly Jobs")
        fig = px.pie(data_frame=df_filtered, names='is_hourly', 
                     title='<b>Percentage of Hourly Jobs</b>',
                     color='is_hourly',
                     color_discrete_map={'True':'#636EFA', 'False':'#EF553B'})
        fig.update_layout(title_x=0.5)
        st.plotly_chart(fig, use_container_width=True)

    with tab2:
        st.header("Geographical Distribution")
        
        col1, col2 = st.columns(2)
        with col1:
            st.subheader("Top Countries for Selected Clusters")
            fig = px.histogram(data_frame=df_country_filtered, x='country', y='count', 
                              text_auto=True, title='<b>Job Postings by Country</b>',
                              color='jobs')
            fig.update_layout(title_x=0.5)
            st.plotly_chart(fig, use_container_width=True)
        
        with col2:
            st.subheader("Hourly Jobs by Country")
            fig = px.histogram(data_frame=df_top10_filtered, x='country', y='jobs', 
                              color='is_hourly', barmode='group', text_auto=True,
                              title='<b>Hourly Job Distribution by Country</b>',
                              color_discrete_map={'True':'#636EFA', 'False':'#EF553B'})
            fig.update_layout(title_x=0.5)
            st.plotly_chart(fig, use_container_width=True)

    with tab3:
        st.header("Temporal Patterns & Salary Analysis")
        
        col1, col2 = st.columns(2)
        with col1:
            st.subheader("Monthly Job Postings")
            fig = px.histogram(data_frame=job_month_filtered, x='month_name', y='count',
                              color='jobs', barmode='group', text_auto=True,
                              title='<b>Monthly Job Postings by Cluster</b>')
            fig.update_layout(title_x=0.5)
            st.plotly_chart(fig, use_container_width=True)
        
        with col2:
            st.subheader("Daily Job Postings")
            fig = px.histogram(data_frame=job_day_filtered, x='day_name', y='count',
                              color='jobs', barmode='group', text_auto=True,
                              title='<b>Daily Job Postings by Cluster</b>')
            fig.update_layout(title_x=0.5)
            st.plotly_chart(fig, use_container_width=True)
        
        st.subheader("Hourly Posting Patterns")
        fig = px.line(
            job_hour_filtered,
            x='hour',
            y='count',
            color='jobs',
            markers=True,
            title='<b>Job Posting Times by Hour</b>'
        )
        fig.update_layout(
            xaxis_title='Hour of Day',
            yaxis_title='Number of Postings',
            title_x=0.5
        )
        st.plotly_chart(fig, use_container_width=True)
        
        st.subheader("Salary Analysis")
        col1, col2 = st.columns(2)
        with col1:
            fig = px.bar(
                data['df_melted'],
                x='jobs',
                y='Hourly Rate',
                color='Rate Type',
                barmode='group',
                text_auto=True,
                title='<b>Hourly Rate Ranges by Cluster</b>',
                color_discrete_sequence=px.colors.qualitative.Pastel
            )
            fig.update_layout(title_x=0.5)
            st.plotly_chart(fig, use_container_width=True)
        
        with col2:
            fig = px.bar(
                data['top_avg'],
                x='jobs',
                y='hourly_avg',
                text='hourly_avg',
                color='jobs',
                title='<b>Top Paying Job Clusters</b>',
                color_discrete_sequence=px.colors.qualitative.Pastel
            )
            fig.update_layout(title_x=0.5, showlegend=False)
            st.plotly_chart(fig, use_container_width=True)

    # Key metrics
    st.sidebar.markdown("---")
    st.sidebar.subheader("Key Metrics")
    st.sidebar.metric("Total Jobs", len(df_filtered))
    st.sidebar.metric("Unique Countries", df_filtered['country'].nunique())
    hourly_pct = df_filtered['is_hourly'].value_counts(normalize=True).get(True, 0)*100
    st.sidebar.metric("Hourly Jobs", f"{hourly_pct:.1f}%")

else:
    # Job Recommendation System
    st.header("💼 Job Recommendation System")
    st.write("Get personalized job recommendations based on your skills")
    
    with st.sidebar:
        st.header("🔧 Select Your Skills")
        selected_skills = st.multiselect(
            "Choose skills you have:",
            options=all_possible_skills,
            default=['html', 'css', 'javascript']
        )
    
    if selected_skills:
        st.success(f"✅ Selected Skills: {', '.join(selected_skills)}")
        recommendations = recommend_jobs(selected_skills, top_n=10)
        
        st.subheader("🔍 Recommended Jobs")
        st.dataframe(recommendations[['title', 'description', 'similarity']]
                    .style.format({'similarity': "{:.2f}"}), 
                    use_container_width=True)
        
        fig = px.bar(
            recommendations,
            x='title',
            y='similarity',
            color='similarity',
            text='similarity',
            title='<b>Job Recommendation Scores</b>',
            color_continuous_scale='Bluered'
        )
        st.plotly_chart(fig, use_container_width=True)
        
        # Show market trends for recommended jobs
        st.subheader("📊 Market Trends for Recommended Jobs")
        recommended_titles = recommendations['title'].unique()
        market_data = data['df_final'][data['df_final']['title'].isin(recommended_titles)]
        
        if not market_data.empty:
            col1, col2 = st.columns(2)
            with col1:
                st.write("Geographical Distribution")
                fig = px.histogram(market_data, x='country', color='title', 
                                 barmode='group', title='<b>Jobs by Country</b>',
                                 color_discrete_sequence=px.colors.qualitative.Pastel)
                st.plotly_chart(fig, use_container_width=True)
            
            with col2:
                st.write("Employment Type")
                fig = px.pie(market_data, names='is_hourly', 
                            title='<b>Hourly vs Full-time</b>',
                            color_discrete_map={'True':'#636EFA', 'False':'#EF553B'})
                st.plotly_chart(fig, use_container_width=True)
        else:
            st.info("No market data available for these specific jobs")
    else:
        st.info("Please select your skills from the sidebar to get recommendations")

# Download button
st.sidebar.markdown("---")
st.sidebar.download_button(
    label="Download Sample Data",
    data=data['df_final'].to_csv().encode('utf-8'),
    file_name='job_cluster_data.csv',
    mime='text/csv'
)

# Add footer
st.sidebar.markdown("---")
st.sidebar.markdown("""
** Recommendation System & Job Cluster Analysis**  
Developed with Eng: Yousef Abdelsalam  
""")

Overwriting recommendation_system_app.py


In [31]:
! streamlit run recommendation_system_app.py

^C
