---
format:
    html:
        embed-resources: true
---

# Visual EDA 

Write either a function, or a class, that performs exploratory data analysis (EDA) on a given CSV file.

Run the function on both of your cleaned CSV files that you created from your job description crawl. 

- Explore feature distributions and relationships
- Uncover trends, patterns, and correlations
- Consider incorporating a geo-spatial analysis where appropriate
  - Folium is a great tool for this: https://realpython.com/python-folium-web-maps-from-data/ 
- Visualize data effectively
- Make sure your results are highly visual with high quality plots.

By focusing on these steps, you'll extract valuable insights and inform deeper analysis of your job descriptions dataset.

Below are some possible things to focus on, you can add more on top of these, and you don't have to do all of them, but aim to produce the highest quality results possible. 

Focus on a few cases that are most interesting to you, the points below are just ideas, there isn't anything specifically that you "need to do", the only thing you need to do is **take the assignment seriously, think critically, and make a high quality professional demonstration of EDA as applied to this dataset**

### **Univariate Analysis (Single Feature)**
- **Frequency Counts**: For categorical features (e.g., job title, sector, job type), visualize frequency distribution (bar charts).
- **Salary Distribution**: Analyze the range and spread of salary data (histograms, box plots).
- **Job Posting Dates**: Plot distribution of job posting dates to find trends over time (time series plots).
- **Experience Level Distribution**: Explore the spread of entry-level, mid-level, and senior-level roles.
- **Location Distribution**: Map the distribution of jobs across different cities, states, or countries.
- **Job Type Analysis**: Count of job types (full-time, part-time, contract) and their proportions.
- **Job Description Length**: Analyze job description lengths by word or character count.

### **Bivariate Analysis (Two Features)**
- **Salary vs. Experience Level**: Explore how salary varies by experience level (box plot or scatter plot).
- **Salary vs. Job Type**: Analyze differences in salary based on job types (full-time, part-time, contract).
- **Salary vs. Location**: Check how salaries vary across different locations.
- **Job Title vs. Sector/Industry**: Look at the relationship between job titles and the industries they belong to.
- **Job Title vs. Skills/Technologies**: Explore which job titles require specific skills.
- **Company Size vs. Salary**: Compare salaries across different company sizes (small, medium, large).
- **Remote Work vs. Salary**: Analyze if remote jobs offer higher or lower salaries compared to on-site jobs.

### **Multivariate Analysis (Multiple Features)**
- **Experience Level vs. Salary vs. Location**: Analyze how salary and experience vary across locations (3D scatter plot or heatmap).
- **Job Type vs. Experience Level vs. Sector**: Explore trends across job types, experience levels, and sectors (grouped bar plots).
- **Skills vs. Salary vs. Experience**: Check if certain skills demand higher salaries at different experience levels.
- **Salary vs. Sector vs. Company Size**: Compare salary ranges across sectors and company sizes.

### **Text Analysis and Feature Engineering**
- **Keyword Frequency**: Extract and analyze the most common keywords from job descriptions and required skills.
- **Text Length Analysis**: Look at the distribution of text lengths in job descriptions to identify patterns in verbosity.
- **NLP for Job Titles**: Group similar job titles by extracting key terms using Natural Language Processing (e.g., clustering similar roles).
- **Sentiment Analysis**: Perform sentiment analysis on company values or job descriptions to gauge company culture.
- **Named Entity Recognition (NER)**: Extract entities like company names, technologies, and locations from the text data.
- **Topic Modeling**: Identify topics or themes within job descriptions using topic modeling techniques like LDA (Latent Dirichlet Allocation).

### **Date and Time Analysis**
- **Job Posting Frequency Over Time**: Check trends in job postings over time (monthly, weekly).
- **Application Deadline vs. Posting Date**: Analyze how much time is typically given for application submissions.
- **Job Posting Expiry Analysis**: Explore how long jobs remain posted before expiring.
- **Seasonality**: Identify seasonal hiring trends by analyzing the distribution of job postings by month or quarter.

### **Geospatial Analysis**
- **Heatmap of Job Locations**: Visualize job density by location (city, state, country) on a map.
- **Salary by Geographic Region**: Compare average salaries across different geographic regions (e.g., East Coast vs. West Coast).
- **Remote Work Proportion by Location**: Analyze how remote job opportunities vary across regions.

### **Categorical Feature Exploration**
- **Sector/Industry Breakdown**: Analyze the distribution of jobs across different sectors or industries.
- **Certifications Breakdown**: Explore which certifications are most commonly required or preferred.
- **Company Size Distribution**: Look at the distribution of small, medium, and large companies in the dataset.
- **Job Platform Analysis**: Check how job postings differ across various job platforms (e.g., LinkedIn, Indeed).
- **Visa Sponsorship Availability**: Explore how often visa sponsorship is offered across job types or industries.

### **Correlations and Associations**
- **Correlation Matrix**: Compute correlations between numerical features (e.g., salary, years of experience) to find relationships.
- **Crosstab Analysis**: Perform crosstab analysis for categorical features (e.g., job type and location).
- **Feature Importance**: Use machine learning models (e.g., Random Forest) to evaluate feature importance for predicting salary or job title.

### **Clustering and Segmentation**
- **Job Title Clustering**: Cluster similar job titles using text similarity or clustering algorithms (e.g., k-means).
- **Salary Segmentation**: Group job listings into salary tiers (low, medium, high) for further analysis.
- **Geographic Segmentation**: Segment job listings based on location proximity or geographic region.

### **Dimensionality Reduction**
- **PCA**: Perform PCA to reduce the dimensionality of the data and visualize in 2D. 
- **t-SNE**: Perform t-SNE to reduce the dimensionality of the data and visualize in 2D. 

### **Patterns and Trends**
- **Trends in Job Types**: Analyze how the proportions of full-time, part-time, and contract jobs have changed over time.
- **Emerging Technologies**: Track trends in the demand for specific technologies or skills over time.
- **Sector Growth**: Identify growing or shrinking industries based on the number of job postings over time.
- **Benefits and Perks Trends**: Analyze which benefits (e.g., remote work, stock options) are becoming more or less common.

