<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Scatter Plot**


Estimated time needed: **45** minutes


## Overview

In this lab, you will focus on creating and interpreting scatter plots to visualize relationships between variables and trends in the dataset. The provided dataset will be directly loaded into a pandas DataFrame, and various scatter plot-related visualizations will be created to explore developer trends, compensation, and preferences.



## Objectives


In this lab, you will:

- Create and analyze scatter plots to examine relationships between variables.

- Use scatter plots to identify trends and patterns in the dataset.

- Focus on visualizations centered on scatter plots for better data-driven insights.


## Setup: Working with the Database



**Install and import the required libraries**


In [9]:
!pip install pandas
!pip install matplotlib

import pandas as pd
import matplotlib.pyplot as plt



#### Step 1: Load the dataset


In [None]:
file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/n01PQ9pSmiRX6520flujwQ/survey-data.csv"

df = pd.read_csv(file_path)



### Task 1: Exploring Relationships with Scatter Plots



#### 1. Scatter Plot for Age vs. Job Satisfaction



Visualize the relationship between respondents' age (`Age`) and job satisfaction (`JobSatPoints_6`). Use this plot to identify any patterns or trends.




In [None]:
## Write your code here
df_clean = df[['Age', 'JobSatPoints_6']].dropna()

plt.figure(figsize=(10,6))
plt.scatter(df_clean['Age'], df_clean['JobSatPoints_6'], alpha=0.6, color='teal')
plt.title('Relationship between Age and Job Satisfaction')
plt.xlabel('Age')
plt.ylabel('Job Satisfaction Points')
plt.grid(True)
plt.show()

#### 2. Scatter Plot for Compensation vs. Job Satisfaction


Explore the relationship between yearly compensation (`ConvertedCompYearly`) and job satisfaction (`JobSatPoints_6`) using a scatter plot.


In [None]:

## Write your code here
df_clean = df[['ConvertedCompYearly', 'JobSatPoints_6']].dropna()

plt.figure(figsize=(10,6))
plt.scatter(df_clean['ConvertedCompYearly'], df_clean['JobSatPoints_6'], alpha=0.5, color='green')
plt.xscale('log')  # Use log scale to handle high compensation outliers
plt.title('Yearly Compensation vs Job Satisfaction')
plt.xlabel('Yearly Compensation (log scale)')
plt.ylabel('Job Satisfaction Points')
plt.grid(True)
plt.show()

### Task 2: Enhancing Scatter Plots


#### 1. Scatter Plot with Trend Line for Age vs. Job Satisfaction



Add a regression line to the scatter plot of Age vs. JobSatPoints_6 to highlight trends in the data.


In [None]:
## Write your code here
df_clean = df[['Age', 'JobSatPoints_6']].dropna()

# Scatter plot with regression line
plt.figure(figsize=(10,6))
sns.regplot(x='Age', y='JobSatPoints_6', data=df_clean, scatter_kws={'alpha':0.5, 'color':'blue'}, line_kws={'color':'red'})
plt.title('Age vs Job Satisfaction with Regression Line')
plt.xlabel('Age')
plt.ylabel('Job Satisfaction Points')
plt.grid(True)
plt.show()

#### 2. Scatter Plot for Age vs. Work Experience


Visualize the relationship between Age (`Age`) and Work Experience (`YearsCodePro`) using a scatter plot.


In [None]:
## Write your code here
df_clean = df[['Age', 'YearsCodePro']].dropna()

def convert_years(x):
    if x == 'Less than 1 year':
        return 0.5
    elif x == 'More than 50 years':
        return 50
    else:
        try:
            return float(x)
        except:
            return None

df_clean['YearsCodePro'] = df_clean['YearsCodePro'].apply(convert_years)
df_clean = df_clean.dropna()

plt.figure(figsize=(10,6))
plt.scatter(df_clean['Age'], df_clean['YearsCodePro'], alpha=0.5, color='orange')
plt.title('Age vs Years of Professional Coding Experience')
plt.xlabel('Age')
plt.ylabel('Years of Professional Coding Experience')
plt.grid(True)
plt.show()

### Task 3: Combining Scatter Plots with Additional Features


#### 1. Bubble Plot of Compensation vs. Job Satisfaction with Age as Bubble Size



Create a bubble plot to explore the relationship between yearly compensation (`ConvertedCompYearly`) and job satisfaction (`JobSatPoints_6`), with bubble size representing age.


In [None]:
## Write your code here
df_clean = df[['ConvertedCompYearly', 'JobSatPoints_6', 'Age']].dropna()

df_clean = df_clean[df_clean['ConvertedCompYearly'] <= df_clean['ConvertedCompYearly'].quantile(0.99)]

plt.figure(figsize=(12,8))
scatter = plt.scatter(
    x=df_clean['ConvertedCompYearly'], 
    y=df_clean['JobSatPoints_6'], 
    s=df_clean['Age']*2, 
    alpha=0.6, 
    c=df_clean['Age'],    
    cmap='viridis',
    edgecolors='w',
    linewidth=0.5
)
plt.xscale('log') 
plt.colorbar(scatter, label='Age')
plt.title('Bubble Plot: Yearly Compensation vs Job Satisfaction (Bubble size = Age)')
plt.xlabel('Yearly Compensation (log scale)')
plt.ylabel('Job Satisfaction Points')
plt.grid(True, alpha=0.3)
plt.show()

#### 2. Scatter Plot for Popular Programming Languages by Job Satisfaction


Visualize the popularity of programming languages (`LanguageHaveWorkedWith`) against job satisfaction using a scatter plot. Use points to represent satisfaction levels for each language.


In [None]:
## Write your code here
df_clean = df[['LanguageHaveWorkedWith', 'JobSatPoints_6']].dropna()

# Split languages (semicolon-separated) into separate rows
df_expanded = df_clean.assign(LanguageHaveWorkedWith=df_clean['LanguageHaveWorkedWith'].str.split(';')).explode('LanguageHaveWorkedWith')

# Optional: trim whitespace
df_expanded['LanguageHaveWorkedWith'] = df_expanded['LanguageHaveWorkedWith'].str.strip()

# Scatter plot using seaborn
plt.figure(figsize=(12,8))
sns.stripplot(x='LanguageHaveWorkedWith', y='JobSatPoints_6', data=df_expanded, jitter=True, alpha=0.6, color='blue')
plt.xticks(rotation=90)
plt.title('Programming Languages vs Job Satisfaction')
plt.xlabel('Programming Language')
plt.ylabel('Job Satisfaction Points')
plt.grid(True, alpha=0.3)
plt.show()

### Task 4: Scatter Plot Comparisons Across Groups


#### 1. Scatter Plot for Compensation vs. Job Satisfaction by Employment Type


Visualize the relationship between yearly compensation (`ConvertedCompYearly`) and job satisfaction (`JobSatPoints_6`), categorized by employment type (`Employment`). Use color coding or markers to differentiate between employment types.


In [None]:
## Write your code here
df_clean = df[['ConvertedCompYearly', 'JobSatPoints_6', 'Employment']].dropna()

# Optional: remove extreme compensation outliers for better visualization
df_clean = df_clean[df_clean['ConvertedCompYearly'] <= df_clean['ConvertedCompYearly'].quantile(0.99)]

# Scatter plot with color coding by Employment type
plt.figure(figsize=(12,8))
sns.scatterplot(
    x='ConvertedCompYearly', 
    y='JobSatPoints_6', 
    hue='Employment', 
    data=df_clean, 
    alpha=0.6,
    palette='Set2'
)
plt.xscale('log')  # Log scale for compensation
plt.title('Yearly Compensation vs Job Satisfaction by Employment Type')
plt.xlabel('Yearly Compensation (log scale)')
plt.ylabel('Job Satisfaction Points')
plt.legend(title='Employment Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.show()

#### 2. Scatter Plot for Work Experience vs. Age Group by Country


Compare work experience (`YearsCodePro`) across different age groups (`Age`) and countries (`Country`). Use colors to represent different countries and markers for age groups.


In [None]:
## Write your code here
df = df[['Age', 'YearsCodePro', 'Country']].dropna()

# YearsCodePro ni sonli qiymatga o‘tkazish
def convert_years(x):
    if x == 'Less than 1 year':
        return 0.5
    elif x == 'More than 50 years':
        return 50
    else:
        try:
            return float(x)
        except:
            return None

df['YearsCodePro'] = df['YearsCodePro'].apply(convert_years)
df = df.dropna()

# Faqat mashhur 5 mamlakatni tanlaymiz
top_countries = ['United States', 'India', 'Germany', 'United Kingdom', 'Canada']
df = df[df['Country'].isin(top_countries)]

# Scatter plot
plt.figure(figsize=(10,6))
for country in top_countries:
    subset = df[df['Country']==country]
    plt.scatter(subset['Age'], subset['YearsCodePro'], alpha=0.5, label=country)

plt.xlabel('Age')
plt.ylabel('Years of Professional Coding Experience')
plt.title('Work Experience by Age and Country')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### Final Step: Review


With these scatter plots, you will have analyzed data relationships across multiple dimensions, including compensation, job satisfaction, employment types, and demographics, to uncover meaningful trends in the developer community.


### Summary


After completing this lab, you will be able to:
- Analyze how numerical variables relate across specific groups, such as employment types and countries.
- Use scatter plots effectively to represent multiple variables with color, size, and markers.
- Gain insights into compensation, satisfaction, and demographic trends using advanced scatter plot techniques.


## Authors:
Ayushi Jain


### Other Contributors:
- Rav Ahuja
- Lakshmi Holla
- Malika


<!--## Change Log
|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|               
|2024-10-07|1.2|Madhusudan Moole|Reviewed and updated lab|                                                                                      
|2024-10-06|1.0|Raghul Ramesh|Created lab|-->


Copyright © IBM Corporation. All rights reserved.
