<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Line Charts**


Estimated time needed: **30** minutes


In this lab, you will focus on using line charts to analyze trends over time and across different categories in a dataset.



## Objectives


In this lab you will perform the following:


- Track trends in compensation across age groups and specific age ranges.

- Analyze job satisfaction trends based on experience level.

- Explore and interpret line charts to identify patterns and trends.


## Setup: Working with the Database
**Install the needed libraries**


In [1]:
!pip install pandas




In [2]:
!pip install matplotlib



**Download and connect to the database file containing survey data.**


To start, download and load the dataset into a `pandas` DataFrame.



#### Step 1: Download the dataset


In [None]:
!wget -O survey-data.csv https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/n01PQ9pSmiRX6520flujwQ/survey-data.csv


--2025-07-10 22:09:08--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/n01PQ9pSmiRX6520flujwQ/survey-data.csv
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.63.118.104
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.63.118.104|:443... connected.
200 OKequest sent, awaiting response... 
Length: 159525875 (152M) [text/csv]
Saving to: ‘survey-data.csv’


2025-07-10 22:09:11 (58.0 MB/s) - ‘survey-data.csv’ saved [159525875/159525875]



#### Step 2: Import necessary libraries and load the dataset


In [None]:
import pandas as pd
import matplotlib.pyplot as plt


#### Load the data


In [None]:
df = pd.read_csv("survey-data.csv")


#### Display the first few rows to understand the structure of the data


In [None]:
df.head()

### Task 1: Trends in Compensation Over Age Groups


##### 1. Line Chart of Median `ConvertedCompYearly` by Age Group


- Track how the median yearly compensation (ConvertedCompYearly) changes across different age groups.

- Use a line chart to visualize these trends.


In [None]:
## Write your code here

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


# Step 1: Drop rows with missing compensation
df_clean = df.dropna(subset=['ConvertedCompYearly'])

# Step 2: Optional scaling (for easier visualization)
df_clean['ConvertedCompYearly'] = df_clean['ConvertedCompYearly'] / 1e4  # scaling by 10,000

# Step 3: Group by Age and calculate median
grouped = df_clean.groupby('Age')['ConvertedCompYearly'].median().reset_index()

# Step 4: Plot
plt.figure(figsize=(10,6))
plt.plot(grouped['Age'], grouped['ConvertedCompYearly'], marker='o')
plt.xticks(rotation=45)
plt.xlabel('Age Group')
plt.ylabel('Median Compensation (×10,000)')
plt.title('Median ConvertedCompYearly by Age Group')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 2. Line Chart of Median `ConvertedCompYearly` for Ages 25 to 45


For a closer look, plot a line chart focusing on the median compensation for respondents between ages 25 and 45.


In [None]:
## Write your code here

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Convert age ranges to midpoints 
def extract_age_midpoint(age_str):
    try:
        if 'years old' in str(age_str):
            age_range = age_str.split(' ')[0]
            if '-' in age_range:
                start, end = map(int, age_range.split('-'))
                return (start + end) / 2
            elif 'Under' in age_str:
                return 16  # Approximate for "Under 18"
            elif '65' in age_str:
                return 70  # Approximate for "65 or older"
    except:
        return np.nan
    return np.nan

df['AgeMid'] = df['Age'].apply(extract_age_midpoint)

# Clean and filter 
df = df.dropna(subset=['ConvertedCompYearly', 'AgeMid'])
df['ConvertedCompYearly'] = pd.to_numeric(df['ConvertedCompYearly'], errors='coerce')
df = df[(df['AgeMid'] >= 25) & (df['AgeMid'] <= 45)]

# Group and compute median 
median_comp_by_age = df.groupby('AgeMid')['ConvertedCompYearly'].median().sort_index()

# Plot 
plt.figure(figsize=(10, 6))
plt.plot(median_comp_by_age.index, median_comp_by_age.values, marker='o', linestyle='-', color='darkblue')
plt.title('Median Yearly Compensation for Ages 25 to 45')
plt.xlabel('Age (Midpoint)')
plt.ylabel('Median Compensation (USD)')
plt.grid(True)
plt.tight_layout()
plt.show()

### Task 2: Trends in Job Satisfaction by Experience Level



##### 1. Line Chart of Job Satisfaction (`JobSatPoints_6`) by Experience Level



- Use a column that approximates experience level to analyze how job satisfaction changes with experience.

- If needed, substitute an available experience-related column for `Experience`.


In [None]:
## Write your code here

import pandas as pd
import matplotlib.pyplot as plt

# Drop rows with missing values in relevant columns
df_exp = df.dropna(subset=['WorkExp', 'JobSatPoints_6']).copy()

# Group by ExperienceLevel and calculate median JobSatPoints_6
median_jobsat_by_exp = df_exp.groupby('WorkExp')['JobSatPoints_6'].median().dropna()

# Plot
plt.figure(figsize=(10,6))
plt.plot(median_jobsat_by_exp.index, median_jobsat_by_exp.values, marker='o', linestyle='-', color='purple')
plt.title('Median Job Satisfaction (JobSatPoints_6) by Experience Level')
plt.xlabel('Experience Level')
plt.ylabel('Median Job Satisfaction Score')
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Task 3: Trends in Job Satisfaction and Compensation by Experience


##### 1.Line Chart of Median ConvertedCompYearly Over Experience Level

- This line chart will track how median compensation (`ConvertedCompYearly`) changes with increasing experience.

- Use a column such as `WorkExp` or another relevant experience-related column.


In [None]:
## Write your code here

import pandas as pd
import matplotlib.pyplot as plt

# Clean the data: drop rows with missing compensation or experience
df_clean = df.dropna(subset=['ConvertedCompYearly', 'WorkExp']).copy()

# Convert WorkExp to a numeric format
def convert_workexp(value):
    if isinstance(value, str):
        if "less than" in value.lower():
            return 0.5
        elif "more than" in value.lower() or "10+" in value:
            return 11  # Assuming upper bound for plotting
        else:
            try:
                return float(value)
            except:
                return None
    return value

df_clean['WorkExpNum'] = df_clean['WorkExp'].apply(convert_workexp)

# Drop rows that couldn't be converted
df_clean = df_clean.dropna(subset=['WorkExpNum'])

# Ensure compensation is numeric
df_clean['ConvertedCompYearly'] = pd.to_numeric(df_clean['ConvertedCompYearly'], errors='coerce')
df_clean = df_clean.dropna(subset=['ConvertedCompYearly'])

# Group by WorkExpNum and calculate median compensation
comp_by_exp = df_clean.groupby('WorkExpNum')['ConvertedCompYearly'].median().sort_index()

# Plotting
plt.figure(figsize=(10,6))
plt.plot(comp_by_exp.index, comp_by_exp.values, marker='o', linestyle='-', color='teal')
plt.title('Median Converted Compensation by Years of Experience')
plt.xlabel('Years of Experience')
plt.ylabel('Median Yearly Compensation (ConvertedCompYearly)')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 2.Line Chart of Job Satisfaction (`JobSatPoints_6`) Across Experience Levels

- Create a line chart to explore trends in job satisfaction (`JobSatPoints_6`) based on experience level.

- This chart will provide insight into how satisfaction correlates with experience over time


In [None]:
## Write your code here

import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Clean and prepare data
df_clean = df.dropna(subset=['JobSatPoints_6', 'WorkExp']).copy()

# Convert WorkExp to numeric
def convert_workexp(value):
    if isinstance(value, str):
        value = value.strip().lower()
        if 'less than' in value:
            return 0.5
        elif 'more than' in value or '10+' in value:
            return 11
        try:
            return float(value)
        except:
            return None
    return value

df_clean['WorkExpNum'] = df_clean['WorkExp'].apply(convert_workexp)
df_clean = df_clean.dropna(subset=['WorkExpNum'])

# Step 2: Group by experience and calculate median job satisfaction
satisfaction_by_exp = df_clean.groupby('WorkExpNum')['JobSatPoints_6'].median().sort_index()

# Step 3: Plot
plt.figure(figsize=(10,6))
plt.plot(satisfaction_by_exp.index, satisfaction_by_exp.values, marker='o', linestyle='-', color='orange')
plt.title('Job Satisfaction (JobSatPoints_6) Across Experience Levels')
plt.xlabel('Years of Experience')
plt.ylabel('Job Satisfaction (JobSatPoints_6)')
plt.grid(True)
plt.tight_layout()
plt.show()

#### Final Step: Review


In this lab, you focused on analyzing trends in compensation and job satisfaction, specifically exploring how these metrics change with age and experience levels using line charts.


### Summary


In this lab, you explored essential data visualization techniques with a focus on analyzing trends using line charts. You learned to:

- Visualize the distribution of compensation across age groups to understand salary trends.

- Track changes in median compensation over various experience levels, identifying how earnings progress with experience.

- Examine trends in job satisfaction by experience, revealing how satisfaction varies throughout a developer's career.

These analyses allow for a deeper understanding of how factors like age and experience influence job satisfaction and compensation. By using line charts, you gained insights into continuous data patterns, which are invaluable for interpreting professional trends in the developer community.


## Authors:
Ayushi Jain


### Other Contributors:
- Rav Ahuja
- Lakshmi Holla
- Malika


<!--
## Change Log
|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2024-10-28|1.2|Madhusudhan Moole|Updated lab|
|2024-10-16|1.1|Madhusudhan Moole|Updated lab|
|2024-10-15|1.0|Raghul Ramesh|Created lab|
--!>


Copyright © IBM Corporation. All rights reserved.
