# Project Title - Stack Overflow Developer Insights: Analysis of the 2025 Developer Survey


## Data set selection

> In this section, you will need to provide the following information about the selected data set:
>
> - Source with a link

> - Stack Overflow Developer Insights: Analysis of the 2025 Developer Survey\
> - https://survey.stackoverflow.co/

> - Fields

> - The dataset contains responses from 49,000+ developers across 62 questions covering areas such as:
> - Country
> - Education level
> - Years of coding experience
> - Salary
> - Employment role
> - Technologies used (languages, frameworks, tools)
>- AI tool usage
>- Job satisfaction
> - Professional status
> - Remote work
> - Learning preferences

> - License

> - The Stack Overflow Developer Survey dataset is released under the Open Data Commons Attribution License (ODC-By), allowing reuse with attribution.

### Data set selection rationale

> Why did you select this data set?

> I selected the Stack Overflow 2025 Annual Developer Survey because it provides a large, rich, and up-to-date snapshot of the global software development community. With nearly 50,000 participants from over 170+ countries, it includes detailed information about developersâ€™ salaries, experience levels, education backgrounds, and technology preferences. This dataset is interesting and valuable because it allows me to explore meaningful questions about career growth, compensation, and the impact of new technologies like AI tools. These insights can help identify skill trends, understand industry needs, and highlight factors that influence developer success.

### Questions to be answered

> Using statistical analysis and visualization, what questions would you like to be able answer about this dataset.
> This could include questions such as:

> - What is the relationship between X and Y variables?
> - What is the distribution of the variables?
> - What is the relationship between the variables and the target?
>   You will need to frame these questions in a way to show value to a stakeholder (i.e.why should we know about the relationship between X and Y variables?)

> - How do developer salaries vary based on years of professional experience?
> - Does using AI-assisted coding tools correlate with higher salaries?
> - Which programming languages are the most commonly used among developers in 2025?
> - Does education level (degree vs. no degree) significantly affect salary ranges?
> - These questions will help reveal patterns that can be valuable for students exploring career paths, educators designing curricula, and developers planning their skill growth.

### Visualization ideas

> Provide a few examples of what you plan to visualize to answer the questions you posed in the previous section. In this project, you will be producing 6-8 visualizations. You will also be producing an interactive chart using Plotly.
> Think about what those visualization could be: what are the variables used in the charts? what insights do you hope to gain from them? 
> To answer my research questions, I plan to create the following visualizations:

> Histogram
> 1. Visualize the distribution of developer salaries
> 2. Shows whether salary data is skewed, centered, or spread out

> Boxplot by Experience Level
> 1. Compare salary distributions across different experience groups
> 2. Helps identify outliers and income progression

> Bar Chart of Most Popular Programming Languages
> 1. Count frequency of languages selected by developers
> 2. Reveals technology trends and skill demand

> Scatter Plot with Regression Line
> 1. Relationship between years of experience and salary
> 2. Shows correlation and direction of trend

> Grouped Bar Chart
> 1. Compare salaries of AI coding tool users vs. non-users
> 2. Highlights how new technologies influence compensation

> Pie Chart of Education Levels
> 1. Distribution of highest education achieved by developers
> 2. Helps understand developer backgrounds

> Together, these 6 visualizations will provide a complete picture of salary trends, skill usage, and developer demographics in the 2025 tech landscape.

In [None]:
# ðŸš€ Importing some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
import pandas as pd

df = pd.read_csv("data/survey_results_public.csv")
df.head()

  df = pd.read_csv("data/survey_results_public.csv")


Unnamed: 0,ResponseId,MainBranch,Age,EdLevel,Employment,EmploymentAddl,WorkExp,LearnCodeChoose,LearnCode,LearnCodeAI,...,AIAgentOrchestration,AIAgentOrchWrite,AIAgentObserveSecure,AIAgentObsWrite,AIAgentExternal,AIAgentExtWrite,AIHuman,AIOpen,ConvertedCompYearly,JobSat
0,1,I am a developer by profession,25-34 years old,"Masterâ€™s degree (M.A., M.S., M.Eng., MBA, etc.)",Employed,"Caring for dependents (children, elderly, etc.)",8.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",...,Vertex AI,,,,ChatGPT,,When I donâ€™t trust AIâ€™s answers,"Troubleshooting, profiling, debugging",61256.0,10.0
1,2,I am a developer by profession,25-34 years old,"Associate degree (A.A., A.S., etc.)",Employed,,2.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",...,,,,,,,When I donâ€™t trust AIâ€™s answers;When I want to...,All skills. AI is a flop.,104413.0,9.0
2,3,I am a developer by profession,35-44 years old,"Bachelorâ€™s degree (B.A., B.S., B.Eng., etc.)","Independent contractor, freelancer, or self-em...",None of the above,10.0,"Yes, I am not new to coding but am learning ne...",Online Courses or Certification (includes all ...,"Yes, I learned how to use AI-enabled tools for...",...,,,,,ChatGPT;Claude Code;GitHub Copilot;Google Gemini,,When I donâ€™t trust AIâ€™s answers;When I want to...,"Understand how things actually work, problem s...",53061.0,8.0
3,4,I am a developer by profession,35-44 years old,"Bachelorâ€™s degree (B.A., B.S., B.Eng., etc.)",Employed,None of the above,4.0,"Yes, I am not new to coding but am learning ne...","Other online resources (e.g. standard search, ...","Yes, I learned how to use AI-enabled tools for...",...,,,,,ChatGPT;Claude Code,,When I donâ€™t trust AIâ€™s answers;When I want to...,,36197.0,6.0
4,5,I am a developer by profession,35-44 years old,"Masterâ€™s degree (M.A., M.S., M.Eng., MBA, etc.)","Independent contractor, freelancer, or self-em...","Caring for dependents (children, elderly, etc.)",21.0,"No, I am not new to coding and did not learn n...",,"Yes, I learned how to use AI-enabled tools for...",...,,,,,,,When I donâ€™t trust AIâ€™s answers,"critical thinking, the skill to define the tas...",60000.0,7.0


In [None]:
df.columns.tolist()

['ResponseId',
 'MainBranch',
 'Age',
 'EdLevel',
 'Employment',
 'EmploymentAddl',
 'WorkExp',
 'LearnCodeChoose',
 'LearnCode',
 'LearnCodeAI',
 'AILearnHow',
 'YearsCode',
 'DevType',
 'OrgSize',
 'ICorPM',
 'RemoteWork',
 'PurchaseInfluence',
 'TechEndorseIntro',
 'TechEndorse_1',
 'TechEndorse_2',
 'TechEndorse_3',
 'TechEndorse_4',
 'TechEndorse_5',
 'TechEndorse_6',
 'TechEndorse_7',
 'TechEndorse_8',
 'TechEndorse_9',
 'TechEndorse_13',
 'TechEndorse_13_TEXT',
 'TechOppose_1',
 'TechOppose_2',
 'TechOppose_3',
 'TechOppose_5',
 'TechOppose_7',
 'TechOppose_9',
 'TechOppose_11',
 'TechOppose_13',
 'TechOppose_16',
 'TechOppose_15',
 'TechOppose_15_TEXT',
 'Industry',
 'JobSatPoints_1',
 'JobSatPoints_2',
 'JobSatPoints_3',
 'JobSatPoints_4',
 'JobSatPoints_5',
 'JobSatPoints_6',
 'JobSatPoints_7',
 'JobSatPoints_8',
 'JobSatPoints_9',
 'JobSatPoints_10',
 'JobSatPoints_11',
 'JobSatPoints_13',
 'JobSatPoints_14',
 'JobSatPoints_15',
 'JobSatPoints_16',
 'JobSatPoints_15_TEXT',
 