# Task B: Stack Overflow Developer Survey 2024 Analytics

You are provided with the latest developer survey results from Stack Overflow. Your task is to perform analytics on the survey to extract insights on the programming industry.

## Setup
If you are in google colab, you should just be able to run the cell below. Otherwise find the conda `environment.yml` file provided with all the dependencies.

In [None]:
%pip install pandas
import pandas as pd

## Getting to know the data reader

In [None]:
from data.stack_overflow_developer_survey.developer_survey_reader import SurveyDataReader

SURVEY_SUBDIR = "data/stack_overflow_developer_survey"
SCHEMA_RELATIVE_PATH = f"{SURVEY_SUBDIR}/survey_results_schema.csv"
DATA_RELATIVE_PATH = f"{SURVEY_SUBDIR}/survey_results_public.csv"

reader = SurveyDataReader(SCHEMA_RELATIVE_PATH, DATA_RELATIVE_PATH)

In [None]:
print(reader.get_schema())

print(len(reader.get_data()))

print(reader.get_data()[0:10]) # Be careful when trying to output the data, there's lots of it!

## Questions

1. Print all the questions asked in the developer survey

2. Which age range has the most responses in the survey?

3. How many survey respondents do we know definitely work for a company larger than Marshall Wace? (Feel free to ask one of us if you don't remember how large Marshall Wace is!)

4. What number of people had less than 1 year of coding experience before (or outside of) coding for their profession?

5. Of the people who had 1 or more years of coding experience outside of coding professionally, what is the average number of years they spent coding outside of work? For simplicity, you can consider only the people who have given an exact number of years they have spent coding in both columns (i.e. excluding those with over 50 or less than 1 year)

6. What is the cumulative compensation among those that disclosed their total compensation? (What assumption are we making here?)

7. What is the most used language for software development?

8. How do developers perceive the benefits of AI in their respective fields (as specified by the first question with id `MainBranch`)?

9. Recreate the graph displaying the most used IDEs, found [here](https://survey.stackoverflow.co/2024/technology/#1-integrated-development-environment). As an extension, recreate it per type of individual as shown on the site too (you do not need to make it interactive).

## Bonus Task: SurveyDataReader

`SurveyDataReader` is a basic class that allows you to access the underlying survey data in a programmatic manner. The class is implemented with basic data structures and no external dependencies hence there is plenty of room for optimisation. If you're feeling adventurous, try to improve the speed of basic operations and add some of your own by potentially leveraging a package such as [NumPy](https://numpy.org/).