# Stack Overflow Developer Survey 2017 Analysis

## Introduction
This project aims to analyze the 2017 Stack Overflow Developer Survey data. 
We will explore:
1. The most popular programming languages among developers.
2. The average salaries across different countries.
3. General insights about developers.

## Business Understanding
The objective is to analyze Stack Overflow survey data to gain insights about:
1. Which countries offer the highest average salaries for developers.
2. The distribution of programming hobbies among developers worldwide.
These insights could help aspiring developers understand global trends and prioritize certain skills or locations.


## Data Understanding

In [None]:
# General statistics of the dataset
print(filtered_data.describe())

# Checking for missing values
print(filtered_data.isnull().sum())


## Data Preparation
The data was filtered to include only the relevant columns:
- `Country`: To group and compare salary data by location.
- `ProgramHobby`: To analyze developer interests.
- `Salary`: To calculate average salaries.
Missing values were removed to ensure accuracy in the analysis.

## Modeling
- The average salary by country was calculated using the `groupby` and `mean` functions.
- The distribution of programming hobbies was analyzed using `value_counts`.

## Evaluation
The results show:
- Countries like Switzerland and the United States offer the highest average salaries for developers.
- A significant number of developers enjoy programming as a hobby, indicating strong engagement with their work.

## Deployment
The results will be shared through:
1. A GitHub repository containing the full code and analysis.
2. A blog post summarizing the key insights and visualizations.

### Steps
1. Load and inspect the data.
2. Clean the data.
3. Perform the required analyses.
4. Visualize the results.
5. Document and push the project to GitHub.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

### Loading the Data
We load the survey data from the `survey_results_public.csv` file using Pandas.

In [None]:
file_path = "survey_results_public.csv"  # Provide the correct path to the file
data = pd.read_csv(file_path)



# Display the first 5 rows
data.head()


### Inspecting the Columns
We inspect the columns to identify which ones are relevant to our analysis.

In [None]:
print(data.columns)

### Data Cleaning
We examine the missing values and remove unnecessary columns to prepare the data for analysis.

In [None]:
# Select relevant columns
columns_needed = ['Country', 'ProgramHobby', 'Salary']
filtered_data = data[columns_needed]

# Remove rows with missing values
filtered_data.dropna(inplace=True)

# Display the cleaned data
filtered_data.head()

### Analyze Most Popular Programming Languages
We analyze the `LanguageWorkedWith` column to determine the most commonly used programming languages among developers.

In [None]:
languages = filtered_data['LanguageWorkedWith'].str.split(';').explode()
top_languages = languages.value_counts().head(10)
top_languages

#### Bar Chart of Most Popular Programming Languages
The chart below shows the top 10 most commonly used programming languages among developers.

In [None]:
top_languages.plot(kind='bar', figsize=(10, 5))
plt.title("Most Popular Programming Languages")
plt.xlabel("Programming Language")
plt.ylabel("Number of Developers")
plt.show()

### Analyze Salaries by Country
We analyze the `ConvertedComp` column to calculate the average salaries in different countries.

In [None]:
# Calculate the average salary by country
avg_salary_by_country = filtered_data.groupby('Country')['Salary'].mean().sort_values(ascending=False).head(10)

# Display the results
print(avg_salary_by_country)

### Analyzing Programming Hobbies
We count the frequency of different programming hobbies among developers to understand their preferences.

In [None]:
# Count the frequency of programming hobbies
hobbies = filtered_data['ProgramHobby'].value_counts()

# Display the results
print(hobbies)


### Visualization: Average Salaries by Country
The bar chart below shows the top 10 countries with the highest average developer salaries.

In [None]:
# Plot a bar chart for average salaries by country
avg_salary_by_country.plot(kind='bar', figsize=(10, 5))
plt.title("Average Developer Salaries by Country")
plt.xlabel("Country")
plt.ylabel("Average Salary (USD)")
plt.show()

### Visualization: Distribution of Programming Hobbies
The bar chart below shows the distribution of programming hobbies among developers.

In [None]:
# Plot a bar chart for programming hobbies distribution
hobbies.plot(kind='bar', figsize=(10, 5))
plt.title("Distribution of Programming Hobbies")
plt.xlabel("Hobby")
plt.ylabel("Number of Developers")
plt.show()