# Survey Dataset Exploration

In this notebook, we will:

- Load the dataset that will be used throughout the capstone project  
- Explore its structure and contents  
- Get familiar with the data types and key columns


##  Loading the Dataset

We’ll load the developer survey dataset into a Pandas DataFrame to begin exploring its structure and contents.


Let’s import the necessary libraries for data loading and initial exploration.


In [1]:
import pandas as pd
print(pd.__version__)

2.2.2


We’ll use the `pandas.read_csv()` function to load the CSV file into a DataFrame for analysis.


The function below will download the dataset directly into your environment using the provided URL.


In [14]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/n01PQ9pSmiRX6520flujwQ/survey-data.csv"
df = pd.read_csv(URL)

## Exploring the Dataset

In [15]:
df.head()

Unnamed: 0,ResponseId,MainBranch,Age,Employment,RemoteWork,Check,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,...,JobSatPoints_6,JobSatPoints_7,JobSatPoints_8,JobSatPoints_9,JobSatPoints_10,JobSatPoints_11,SurveyLength,SurveyEase,ConvertedCompYearly,JobSat
0,1,I am a developer by profession,Under 18 years old,"Employed, full-time",Remote,Apples,Hobby,Primary/elementary school,Books / Physical media,,...,,,,,,,,,,
1,2,I am a developer by profession,35-44 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,
2,3,I am a developer by profession,45-54 years old,"Employed, full-time",Remote,Apples,Hobby;Contribute to open-source projects;Other...,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Colleague;On the job tr...,Technical documentation;Blogs;Books;Written Tu...,...,,,,,,,Appropriate in length,Easy,,
3,4,I am learning to code,18-24 years old,"Student, full-time",,Apples,,Some college/university study without earning ...,"Other online resources (e.g., videos, blogs, f...",Stack Overflow;How-to videos;Interactive tutorial,...,,,,,,,Too long,Easy,,
4,5,I am a developer by profession,18-24 years old,"Student, full-time",,Apples,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,...,,,,,,,Too short,Easy,,


We’ll begin by exploring the basic shape of the dataset to understand how many rows and columns it contains.  
This gives us a sense of the dataset’s size before diving deeper.


In [16]:
len(df)

65437

Next, we’ll check how many columns are present in the dataset to understand its structure and available fields.


In [17]:
len(df.columns)

114

We’ll explore the dataset to identify the data types of each column.  
This helps us understand how the data is structured and whether any type conversions are needed before analysis.


In [18]:
df.dtypes

ResponseId               int64
MainBranch              object
Age                     object
Employment              object
RemoteWork              object
                        ...   
JobSatPoints_11        float64
SurveyLength            object
SurveyEase              object
ConvertedCompYearly    float64
JobSat                 float64
Length: 114, dtype: object

We’ll calculate and print the mean age of the survey participants to get a sense of the average respondent demographic.


Since the `Age` column contains categorical ranges, we’ll map each range to an estimated midpoint to calculate an approximate average age.


In [37]:
# Mapping age ranges to numeric midpoints
age_mapping = {
    'Under 18 years old': 17,
    '18-24 years old': 21,
    '25-34 years old': 29.5,  
    '35-44 years old': 39.5, 
    '45-54 years old': 48,
    '55-64 years old': 58,
    '65 years or older': 67,
    'Prefer not to say': None
}

# Map the values to a new column
df['Age_numeric'] = df['Age'].map(age_mapping)

# Calculate mean age
mean_age = df['Age_numeric'].mean()

# Display result
print(f"Estimated average age of survey participants: {mean_age:.1f} years")

Estimated average age of survey participants: 32.8 years


The dataset is the result of a worldwide survey.  
To understand the global reach of the responses, we begin by exploring the `Country` column to identify how many **unique countries** are represented in the dataset.


In [48]:
df["Country"].nunique()

185

---