# 1. Import Libraries and Load Data

Importing pandas and using `urllib` to download the main survey dataset from the IBM cloud. I'll save it locally as `survey.csv` and then load it into a pandas DataFrame.

In [1]:
import pandas as pd
import urllib.request

# Define the URL for the dataset
dataset_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m1_survey_data.csv"

# Download the file
filename = 'survey.csv'
urllib.request.urlretrieve(dataset_url, filename)

# Load the data into a DataFrame
df = pd.read_csv(filename)

# 2. Inspect the Data

Displaying the first 5 rows to understand the dataset's structure.

In [2]:
df.head()

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
0,4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
1,9,I am a developer by profession,Yes,Once a month or more often,The quality of OSS and closed source software ...,Employed full-time,New Zealand,No,Some college/university study without earning ...,"Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,,23.0,Man,No,Bisexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
2,13,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,United States,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",...,Somewhat more welcome now than last year,Tech articles written by other developers;Cour...,28.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
3,16,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,United Kingdom,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,26.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Neither easy nor difficult
4,17,I am a developer by profession,Yes,Less than once a month but more than once per ...,The quality of OSS and closed source software ...,Employed full-time,Australia,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,29.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina;Multiracial,No,Appropriate in length,Easy


# 3. Explore Data Dimensions

Checking the dimensions of the dataset (number of rows and columns).

In [3]:
print(f"The dataset has {df.shape[0]} rows.")
print(f"The dataset has {df.shape[1]} columns.")

The dataset has 11552 rows.
The dataset has 85 columns.


# 4. Check Data Types

Inspecting the data types of all columns to identify which are numeric and which are text (object).

In [4]:
df.dtypes

Unnamed: 0,0
Respondent,int64
MainBranch,object
Hobbyist,object
OpenSourcer,object
OpenSource,object
...,...
Sexuality,object
Ethnicity,object
Dependents,object
SurveyLength,object


# 5. Initial Data Insights

Performing a few simple calculations to get initial insights, like the mean age of participants and the number of unique countries in the survey.

In [5]:
print(f"The mean age of participants is: {df['Age'].mean()}")
print(f"The number of unique countries in the survey is: {df['Country'].nunique()}")

The mean age of participants is: 30.77239449133718
The number of unique countries in the survey is: 135
