# Initial Python Setup

In [None]:
import pandas as pd
import numpy as np
import requests as req

from zipfile import ZipFile
from io import BytesIO

# Setting Variables
Here we are just setting variables (such as the url, file name, etc) to tidy up the importing code

In [None]:
url = 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2022.zip'
file_name = 'survey_results_public.csv'

# Importing the Dataset

In [None]:
response = req.get(url)
zip_file = ZipFile(BytesIO(response.content))
df = pd.read_csv(BytesIO(zip_file.read(file_name)))
display(df)

# Cleaning the Data

First, we will need to clean the data in order to make it possible to begin regressing the variables.

## Dummy Variables

Many of the variables in the data are multi-selector lists, where ther user could choose any, all, or none of the options. In order to work with this data, we will need to make dummy variables for each item in the list. We will be using the following method in order to do so
```python
df.str.get_dummies(";")
```

In [None]:
dummy_languages = df["LanguageHaveWorkedWith"].str.get_dummies(";").add_prefix("Language-")
dummy_devtype = df["DevType"].str.get_dummies(";").add_prefix("DevType-")
dummy_learncodeoffline = df["LearnCode"].str.get_dummies(";").add_prefix("LearnCode-")
dummy_learncodeonline = df["LearnCodeOnline"].str.get_dummies(";").add_prefix("LearnCode-")

df1 = pd.concat([df, dummy_languages, dummy_devtype, dummy_learncodeoffline, dummy_learncodeonline], axis = 1)
display(df1)

We keep the individual dummy dataframes because we can use them to grab all of the dummy variables from a set, such as below

In [None]:
display(df1[dummy_learncodeoffline.columns])