# Introduction
Every year, users of Stack Overflow are interviewed to measure various parameters, such as the tools they use, programming languages, salaries, and many other things. This information is extremely helpful for gathering data on global leading technologies. That's what we'll do in this project! This first notebook will be solely for data cleaning 🫡



In [5]:
import numpy as np
import pandas as pd

pd.set_option("display.max_columns", 85)
pd.set_option("display.max_rows", 85)
pd.options.display.float_format = "{:,.2f}".format

In [6]:
df_schema = pd.read_csv("../../data/raw/survey_results_schema.csv")
df_schema.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   qid         78 non-null     object
 1   qname       78 non-null     object
 2   question    78 non-null     object
 3   force_resp  67 non-null     object
 4   type        78 non-null     object
 5   selector    78 non-null     object
dtypes: object(6)
memory usage: 3.8+ KB


In [7]:
df = pd.read_csv("../../data/raw/survey_results_public.csv", index_col="ResponseId")
df.head()

Unnamed: 0_level_0,Q120,MainBranch,Age,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,DevType,OrgSize,PurchaseInfluence,TechList,BuyNewTool,Country,Currency,CompTotal,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,SOAI,AISelect,AISent,AIAcc,AIBen,AIToolInterested in Using,AIToolCurrently Using,AIToolNot interested in Using,AINextVery different,AINextNeither different nor similar,AINextSomewhat similar,AINextVery similar,AINextSomewhat different,TBranch,ICorPM,WorkExp,Knowledge_1,Knowledge_2,Knowledge_3,Knowledge_4,Knowledge_5,Knowledge_6,Knowledge_7,Knowledge_8,Frequency_1,Frequency_2,Frequency_3,TimeSearching,TimeAnswering,ProfessionalTech,Industry,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1
1,I agree,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,I agree,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,Hobby;Contribute to open-source projects;Boots...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Friend or fam...,Formal documentation provided by the owner of ...,Other,18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",2 to 9 employees,I have a great deal of influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,285000.0,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),Asana;Basecamp;GitHub Discussions;Jira;Linear;...,GitHub Discussions;Linear;Notion;Trello,Cisco Webex Teams;Discord;Google Chat;Google M...,Discord;Signal;Slack;Zoom,ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per month or weekly,"Yes, definitely","I don't think it's super necessary, but I thin...",Yes,Indifferent,Other (please explain),Somewhat distrust,Learning about a codebase;Writing code;Debuggi...,Writing code;Committing and reviewing code,,,,,,,Yes,People manager,10.0,Strongly agree,Agree,Strongly agree,Agree,Agree,Agree,Agree,Strongly agree,1-2 times a week,10+ times a week,Never,15-30 minutes a day,15-30 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,285000.0
3,I agree,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Professional development or self-paced l...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Formal documentation provided by the owner of ...,,27.0,23.0,"Developer, back-end","5,000 to 9,999 employees",I have some influence,Given a list,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,250000.0,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,Markdown File;Stack Overflow for Teams,Markdown File,Microsoft Teams;Slack;Zoom,Slack;Zoom,,,,,Stack Overflow;Stack Exchange;Stack Overflow f...,A few times per month or weekly,Yes,Less than once per month or monthly,Neutral,,"No, and I don't plan to",,,,,,,,,,,,Yes,Individual contributor,23.0,Strongly agree,Neither agree nor disagree,Agree,Agree,Agree,Agree,Agree,Agree,6-10 times a week,6-10 times a week,3-5 times a week,30-60 minutes a day,30-60 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,250000.0
4,I agree,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Colleague;Friend or family member;Other online...,Formal documentation provided by the owner of ...,,12.0,7.0,"Developer, front-end",100 to 499 employees,I have some influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,156000.0,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,Jira,Jira,Discord;Google Meet;Microsoft Teams;Slack;Zoom,Discord;Google Meet;Slack;Zoom,,,,,Stack Overflow;Stack Exchange,A few times per week,Yes,Less than once per month or monthly,"No, not really",I'm wearing of Stack Overflow using AI.,"No, and I don't plan to",,,,,,,,,,,,Yes,Individual contributor,7.0,Strongly agree,Strongly disagree,Strongly agree,Strongly agree,Agree,Neither agree nor disagree,Agree,Agree,1-2 times a week,10+ times a week,1-2 times a week,15-30 minutes a day,30-60 minutes a day,Automated testing;Continuous integration (CI) ...,,Appropriate in length,Easy,156000.0
5,I agree,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,Hobby;Contribute to open-source projects;Profe...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Online Courses or Certi...,Formal documentation provided by the owner of ...,Other;Codecademy;edX,6.0,4.0,"Developer, full-stack",20 to 99 employees,I have some influence,Investigate,Start a free trial;Ask developers I know/work ...,Philippines,PHP\tPhilippine peso,1320000.0,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,Confluence;Jira;Notion,Confluence;Jira;Notion,Discord;Google Meet;Slack;Zoom,Discord;Google Meet;Slack;Zoom,ChatGPT,ChatGPT,,,Stack Overflow;Stack Exchange,A few times per week,No,,Neutral,Using AI to suggest better answer to my questi...,Yes,Very favorable,Increase productivity;Greater efficiency;Speed...,Somewhat trust,Project planning;Testing code;Committing and r...,Learning about a codebase;Writing code;Documen...,,,,,,,Yes,Individual contributor,6.0,Agree,Strongly agree,Agree,Agree,Neither agree nor disagree,Agree,Strongly agree,Agree,1-2 times a week,1-2 times a week,3-5 times a week,60-120 minutes a day,30-60 minutes a day,Microservices;Automated testing;Observability ...,Other,Appropriate in length,Neither easy nor difficult,23456.0


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 89184 entries, 1 to 89184
Data columns (total 83 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Q120                                 89184 non-null  object 
 1   MainBranch                           89184 non-null  object 
 2   Age                                  89184 non-null  object 
 3   Employment                           87898 non-null  object 
 4   RemoteWork                           73810 non-null  object 
 5   CodingActivities                     73764 non-null  object 
 6   EdLevel                              87973 non-null  object 
 7   LearnCode                            87663 non-null  object 
 8   LearnCodeOnline                      70084 non-null  object 
 9   LearnCodeCoursesCert                 37076 non-null  object 
 10  YearsCode                            87435 non-null  object 
 11  YearsCodePro                     

In [9]:
df.columns

interest_df = df[
    [
        "MainBranch", "Age", "Employment", "RemoteWork", "EdLevel", "LearnCodeCoursesCert", "YearsCode",
        "YearsCodePro", "DevType", "BuyNewTool", "Country", "LanguageHaveWorkedWith", "LanguageWantToWorkWith",
        "DatabaseHaveWorkedWith", "DatabaseWantToWorkWith", "PlatformHaveWorkedWith", "PlatformWantToWorkWith",
        "WebframeHaveWorkedWith", "WebframeWantToWorkWith", "MiscTechHaveWorkedWith", "MiscTechWantToWorkWith",
        "ToolsTechHaveWorkedWith", "ToolsTechWantToWorkWith", "NEWCollabToolsHaveWorkedWith",
        "NEWCollabToolsWantToWorkWith", "OpSysPersonal use", "OpSysProfessional use", "AISearchHaveWorkedWith",
        "AISearchWantToWorkWith", "AIDevHaveWorkedWith", "AIDevWantToWorkWith", "AISelect", "AISent", "AIAcc",
        "AIBen", "AIToolInterested in Using", "AIToolCurrently Using", "AIToolNot interested in Using",
        "ProfessionalTech", "Industry", "ConvertedCompYearly","WorkExp",
    ]
].copy()

Perfect, I've finished selecting my interest columns!

In [10]:
interest_df.head()

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,EdLevel,LearnCodeCoursesCert,YearsCode,YearsCodePro,DevType,BuyNewTool,Country,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith,AISelect,AISent,AIAcc,AIBen,AIToolInterested in Using,AIToolCurrently Using,AIToolNot interested in Using,ProfessionalTech,Industry,ConvertedCompYearly,WorkExp
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1
1,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Other,18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",Start a free trial;Ask developers I know/work ...,United States of America,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Yes,Indifferent,Other (please explain),Somewhat distrust,Learning about a codebase;Writing code;Debuggi...,Writing code;Committing and reviewing code,,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",285000.0,10.0
3,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,27.0,23.0,"Developer, back-end",Start a free trial;Ask developers I know/work ...,United States of America,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,,,,,"No, and I don't plan to",,,,,,,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",250000.0,23.0
4,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,12.0,7.0,"Developer, front-end",Start a free trial;Ask developers I know/work ...,United States of America,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,,,,,"No, and I don't plan to",,,,,,,Automated testing;Continuous integration (CI) ...,,156000.0,7.0
5,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Other;Codecademy;edX,6.0,4.0,"Developer, full-stack",Start a free trial;Ask developers I know/work ...,Philippines,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,ChatGPT,ChatGPT,,,Yes,Very favorable,Increase productivity;Greater efficiency;Speed...,Somewhat trust,Project planning;Testing code;Committing and r...,Learning about a codebase;Writing code;Documen...,,Microservices;Automated testing;Observability ...,Other,23456.0,6.0


In [11]:
interest_df = interest_df.rename(
    columns={
        "LearnCodeCoursesCert": "CoursesCert",
        "DevType": "CurrentJob",
        "LanguageHaveWorkedWith": "WorkedLanguage",
        "LanguageWantToWorkWith": "TargetLanguage",
        "DatabaseHaveWorkedWith": "WorkedDatabase",
        "DatabaseWantToWorkWith": "TargetDatabase",
        "PlatformHaveWorkedWith": "WorkedPlatform",
        "PlatformWantToWorkWith": "TargetPlatform",
        "WebframeHaveWorkedWith": "WorkedFramework",
        "WebframeWantToWorkWith": "TargetFramework",
        "MiscTechHaveWorkedWith": "WorkedLibraries",
        "MiscTechWantToWorkWith": "TargetLibraries",
        "ToolsTechHaveWorkedWith": "WorkedTools",
        "ToolsTechWantToWorkWith": "TargetTools",
        "NEWCollabToolsHaveWorkedWith": "WorkedCollabTools",
        "NEWCollabToolsWantToWorkWith": "TargetCollabTools",
        "OpSysPersonal use": "OSPersonal",
        "OpSysProfessional use": "OSProffesional",
        "AISearchHaveWorkedWith": "WorkedAISearch",
        "AISearchWantToWorkWith": "TargetAISearch",
        "AIDevHaveWorkedWith": "WorkedAIDeveloper",
        "AIDevWantToWorkWith": "TargetAIDeveloper",
        "AISelect": "z_UsingAI",
        "AISent": "z_FavorableAI",
        "AIAcc": "z_BenefitsAI",
        "AIBen": "z_TrustAI",
        "ConvertedCompYearly": "YearlySalary",
        "WorkExp": "WorkingYears",
    }
)

In [12]:
pd.DataFrame(interest_df.isna().sum(), columns=['isnull'])

Unnamed: 0,isnull
MainBranch,0
Age,0
Employment,1286
RemoteWork,15374
EdLevel,1211
CoursesCert,52108
YearsCode,1749
YearsCodePro,23048
CurrentJob,12312
BuyNewTool,6175


In [13]:
interest_df.columns

Index(['MainBranch', 'Age', 'Employment', 'RemoteWork', 'EdLevel',
       'CoursesCert', 'YearsCode', 'YearsCodePro', 'CurrentJob', 'BuyNewTool',
       'Country', 'WorkedLanguage', 'TargetLanguage', 'WorkedDatabase',
       'TargetDatabase', 'WorkedPlatform', 'TargetPlatform', 'WorkedFramework',
       'TargetFramework', 'WorkedLibraries', 'TargetLibraries', 'WorkedTools',
       'TargetTools', 'WorkedCollabTools', 'TargetCollabTools', 'OSPersonal',
       'OSProffesional', 'WorkedAISearch', 'TargetAISearch',
       'WorkedAIDeveloper', 'TargetAIDeveloper', 'z_UsingAI', 'z_FavorableAI',
       'z_BenefitsAI', 'z_TrustAI', 'AIToolInterested in Using',
       'AIToolCurrently Using', 'AIToolNot interested in Using',
       'ProfessionalTech', 'Industry', 'YearlySalary', 'WorkingYears'],
      dtype='object')

In [14]:
interest_df['AIToolCurrently Using'].value_counts()

AIToolCurrently Using
Writing code                                                                                                            6459
Writing code;Debugging and getting help                                                                                 3741
Writing code;Documenting code                                                                                           2363
Learning about a codebase;Writing code;Debugging and getting help                                                       1673
Writing code;Documenting code;Debugging and getting help                                                                1528
                                                                                                                        ... 
Learning about a codebase;Project planning;Documenting code;Debugging and getting help;Collaborating with teammates        1
Project planning;Documenting code;Debugging and getting help;Testing code;Deployment and monitoring    

In [15]:
interest_df['AIToolNot interested in Using'].value_counts()

AIToolNot interested in Using
Collaborating with teammates                                                                                                                                                          2087
Project planning                                                                                                                                                                      1397
Project planning;Collaborating with teammates                                                                                                                                         1297
Deployment and monitoring;Collaborating with teammates                                                                                                                                1039
Project planning;Deployment and monitoring;Collaborating with teammates                                                                                                                986
                                   

In [16]:
interest_df['AIToolInterested in Using'].value_counts()

AIToolInterested in Using
Learning about a codebase;Project planning;Writing code;Documenting code;Debugging and getting help;Testing code;Committing and reviewing code;Deployment and monitoring;Collaborating with teammates     1252
Learning about a codebase;Project planning;Documenting code;Debugging and getting help;Testing code;Committing and reviewing code;Deployment and monitoring;Collaborating with teammates                   742
Learning about a codebase                                                                                                                                                                                  624
Documenting code                                                                                                                                                                                           610
Learning about a codebase;Project planning;Documenting code;Testing code;Committing and reviewing code;Deployment and monitoring;Collaborating wit

In [17]:
interest_df.loc[interest_df['CurrentJob'].str.contains('Data scientist', na=False)]['AIToolCurrently Using'].value_counts()

AIToolCurrently Using
Writing code                                                                                                                                                            127
Writing code;Debugging and getting help                                                                                                                                  85
Writing code;Documenting code                                                                                                                                            82
Writing code;Documenting code;Debugging and getting help                                                                                                                 45
Learning about a codebase;Writing code;Debugging and getting help                                                                                                        40
                                                                                                                      

In [18]:
interest_df.loc[interest_df['CurrentJob'].str.contains('Data scientist', na=False)]['AIToolInterested in Using'].value_counts()

AIToolInterested in Using
Learning about a codebase;Project planning;Writing code;Documenting code;Debugging and getting help;Testing code;Committing and reviewing code;Deployment and monitoring;Collaborating with teammates     32
Learning about a codebase                                                                                                                                                                                 19
Documenting code                                                                                                                                                                                          15
Learning about a codebase;Project planning;Debugging and getting help;Testing code;Committing and reviewing code;Deployment and monitoring;Collaborating with teammates                                   15
Learning about a codebase;Project planning;Documenting code;Debugging and getting help;Testing code;Committing and reviewing code;Deployment and monitorin

I think it's not that important the parts of their development workflow the people will use AI. The majority of responses, Data scientist too, are using AI to write code,
and they are interested in learning about a codebase.
# What's a codebase??


- ChatGPT:  
A codebase refers to the entire collection of source code, configuration files, dependencies, and any other relevant files that make up a software project. It encompasses all the code and related resources that are used to develop and maintain an application or software system.

A codebase typically includes:

1. Source code files: These are the actual programming files written in languages like Python, Java, JavaScript, C++, etc., that define the behavior and functionality of the software.

2. Configuration files: These files contain settings and configurations for the application, such as database connection details, environment variables, API keys, and other parameters that affect its behavior.

3. Build scripts and files: These are files responsible for building, compiling, and packaging the code into a runnable or deployable form. Examples include Makefiles, Gradle scripts, or build configurations for tools like Maven, npm, or webpack.

4. Dependency files: These files specify the external libraries, modules, or packages that the software depends on. They typically include package.json, requirements.txt, or Maven's pom.xml, which list the dependencies and their versions.

5. Documentation: This includes any relevant documentation files, such as README.md, API documentation, or user guides, which provide instructions and information about the codebase and its usage.

6. Tests: Test files and directories containing unit tests, integration tests, or other forms of automated tests that ensure the code behaves as expected and meets the defined requirements.

The codebase is essential for collaborative development, version control, and maintenance of the software. It provides a complete representation of the software project and allows developers to work together, track changes, and manage the code effectively.

version control systems like Git are commonly used to manage the codebase, enabling collaboration, history tracking, branching, and merging, among other features.

#### Why the data scientist's want to learn about the entire collection of the code in a software?

******Anyway, for my interests, those columns are not important******

In [19]:
interest_df = interest_df.drop(['AIToolInterested in Using','AIToolCurrently Using', 'AIToolNot interested in Using'], axis=1)

In [20]:
interest_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 89184 entries, 1 to 89184
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   MainBranch         89184 non-null  object 
 1   Age                89184 non-null  object 
 2   Employment         87898 non-null  object 
 3   RemoteWork         73810 non-null  object 
 4   EdLevel            87973 non-null  object 
 5   CoursesCert        37076 non-null  object 
 6   YearsCode          87435 non-null  object 
 7   YearsCodePro       66136 non-null  object 
 8   CurrentJob         76872 non-null  object 
 9   BuyNewTool         83009 non-null  object 
 10  Country            87973 non-null  object 
 11  WorkedLanguage     87140 non-null  object 
 12  TargetLanguage     80709 non-null  object 
 13  WorkedDatabase     73435 non-null  object 
 14  TargetDatabase     60911 non-null  object 
 15  WorkedPlatform     63628 non-null  object 
 16  TargetPlatform     51308 no

Why does the years coding column is an object?

In [21]:
interest_df['YearsCode'].unique()

array([nan, '18', '27', '12', '6', '21', '4', '5', '20', '14', '10', '15',
       '11', '3', '24', '8', '13', 'Less than 1 year', '16', '33', '22',
       '30', '32', '7', '35', '28', '40', '17', '29', '19',
       'More than 50 years', '9', '38', '26', '34', '25', '2', '45', '23',
       '31', '43', '1', '48', '41', '50', '39', '42', '37', '36', '44',
       '46', '49', '47'], dtype=object)

Thats beacause the 'Less than 1 year', and the 'More than 50 years' str. I expect that with the YearscodingPro, will be the same. I'm gonna make a function to modify those two, just for fun

In [22]:
interest_df['YearsCode']

ResponseId
1        NaN
2         18
3         27
4         12
5          6
        ... 
89180     20
89181      5
89182     10
89183      3
89184     17
Name: YearsCode, Length: 89184, dtype: object

In [23]:
interest_df['YearsCode'] = interest_df['YearsCode'].apply(lambda x: 50 if x=="More than 50 years" else x)
interest_df['YearsCode'] = interest_df['YearsCode'].apply(lambda x: 0 if x=="Less than 1 year" else x)

In [24]:
interest_df['YearsCodePro'] = interest_df['YearsCodePro'].apply(lambda x: 50 if x=="More than 50 years" else x)
interest_df['YearsCodePro'] = interest_df['YearsCodePro'].apply(lambda x: 0 if x=="Less than 1 year" else x)
    

In [25]:
interest_df['YearsCode'] = pd.to_numeric(interest_df['YearsCode'], errors='coerce')
interest_df['YearsCodePro'] = pd.to_numeric(interest_df['YearsCodePro'], errors='coerce')

In [26]:
interest_df.head()

Unnamed: 0_level_0,MainBranch,Age,Employment,RemoteWork,EdLevel,CoursesCert,YearsCode,YearsCodePro,CurrentJob,BuyNewTool,Country,WorkedLanguage,TargetLanguage,WorkedDatabase,TargetDatabase,WorkedPlatform,TargetPlatform,WorkedFramework,TargetFramework,WorkedLibraries,TargetLibraries,WorkedTools,TargetTools,WorkedCollabTools,TargetCollabTools,OSPersonal,OSProffesional,WorkedAISearch,TargetAISearch,WorkedAIDeveloper,TargetAIDeveloper,z_UsingAI,z_FavorableAI,z_BenefitsAI,z_TrustAI,ProfessionalTech,Industry,YearlySalary,WorkingYears
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1
1,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Other,18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",Start a free trial;Ask developers I know/work ...,United States of America,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Yes,Indifferent,Other (please explain),Somewhat distrust,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",285000.0,10.0
3,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,27.0,23.0,"Developer, back-end",Start a free trial;Ask developers I know/work ...,United States of America,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,,,,,"No, and I don't plan to",,,,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",250000.0,23.0
4,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,12.0,7.0,"Developer, front-end",Start a free trial;Ask developers I know/work ...,United States of America,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,,,,,"No, and I don't plan to",,,,Automated testing;Continuous integration (CI) ...,,156000.0,7.0
5,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Other;Codecademy;edX,6.0,4.0,"Developer, full-stack",Start a free trial;Ask developers I know/work ...,Philippines,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,ChatGPT,ChatGPT,,,Yes,Very favorable,Increase productivity;Greater efficiency;Speed...,Somewhat trust,Microservices;Automated testing;Observability ...,Other,23456.0,6.0


In [27]:
interest_df.drop_duplicates().value_counts().sum()

1168

I think those are the ones who didn't fill the survey

In [28]:
interest_df = interest_df.drop_duplicates()

In [29]:
interest_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 87981 entries, 1 to 89184
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   MainBranch         87981 non-null  object 
 1   Age                87981 non-null  object 
 2   Employment         87898 non-null  object 
 3   RemoteWork         73810 non-null  object 
 4   EdLevel            87973 non-null  object 
 5   CoursesCert        37076 non-null  object 
 6   YearsCode          87435 non-null  float64
 7   YearsCodePro       66136 non-null  float64
 8   CurrentJob         76872 non-null  object 
 9   BuyNewTool         83009 non-null  object 
 10  Country            87973 non-null  object 
 11  WorkedLanguage     87140 non-null  object 
 12  TargetLanguage     80709 non-null  object 
 13  WorkedDatabase     73435 non-null  object 
 14  TargetDatabase     60911 non-null  object 
 15  WorkedPlatform     63628 non-null  object 
 16  TargetPlatform     51308 no

In [30]:
interest_df = interest_df.reset_index(drop=True).copy()

In [31]:
# interest_df = interest_df.sort_index()

In [32]:
len(interest_df)

87981

In [33]:
interest_df

Unnamed: 0,MainBranch,Age,Employment,RemoteWork,EdLevel,CoursesCert,YearsCode,YearsCodePro,CurrentJob,BuyNewTool,Country,WorkedLanguage,TargetLanguage,WorkedDatabase,TargetDatabase,WorkedPlatform,TargetPlatform,WorkedFramework,TargetFramework,WorkedLibraries,TargetLibraries,WorkedTools,TargetTools,WorkedCollabTools,TargetCollabTools,OSPersonal,OSProffesional,WorkedAISearch,TargetAISearch,WorkedAIDeveloper,TargetAIDeveloper,z_UsingAI,z_FavorableAI,z_BenefitsAI,z_TrustAI,ProfessionalTech,Industry,YearlySalary,WorkingYears
0,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Other,18.00,9.00,"Senior Executive (C-Suite, VP, etc.)",Start a free trial;Ask developers I know/work ...,United States of America,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Yes,Indifferent,Other (please explain),Somewhat distrust,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",285000.00,10.00
2,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,27.00,23.00,"Developer, back-end",Start a free trial;Ask developers I know/work ...,United States of America,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,,,,,"No, and I don't plan to",,,,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",250000.00,23.00
3,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,12.00,7.00,"Developer, front-end",Start a free trial;Ask developers I know/work ...,United States of America,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,,,,,"No, and I don't plan to",,,,Automated testing;Continuous integration (CI) ...,,156000.00,7.00
4,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Other;Codecademy;edX,6.00,4.00,"Developer, full-stack",Start a free trial;Ask developers I know/work ...,Philippines,HTML/CSS;JavaScript;TypeScript,HTML/CSS;JavaScript;Python;Rust;TypeScript,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Gatsby;NestJS;Next.js;Node.js;React,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,,,Docker;npm;Webpack;Yarn,Docker;npm;Yarn,Vim;Visual Studio Code,Vim;Visual Studio Code,Other (Please Specify):,Other (Please Specify):,ChatGPT,ChatGPT,,,Yes,Very favorable,Increase productivity;Greater efficiency;Speed...,Somewhat trust,Microservices;Automated testing;Observability ...,Other,23456.00,6.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87976,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,"Associate degree (A.A., A.S., etc.)",Udemy,20.00,5.00,"Developer, front-end",,Brazil,HTML/CSS;Java;JavaScript;SQL;TypeScript,Bash/Shell (all shells);C;Go;HTML/CSS;JavaScri...,MongoDB;MySQL;PostgreSQL;SQLite,PostgreSQL;SQLite,Digital Ocean;Firebase;Google Cloud;Heroku;Vercel,Amazon Web Services (AWS);Heroku;Netlify;Vercel,Angular;AngularJS;Express;jQuery;Node.js;Nuxt....,Express;Node.js;Nuxt.js;Svelte;Vue.js,Flutter,,Chocolatey;CMake;Docker;Maven (build tool);npm...,Docker;npm;Pip;Vite;Yarn,Android Studio;Atom;Eclipse;IntelliJ IDEA;Netb...,Visual Studio Code,Android;Windows,Android;Ubuntu;Windows,ChatGPT,ChatGPT,Whispr AI,Whispr AI,Yes,Very favorable,Increase productivity;Greater efficiency;Speed...,Somewhat trust,,,,
87977,I am a developer by profession,18-24 years old,"Student, full-time;Employed, part-time","Hybrid (some remote, some in-person)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)",,5.00,,"Developer, back-end",Start a free trial;Ask developers I know/work ...,Romania,Dart;Java;Python;SQL,Java;JavaScript;Python;SQL;TypeScript,PostgreSQL,PostgreSQL,,,Spring Boot,Spring Boot,Flutter;Opencv;RabbitMQ;Spring Framework;Torch...,RabbitMQ;Spring Framework;Torch/PyTorch,Maven (build tool);Pip,Maven (build tool);Pip,Android Studio;IntelliJ IDEA;PyCharm,IntelliJ IDEA;PyCharm,Windows,Windows,ChatGPT,ChatGPT,,,Yes,Very favorable,Speed up learning,Somewhat trust,,,,
87978,I code primarily as a hobby,Prefer not to say,I prefer not to say,,Something else,Codecademy;Coursera,10.00,,,,Israel,Assembly;Bash/Shell (all shells);C;C#;Python;R...,Python;Rust,SQLite,,Amazon Web Services (AWS),Amazon Web Services (AWS);Microsoft Azure,,,NumPy;Pandas;TensorFlow,NumPy;Pandas;Torch/PyTorch,Cargo,Cargo;Docker;Kubernetes;Terraform,Jupyter Notebook/JupyterLab;Neovim;Visual Stud...,Jupyter Notebook/JupyterLab;Neovim;Visual Stud...,Windows,,ChatGPT;Quora Poe,ChatGPT;Quora Poe,,,"No, but I plan to soon",Very favorable,,Highly trust,,,,
87979,I am a developer by profession,Under 18 years old,"Employed, part-time;Student, part-time","Hybrid (some remote, some in-person)","Secondary school (e.g. American high school, G...",Udemy,3.00,,System administrator,Ask developers I know/work with;Visit develope...,Switzerland,Bash/Shell (all shells);C#;HTML/CSS;Java;JavaS...,Bash/Shell (all shells);HTML/CSS;JavaScript;Po...,MariaDB;Microsoft SQL Server;MongoDB;MySQL;Red...,Cassandra;Cosmos DB;Dynamodb;MariaDB;Microsoft...,Amazon Web Services (AWS);Cloudflare;Google Cl...,Amazon Web Services (AWS);Cloudflare;Digital O...,Express;Next.js;Node.js;React;WordPress,Angular;AngularJS;Express;Next.js;Node.js;Reac...,CUDA;NumPy;Torch/PyTorch,CUDA;Flutter;NumPy;TensorFlow;Torch/PyTorch,Docker;Kubernetes;npm;Podman;Vite,Ansible;APT;Docker;Kubernetes;npm;Pip;Podman;T...,Eclipse;IntelliJ IDEA;Nano;Notepad++;PyCharm;V...,Jupyter Notebook/JupyterLab;Notepad++;Visual S...,Arch;Debian;iOS;iPadOS;MacOS;Ubuntu;Windows;Wi...,BSD;Red Hat;Solaris;Ubuntu;Windows,Bing AI;ChatGPT;WolframAlpha,Bing AI;ChatGPT;Google Bard AI,,,Yes,Favorable,Increase productivity;Greater efficiency;Speed...,Somewhat distrust,,,,


In [34]:
interest_df = interest_df.sort_index(axis='columns')

In [35]:
interest_df.head()

Unnamed: 0,Age,BuyNewTool,Country,CoursesCert,CurrentJob,EdLevel,Employment,Industry,MainBranch,OSPersonal,OSProffesional,ProfessionalTech,RemoteWork,TargetAIDeveloper,TargetAISearch,TargetCollabTools,TargetDatabase,TargetFramework,TargetLanguage,TargetLibraries,TargetPlatform,TargetTools,WorkedAIDeveloper,WorkedAISearch,WorkedCollabTools,WorkedDatabase,WorkedFramework,WorkedLanguage,WorkedLibraries,WorkedPlatform,WorkedTools,WorkingYears,YearlySalary,YearsCode,YearsCodePro,z_BenefitsAI,z_FavorableAI,z_TrustAI,z_UsingAI
0,18-24 years old,,,,,,,,None of these,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,25-34 years old,Start a free trial;Ask developers I know/work ...,United States of America,Other,"Senior Executive (C-Suite, VP, etc.)","Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Employed, full-time","Information Services, IT, Software Development...",I am a developer by profession,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),DevOps function;Microservices;Automated testin...,Remote,GitHub Copilot,ChatGPT;Neeva AI,Vim;Visual Studio Code,Firebase Realtime Database;Supabase,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Fly.io;Netlify;Render,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,GitHub Copilot,ChatGPT,Vim;Visual Studio Code,Supabase,Next.js;React;Remix;Vue.js,HTML/CSS;JavaScript;Python,Electron;React Native;Tauri,Amazon Web Services (AWS);Netlify;Vercel,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,10.0,285000.0,18.0,9.0,Other (please explain),Indifferent,Somewhat distrust,Yes
2,45-54 years old,Start a free trial;Ask developers I know/work ...,United States of America,,"Developer, back-end","Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Employed, full-time","Information Services, IT, Software Development...",I am a developer by profession,MacOS;Other Linux-based,MacOS;Other Linux-based,DevOps function;Microservices;Automated testin...,"Hybrid (some remote, some in-person)",,,Emacs;Helix,,,Haskell;OCaml;Rust,,,Cargo;Kubernetes;Nix,,,Emacs;Helix,,,Bash/Shell (all shells);Go,,Amazon Web Services (AWS);Google Cloud;OpenSta...,Cargo;Docker;Kubernetes;Make;Nix,23.0,250000.0,27.0,23.0,,,,"No, and I don't plan to"
3,25-34 years old,Start a free trial;Ask developers I know/work ...,United States of America,,"Developer, front-end","Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Employed, full-time",,I am a developer by profession,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,Automated testing;Continuous integration (CI) ...,"Hybrid (some remote, some in-person)",,,IntelliJ IDEA;Vim;WebStorm,PostgreSQL;Redis,Node.js;Ruby on Rails;Vue.js,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,,Cloudflare;Heroku,Homebrew;npm;Vite,,,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,PostgreSQL;Redis,Node.js;React;Ruby on Rails;Vue.js;WordPress,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,,Cloudflare;Heroku,Homebrew;npm;Vite;Webpack;Yarn,7.0,156000.0,12.0,7.0,,,,"No, and I don't plan to"
4,25-34 years old,Start a free trial;Ask developers I know/work ...,Philippines,Other;Codecademy;edX,"Developer, full-stack","Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Employed, full-time;Independent contractor, fr...",Other,I am a developer by profession,Other (Please Specify):,Other (Please Specify):,Microservices;Automated testing;Observability ...,Remote,,ChatGPT,Vim;Visual Studio Code,Elasticsearch;MongoDB;PostgreSQL;Redis;Supabase,Express;NestJS;Next.js;Node.js;React;Remix;Vue.js,HTML/CSS;JavaScript;Python;Rust;TypeScript,,Amazon Web Services (AWS);Cloudflare;Digital O...,Docker;npm;Yarn,,ChatGPT,Vim;Visual Studio Code,BigQuery;Elasticsearch;MongoDB;PostgreSQL,Express;Gatsby;NestJS;Next.js;Node.js;React,HTML/CSS;JavaScript;TypeScript,,Amazon Web Services (AWS);Firebase;Heroku;Netl...,Docker;npm;Webpack;Yarn,6.0,23456.0,6.0,4.0,Increase productivity;Greater efficiency;Speed...,Very favorable,Somewhat trust,Yes


In [36]:
interest_df.shape

(87981, 39)

In [37]:
interest_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87981 entries, 0 to 87980
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Age                87981 non-null  object 
 1   BuyNewTool         83009 non-null  object 
 2   Country            87973 non-null  object 
 3   CoursesCert        37076 non-null  object 
 4   CurrentJob         76872 non-null  object 
 5   EdLevel            87973 non-null  object 
 6   Employment         87898 non-null  object 
 7   Industry           36774 non-null  object 
 8   MainBranch         87981 non-null  object 
 9   OSPersonal         86557 non-null  object 
 10  OSProffesional     78587 non-null  object 
 11  ProfessionalTech   41783 non-null  object 
 12  RemoteWork         73810 non-null  object 
 13  TargetAIDeveloper  19587 non-null  object 
 14  TargetAISearch     46150 non-null  object 
 15  TargetCollabTools  76649 non-null  object 
 16  TargetDatabase     609

I consider the data is cleaned n.n  
In another jupyter notebook, I'll start to make some feature understanding :D

In [39]:
interest_df.to_pickle('../../data/interim/interest_SO_survey_2023.pkl')