## Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data

In [22]:
import pandas as pd

#load cvs file with the read_csv method
df = pd.read_csv('survey_results_public.csv')

In [23]:
# see number of roles and columns (shape is an attribute not a method so no need for paratheses)
df.shape

(89184, 84)

In [24]:
# view all columns
pd.set_option('display.max_columns', 84)

# view all row (might be dangerous or effort consuming based on the size of the data set)
#pd.set_option('display.max_rows', 84)

In [25]:
# used to see information about the dataset, columns names, total number of inputs êr columns, Data type (It's a method)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89184 entries, 0 to 89183
Data columns (total 84 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   ResponseId                           89184 non-null  int64  
 1   Q120                                 89184 non-null  object 
 2   MainBranch                           89184 non-null  object 
 3   Age                                  89184 non-null  object 
 4   Employment                           87898 non-null  object 
 5   RemoteWork                           73810 non-null  object 
 6   CodingActivities                     73764 non-null  object 
 7   EdLevel                              87973 non-null  object 
 8   LearnCode                            87663 non-null  object 
 9   LearnCodeOnline                      70084 non-null  object 
 10  LearnCodeCoursesCert                 37076 non-null  object 
 11  YearsCode                   

In [26]:
schema_df = pd.read_csv('survey_results_schema.csv')

schema_df

Unnamed: 0,qid,qname,question,force_resp,type,selector
0,QID16,S0,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
1,QID12,MetaInfo,Browser Meta Info,False,Meta,Browser
2,QID310,Q310,"<div><span style=""font-size:19px;""><strong>You...",False,DB,TB
3,QID312,Q120,,True,MC,SAVR
4,QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
...,...,...,...,...,...,...
73,QID289,Knowledge_7,Waiting on answers to questions often causes i...,,MC,MAVR
74,QID289,Knowledge_8,I feel like I have the tools and/or resources ...,,MC,MAVR
75,QID290,Frequency_1,Needing help from people outside of your immed...,,MC,MAVR
76,QID290,Frequency_2,Interacting with people outside of your immedi...,,MC,MAVR


In [27]:
# use df.head to specify the number of rows we want to see
# this is used to show the first 10 rows
schema_df.head(10)

Unnamed: 0,qid,qname,question,force_resp,type,selector
0,QID16,S0,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
1,QID12,MetaInfo,Browser Meta Info,False,Meta,Browser
2,QID310,Q310,"<div><span style=""font-size:19px;""><strong>You...",False,DB,TB
3,QID312,Q120,,True,MC,SAVR
4,QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
5,QID2,MainBranch,Which of the following options best describes ...,True,MC,SAVR
6,QID127,Age,What is your age? *,True,MC,MAVR
7,QID296,Employment,Which of the following best describes your cur...,False,MC,MAVR
8,QID308,RemoteWork,Which best describes your current work situation?,False,MC,SAVR
9,QID297,CodingActivities,Which of the following best describes the code...,False,MC,MAVR


In [28]:
# this is used to show the last 10 rows
schema_df.tail(10)

Unnamed: 0,qid,qname,question,force_resp,type,selector
68,QID289,Knowledge_2,Knowledge silos prevent me from getting ideas ...,,MC,MAVR
69,QID289,Knowledge_3,I can find up-to-date information within my or...,,MC,MAVR
70,QID289,Knowledge_4,I am able to quickly find answers to my questi...,,MC,MAVR
71,QID289,Knowledge_5,I know which system or resource to use to find...,,MC,MAVR
72,QID289,Knowledge_6,I often find myself answering questions that I...,,MC,MAVR
73,QID289,Knowledge_7,Waiting on answers to questions often causes i...,,MC,MAVR
74,QID289,Knowledge_8,I feel like I have the tools and/or resources ...,,MC,MAVR
75,QID290,Frequency_1,Needing help from people outside of your immed...,,MC,MAVR
76,QID290,Frequency_2,Interacting with people outside of your immedi...,,MC,MAVR
77,QID290,Frequency_3,Encountering knowledge silos (where one indivi...,,MC,MAVR


## Python Pandas Tutorial (Part 2): DataFrame and Series Basics - Selecting Rows and Columns

In [55]:
# What is a dataframe let think of a data frame as a dictionary
# Dataframes look like something like this but in a more structured way

peron = {
    "firstname" : "Shalom",
    "lastname" : "DOSSEH",
    "email" : "dossehdosseh14@gmail.com"
}

people = {
    "firstname" : ["Shalom"],
    "lastname" : ["DOSSEH"],
    "email" : ["dossehdosseh14@gmail.com"]
}

people = {
    "firstname" : ["Shalom", "Paul", "Lola", "Pierre", "Emma", "John", "Sarah"],
    "lastname" : ["DOSSEH", "Lil", "Badass", "Sage", "Smith", "Doe", "Johnson"],
    "email" : ["dossehdosseh14@gmail.com", "paullil@gmail.com", "lolabadass0outlook.com", "pipisage@outmail.com", "emma.smith@example.com", "john.doe@example.com", "sarah.johnson@example.com"]
}


people["email"]

['dossehdosseh14@gmail.com',
 'paullil@gmail.com',
 'lolabadass0outlook.com',
 'pipisage@outmail.com',
 'emma.smith@example.com',
 'john.doe@example.com',
 'sarah.johnson@example.com']

In [30]:
# let access and display the values of the dictionary,

for i in range(len(people["firstname"])):
    firstname = people["firstname"][i]
    lastname = people["lastname"][i]
    email = people["email"][i]
    print(f"{firstname} {lastname} - {email}")

Shalom DOSSEH - dossehdosseh14@gmail.com
Paul Lil - paullil@gmail.com
Lola Badass - lolabadass0outlook.com
Pierre Sage - pipisage@outmail.com


In [31]:
# Ok now let create a dataframe from the dictionnary
# load
import pandas as pd

In [32]:
df = pd.DataFrame(people)

In [33]:
# here let display the mail column from the dataframe
# Nb: a dataframe is more complex and have more features than a dictionnary which allow wide range of data manipulation

df["email"] # or df.email

0    dossehdosseh14@gmail.com
1           paullil@gmail.com
2      lolabadass0outlook.com
3        pipisage@outmail.com
Name: email, dtype: object

In [34]:
# Let check the type of the column
type(df["email"])

pandas.core.series.Series

In [56]:
# pandas.core.series.Series (This is a series or also called a one dimensional Array)
# This is a series, so a columns can be called a series therefore a dataframe is a  or container of multiple series objects

# let parse multiple columns names to the dataframe
df[['lastname', 'email']]

Unnamed: 0,lastname,email
0,DOSSEH,dossehdosseh14@gmail.com
1,Lil,paullil@gmail.com
2,Badass,lolabadass0outlook.com
3,Sage,pipisage@outmail.com


In [39]:
# This is use to get columns
df.columns

Index(['firstname', 'lastname', 'email'], dtype='object')

In [52]:
# To get rows we use loc and iloc, iloc stands for (integer location)
# this will return a series of the first rows of data

df.iloc[0]

firstname                      Shalom
lastname                       DOSSEH
email        dossehdosseh14@gmail.com
Name: 0, dtype: object

In [54]:
# let parse a list of rows

df.iloc[[0, 1]]

Unnamed: 0,firstname,lastname,email
0,Shalom,DOSSEH,dossehdosseh14@gmail.com
1,Paul,Lil,paullil@gmail.com


In [None]:
# here we are grouping the first two rows to select the emails (2 index)
df.iloc[[0, 1], 2]

In [58]:
# Nb: iloc use use to grab row but only based on their indexes not names (that will be for  that will locate columns based on labels) 0 for the first row 1 for the seconde and so one
df.loc[[0, 1], "email"]

0    dossehdosseh14@gmail.com
1           paullil@gmail.com
Name: email, dtype: object

In [59]:
# We can also pass a list (we can also do that with iloc)
df.loc[[0, 1], ["email", "lastname"]]

Unnamed: 0,email,lastname
0,dossehdosseh14@gmail.com,DOSSEH
1,paullil@gmail.com,Lil


In [60]:
# let go back to the initial dataframe
import pandas as pd

#load cvs file with the read_csv method
df = pd.read_csv('survey_results_public.csv')

In [67]:
df.head(2)

Unnamed: 0,ResponseId,Q120,MainBranch,Age,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,DevType,OrgSize,PurchaseInfluence,TechList,BuyNewTool,Country,Currency,CompTotal,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,SOAI,AISelect,AISent,AIAcc,AIBen,AIToolInterested in Using,AIToolCurrently Using,AIToolNot interested in Using,AINextVery different,AINextNeither different nor similar,AINextSomewhat similar,AINextVery similar,AINextSomewhat different,TBranch,ICorPM,WorkExp,Knowledge_1,Knowledge_2,Knowledge_3,Knowledge_4,Knowledge_5,Knowledge_6,Knowledge_7,Knowledge_8,Frequency_1,Frequency_2,Frequency_3,TimeSearching,TimeAnswering,ProfessionalTech,Industry,SurveyLength,SurveyEase,ConvertedCompYearly
0,1,I agree,None of these,18-24 years old,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,I agree,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,Hobby;Contribute to open-source projects;Boots...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Friend or fam...,Formal documentation provided by the owner of ...,Other,18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",2 to 9 employees,I have a great deal of influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,285000.0,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),Asana;Basecamp;GitHub Discussions;Jira;Linear;...,GitHub Discussions;Linear;Notion;Trello,Cisco Webex Teams;Discord;Google Chat;Google M...,Discord;Signal;Slack;Zoom,ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per month or weekly,"Yes, definitely","I don't think it's super necessary, but I thin...",Yes,Indifferent,Other (please explain),Somewhat distrust,Learning about a codebase;Writing code;Debuggi...,Writing code;Committing and reviewing code,,,,,,,Yes,People manager,10.0,Strongly agree,Agree,Strongly agree,Agree,Agree,Agree,Agree,Strongly agree,1-2 times a week,10+ times a week,Never,15-30 minutes a day,15-30 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,285000.0


In [70]:
# we can also slice values in iloc and loc

df.loc[0:3, "Employment":"SOComm"]

Unnamed: 0,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,DevType,OrgSize,PurchaseInfluence,TechList,BuyNewTool,Country,Currency,CompTotal,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysPersonal use,OpSysProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,AISearchHaveWorkedWith,AISearchWantToWorkWith,AIDevHaveWorkedWith,AIDevWantToWorkWith,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm
0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,"Employed, full-time",Remote,Hobby;Contribute to open-source projects;Boots...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Friend or fam...,Formal documentation provided by the owner of ...,Other,18.0,9.0,"Senior Executive (C-Suite, VP, etc.)",2 to 9 employees,I have a great deal of influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,285000.0,HTML/CSS;JavaScript;Python,Bash/Shell (all shells);C#;Dart;Elixir;GDScrip...,Supabase,Firebase Realtime Database;Supabase,Amazon Web Services (AWS);Netlify;Vercel,Fly.io;Netlify;Render,Next.js;React;Remix;Vue.js,Deno;Elm;Nuxt.js;React;Svelte;Vue.js,Electron;React Native;Tauri,Capacitor;Electron;Tauri;Uno Platform;Xamarin,Docker;Kubernetes;npm;Pip;Vite;Webpack;Yarn,Godot;npm;pnpm;Unity 3D;Unreal Engine;Vite;Web...,Vim;Visual Studio Code,Vim;Visual Studio Code,iOS;iPadOS;MacOS;Windows;Windows Subsystem for...,MacOS;Windows;Windows Subsystem for Linux (WSL),Asana;Basecamp;GitHub Discussions;Jira;Linear;...,GitHub Discussions;Linear;Notion;Trello,Cisco Webex Teams;Discord;Google Chat;Google M...,Discord;Signal;Slack;Zoom,ChatGPT,ChatGPT;Neeva AI,GitHub Copilot,GitHub Copilot,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per month or weekly,"Yes, definitely"
2,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Professional development or self-paced l...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Formal documentation provided by the owner of ...,,27.0,23.0,"Developer, back-end","5,000 to 9,999 employees",I have some influence,Given a list,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,250000.0,Bash/Shell (all shells);Go,Haskell;OCaml;Rust,,,Amazon Web Services (AWS);Google Cloud;OpenSta...,,,,,,Cargo;Docker;Kubernetes;Make;Nix,Cargo;Kubernetes;Nix,Emacs;Helix,Emacs;Helix,MacOS;Other Linux-based,MacOS;Other Linux-based,Markdown File;Stack Overflow for Teams,Markdown File,Microsoft Teams;Slack;Zoom,Slack;Zoom,,,,,Stack Overflow;Stack Exchange;Stack Overflow f...,A few times per month or weekly,Yes,Less than once per month or monthly,Neutral
3,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Colleague;Friend or family member;Other online...,Formal documentation provided by the owner of ...,,12.0,7.0,"Developer, front-end",100 to 499 employees,I have some influence,Investigate,Start a free trial;Ask developers I know/work ...,United States of America,USD\tUnited States dollar,156000.0,Bash/Shell (all shells);HTML/CSS;JavaScript;PH...,Bash/Shell (all shells);HTML/CSS;JavaScript;Ru...,PostgreSQL;Redis,PostgreSQL;Redis,Cloudflare;Heroku,Cloudflare;Heroku,Node.js;React;Ruby on Rails;Vue.js;WordPress,Node.js;Ruby on Rails;Vue.js,,,Homebrew;npm;Vite;Webpack;Yarn,Homebrew;npm;Vite,IntelliJ IDEA;Vim;Visual Studio Code;WebStorm,IntelliJ IDEA;Vim;WebStorm,iOS;iPadOS;MacOS,iOS;iPadOS;MacOS,Jira,Jira,Discord;Google Meet;Microsoft Teams;Slack;Zoom,Discord;Google Meet;Slack;Zoom,,,,,Stack Overflow;Stack Exchange,A few times per week,Yes,Less than once per month or monthly,"No, not really"
