# Dictionary vs Dataframe (DF) 
<font color=blue>Dataframe</font> is more like a dictionary of list. And it is of type **series object** (verify using type()), bit different than a dictionary. <br>
**DF series** contain data of a single column. Therefore DF is a container for multiple such series.

In [83]:
# This is convenient way to store data about single person.
person = {
    'first': 'kas',
    'last': 'las',
    'email': 'kas@kas.com'
}

person['first']

'kas'

In [84]:
people = {
   'first': ['kas'],
   'last': ['las'],
   'email': ['kas@las.com'] 
}

In [85]:
# This is convenient way to store data about multiple people.
people = {
   'first': ['kas','bil','bap'],
   'last': ['las','bil','gap'],
   'email': ['kas@las.com','bil@bil.com','bap@bap.com'] 
}
people['email']

['kas@las.com', 'bil@bil.com', 'bap@bap.com']

In [86]:
type(people['email'])

list

In [87]:
import pandas as pd
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,kas,las,kas@las.com
1,bil,bil,bil@bil.com
2,bap,gap,bap@bap.com


In [88]:
df['email']

0    kas@las.com
1    bil@bil.com
2    bap@bap.com
Name: email, dtype: object

In [89]:
# Though we can access the column using dot ('.') notation as well. But it is more preferable to use bracket('[]') to access a column in a dataframe.
# The reason is if a dataframe object has an attribute with the same name as column then it might throw some error.
df.email

0    kas@las.com
1    bil@bil.com
2    bap@bap.com
Name: email, dtype: object

In [90]:
# accessing single column in a DF return as Series.
type(df['email'])

pandas.core.series.Series

In [91]:
df[['email','first']]

Unnamed: 0,email,first
0,kas@las.com,kas
1,bil@bil.com,bil
2,bap@bap.com,bap


In [92]:
# as we see we can access multiple column in DF. But in this case it is no more a series but a DF. Which means DF is a collection of Series
type(df[['first','last']])

pandas.core.frame.DataFrame

In [93]:
# access the column email in 0th row or Index
df.loc[0,'email']

'kas@las.com'

In [94]:
df.columns

Index(['first', 'last', 'email'], dtype='object')

In [95]:
df

Unnamed: 0,first,last,email
0,kas,las,kas@las.com
1,bil,bil,bil@bil.com
2,bap,gap,bap@bap.com


In [96]:
# Till now we see default indexes for each row. e.g 0 1 2 3 4 5
# we can set the index for unique value. The criteria for the column as an index column is all values in that columns should be unique.
df.set_index('email')

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
kas@las.com,kas,las
bil@bil.com,bil,bil
bap@bap.com,bap,gap


In [97]:
# we can the index not changed in the subsequent cells even though we have changed index in the previos cell
df

Unnamed: 0,first,last,email
0,kas,las,kas@las.com
1,bil,bil,bil@bil.com
2,bap,gap,bap@bap.com


In [98]:
# To change the index permanently so that it relects in the susequent cells
df.set_index('email',inplace=True)

In [99]:
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
kas@las.com,kas,las
bil@bil.com,bil,bil
bap@bap.com,bap,gap


In [100]:
df.index

Index(['kas@las.com', 'bil@bil.com', 'bap@bap.com'], dtype='object', name='email')

In [101]:
# Setting the index helps to find the detail about the row. 
df.loc['kas@las.com']

first    kas
last     las
Name: kas@las.com, dtype: object

In [102]:
df.loc['bil@bil.com','last']

'bil'

In [103]:
# We can not use loc with old index numebr will error out
#df.loc[0]

# But with iloc we can still pass row number i.e default index value
df.iloc[0]

first    kas
last     las
Name: kas@las.com, dtype: object

In [104]:
# to reset the index
df.reset_index(inplace=True)
df

Unnamed: 0,email,first,last
0,kas@las.com,kas,las
1,bil@bil.com,bil,bil
2,bap@bap.com,bap,gap


In [106]:
# Index can be define while loading the csv file
df = pd.read_csv("data/survey_results_public.csv",index_col='ResponseId')
df

Unnamed: 0_level_0,Q120,MainBranch,Age,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,...,Frequency_1,Frequency_2,Frequency_3,TimeSearching,TimeAnswering,ProfessionalTech,Industry,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I agree,None of these,18-24 years old,,,,,,,,...,,,,,,,,,,
2,I agree,I am a developer by profession,25-34 years old,"Employed, full-time",Remote,Hobby;Contribute to open-source projects;Boots...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;Friend or fam...,Formal documentation provided by the owner of ...,Other,...,1-2 times a week,10+ times a week,Never,15-30 minutes a day,15-30 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,285000.0
3,I agree,I am a developer by profession,45-54 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Professional development or self-paced l...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Colleague;On the job tr...,Formal documentation provided by the owner of ...,,...,6-10 times a week,6-10 times a week,3-5 times a week,30-60 minutes a day,30-60 minutes a day,DevOps function;Microservices;Automated testin...,"Information Services, IT, Software Development...",Appropriate in length,Easy,250000.0
4,I agree,I am a developer by profession,25-34 years old,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Colleague;Friend or family member;Other online...,Formal documentation provided by the owner of ...,,...,1-2 times a week,10+ times a week,1-2 times a week,15-30 minutes a day,30-60 minutes a day,Automated testing;Continuous integration (CI) ...,,Appropriate in length,Easy,156000.0
5,I agree,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,Hobby;Contribute to open-source projects;Profe...,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Online Courses or Certi...,Formal documentation provided by the owner of ...,Other;Codecademy;edX,...,1-2 times a week,1-2 times a week,3-5 times a week,60-120 minutes a day,30-60 minutes a day,Microservices;Automated testing;Observability ...,Other,Appropriate in length,Neither easy nor difficult,23456.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89180,I agree,I am a developer by profession,25-34 years old,"Employed, full-time;Independent contractor, fr...",Remote,Hobby;Bootstrapping a business;Freelance/contr...,"Associate degree (A.A., A.S., etc.)",Online Courses or Certification;Other online r...,Formal documentation provided by the owner of ...,Udemy,...,,,,,,,,Too long,Neither easy nor difficult,
89181,I agree,I am a developer by profession,18-24 years old,"Student, full-time;Employed, part-time","Hybrid (some remote, some in-person)",School or academic work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Colleague;Online Courses or Certification;Othe...,Formal documentation provided by the owner of ...,,...,,,,,,,,Too long,Easy,
89182,I agree,I code primarily as a hobby,Prefer not to say,I prefer not to say,,,Something else,Books / Physical media;Hackathons (virtual or ...,,Codecademy;Coursera,...,,,,,,,,Too long,Neither easy nor difficult,
89183,I agree,I am a developer by profession,Under 18 years old,"Employed, part-time;Student, part-time","Hybrid (some remote, some in-person)",Hobby;School or academic work,"Secondary school (e.g. American high school, G...",Online Courses or Certification;Other online r...,Formal documentation provided by the owner of ...,Udemy,...,,,,,,,,Appropriate in length,Neither easy nor difficult,


In [119]:
schema_df = pd.read_csv("data/survey_results_schema.csv",index_col='qid')
pd.set_option('display.max_rows', 50)

Unnamed: 0_level_0,qname,question,force_resp,type,selector
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
QID16,S0,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
QID12,MetaInfo,Browser Meta Info,False,Meta,Browser
QID310,Q310,"<div><span style=""font-size:19px;""><strong>You...",False,DB,TB
QID312,Q120,,True,MC,SAVR
QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
...,...,...,...,...,...
QID289,Knowledge_7,Waiting on answers to questions often causes i...,,MC,MAVR
QID289,Knowledge_8,I feel like I have the tools and/or resources ...,,MC,MAVR
QID290,Frequency_1,Needing help from people outside of your immed...,,MC,MAVR
QID290,Frequency_2,Interacting with people outside of your immedi...,,MC,MAVR


In [120]:
schema_df.loc['QID290']

Unnamed: 0_level_0,qname,question,force_resp,type,selector
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
QID290,Frequency,How frequently do you experience each of the f...,False,Matrix,Likert
QID290,Frequency_1,Needing help from people outside of your immed...,,MC,MAVR
QID290,Frequency_2,Interacting with people outside of your immedi...,,MC,MAVR
QID290,Frequency_3,Encountering knowledge silos (where one indivi...,,MC,MAVR


In [121]:
schema_df.sort_index()

Unnamed: 0_level_0,qname,question,force_resp,type,selector
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
QID100,SOVisitFreq,How frequently would you say you visit Stack O...,False,MC,SAVR
QID101,SOAccount,Do you have a Stack Overflow account?,False,MC,SAVR
QID102,SOPartFreq,How frequently would you say you participate i...,False,MC,SAVR
QID106,SOComm,Do you consider yourself a member of the Stack...,False,MC,SAVR
...,...,...,...,...,...
QID51,CompTotal,What is your current total <b>annual</b> compe...,False,TE,SL
QID6,Country,"Where do you live? <span style=""font-weight: b...",True,MC,DL
QID61,S3,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
QID71,OpSys,What is the primary <b>operating system</b> in...,False,Matrix,Likert
