# Dictionary vs Dataframe (DF) 
<font color=blue>Dataframe</font> is more like a dictionary of list. And it is of type **series object** (verify using type()), bit different than a dictionary. <br>
**DF series** contain data of a single column. Therefore DF is a container for multiple such series.

In [2]:
# This is convenient way to store data about single person.
person = {
    'first': 'kas',
    'last': 'las',
    'email': 'kas@kas.com'
}

person['first']

'kas'

In [3]:
people = {
   'first': ['kas'],
   'last': ['las'],
   'email': ['kas@las.com'] 
}

In [43]:
# This is convenient way to store data about multiple people.
people = {
   'first': ['kas','bil','bap','sunny'],
   'last': ['las','bil','gap','deol'],
   'email': ['kas@las.com','bil@bil.com','bap@bap.com','sunny@deol.com'] 
}
people['email']

['kas@las.com', 'bil@bil.com', 'bap@bap.com', 'sunny@deol.com']

In [44]:
type(people['email'])

list

In [119]:
import pandas as pd
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,kas,las,kas@las.com
1,bil,bil,bil@bil.com
2,bap,gap,bap@bap.com
3,sunny,deol,sunny@deol.com


In [46]:
df['email']

0       kas@las.com
1       bil@bil.com
2       bap@bap.com
3    sunny@deol.com
Name: email, dtype: object

In [47]:
# Though we can access the column using dot ('.') notation as well. But it is more preferable to use bracket('[]') to access a column in a dataframe.
# The reason is if a dataframe object has an attribute with the same name as column then it might throw some error.
df.email

0       kas@las.com
1       bil@bil.com
2       bap@bap.com
3    sunny@deol.com
Name: email, dtype: object

In [48]:
### Series Object
# Each column is a series object. Series is something like list but with more funtionality available i.e dataframe is a container which holds many series objects. Accessing single column in a DF return as Series.
type(df['email'])

pandas.core.series.Series

In [49]:
### We can access multiple columns from dataframe by passing column names as list. In this case the output is not a Series, rather another dataframe.
df[['email','first']]

Unnamed: 0,email,first
0,kas@las.com,kas
1,bil@bil.com,bil
2,bap@bap.com,bap
3,sunny@deol.com,sunny


In [50]:
# as we see we can access multiple column in DF. But in this case it is no more a series but a DF. Which means DF is a collection of Series
type(df[['first','last']])

pandas.core.frame.DataFrame

In [51]:
# access the column email in 0th row or Index
df.loc[0,'email']

'kas@las.com'

In [13]:
df.columns

Index(['first', 'last', 'email'], dtype='object')

In [52]:
df

Unnamed: 0,first,last,email
0,kas,las,kas@las.com
1,bil,bil,bil@bil.com
2,bap,gap,bap@bap.com
3,sunny,deol,sunny@deol.com


In [15]:
# Till now we see default indexes for each row. e.g 0 1 2 3 4 5
# we can set the index for unique value. The criteria for the column as an index column is all values in that columns should be unique.
df.set_index('email')

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
kas@las.com,kas,las
bil@bil.com,bil,bil
bap@bap.com,bap,gap


In [53]:
# we see the index not changed even though we have changed index in the previos cell
df

Unnamed: 0,first,last,email
0,kas,las,kas@las.com
1,bil,bil,bil@bil.com
2,bap,gap,bap@bap.com
3,sunny,deol,sunny@deol.com


In [54]:
# To change the index permanently so that it relects in the susequent cells
df.set_index('email',inplace=True)

In [55]:
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
kas@las.com,kas,las
bil@bil.com,bil,bil
bap@bap.com,bap,gap
sunny@deol.com,sunny,deol


In [56]:
df.index

Index(['kas@las.com', 'bil@bil.com', 'bap@bap.com', 'sunny@deol.com'], dtype='object', name='email')

In [57]:
# Setting the index helps to find the detail about the row. 
df.loc['kas@las.com']

first    kas
last     las
Name: kas@las.com, dtype: object

In [21]:
df.loc['bil@bil.com','last']

'bil'

In [58]:
# We can not use loc with old index numebr will error out
#df.loc[0]

In [59]:
# But with iloc we can still pass row number i.e default index value
df.iloc[0]

first    kas
last     las
Name: kas@las.com, dtype: object

In [60]:
# to reset the index
df.reset_index(inplace=True)
df

Unnamed: 0,email,first,last
0,kas@las.com,kas,las
1,bil@bil.com,bil,bil
2,bap@bap.com,bap,gap
3,sunny@deol.com,sunny,deol


In [124]:
# Set index with multiple columns. As not using inplace=True it will not make this change in the dataframe.
df.set_index(['email','first'],inplace=False)
# df.reset_index(inplace=True)
# df

Unnamed: 0_level_0,Unnamed: 1_level_0,last
email,first,Unnamed: 2_level_1
kas@las.com,kas,las
bil@bil.com,bil,bil
bap@bap.com,bap,gap
sunny@deol.com,sunny,deol


In [127]:
# Set index using list
list = ['x1','x2','x3','x4']
index = pd.Index(list)
df.set_index(index)

Unnamed: 0,email,first,last
x1,kas@las.com,kas,las
x2,bil@bil.com,bil,bil
x3,bap@bap.com,bap,gap
x4,sunny@deol.com,sunny,deol


In [129]:
# Set multiindex using list and column names
list = ['x1','x2','x3','x4']
index = pd.Index(list)
df.set_index([index,'email','first'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,last
Unnamed: 0_level_1,email,first,Unnamed: 3_level_1
x1,kas@las.com,kas,las
x2,bil@bil.com,bil,bil
x3,bap@bap.com,bap,gap
x4,sunny@deol.com,sunny,deol


In [128]:
df

Unnamed: 0,email,first,last
0,kas@las.com,kas,las
1,bil@bil.com,bil,bil
2,bap@bap.com,bap,gap
3,sunny@deol.com,sunny,deol


In [40]:
# Index can be define while loading the csv file
df = pd.read_csv("data/survey_results_public.csv",index_col='Respondent')
df

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,"Taught yourself a new language, framework, or ...",...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88377,,Yes,Less than once a month but more than once per ...,The quality of OSS and closed source software ...,"Not employed, and not looking for work",Canada,No,Primary/elementary school,,"Taught yourself a new language, framework, or ...",...,,Tech articles written by other developers;Tech...,,Man,No,,,No,Appropriate in length,Easy
88601,,No,Never,The quality of OSS and closed source software ...,,,,,,,...,,,,,,,,,,
88802,,No,Never,,Employed full-time,,,,,,...,,,,,,,,,,
88816,,No,Never,"OSS is, on average, of HIGHER quality than pro...","Independent contractor, freelancer, or self-em...",,,,,,...,,,,,,,,,,


In [None]:
schema_df = pd.read_csv("data/survey_results_schema.csv",index_col='qid')
pd.set_option('display.max_rows', 50)

Unnamed: 0_level_0,qname,question,force_resp,type,selector
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
QID16,S0,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
QID12,MetaInfo,Browser Meta Info,False,Meta,Browser
QID310,Q310,"<div><span style=""font-size:19px;""><strong>You...",False,DB,TB
QID312,Q120,,True,MC,SAVR
QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
...,...,...,...,...,...
QID289,Knowledge_7,Waiting on answers to questions often causes i...,,MC,MAVR
QID289,Knowledge_8,I feel like I have the tools and/or resources ...,,MC,MAVR
QID290,Frequency_1,Needing help from people outside of your immed...,,MC,MAVR
QID290,Frequency_2,Interacting with people outside of your immedi...,,MC,MAVR


In [None]:
schema_df.loc['QID290']

Unnamed: 0_level_0,qname,question,force_resp,type,selector
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
QID290,Frequency,How frequently do you experience each of the f...,False,Matrix,Likert
QID290,Frequency_1,Needing help from people outside of your immed...,,MC,MAVR
QID290,Frequency_2,Interacting with people outside of your immedi...,,MC,MAVR
QID290,Frequency_3,Encountering knowledge silos (where one indivi...,,MC,MAVR


In [None]:
schema_df.sort_index()

Unnamed: 0_level_0,qname,question,force_resp,type,selector
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
QID1,S1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
QID100,SOVisitFreq,How frequently would you say you visit Stack O...,False,MC,SAVR
QID101,SOAccount,Do you have a Stack Overflow account?,False,MC,SAVR
QID102,SOPartFreq,How frequently would you say you participate i...,False,MC,SAVR
QID106,SOComm,Do you consider yourself a member of the Stack...,False,MC,SAVR
...,...,...,...,...,...
QID51,CompTotal,What is your current total <b>annual</b> compe...,False,TE,SL
QID6,Country,"Where do you live? <span style=""font-weight: b...",True,MC,DL
QID61,S3,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
QID71,OpSys,What is the primary <b>operating system</b> in...,False,Matrix,Likert
