# Pandas Part 3

Link: https://youtu.be/W9XjRYFkkyw

In [1]:
import pandas as pd

In Pandas, an index is a label or set of labels used to identify rows (and sometimes columns) in a DataFrame or Series. It is a fundamental part of the structure of these data structures, providing a way to access and manipulate the data easily.


- Index (in a DataFrame): An index is a row label that allows for easy access, alignment, and manipulation of data within the DataFrame. It can be thought of as an address that identifies each row uniquely. It can be a simple range of integers, a set of labels, or even multi-level (hierarchical).

- Index (in a Series): In a Series, the index is similar to a DataFrame but applies to the single column of data. It labels each data point in the Series.

Pandas doesn't enforce unique values for indexes, but it's considered good practice.

In [2]:
people = {
    'first': ['John', 'Jane', 'Jim'],
    'last': ['Doe', 'Smith', 'Brown'],
    'email': ['john.doe@example.com', 'jane.smith@example.com', 'jim.brown@example.com']
}

In [3]:
# Setting index when creating dataframe
people_df = pd.DataFrame(people, index=people['email']) 
people_df

Unnamed: 0,first,last,email
john.doe@example.com,John,Doe,john.doe@example.com
jane.smith@example.com,Jane,Smith,jane.smith@example.com
jim.brown@example.com,Jim,Brown,jim.brown@example.com


In [4]:
# Setting index after creating dataframe
people_df = pd.DataFrame(people) 
people_df.set_index('email', inplace = True)

In [5]:
people_df.index

Index(['john.doe@example.com', 'jane.smith@example.com',
       'jim.brown@example.com'],
      dtype='object', name='email')

In [6]:
people_df.loc['john.doe@example.com','last']

'Doe'

In [7]:
# Resetting index
people_df.reset_index(inplace = True)
people_df

Unnamed: 0,email,first,last
0,john.doe@example.com,John,Doe
1,jane.smith@example.com,Jane,Smith
2,jim.brown@example.com,Jim,Brown


In [8]:
path_df = 'data/survey_results_public.csv'
path_schema = 'data/survey_results_schema.csv'

#Options to actually see all rows and columns
pd.set_option('display.max_columns', 10) # Can also be displayed to the number of columns
pd.set_option('display.max_rows', 10)

df = pd.read_csv(path_df, index_col = 'Respondent') # Setting index when creating dataframe
schema_df = pd.read_csv(path_schema, index_col = 'Column')
df.head()

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,...,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",...,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",...,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,...,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,...,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,...,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


In [9]:
df.loc[1]

MainBranch                 I am a student who is learning to code
Hobbyist                                                      Yes
OpenSourcer                                                 Never
OpenSource      The quality of OSS and closed source software ...
Employment                 Not employed, and not looking for work
                                      ...                        
Sexuality                                 Straight / Heterosexual
Ethnicity                                                     NaN
Dependents                                                     No
SurveyLength                                Appropriate in length
SurveyEase                             Neither easy nor difficult
Name: 1, Length: 84, dtype: object

In [10]:
schema_df.head()

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Respondent,Randomized respondent ID number (not in order ...
MainBranch,Which of the following options best describes ...
Hobbyist,Do you code as a hobby?
OpenSourcer,How often do you contribute to open source?
OpenSource,How do you feel about the quality of open sour...


In [11]:
# We can use .loc to search for the columns directly
schema_df.loc['MgrIdiot','QuestionText']

'How confident are you that your manager knows what they’re doing?'

In [12]:
df['MgrIdiot'].value_counts()

MgrIdiot
Somewhat confident        25207
Very confident            24344
Not at all confident       9516
I don't have a manager     2092
Name: count, dtype: int64

In [13]:
# Improve searchability by ordering indexes
schema_df.sort_index(ascending = True, inplace = True)
schema_df

Unnamed: 0_level_0,QuestionText
Column,Unnamed: 1_level_1
Age,What is your age (in years)? If you prefer not...
Age1stCode,At what age did you write your first line of c...
BetterLife,Do you think people born today will have a bet...
BlockchainIs,Blockchain / cryptocurrency technology is prim...
BlockchainOrg,How is your organization thinking about or imp...
...,...
WorkPlan,How structured or planned is your work?
WorkRemote,How often do you work remotely?
WorkWeekHrs,"On average, how many hours per week do you work?"
YearsCode,"Including any education, how many years have y..."
